Paper co-authored by MSc Student Nikos Nikitas on optimizing internal shuffle operation of Apache Spark, the leading Big Data system, presented on IEEE BigData 2021

We are pleased to announce that the paper titled “Cherry: A Distributed Task-Aware Shuffle Service for Serverless Analytics” has been presented in the prestigious IEEE BigData conference that was held virtually on December 15-18, 2021.

The paper was co-authored by Nikos Nikitas (MSc student in Data Science and Machine Learning, ECE-NTUA), Ioannis Konstantinou (Assistant Professor, University of Thessaly), Vana Kalogeraki (Professor, Athens University of Economy and Business) and Nectarios Koziris (Professor, ECE-NTUA) and was the result of Nikos’ Master’s Thesis.

The paper presents Cherry, an open-source distributed task-aware Caching sHuffle sErvice for seRveRless analYtics. Cherry optimizes Apache Spark’s internal shuffle mechanism, which is a typical bottleneck in heavy distributed computations. It introduces a remote disaggregated storage engine for intermediate files while employing a task-aware look-ahead caching policy that pre-fetches blocks ahead of computation. Cherry is released as an open-source module and is based on cloud-native k8s technologies.