Computer Science > Distributed, Parallel, and Cluster Computing
[Submitted on 16 Jun 2014]
Title:OS4M: Achieving Global Load Balance of MapReduce Workload by Scheduling at the Operation Level
View PDFAbstract:The efficiency of MapReduce is closely related to its load balance. Existing works on MapReduce load balance focus on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations, with each operation representing one invocation of the Map or Reduce function. By default, MapReduce adopts the hash-based method to schedule Reduce operations, which often leads to poor load balance. In addition, the copy phase of Reduce tasks overlaps with Map tasks, which significantly hinders the progress of Map tasks due to I/O contention. Moreover, the three phases of Reduce tasks run in sequence, while consuming different resources, thereby under-utilizing resources. To overcome these problems, we introduce a set of mechanisms named OS4M (Operation Scheduling for MapReduce) to improve MapReduce's performance. OS4M achieves load balance by collecting statistics of all Map operations, and calculates a globally optimal schedule to distribute Reduce operations. With OS4M, the copy phase of Reduce tasks no longer overlaps with Map tasks, and the three phases of Reduce tasks are pipelined based on their operation loads. OS4M has been transparently incorporated into MapReduce. Evaluations on standard benchmarks show that OS4M's job duration can be shortened by up to 42%, compared with a baseline of Hadoop.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.