Efficient Job Scheduling for Big Data

Job scheduling in Big Data clusters is crucial both for cluster operators’ return on investment and for overall user experience. In this context, we observe several anomalies in how modern cluster schedulers manage queues, and argue that maintaining queues of tasks at worker nodes has significant benefits. On one hand, centralized approaches do not use worker-side queues. Given the inherent feedback delays that these systems incur, they achieve sub-optimal cluster utilization, particularly for workloads dominated by short tasks. On the other hand, distributed schedulers typically do employ worker-side queuing, and achieve higher cluster utilization. However, they fail to place tasks at the best possible machine, since they lack cluster-wide information, leading to worse job completion time, especially for heterogeneous workloads. To the best of our knowledge, this is the first work to provide principled solutions to the above problems by introducing queue management techniques, such as appropriate queue sizing, prioritization of task execution via queue reordering, starvation freedom, and careful placement of tasks to queues. We instantiate our techniques by extending both a centralized (YARN) and a distributed (Mercury) scheduler, and evaluate their performance on a wide variety of synthetic and production workloads derived from Microsoft clusters. Our centralized implementation, Yaq-c, achieves 1.7x improvement on median job completion time compared to YARN, and our distributed one, Yaq-d, achieves 9.3x improvement over an implementation of Sparrow’s batch sampling on Mercury.

Facebooktwittergoogle_plusredditlinkedinmail

People Involved

Jeff Rasley

Jeff Rasley

PhD Student (2012)

Distributed Systems, Networking

Rodrigo Fonseca

Rodrigo Fonseca

Assistant Professor of Computer Science

Group Director

Konstantinos Karanasos

Konstantinos Karanasos

Collaborator

Microsoft CISL

Srikanth Kandula

Srikanth Kandula

Collaborator

Microsoft Research Redmond

Milan Vojnovic

Milan Vojnovic

Collaborator

Microsoft Research, Cambridge

Sriram Rao

Sriram Rao

Collaborator

Microsoft CISL

Publications

LEAVE A REPLY