multi-resource packing for cluster schedulers robert grandl aditya akella srikanth kandula ganesh...

12
Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Upload: leon-harrison

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Multi-Resource Packing for

Cluster Schedulers

Robert Grandl Aditya Akella

Srikanth KandulaGanesh AnanthanarayananSriram Rao

Page 2: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Diverse Resource Requirements

Tasks need varying amounts of each resource E.g., Memory [100MB to 17GB] CPU [2% of a core to 6 cores]

Need to match tasks with machines based on resource

Demands for resources are not correlated Correlation coefficient across resource demands [–0.11, 0.33]

Page 3: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Current Schedulers do not Pack

Resources allocated in terms of “slots”

Resource Fragmentation

T1: 2 GB

T2: 2 GB

T3: 4 GB

4 GB Memory Machine A

4 GB Memory Machine B

Current Schedulers Packer Schedulers

T1: 2 GB

T2: 2 GB

T3: 4 GB

4 GB Memory Machine A

4 GB Memory Machine B

Page 4: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Current Schedulers do not Pack

Resources allocated in terms of “slots”

Over-allocation

20 MB/sIn Nw.

T1: 2 GB Mem

Current Schedulers

4 GB Memory 20 MB/s In Nw.

Machine A

20 MB/sIn Nw.

T2: 2 GB Mem

20 MB/sIn Nw.

20 MB/sIn Nw.

Packer Schedulers

20 MB/sIn Nw.

T1: 2 GB Mem

4 GB Memory 20 MB/s In Nw.

Machine A

20 MB/sIn Nw.

T2: 2 GB Mem

T3: 2 GB Mem

Page 5: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Current Schedulers do not Pack

Slots allocated purely on fairness

considerations

Cluster [18 Cores, 36 GB] / Job: [Task Prof.], # tasks

A [1 Core, 2 GB], 18

B [3 Cores, 1 GB], 6

C [3 Cores, 1 GB], 6

6 tasks

2 tasks

2 tasks

18cores

16 GB

6 tasks

2 tasks

2 tasks

6 tasks

2 tasks

2 tasks

18cores

16 GB

18cores

16 GB

A

B

Ct 2t 3t

18 tasks

0 tasks

0 tasks

18cores

36 GB

6 tasks

0 tasks 6 tasks

18cores

6 GB

18cores

6 GB

A

B

Ct 2t 3t

Durations:

DRF Packer

Resources used: DRF share = 1/3 Resources used: Packer

Current Schedulers Packer Schedulers

A: 3tB: 3tC: 3t

Durations:A: tB: 2tC: 3t

33% improvement

Page 6: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

It is all about packing ?

Multi-dimensional bin packing is NP-hard for #dimens. ≥ 2 Several heuristics proposed But they do not apply here … size of the ball, contiguity of allocation, resource demands are elastic in time

Will perfect packing suffice ?

Competing objectives: Cluster utilization vs. Job completion times vs.

Fairness

Page 7: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Something reasonably simple and which can be applied

Intuition behind the solution

Cluster efficiency Job completion time

Cluster efficiency Fairness

Page 8: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Tetris

Pack tasks along multiple resources Cosine similarity between task demand vector

and machine resource vector

Multi-resource version of SRTF Favor jobs with small remaining duration

and small resource consumption

Incorporate Fairness Fairness knob (0, 1] f → 0 close to perfect fairness f = 1 most efficient scheduling

A

T

F

1: while (resources R are free)2: among FJ jobs furthest from fair share3: score (j) = 4: max task t in j, demand(t) ≤ R A(t, R) + T(j)5: pick j*, t* = argmax score(j)6: R = R – demand(t*) 7: end while

(simplified) Scheduling procedure

Page 9: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Learning task requirements From tasks that have finished in the same phase Coefficient of variation [0.022, 0.41] Collecting statistics from recurring jobs

Task Requirements and resource usages

Resource Tracker measure actual usage of resources enforce allocations aware of activities on the cluster other than

tasks assignment: ingest and evacuation

Peak usage demands estimates for tasks

Machine - In Network

850

1024

0

512

MBy

tes /

sec

Time (sec)In Network UsedIn Network Free Task In Network Estimates

Resource Tracker

Page 10: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Prototype atop Hadoop 2.3

Evaluation

Tetris as a pluggable scheduler to RM Implement RT as a NM service Modified AM/RM resource allocation

protocol

Cluster capacity: 250 Nodes

4 hour synthetic workload

60 jobs with complementary task demands

Reduction (%) in Job Duration - DRF

CDF

Reduces average job duration by up to 40%

Reduces makespan by 39%

Large scale evaluation

Page 11: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Evaluation

Facebook production traces analysis

Fairness knob: fewer than 6% of jobs slow down; by not more than 8% on average

Knob value of 0.75 offers nearly the best possible efficiency with little unfairness

Trace-driven simulation

Fairness Knob - Fair

Slow

dow

n (%

)

Fairness Knob - DRF

Slow

dow

n (%

)

Page 12: Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Conclusion

Identify the importance of scheduling all relevant resources in a cluster

Resource Fragmentation

Over-allocation and Interference

New scheduler that pack tasks along multiple resources

Reduce makespan

Job Completion Time

Enable a trade-off between packing efficiency and fairness

Fairness Knob

Come and see our poster !