apollo : scalable and coordinated scheduling for cloud-scale computing 72150263 심윤석

24
APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심심심

Upload: fay-holland

Post on 19-Jan-2016

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO :SCALABLE AND COORDINATED SCHEDUL-ING FOR CLOUD-SCALE COMPUTING

72150263 심윤석

Page 2: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

INDEX

• Backgroud

• Goals & Challenges of Apollo

• Apollo Framework

• Evaluation

• Conclusion

Page 3: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

BACKGROUDJob

StageTask

SCOPECompile

DAG (Directed acyclic graph)

150 DOG

Page 4: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

BACKGROUD

Page 5: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

GOALS & CHALLENGES

• Minimize Job Latency & Maximize Cluster Utilization

• Challenges• Scaling

• Heterogeneous workload

• Maximize Resource Utilization

Page 6: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

GOALS & CHALLENGES

• Scale• Job processes had GB to PB of data

• 100,000 scheduling request/sec (in peak time)

• Clusters contain over 20,000 servers

• Clusters run up to 170,000 tasks in parallel

Page 7: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

GOALS & CHALLENGES

• Heterogeneous workload• Short (Seconds) & Long (Hours) Execution Time

• I/O bound, CPU bound

• Various Resource Requirements (e.g. Memory, Cores)

• Data Locality (Long Task) & Scheduling Latency (Short Task)

Page 8: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

GOALS & CHALLENGES

• Maximize Utilization• Workload Fluctuates Regularly

• Especially CPU Utilization

Page 9: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK

Page 10: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORKDistributed and Coordinate Scheduler

Page 11: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK

EstimationBased

Scheduling

Page 12: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK

Wait-Time Update

Page 13: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK

• Wait-Time Matrix• For represent server load

• Lightweight

• Expected Wait Time

• Future Resource Availability

Page 14: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK• Estimation-Based Scheduling

• For Minimize Task Completion Time

• Stable match algorithm

• Task Completion Time Equation

• E Estimated Task Comple-tion TimeI Initialization TimeW Wait TimeR Runtime

• Include Server Failure Cost

• C Final Estimated Completion TimeP Success ProbabilityK Server Failure Panalty

𝐸=𝐼+𝑊+𝑅 𝐶=𝑃𝑠𝑢𝑐𝑐𝐸+𝐾 (1−𝑃 𝑠𝑢𝑐𝑐 )𝐸

Page 15: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK

• Distributed and Coordinate Scheduler• One scheduler per one job

• Each scheduler make Independent Decisionbased on Global Status

• Conflicts can be occur

Page 16: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK

• Correcting Conflicts (Correction Machanism)• Re-evaluates prior scheduling decisions• Duplicate Scheduling• Confidence

• Scattering completion time• Randomization

Page 17: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

APOLLO FRAMEWORK

• Opportunistic Scheduling• Maximize Utilization

• Random Scheduling Fairness

• Opportunistic Task • Can be preempted

• Can be upgrade to regular task

• Only consume idle resources

Opportunistic Task can useif Regular Task does not exist

Page 18: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

EVALUATION

• Apollo at Sacle

• Scheduling Quality

• Evaluating Estimates Completion Time

• Correction Effectiveness

• Stable matching Efficiency

Page 19: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

EVALUATION

• Apollo at Scale• Run 170,000 tasks in parallel

• Tracks 14,000,000 pending tasks

• Well utilized in weekday(90% median CPU utilization)

Page 20: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

EVALUATION

• Scheduling Quality• 80% of Recurring jobs

getting faster

• Significantly improvedwait time

• Similar performance with Oracle (No schedule latency, conflicts, failure …)

Page 21: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

EVALUATION

• Evaluating Estimates Completion Time

Page 22: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

EVALUATION

• Correction Effectiveness

• 82% Success rate

• < 0.5% Trigger rate

• Stable matching Efficiency

Page 23: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

CONCLUSION

• Minimize Job Latency • Loosely Coordinated Distributed Scheduler

• High Quality Scheduling

• Maximize Cluster Utilization• Opportunistic Scheduling

Page 24: APOLLO : SCALABLE AND COORDINATED SCHEDULING FOR CLOUD-SCALE COMPUTING 72150263 심윤석

REFERENCE

• https://www.usenix.org/conference/osdi14/technical-ses-sions/presentation/boutin

• https://www.usenix.org/sites/default/files/conference/pro-tected-files/osdi14_slides_boutin.pdf