Introduction Off-Line Problem On-Line Problem Summary Appendix
Dynamic Fractional Resource Schedulingfor Cluster Platforms
Mark Stillwell
Department of Information and Computer SciencesUniversity of Hawai’i at Manoa
Achievement Rewards for College Scientists2010 Scholarship Awards Program
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Clusters
DefinitionA cluster is a group of independent computers, or nodes,working together closely, usually connected by a high-speednetwork.
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Jobs
I The system can accept user requests to run jobs or theadministrator can instantiate jobs directly
I Running jobs are made up of nearly identical tasksI The number of tasks is specified by user/administratorI Tasks can block while communicating with each other
I The assignment of resources to tasks is called scheduling
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Current Approaches
I Service Hosting (Off-Line scheduling)I traditionally, dedicated machinesI current interest in server consolidationI few good theoretical modelsI heavy “engineering” bias
I High-Performance Computing (On-Line Scheduling)I Usually First-Come-First-Served (FCFS) with backfillingI Backfilling needs (unreliable) compute time estimatesI Unbounded wait timesI Inefficient use of nodes/resources
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Our Proposal
I Use virtual machine technology.I Multiple tasks on one nodeI Performance isolationI Sharing of fractional resources
I Define a run-time computable metric that captures notionsof performance and fairness.
I Design heuristics that allocate resources to jobs whileexplicitly trying to achieve high ratings by our metric.
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Requirements, Needs, and Yield
I Tasks have memory requirements and CPU needsI All tasks of a job have the same requirements and needsI For a task to be placed on a node there must be memory
available at least equal to its requirementsI A task can be allocated less CPU than its need, and the
ratio of the allocation to the need is the yieldI All tasks of a job must have the same yield, so we can also
speak of the yield of a jobI The yield of a job gives its performance relative to if it were
run on a dedicated system
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Off-Line Problem Assumptions
I Steady-state execution with infinite jobsI Models an ideal service hosting environmentI Makes the problem more tractable [Marchal et al., 2006]I Good when job duration longer than schedule time
I Jobs are serial (single task)
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Objective Function
I The performance of a job is correlated to its yieldI Maximizing the average or sum of the yields may be unfair
to some jobsI Instead, we seek to maximize the minimum yield
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Task Placement Heuristics
I Greedy Task Placement – Incremental, puts each task onthe node with the lowest computational load on which itcan fit without violating memory constraints
I Randomized Rounding Task Placement – Based onrelaxing an MILP to an LP and rounding probabilistically
I MCB Task Placement – Global, iteratively appliesmulti-capacity (vector) bin-packing heuristics during abinary search for the maximized minimum yield
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Large Problem Set: Minimum Yield vs. Free Memory
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.3
0.35
0.4
0.45
0.5
0.55
Slack
Min
imum
Yie
ld
boundMCB8GRGBSGSGBRRNDRRNZ
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
I The problem is tractableI Multi-capacity bin packing algorithms perform wellI Greedy algorithms perform okay, and are very fastI Randomized rounding approaches are not very promising
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
On-Line Problem Assumptions
I Job arrival/completion times are not known in advanceI Jobs are temporary
I The user wants a final resultI Quick turnaround relative to runtime is desired
I Jobs are not interactiveI So can wait until resources are available to start
I We avoid the use of runtime estimates
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Stretch
I Our goal: minimize maximum stretch (aka slowdown)I Stretch: the time a job spends in the system divided by the
time that would be spent in a dedicatedsystem [Bender et al., 1998]
I Popular to quantify schedule quality post-mortemI Not generally used to make scheduling decisionsI Runtime computation requires (unreliable) user estimates.I Minimizing average stretch prone to starvationI Minimizing maximum stretch captures notions of both
performance and fairness.I Our approach: try to maximize minimum yield
I Similar, but not the same, as minimizing maximum stretch
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Max Stretch Degradation vs. Load, no migration cost
1
10
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Str
etc
h D
eg
rad
atio
n F
acto
r
Load
EASYFCFS
GREEDY
GREEDYPGREEDYPM
DynMCB8
MCB8P 600MCB8PG 600
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
I DFRS approaches can significantly outperform traditionalapproaches
I Aggressive repacking can lead to much better resourceallocations
I But also to heavy migration costsI Greedy migration is not that useful
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Summary
I We have proposed a novel approach to job scheduling onclusters, Dynamic Fractional Resource Scheduling, thatmakes use of modern virtual machine technology andseeks to optimize a runtime-computable, user-centricmeasure of performance called the minimum yield
I Multi-capacity bin packing heuristics can be used to findgood solutions
I Our approach avoids the use of unreliable runtimeestimates
I This approach has the potential to lead toorder-of-magnitude improvements in performance overcurrent technology
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
References I
Bender, M. A., Chakrabarti, S., and Muthukrishnan, S.(1998).Flow and Stretch Metrics for Scheduling Continuous JobStreams.In Proc. of the 9th ACM-SIAM Symp. On DiscreteAlgorithms, pages 270–279.
Marchal, L., Yang, Y., Casanova, H., and Robert, Y. (2006).
Steady-state scheduling of multiple divisible loadapplications on wide-area distributed computing platforms.Intl. J. of High Performance Computing Applications,20(3):365–381.
Mark Stillwell UH Manoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
References II
Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova,H. (2009).Resource Allocation using Virtual Clusters.In CCGrid, pages 260–267. IEEE.
Stillwell, M., Vivien, F., and Casanova, H. (2010).Dynamic fractional resource scheduling for HPC workloads.
In IPDPS.to appear.
Mark Stillwell UH Manoa ICS
DFRS for Clusters