scheduling strategies for mapping application workflows onto the grid a. mandal, k. kennedy, c....
DESCRIPTION
Environment GrADSoft –Runs on top of Globus –Facilitates scheduling, launching, and monitoring of grid apps Extend GrADSoft to deal with workflows (not only tightly coupled apps)TRANSCRIPT
Scheduling Strategies for Mapping Application
Workflows Onto the Grid
A. Mandal, K. Kennedy, C. Koelbel, G. Marin, J. Mellor-
Crummey, B. Liu, L. Johnsson
The Forest
Performance Prediction + Scheduling
HeuristicsStatic Schedule for
Workflow Components
G. Marin, 2004 T. Braun, 1999
Environment• GrADSoft
– Runs on top of Globus– Facilitates scheduling, launching, and
monitoring of grid apps
• Extend GrADSoft to deal with workflows (not only tightly coupled apps)
What’s a workflow?• A set of applications (workflow components)
that must be run in a specific order
DAG – Directed Acyclic Graph
Workflow Scheduling• Condor DAGMan – dynamic, effectively
random scheduling• This approach is to do static scheduling
– Classic problem: given a set of machines, a set of jobs, and the performance of each job on each machine, schedule all jobs as to minimize total makespan
Determining Machine Fitness• Marin and Mellor-Crummey’s performance
models– For each workflow component and target machine,
produce a performance model– Advantage of performance models over cycle
accurate simulations!• Add data transfer penalty (using Network
Weather Service)
• We now have the expected time to completion (ETC) of every machine for every task.
Minimum Multiprocessor Scheduling Problem
• Classic problem is NP-Complete
• Use traditional heuristics:– Min-Min – Schedule minimum-length job– Max-Min – Schedule maximum-length job– Sufferage – Schedule job with most to lose by
waiting
Is This a Workflow Problem?
Only one component is easy(Marin already showed this works)
Scheduling many may not be tractable
EvaluationEMAN – Electron Micrograph Analysis
Almost entire time spent here
Evaluation• RN: Random Scheduling (DAGMan)• RA: Weighted Random• HC: Heuristic Scheduling with crude
performance models (CPU speed) • HA: Heuristic Scheduling with accurate
performance models (this scheme)
Evaluation Testbed• 147 machines• 4 types
• 64 dual processor Itanium 900MHz IA-64 nodes (RTC – Houston)• 16 Opteron 2009MHz nodes (Medusa - Houston)• 60 dual processor 1300MHz Itanium IA-64 nodes (acrl – Houston)• 7 Pentium IA-32 nodes (Knoxville) – used?
Results
2.2x improvement over random
Discussion• Static vs Dynamic Scheduling
– Problems?– Why not use performance models
dynamically?• Application to workflows or more to
parameter sweeps?• How did they achieve load balance?• Barriers to adoption?