![Page 1: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/1.jpg)
7th Biennial Ptolemy Miniconference
Berkeley, CAFebruary 13, 2007
Scheduling Data-Intensive Workflows
Tim H. Wong, Daniel Zinn, Bertram Ludäscher
(UC Davis)
![Page 2: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/2.jpg)
2Ptolemy Miniconference 2007 Daniel Zinn
Outline
Problem motivation Assumptions Cost model Problem formalization Different “simplifications” and their complexity Prototypical Java implementation for Kepler Summary
![Page 3: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/3.jpg)
3Ptolemy Miniconference 2007 Daniel Zinn
Motivation: Distributed Execution of Scientific Workflows
![Page 4: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/4.jpg)
4Ptolemy Miniconference 2007 Daniel Zinn
Motivation: Distributed Execution of Scientific Workflows
Process a set of data on a set of machines
GOAL:Minimize WF-Execution time!Allocation Problem: Which actors are computed on which hosts?
![Page 5: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/5.jpg)
5Ptolemy Miniconference 2007 Daniel Zinn
Assumptions
Arbitrary data size Arbitrary machine speed Arbitrary bandwidth Arbitrary number of inputs Scientific workflow is a DAG (!)
GRID COMPUTING
![Page 6: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/6.jpg)
6Ptolemy Miniconference 2007 Daniel Zinn
Cost Model
Communication Time: TC
Function Execution Time: TE
Total Time: TT = TC + TE
Shipping and Handling Problem:Schedule all tasks such that the total time is minimal
![Page 7: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/7.jpg)
7Ptolemy Miniconference 2007 Daniel Zinn
Problem Variants and Complexities
Task Handling Problem (THP) Data Shipping Problem (DSP)
Reduction from Task Scheduling Problem [ERLA94]
Reduction from Multiprocessor Scheduling Problem [KA99]
Reduction from 1-Multiterminal Cut
Shipping and Handling Problem (SHP)Communication Cost: Non-uniformFunction Execution Cost: Non-uniformComplexity: NP-complete
Communication Cost: ZeroFunction Execution Cost: Non-uniformComplexity: NP-complete
Communication Cost: Non-uniformFunction Execution Cost: ZeroComplexity: NP-complete
![Page 8: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/8.jpg)
8Ptolemy Miniconference 2007 Daniel Zinn
easy-DSP: Uniform Transfer Rate, Uniform Data Size
Given: Directed Acyclic Graph,
Set of Colors Some vertices are already
colored Edge Weight = 1, if two adjacent
vertices are of different colorsEdge Weight = 0, otherwise
TASK: Color the rest of the vertices
such that total weight is minimal!
Cost Model:Minimize TotalShipped Volume!
4
![Page 9: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/9.jpg)
9Ptolemy Miniconference 2007 Daniel Zinn
1 - Multi-Terminal CUT
Given: Undirected Graph: G = (V,E) Set of Terminals: S V Edge Weights: 1
TASK: Find a multi-way cut of G with a
minimum number of edges
NP-Complete for more than 3 Terminals!
Minimize #edgesbetween differentterminals!
4
![Page 10: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/10.jpg)
10Ptolemy Miniconference 2007 Daniel Zinn
Reduction: 1-MTC <= DSP
4 4
?
DSP 1-MTC
“Order graph Color terminals”
![Page 11: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/11.jpg)
11Ptolemy Miniconference 2007 Daniel Zinn
Reduction: 1-MTC <= DSP
4 4
1
11
1
1
1 11
1
?!
DSP 1-MTC
![Page 12: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/12.jpg)
12Ptolemy Miniconference 2007 Daniel Zinn
Reduction: 1-MTC <= DSP
4 4
1
11
1
1
1 11
1
!
DSP 1-MTC
![Page 13: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/13.jpg)
13Ptolemy Miniconference 2007 Daniel Zinn
NP-Hard, ...But: Need to solve
Greedy Algorithm Dynamic Programing Algorithm
Investigate Approximation Algorithms for MTC/related !
![Page 14: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/14.jpg)
14Ptolemy Miniconference 2007 Daniel Zinn
Prototypical Implementation ...
abstractonly somenodes assigned
concreteall nodes assigned
scheduling
![Page 15: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/15.jpg)
15Ptolemy Miniconference 2007 Daniel Zinn
Prototypical Implementation ... in Kepler!
Abstract Workflow ...
SCHEDULING
![Page 16: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/16.jpg)
16Ptolemy Miniconference 2007 Daniel Zinn
Prototypical Implementation ... in Kepler!
Concrete Workflow ...
![Page 17: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/17.jpg)
17Ptolemy Miniconference 2007 Daniel Zinn
Future Work
Use Heuristics about looping to guess multiplicities(then not ACYCLIC any more!)
Investigate approximation algorithms with error guarantees for 1-MTC => try to apply for DSP
ALSO: Relevant for COMAD Workflows:can be “compiled” into a low-level conventional WF
![Page 18: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/18.jpg)
18Ptolemy Miniconference 2007 Daniel Zinn
Summary
Bad news Scheduling is hard DSP is hard (for BEST plans)
Good news Finding a quite good plan is easy Greedy/Dynamic Algorithms
Open Problems Approximation Quality of “simple algorithms”? When do they perform badly? Does this occur often in real-life workflows?
![Page 19: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/19.jpg)
19Ptolemy Miniconference 2007 Daniel Zinn
References
![Page 20: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC](https://reader030.vdocuments.mx/reader030/viewer/2022032523/56649d7e5503460f94a60da0/html5/thumbnails/20.jpg)
20Ptolemy Miniconference 2007 Daniel Zinn
Thank You. Questions?