dynamic multi phase scheduling for heterogeneous clusters

29
Dynamic Multi Phase Scheduling Dynamic Multi Phase Scheduling for Heterogeneous Clusters for Heterogeneous Clusters Florina M. Ciorba , Theodore Andronikos , Ioannis Riakiotakis , Anthony T. Chronopoulos and George Papakonstantinou National Technical University of Athens Computing Systems Laboratory University of Texas at San Antonio [email protected] www.cslab.ece.ntua.gr 20th International Parallel and Distributed Processing Symposium 25-29 April 2006

Upload: haamid

Post on 19-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

20th International Parallel and Distributed Processing Symposium 25-29 April 2006. Dynamic Multi Phase Scheduling for Heterogeneous Clusters. Florina M. Ciorba † , Theodore Andronikos † , Ioannis Riakiotakis † , Anthony T. Chronopoulos ‡ and George Papakonstantinou †. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

Dynamic Multi Phase Scheduling for Dynamic Multi Phase Scheduling for Heterogeneous ClustersHeterogeneous Clusters

Florina M. Ciorba†, Theodore Andronikos†, Ioannis Riakiotakis†,

Anthony T. Chronopoulos‡ and George Papakonstantinou†

† National Technical University of Athens

Computing Systems Laboratory

‡ University of Texas at San Antonio

[email protected]

20th International Parallel and Distributed Processing Symposium

25-29 April 2006

Page 2: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 2

OutlineOutline• IntroductionIntroduction

• Notation

• Some existing self-scheduling algorithms

• Dynamic self-scheduling for dependence

loops

• Implementation and test results

• Conclusions

• Future work

Page 3: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 3

IntroductionIntroduction Motivation for dynamically scheduling

loops with dependencies:

• Existing dynamic algorithms can not cope

with dependencies, because they lack

inter-slave communication

• Static algorithms are not always efficient

• In their original form, if dynamic algorithms

are applied to loops with dependencies,

they yield a serial/invalid execution

Page 4: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 4

OutlineOutline• IntroductionIntroduction

• Notation

• Some existing self-scheduling algorithms

• Dynamic self-scheduling for dependence

loops

• Implementation and test results

• Conclusions

• Future work

Page 5: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 5

NotationNotationAlgorithmic model:FOR (i1=l1; i1<=u1; i1++) FOR (i2=l2; i2<=u2; i2++) … FOR (in=ln; in<=un; in++)

Loop Body ENDFOR … ENDFORENDFOR• Perfectly nested loops

• Constant flow data dependencies• General program statements within the loop body

• J – index space of an n-dimensional uniform dependence loop

}1,|j{ nruilNJ rrr

Page 6: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 6

NotationNotation• u1 – synchronization dimension, un – scheduling dimension

• – set of dependence vectors

• PE – processing element

• P1,...,Pm – slaves

• N – number of scheduling steps

• Ci – chunk size at the i-th scheduling step

• Vi – size (iteration-wise) of Ci along scheduling dimension un

• VPk – virtual computing power of slave Pk

• Qk – number of processes in the run-queue of slave Pk

• – available computing power of slave Pk

• – total available computing power of the cluster

},...,{ 1 pddDS

kkk QVPA

m

k kAA1

Page 7: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 7

Outline• IntroductionIntroduction

• Notation

• Some existing self-scheduling algorithms

• Dynamic self-scheduling for dependence

loops

• Implementation and test results

• Conclusions

• Future work

Page 8: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 8

Some existing self-scheduling Some existing self-scheduling algorithmsalgorithms

• CSS and TSS are devised for homogeneous systems

• DTSS improves on TSS for heterogeneous systems by selecting

the chunk sizes according to:

• the virtual computational power of the slaves, Vk

• the number of processes in the run-queue of each PE, Qk

3 self-scheduling algorithms: CSS – Chunk Self-Scheduling,

Ci = constant

TSS – Trapezoid Self-Scheduling, Ci

= Ci-1 – D, where D – decrement, and

the first chunk is F = |J|/(2×m) and the last chunk is L = 1.

DTSS – Distributed TSS, Ci = Ci-1 – D,

where D – decrement, and the first chunk is F = |J|/(2×A) and the last chunk is L = 1.

u1

u2

Vi+1

Vi

Vi-1

V1

VN

...

...

DTSS

TSS

CSS

Ci+1

Ci

Ci-1

Page 9: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 9

Some existing self-scheduling Some existing self-scheduling algorithmsalgorithms

Algorithm Chunk sizes

CSS 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 200

TSS 277 270 263 256 249 242 235 228 221214 207 200 193 186 179 172 165 158 151 144 137 130 123 116 109 102 73

DTSS(dedicated)

392 253 368 237 344 221 108 211 103 300 192 276 176 176 252 160 77 149 72 207 130 183 114 159 98 46 87 41 44

DTSS(non-

dedicated)

263 383 369 355 229 112 219 107 209203 293 279 265 169 33 96 46 89 8683 80 77 74 24 69 66 31 59 56 5350 47 44 20 39 20 33 30 27 24 2120 20 20 20 20 20 20 20 8

|J|=5000×10000

m = 10 slaves

CSS and TSS give

the same chunk

sizes both in

dedicated and non-

dedicated systems,

respectively

DTSS adjusts the

chunk sizes to match

the different Ak of

slaves

Page 10: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 10

Outline• IntroductionIntroduction

• Notation

• Some existing self-scheduling algorithms

• Dynamic self-scheduling for dependence

loops

• Implementation and test results

• Conclusions

• Future work

Page 11: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 11

More notationMore notation• SP – synchronization point

• M – number of SPs inserted along synchronization

dimension u1

• H – interval (iteration-wise) between two SPs along u1

• H – is the same for every chunk

• SCi,j – the set of iterations of Ci between SPj-1 and

SPj

• Ci = Vi × M × H

• Current slave – the slave assigned chunk Ci

• Previous slave – the slave assigned chunk Ci-1

Page 12: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 12

Self-scheduling with Self-scheduling with synchronizationsynchronization

• Chunks are formed along scheduling dimension, here say u2

• SPs are inserted along synchronization dimension, u1

• Phase 1: Apply self-scheduling algorithms to the scheduling dimension• Phase 2: Insert synchronization points along the synchronization dimension

Page 13: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 13

The inter-slave communication scheme

• Ci-1 is assigned to Pk-1, Ci assigned to Pk and Ci+1 to Pk+1

• When Pk reaches SPj+1, it sends to Pk+1 only the data Pk+1 requires (i.e.,

those iterations imposed by the existing dependence vectors)• Afterwards, Pk receives from Pk-1 the data required for the current

computation

Slaves do not reach a SP at the same time, which leads to a wavefront execution fashion

communication setset of points computed at moment t+1set of points computed at moment tindicates communicationauxiliary explanations

Pk+

1

Pk

Pk-1

SPj

Ci+1

Ci

Ci-1

SPj+

1

SPj+

2

SCi,j+

1

SCi-

1,j+1

tt

tt tt+1+1

tt+1+1

Page 14: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 14

Dynamic Multi-Phase Scheduling DMPS(x)

INPUT: (a) An n-dimensional dependence nested loop.

(b) The choice of the algorithm CSS, TSS or DTSS.

(c) If CSS is chosen, then chunk size Ci.

(d) The synchronization interval H.

(e) The number of slaves m; in case of DTSS, the virtual power Vk of

every slave.

MasterMaster:

Initialization: (M.a) Register slaves. In case of DTSS, slaves report their Ak.

(M.b) Calculate F, L, N, D for TSS and DTSS. For CSS use the given Ci.

While there are unassigned iterations do:

(M.1) If a request arrives, put it in the queue.

(M.2) Pick a request from the queue, and compute the next chunk size using CSS,

TSS or DTSS.

(M.3) Update the current and previous slave ids.

(M.4) Send the id of the current slave to the previous one.

Page 15: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 15

Dynamic Multi-Phase Scheduling DMPS(x)

Slave Slave PPkk:

Initialization: (S.a) Register with the master. In case of DTSS, report Ak.

(S.b) Compute M according to the given H.

(S.1) Send request to the master.

(S.2) Wait for reply; if received chunk from master, go to step 3, else go to

OUTPUT.

(S.3) While the next SP is not reached, compute chunk i.

(S.4) If id of the send-to slave is known, go to step 5, else go to step 6.

(S.5) Send computed data to send-to slave

(S.6) Receive data from the receive-from slave and go to step 3.

OUTPUT

MasterMaster: If there are no more chunks to be assigned, terminate.

Slave Slave PPkk: If no more tasks come from master, terminate.

Page 16: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 16

Advantages of DMPS(x)

Can take as input any self-scheduling algorithm,

without any modifications

Phase 2 is independent of Phase 1

Phase 1 deals with the heterogeneity & load

variation in the system

Phase 2 deals with minimizing the inter-slave

communication cost

Suitable for any type of heterogeneous systems

Dynamic Multi-Phase Scheduling DMPS(x)

Page 17: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 17

Outline• IntroductionIntroduction

• Notation

• Some existing self-scheduling algorithms

• Dynamic self-scheduling for dependence

loops

• Implementation and test results

• Conclusions

• Future work

Page 18: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 18

Implementation and testing setupImplementation and testing setup The algorithms are implemented in C and C++

MPI platform is used for master-slave and inter-slave

communication

The heterogeneous system consists of 10 machines:

4 Intel Pentiums III, 1266 MHz with 1GB RAM (called zealots),

assumed to have VPk = 1.5 (one of them is the master)

6 Intel Pentiums III, 500 MHz with 512MB RAM (called kids),

assumed to have VPk = 0.5.

Interconnection network is Fast Ethernet, at 100Mbit/sec.

Dedicated system: all machines are dedicated to running the

program and no other loads are interposed during the execution.

Non-dedicated system: at the beginning of program’s execution,

a resource expensive process is started on some of the slaves,

halving their Ak.

Page 19: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 19

Implementation and testing setupImplementation and testing setup System configuration: zealot1 (master), zealot2, kid1, zealot3, kid2,

zealot4, kid3, kid4, kid5, kid6.

Three series of experiments for both dedicated & non-dedicated

systems, for m = 3,4,5,6,7,8,9 slaves:

1) DMPS(CSS)

2) DMPS(TSS)

3) DMPS(DTSS)

Two real-life applications: heat equation, Floyd-Steinberg computation

Speedup Sp is computed with:

where TPi – serial execution time on slave Pi, 1 ≤ i ≤ m, and

TPAR – parallel execution time (on m slaves)

In the plotting of Sp, VP is used instead of m on the x-axis.

PAR

PPPp T

TTTS m

},...,,min{21

Page 20: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 20

Performance results – Heat Performance results – Heat equationequation

Sync. interval

H

Dedicated systemSeries tested

Number of slaves m

3 4 5 6 7 8 9

100

1) DMPS(CSS) 2.32 1.75 1.73 1.23 1.21 1.21 1.182) DMPS(TSS) 2.20 1.73 1.56 1.38 1.25 1.14 1.02

3) DMPS(DTSS) 1.42 1.14 1.00 0.95 0.91 0.85 0.78

1501) DMPS(CSS) 2.31 1.74 1.71 1.21 1.22 1.21 1.182) DMPS(TSS) 2.18 1.72 1.54 1.38 1.25 1.14 1.02

3) DMPS(DTSS) 1.42 1.13 0.99 0.93 0.90 0.84 0.78

200

1) DMPS(CSS) 2.30 1.74 1.73 1.22 1.23 1.22 1.192) DMPS(TSS) 2.21 1.74 1.55 1.38 1.25 1.14 1.02

3) DMPS(DTSS) 1.42 1.13 0.99 0.94 0.90 0.83 0.78Heat Equation, dedicated heterogeneous cluster

0

1

2

3

4

5

6

3.5 4 5.5 6 6.5 7 7.5

Virtual powers

Spe

edup

DMPS(CSS) DMPS(TSS) DMPS(DTSS)

Page 21: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 21

Performance results – Heat Performance results – Heat equationequation

Sync. interval

H

Non-dedicated system

Series tested

Number of slaves m

3 4 5 6 7 8 9

100

1) DMPS(CSS) 2.33 1.76 1.73 2.46 2.45 2.38 2.062) DMPS(TSS) 2.20 1.74 1.56 2.52 2.56 2.18 2.10

3) DMPS(DTSS) 1.95 1.45 1.30 1.31 1.33 1.38 1.25

1501) DMPS(CSS) 2.33 1.74 1.72 2.46 2.49 2.43 2.052) DMPS(TSS) 2.19 1.72 1.54 2.42 2.23 2.31 2.06

3) DMPS(DTSS) 1.94 1.47 1.30 1.30 1.28 1.36 1.23

200

1) DMPS(CSS) 2.30 1.74 1.73 2.39 2.36 2.38 2.102) DMPS(TSS) 2.22 1.75 1.56 1.79 2.32 2.10 2.02

3) DMPS(DTSS) 1.96 1.44 1.29 1.29 1.27 1.32 1.21Heat Equation, non-dedicated heterogeneous cluster

0

1

2

3

4

3.5 4 5.5 6 6.5 7 7.5

Virtual powers

Sp

eed

up

DMPS(CSS) DMPS(TSS) DMPS(DTSS)

Page 22: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 22

Performance results – Floyd-Performance results – Floyd-SteinbergSteinberg

Sync. interval

H

Dedicated system

Series tested

Number of slaves m

3 4 5 6 7 8 9

50

1) DMPS(CSS) 27.79

22.14

16.78

16.69

16.53

11.38

11.36

2) DMPS(TSS) 25.32

19.77

17.30

15.41

13.80

12.43

11.40

3) DMPS(DTSS) 19.63

14.87

13.28

12.72

11.57

11.45

10.73

100

1) DMPS(CSS) 27.52

22.01

16.70

16.65

16.43

11.34

11.33

2) DMPS(TSS) 25.22

19.70

17.24

15.35

13.75

12.38

11.38

3) DMPS(DTSS) 19.63

14.80

13.21

12.66

11.52

11.34

10.64

150

1) DMPS(CSS) 27.58

22.03

16.75

16.70

16.44

11.43

11.43

2) DMPS(TSS) 25.22

19.70

17.22

15.34

13.75

12.39

11.38

3) DMPS(DTSS) 19.62

14.82

13.24

12.67

11.53

11.34

10.65

Floyd-Steinberg, dedicated heterogeneous cluster

0

12

3

4

56

7

3.5 4 5.5 6 6.5 7 7.5

Virtual powers

Sp

eed

up

DMPS(CSS) DMPS(TSS) DMPS(DTSS)

Page 23: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 23

Performance results – Floyd-Performance results – Floyd-SteinbergSteinberg

Floyd-Steinberg, non-dedicated heterogeneous cluster

0

1

2

3

4

5

6

3.5 4 5.5 6 6.5 7 7.5

Virtual power

Sp

eed

up

DMPS(CSS) DMPS(TSS) DMPS(DTSS)

Sync. interval

H

Non-dedicated system

Series tested

Number of slaves m

3 4 5 6 7 8 9

50

1) DMPS(CSS) 27.72

22.13

16.76

23.81

22.32

22.47

22.44

2) DMPS(TSS) 25.18

19.72

17.24

22.34

24.14

22.26

20.95

3) DMPS(DTSS) 21.88

16.06

14.38

13.74

13.26

13.02

11.71

100

1) DMPS(CSS) 27.49

21.99

16.67

22.61

22.42

22.59

22.35

2) DMPS(TSS) 25.18

19.66

17.17

19.23

24.15

22.24

20.88

3) DMPS(DTSS) 21.85

15.96

14.32

13.65

13.11

12.80

11.58

150

1) DMPS(CSS) 27.57

22.01

16.74

22.49

22.48

22.32

22.46

2) DMPS(TSS) 25.17

19.65

17.20

26.20

24.14

22.02

20.82

3) DMPS(DTSS) 21.86

15.96

14.31

13.58

13.18

12.80

11.59

Page 24: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 24

Interpretation of the results• Dedicated system:

• as expected, all algorithms perform better on a dedicated system, compared to a non-dedicated one.

• DMPS(TSS) slightly outperforms DMPS(CSS) for parallel loops, because it provides better load balancing

• DMPS(DTSS) outperforms both other algorithms because it explicitly accounts for system’s heterogeneity

• Non-dedicated system:

• DMPS(DTSS) stands out even more, since the other algorithms cannot handle extra load variations

• The speedup for DMPS(DTSS) increases in all cases

• H must be chosen so as to maintain the comm/comp ratio < 1,

for every test case

• Even then, small variations of the value of H, do not significantly affect the overall performance.

Page 25: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 25

OutlineOutline• IntroductionIntroduction

• Notation

• Some existing self-scheduling algorithms

• Dynamic self-scheduling for dependence

loops

• Implementation and test results

• Conclusions

• Future work

Page 26: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 26

ConclusionsConclusions

• Loops with dependencies can now be

dynamically scheduled on heterogeneous

dedicated & non-dedicated systems

• Distributed algorithms efficiently compensate

for the system’s heterogeneity for loops with

dependencies, especially in non-dedicated

systems

Page 27: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 27

OutlineOutline• IntroductionIntroduction

• Notation

• Some existing self-scheduling algorithms

• Dynamic self-scheduling for dependence

loops

• Implementation and test results

• Conclusions

• Future work

Page 28: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 28

Future work

• Establish a model for predicting the

optimal synchronization interval H and

minimize the communication

• Extend all other self-scheduling

algorithms, such that they can handle

loops with dependencies and account for

system’s heterogeneity

Page 29: Dynamic Multi Phase Scheduling for Heterogeneous Clusters

April 27, 2006 IPDPS 2006 29

Thank you

Questions?