complexity measures for parallel computation
DESCRIPTION
Complexity Measures for Parallel Computation. Several possible models!. Execution time and parallelism: Work / Span Model Total cost of moving data: Communication Volume Model Detailed models that try to capture time for moving data: Latency / Bandwidth Model (for message-passing) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/1.jpg)
Complexity Measures for
Parallel Computation
![Page 2: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/2.jpg)
Several possible models!Several possible models!
• Execution time and parallelism: • Work / Span Model
• Total cost of moving data: • Communication Volume Model
• Detailed models that try to capture time for moving data:• Latency / Bandwidth Model (for message-passing)
• Cache Memory Model (for hierarchical memory)
• Other detailed models we won’t discuss: LogP, UMH, ….
![Page 3: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/3.jpg)
tp = execution time on p processors
Work / Span ModelWork / Span Model
![Page 4: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/4.jpg)
tp = execution time on p processors
t1 = work
Work / Span ModelWork / Span Model
![Page 5: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/5.jpg)
tp = execution time on p processors
*Also called critical-path lengthor computational depth.
t1 = work t∞ = span *
Work / Span ModelWork / Span Model
![Page 6: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/6.jpg)
tp = execution time on p processors
t1 = work t∞ = span *
*Also called critical-path lengthor computational depth.
WORK LAW∙tp ≥t1/pSPAN LAW
∙tp ≥ t∞
Work / Span ModelWork / Span Model
![Page 7: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/7.jpg)
Work: t1(A∪B) =
Series CompositionSeries Composition
AA BB
Work: t1(A∪B) = t1(A) + t1(B)Span: t∞(A∪B) = t∞(A) +t∞(B)Span: t∞(A∪B) =
![Page 8: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/8.jpg)
Parallel CompositionParallel Composition
AA
BB
Span: t∞(A∪B) = max{t∞(A), t∞(B)}
Work: t1(A∪B) = t1(A) + t1(B)
![Page 9: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/9.jpg)
Def. t1/tP = speedup on p processors.
If t1/tP = (p), we have linear speedup,
= p, we have perfect linear speedup,
> p, we have superlinear speedup,
(which is not possible in this model, because of the Work Law tp ≥ t1/p)
SpeedupSpeedup
![Page 10: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/10.jpg)
ParallelismParallelism
Because the Span Law requires tp ≥ t∞,
the maximum possible speedup is
t1/t∞ = (potential) parallelism
= the average amount of work per step along the span.
![Page 11: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/11.jpg)
Communication Volume ModelCommunication Volume Model
• Network of p processors• Each with local memory• Message-passing
• Communication volume (v)• Total size (words) of all messages passed during computation• Broadcasting one word costs volume p (actually, p-1)
• No explicit accounting for communication time• Thus, can’t really model parallel efficiency or speedup;
for that, we’d use the latency-bandwidth model (see next slide)
![Page 12: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/12.jpg)
Complexity Measures for Parallel ComputationComplexity Measures for Parallel Computation
Problem parameters:• n index of problem size• p number of processors
Algorithm parameters:
• tp running time on p processors
• t1 time on 1 processor = sequential time = “work”
• t∞ time on unlimited procs = critical path length = “span”
• v total communication volume
Performance measures
• speedup s = t1 / tp
• efficiency e = t1 / (p*tp) = s / p
• (potential) parallelism pp = t1 / t∞
![Page 13: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/13.jpg)
Laws of Parallel ComplexityLaws of Parallel Complexity
• Work law: tp ≥ t1 / p
• Span law: tp ≥ t∞
• Amdahl’s law:
• If a fraction f, between 0 and 1, of the work must be
done sequentially, then
speedup ≤ 1 / f
• Exercise: prove Amdahl’s law from the span law.
![Page 14: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/14.jpg)
Detailed complexity measures for data movement I: Detailed complexity measures for data movement I: Latency/Bandwith Model Latency/Bandwith Model
Moving data between processors by message-passing
• Machine parameters:
• or tstartup latency (message startup time in seconds)
• or tdata inverse bandwidth (in seconds per word)
• between nodes of Triton, × 10-6and × 10-9
• Time to send & recv or bcast a message of w words: + w*
• tcomm total commmunication time
• tcomp total computation time
• Total parallel time: tp = tcomp + tcomm
![Page 15: Complexity Measures for Parallel Computation](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568158c7550346895dc610ee/html5/thumbnails/15.jpg)
Moving data between cache and memory on one processor:• Assume just two levels in memory hierarchy, fast and slow• All data initially in slow memory
• m = number of memory elements (words) moved between fast and slow memory
• tm = time per slow memory operation
• f = number of arithmetic operations
• tf = time per arithmetic operation, tf << tm
• q = f / m average number of flops per slow element access
• Minimum possible time = f * tf when all data in fast memory
• Actual time
• f * tf + m * tm = f * tf * (1 + tm/tf * 1/q)
• Larger q means time closer to minimum f * tf
Detailed complexity measures for data movement II: Detailed complexity measures for data movement II: Cache Memory Model Cache Memory Model