parallel computing chapter 7 performance and scalabilityjzhang/cs621/chapter7.pdf · ·...
TRANSCRIPT
![Page 1: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/1.jpg)
Parallel ComputingChapter 7
Performance and ScalabilityJun Zhang
Department of Computer ScienceUniversity of Kentucky
![Page 2: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/2.jpg)
7.1 Parallel Systems
• Definition: A parallel system consists of analgorithm and the parallel architecture thatthe algorithm is implemented.
• Note that an algorithm may have differentperformance on different parallel architecture.
• For example, an algorithm may performdifferently on a linear array of processors andon a hypercube of processors.
![Page 3: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/3.jpg)
7.2 Performance Metrices for Parallel Systems
• Run Time: The parallel run time is defined as the timethat elapses from the moment that a parallelcomputation starts to the moment that the lastprocessor finishes execution.
• Notation: Serial run time , parallel run time .ST PT
![Page 4: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/4.jpg)
7.3 Speedup
• The speedup is defined as the ratio of the serial runtimeof the best sequential algorithm for solving a problem tothe time taken by the parallel algorithm to solve thesame problem on p processors.
• Example Adding n numbers on an n processor hypercube
P
S
TTS
),(nTS ),(log nTP ),log
(n
nS
![Page 5: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/5.jpg)
![Page 6: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/6.jpg)
Using Reduction Algorithm
![Page 7: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/7.jpg)
7.4 Efficiency
• The efficiency is defined as the ratio of speedup to the number of processors. Efficiency measures the fraction of time for which a processor is usefully utilized.
• Example Efficiency of adding n numbers on an n‐processor hypercube
p
S
pTT
pSE
)log
1()1.log
(nnn
nE
![Page 8: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/8.jpg)
7.5 Cost
• The cost of solving a problem on a parallel system is defined as the product of run time and the number of processors.
• A cost‐optimal parallel system solves a problem with a cost proportional to the execution time of the fastest known sequential algorithm on a single processor.
• Example Adding n numbers on an n‐processor hypercube.Cost is for the parallel system and for sequential algorithm. The system is not cost‐optimal.
)log( nn )(n
![Page 9: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/9.jpg)
7.6 Granularity and Performance
• Use less than the maximum number of processors.• Increase performance by increasing granularity of
computation in each processor.• Example Adding n numbers cost‐optimally on a hypercube.
Use p processors, each holds n/p numbers. First add the n/p numbers locally. Then the situation becomes adding pnumbers on a p processor hypercube. Parallel run time and cost:
),log/( ppn )log( ppn
![Page 10: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/10.jpg)
7.7 Scalability
• Scalability is a measure of a parallel system’s capacity to increase speedup in proportion to the number of processors.
• Example Adding n numbers cost‐optimally
Well‐known Amdahl’s law dictates the achievable speedup and efficiency.
ppnn
pSE
ppnnpS
ppnT p
log2
log2
log2
![Page 11: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/11.jpg)
![Page 12: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/12.jpg)
7.8 Amdahl’s Law (1967)
• The speedup of a program using multiple processors in parallel computing is limited by the time needed for the serial fraction of the problem.
• If a problem of size W has a serial component Ws, the speedup of the program is
pWWS
WpWWWS
Wp
WWT
s
ss
ss
p
as
/)(
![Page 13: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/13.jpg)
7.9 Amdahl’s Law
• If Ws=20%, W‐Ws=80%, then
• So no matter how many processors are used, the speedup cannot be greater than 5
• Amdahl’s Law implies that parallel computing is only useful when the number of processors is small, or when the problem is perfectly parallel, i.e., embarrassingly parallel
pS
pS
as 52.0
12.0/8.0
1
![Page 14: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/14.jpg)
![Page 15: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/15.jpg)
7.10 Gustafson’s Law (1988)
• Also known as the Gustafson‐Barsis’s Law• Any sufficiently large problem can be efficiently parallelized
with a speedup
• where p is the number of processors, and α is the serial portion of the problem
• Gustafson proposed a fixed time concept which leads to scaled speedup for larger problem sizes.
• Basically, we use larger systems with more processors to solve larger problems
)1( ppS
![Page 16: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/16.jpg)
7.10 Gustafson’s Law (Cont)
• Execution time of program on a parallel computer is (a+b)• a is the sequential time and b is the parallel time• Total amount of work to be done in parallel varies linearly
with the number of processors. So b is fixed as p is varied. The total run time is (a + p*b)
• The speedup is (a+p*b)/(a+b)• Define α = a/(a+b) , the sequential fraction of the execution
time, then
)1( ppS
![Page 17: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/17.jpg)
Gustafson’s Law
![Page 18: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/18.jpg)
7.11 Scalability (cont.)
• Increase number of processors ‐‐> decrease efficiency• Increase problem size ‐‐> increase efficiency
• Can a parallel system keep efficiency by increasing the number of processors and the problem size simultaneously???Yes: ‐‐> scalable parallel systemNo: ‐‐> non‐scalable parallel system
A scalable parallel system can always be made cost‐optimal by adjusting the number of processors and the problem size.
![Page 19: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/19.jpg)
7.12 Scalability (cont.)
• E.g. 7.7 Adding n numbers on an n‐processor hypercube, the efficiency is
• No way to fix E (make the system scalable)• Adding n numbers on a p processor hypercube optimally, the
efficiency is
Choosing makes E a constant.
)log
1(n
E
ppnnE
log2
)log( ppn
![Page 20: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/20.jpg)
7.13 Isoefficiency Metric of Scalability
• Degree of scalability: The rate to increase the size of the problem to maintain efficiency as the number of processors changes.
• Problem size: Number of basic computation steps in best sequential algorithm to solve the problem on a single processor ( ).
• Overhead function: Part of the parallel system cost (processor‐time product) that is not incurred by the fastest known serial algorithm on a serial computer
STW
WpTT pO
![Page 21: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/21.jpg)
7.14 Isoefficiency Function
• Solve the above equation for W
• The isoefficiency function determines the growth rate of W required to keep the efficiency fixed as p increases.
• Highly scalable systems have small isoefficiency function.
WpWTE
ppWTWT
O
OP
/),(11
),(
),(),(1
pWKTpWTE
EW OO
![Page 22: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/22.jpg)
7.15 Sources of Parallel Overhead
• Interprocessor communication: increase data locality to minimize communication.
• Load imbalance: distribution of work (load) is not uniform. Inherent parallelism of the algorithm is not sufficient.
• Extra computation: modify the best sequential algorithm may result in extra computation. Extra computation may be done to avoid communication.
7.6 Minimum Execution Time
ppnTp log2
![Page 23: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/23.jpg)
A Few Examples
• Example Overhead Function for adding n numbers on a p processor hypercube.Parallel run time is: Parallel system cost is:Serial cost (and problem size) is:The overhead function is:
What can we know here?
ppnTp log2
ppnpT p log2
nTS
ppnppn
WpTT pO
log2)log2(
![Page 24: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/24.jpg)
Example
• Example Isoefficiency Function for adding n numbers on a p processor Hypercube.The overhead function is:
Hence, isoefficiency function is
Increasing the # of processors from to , the problem size has to be increased by a factor of
If If , for , we have to have
ppTO log2
pKpW log2
0p 1p
)log/(log 0011 pppp
.8.0,64,4 Enp 16p 8.0E .512n
![Page 25: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/25.jpg)
Example
• Example Isoefficiency Function with a Complex Overhead Function.Suppose overhead function is:
For the first term:
For the second term:
Solving it yields
The second term dominates, so the overall asymptotic isoefficiencyfunction is .
4/34/32/3 WppTO
2/3KpW
4/34/3 WKpW
)( 3p
34 pKW
![Page 26: Parallel Computing Chapter 7 Performance and Scalabilityjzhang/CS621/chapter7.pdf · · 2014-03-26Parallel Computing Chapter 7 Performance and Scalability Jun Zhang ... 7.5 Cost](https://reader036.vdocuments.mx/reader036/viewer/2022062907/5aa332997f8b9aa0108e3ec4/html5/thumbnails/26.jpg)
Example
• Example Minimum Cost‐Optimal Execution time for Adding nNumbersMinimum execution time is:
If, for cost‐optimal
then
Now solve (1) for p, we have
nTnp P log2,2
min
)1(log)( pppfnW
pppn logloglogloglog
nnTnnpnnfp
OptimalCostP loglog2log3
log/log/)(1