the impact of data dependence analysis on compilation and program parallelization original research...
Post on 28-Dec-2015
219 Views
Preview:
TRANSCRIPT
The Impact of Data Dependence Analysis on Compilation and
Program Parallelization
Original Research byKleanthis Psarris & Konstantinos Kyriakopoulos
Year of Publication: 2003
Presentation by Jamie Perkins
Data Dependence Analysis• Key to optimization and detection of
implicit parallelism in sequential code.
• Helps compiler improve memory, improve load balancing and determine efficient scheduling.
• Different test for data dependence provide different trade-offs.– Accuracy vs. Efficiency
About this research…• Sun UltraSPARC-IIi with 440 MHz CPU
and 512 Mbytes main memory.
• 2 different applications tested– Perfect Club Benchmarks– Lapack
• 4 different tests applied– Greatest Common Divisor Test (GCD)– Banerjee Test– I – Test– Omega Test
Polaris Compiler
• Developed at the University of Illinois at Urbana Champaign & Purdue University.
• Parallelizes Fortran 77 programs for execution on shared memory multiprocessors.
Applications
• Perfect Club Benchmark (PCB)– Collection of 13 scientific & engineering
Fortran 77 programs.
• Lapack (LP)– A library of subroutines for solving linear
algebra problems in Fortran 77.
Tests applied
• Greatest Common Divisor Test (GCD)– Based on theorem of elementary number
theory.
• Banerjee Test– Based on the Intermediate Value Theorem.
These two tests are applied together.
Tests Applied (cont.)
• I – Test– Based on & enhances the Banerjee test
and the GCD test.– Adds “accuracy conditions” to the previous
tests.
• Omega Test– Based on a combination of the Least
Remainder Algorithm and Fourier-Motzkin Variable Elimination.
Data Dependence Problems for PCB
30%
70%
7%
30%
63%
11%
35%
54%
Banerjee Test I -Test Omega Test
KEY:INDEPENDENT DEPENDENT MAYBE
***100% is equal to 59936
Data Dependence Problems for LP
13%
87%
Banerjee Test I -Test Omega Test
KEY:INDEPENDENT DEPENDENT MAYBE
1%
13%
86%
10%
22%
68%
***100% is equal to 293,718
Avg. Cost per Data Dependence in PCB
0
50
100
150
200
250
300
Total Indep. Dep. Maybe
BanerjeeI-TestOmegaT
ime
(mse
c)
Avg. Cost per Data Dependence in LP
0
20
40
60
80
100
Total Indep. Dep. Maybe
Banerjee
I-Test
OmegaTim
e (m
sec)
Total Compilation Time
113.4 111.6
330.5
0
50
100
150
200
250
300
350
87.9 88.2
371.7
0
50
100
150
200
250
300
350
400
Baner
jee
I-Tes
t
Om
ega
Tim
e in
Min
ute
s
Perfect Club Benchmark Lapack Library
Tim
e in
Min
ute
s
Parallelizable Loops
4118
2241
2253
2295
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Total
Baner
jee
I-Tes
t
Omeg
a0
1000
2000
3000
4000
5000
6000
7000
8000
Num
ber
of L
oops
Num
ber
of L
oops
Perfect Club Benchmark Lapack Library
Execution Time
• Perfect Club Benchmark– Only 4 out of the 11 could be effectively
parallelized.
• Lapack Library– Much better results, the execution time of
7 of the programs were cut in half.
Prog. Test Serial
Time
2-p 4-p 6-p 8-p
Banerjee
26.70
26.64 25.87 26.23 27.71
I-Test 26.62 25.87 26.24 27.80
Omega 26.62 25.90 26.21 27.83
Banerjee
8.54
5.56 3.27 2.57 2.24
I-Test 5.56 3.27 2.57 2.24
Omega 5.44 3.15 2.41 2.07
OC
EA
NB
DN
APerfect Club Benchmark
Prog. Test Serial
Time
2-p 4-p 6-p 8-p
Banerjee
18.40
9.54 6.60 4.43 4.97
I-Test 9.53 6.62 4.49 4.86
Omega 9.48 6.56 4.42 4.83
Banerjee
33.85
17.03 11.61 7.48 14.79
I-Test 17.06 11.58 7.52 14.34
Omega 17.02 11.56 7.42 15.74
GE
P E
INR
EC
T L
INLapack Library
Conclusions– Data dependence accuracy
• Depending on program differences, may not be substantial (PBC vs. LP).
– Efficiency• Often a trade-off (efficiency vs. accuracy), Omega
proved more accurate at a high cost.
– Effectiveness• All 3 tests found similar number of parallelizable
loops.
– Execution Performance• Again all three tests produced similar results in
execution.
Thank You
Any Questions?
top related