data-centric performance measurement technique for chapel ... · data-centric performance...
TRANSCRIPT
![Page 1: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/1.jpg)
Data-Centric Performance Measurement Technique for
Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth
{hzhang86, hollings}@cs.umd.edu Department of Computer Science, University of Maryland-
College Park
1
![Page 2: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/2.jpg)
Introduction
• Why PGAS (Partitioned Global Address Space )
Parallel programming is too hard
Unified solution for mixed mode parallelism (multi-core + multi-node)
• Why Chapel Emerging PGAS language with productive features
Potential for performance improvement and few useful profilers for its end users
Insights for the language evolvement in the future
2
![Page 3: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/3.jpg)
Data-centric Profiling
3
int busy(int *x) { // hotspot function *x = complex(); return *x; } int main() { for (i=0; i<n; i++) { A[i] = busy(&B[i]) + busy(&C[i-1]) + busy(&C[i+1]); } }
Data-centric Profiling
main: 100% latency busy: 100% latency complex: 100% latency
main: 100% latency busy: 100% latency complex: 100% latency
Code-centric Profiling
A: 100% latency B: 33.3% latency C: 66.7% latency
A: 100% latency B: 33.3% latency C: 66.7% latency
![Page 4: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/4.jpg)
Our Contribution
1. Data-centric profiling of PGAS programs
2. First Chapel-specific profiler
3. Profiled three benchmarks and improved the performance up to 2.3x
4
![Page 5: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/5.jpg)
Tool Framework
1: Intraprocedural Static Analysis
Module: Global Variables, Type
Analysis (class, record)
Function: Local Variables,
Parameters, Return Values
Module: Global Variables, Type
Analysis (class, record)
Function: Local Variables,
Parameters, Return Values
2: Monitored Execution
Run the Program with Sampling and
Instrumentation Enabled
Run the Program with Sampling and
Instrumentation Enabled
3: Post Processing
Data Flow
Analysis
Control Flow
Analysis
Node 1 Node 4 Node 2
Node 3
Aggregate Data from All Nodes and Display Aggregate Data from All Nodes and Display
4: GUI Presentation
Decode Context Sensitive
Samples
Variable Profiles
(Per Node)
5
![Page 6: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/6.jpg)
Blame Definition
1) 𝑩𝒍𝒂𝒎𝒆𝑺𝒆𝒕 𝒗 = 𝑩𝒂𝒄𝒌𝒘𝒂𝒓𝒅𝑺𝒍𝒊𝒄𝒆 𝒘𝒘∈𝑾
2) 𝒊𝒔𝑩𝒍𝒂𝒎𝒆𝒅 𝒗, 𝒔 = {𝒊𝒇 𝒔 ∈ 𝑩𝒍𝒂𝒎𝒆𝑺𝒆𝒕 𝒗 𝒕𝒉𝒆𝒏 𝟏 𝒆𝒍𝒔𝒆 𝟎}
3) 𝑩𝒍𝒂𝒎𝒆𝑷𝒆𝒓𝒄𝒆𝒏𝒕𝒂𝒈𝒆 𝒗, 𝑺 = 𝒊𝒔𝑩𝒍𝒂𝒎𝒆𝒅(𝒗,𝒔)𝒔∈𝑺
𝑺
• v: a certain variable • w: a write statement to v’s memory region • W: a set of w (all write statements to v’s memory region) • s: a sample • S: a set of samples
6
![Page 7: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/7.jpg)
Blame Calculation Example
1 a=2;
2 b=3; //Sample 1
3 if a<b //Sample 2
4 a=b+1; //Sample 3
5 c=a+b; //Sample 4
7
Variable Name a b c
BlameSet 1, 3, 4 2 1, 2, 3, 4, 5
Blame Samples S2, S3 S1 S1, S2, S3, S4
Blame 50% 25% 100%
![Page 8: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/8.jpg)
GUI screenshots of MiniMD
Code-centric Data-centric
8
![Page 9: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/9.jpg)
Optimization Result - MiniMD
0
5
10
15
20
25
w/o --fast w/ --fast
20.9
6.41
9.2
2.5 Exe
cuti
on
Tim
e (
s)
original
optimized
9
![Page 10: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/10.jpg)
Experiment - CLOMP
Name Type Blame Context
partArray [partDomain] Part 99.5% main
->partArray[i] Part 99.5% main
->partArray[i].zoneArray[j] Zone 99.0% main
->partArray[i].zoneArray[j].value real 99.0% main
->partArray[i].residue real 12.3% main
remaining_deposit real 11.8% update_part
10
![Page 11: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/11.jpg)
Optimization Result – CLOMP
0
1
2
3
4
5
6
7
8
1024/64,000 65536/10 12/640,000 65536/6400
4.02
4.79
3.87
7.88
2.18
4.4
1.82
7.14
Exe
cuti
on
Tim
e (
s)
Different Problem Sizes (#parts/#zones per part)
original
optimized
w/o --fast
11
![Page 12: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/12.jpg)
Experiment – LULESH
12
1. Number of profiling samples in this function 2. Percentage of profiling samples in this function 3. Cumulative percentage of samples 4. Number of samples in this function and its callees 5. Percentage of samples in this function and its callees 6. Function name
1 2 3 4 5 6
![Page 13: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/13.jpg)
Experiment – LULESH Name Type Blame Context
hgfz 8*real 30.8% CalcFBHourglassForceForElems
hgfx 8*real 29.5% CalcFBHourglassForceForElems
hgfy 8*real 29.2% CalcFBHourglassForceForElems
shz real 27.9% CalcElemFBHourglassForce
hz 4*real 27.6% CalcElemFBHourglassForce
shx real 26.9% CalcElemFBHourglassForce
shy real 26.6% CalcElemFBHourglassForce
hx 4*real 26.6% CalcElemFBHourglassForce
hy 4*real 26.6% CalcElemFBHourglassForce
hourgam 8*(4*real) 25.0% CalcFBHourglassForceForElems
determ [Elems] real 15.7% CalcVolumeForceForElems
b_x 8*real 9.7% IntegrateStressForElems
b_z 8*real 9.7% IntegrateStressForElems
b_y 8*real 8.7% IntegrateStressForElems
dvdx(y/z) [Elems] 8*real 8.3% CalcHourglassControlForElems
hourmodx real 5.8% CalcFBHourglassForceForElems
hourmody real 5.1% CalcFBHourglassForceForElems
hourmodz real 4.8% CaclFBHourglassForceForElems
13
![Page 14: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/14.jpg)
Optimization Example - Loop
Code Snapshot of LULESH Hot Spot
14
![Page 15: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/15.jpg)
Results for different loop optimizations
11
11.2
11.4
11.6
11.8
12
12.2
12.4
12.6
12.8
13
12.47
12.04
11.65
12.95
11.78
12.59
11.89
12.6
12.1
12.33
12.75
Exe
cuti
on
Tim
e (
s)
U*: manual loop unrolling at place *
15
![Page 16: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/16.jpg)
Optimization Result – LULESH
0
2
4
6
8
10
12
14
w/o --fast w/ --fast
12.47
4.7
11.57
4.59
11.65
4.54
9.98
3.39
9.02
3.2
original
CENN
P1
VG
best case
Performance Improvement 27.7%
Exe
cuti
on
Tim
e (
s)
16
![Page 17: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/17.jpg)
Updates & Future Work
• Updates: – Built a prototype for multi-node Chapel
– Optimized runtime instrumentation
– Improved Graphic-User-Interface
• Future work: – Large-size problems on distributed systems
– Further application of “Blame” in other fields
17
![Page 18: Data-Centric Performance Measurement Technique for Chapel ... · Data-Centric Performance Measurement Technique for Chapel Programs Hui Zhang, Jeffrey K. Hollingsworth {hzhang86,](https://reader035.vdocuments.mx/reader035/viewer/2022071011/5fc99af23f3db94e3d282cf4/html5/thumbnails/18.jpg)
Conclusion
“Blame” application on PGAS programs
First Chapel-specific profiler
Benchmark optimization
18
1 1 1
2.3 2.1
1.4
0
0.5
1
1.5
2
2.5
MiniMD CLOMP LULESH
Spe
ed
-up
s
orginal
optimized