prospector : a toolchain to help parallel programming minjang kim, hyesoon kim, hparch lab, and...
TRANSCRIPT
Prospector: A Toolchain To Help Parallel Programming
Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel
This work will be also supported by Samsung
2
Motivation (1/2) Parallel programming is hard
What if there is a tool that helps parallel programming? Already we have some tools like race
detectors
However, not many tools on guiding parallel programming itself A program wants to parallelize a serial
code Where to parallelize? How to parallelize?
3
Motivation (2/2) We propose Prospector
A set of dynamic program analyzers to help parallelization of serial code
Goals Give information to find right
parallelization targets Provide advices on writing correct and
optimized parallelized code
4
Overview of Prospector
Parallelizable
Section Finder
Parallelism Pattern Advisor
Parallel Speedup Predictor
Func1(){ Loop1; Loop2; Func2();}
Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}
Architecture
Advisor
# of core
Sp
eed
up
2 4 8 Sp
eed
up
CPU GPU
Parallel Performance Analyzer
Func1(){ Loop1; Loop2; Func2();}
Func2() { Loop3}
Source code or Binary
Input
Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}
Loop-Centric Profiler
Loop1
Invocation:Iteration:Max Iter:Min Iter:
85,0001,600
40
5
Prospector: Loop-Centric Profiler Q: Which code section would good for parallelization?
Mostly frequently executed loops Legacy profilers only report hot functions and
instructions
We provide details of loop execution # of trip count Sufficient work? # of invocation Low fork/join overhead? Stats of the length of loop iteration Balanced?
Min, Max, Stdev
Loop1
Invocation:Iteration:Max Iter:Min Iter:
85,0001,600
40
6
Prospector: Parallel Speedup Predictor (1/2)
Q: What would be expected speedup?
Analytical models (e.g., Amdahl’s Law) are not practical to predict speedup in the presence of locks
Our approach Dynamically predicting speedup based on
light profiling
Challenges How to model architecture factors (e.g., caches,
memory)?
# of core
Sp
eed
up
2 4 8
7
Prospector: Parallel Speedup Predictor (2/2)
Mechanisms Programmers annotate the serial code
Describe the behaviors of parallel execution + locks Fast and light profiling
Measure time between annotations Emulation
Obtain estimated parallel execution time for speedup
Modeling architectural parameters Sampling memory accesses Using an analytical model for cache hit/miss
prediction
8
Prospector: Parallelizable Section Finder (1/3)
Q: Is this code section parallelizable?
Data dependences determine the parallelizability Compilers may not be good due to pointers and
complex control flows
Our approach Dynamic data-dependence profiling Provides detailed dependence information for a given
input
Challenges Too much overhead; Smart algorithm is needed
Func1(){ Loop1; Loop2; Func2();}
Parallelizable!
9
Prospector: Parallelizable Section Finder (2/3)
Mechanisms A dynamic profiler by using
instrumentations Instrumentation can be either binary and source
level At instrumentation time (or static time)
Analyzes control flow graphs and loop structures At runtime
We observe memory addresses (no pointer-to analysis)
These memory addresses are stored and analyzed to discover data dependences
10
Prospector: Parallelizable Section Finder (3/3)
Mechanisms Scalability
Current tools require too much memory and time to analyze data dependence
Prospector implements a new scalable algorithm for data dependence profiling
Key ideas Using compression and parallelization (MICRO
‘10)
11
Prospector: Parallelism Pattern Advisor
Q: How can I transform the serial code?
If dependences are easily removable I.e., Embarrassingly parallel loops with some
reductions Guide parallelization strategy directly
E.g., Use OpenMP pragma here
If severe dependences exist Can we give advice on avoiding these dependences?
General solutions are extremely hard Instead data-dependence pattern analysis
E.g., pipeline parallelism, a certain form of locking
Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}
12
Prospector: Parallel Architecture Advisor
Q: Which parallel hardware would be better?
Can we predict performances on different hardware? E.g., Speedups on multicore and GPGPU
Challenges Need to model more architectural factors
Sp
eed
up
CPU GPU
13
Prospector: Parallel Performance Analyzer
Q: What is the reason of poor speedup?
There are a couple of profiler for this purpose Analyzes the degree of concurrency Profiles lock contentions (wait time) Too low-level information to understand
problems
Alternative Macroscopic profiling of parallelized programs An alternative form of visualizations
Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}
14
Related Work State-of-the-art tools
Parallel Advisor from Intel Parallel Studio 2011 Speedup Predictor: cannot model architectures Parallelizable Section Finder: scalability issues
vfAnalyst from VectorFabric Parallelizable Section Finder: scalability issues
15
Current Status and Timeline June 2010
Initial Prospector’s idea is presented in HotPar ‘10 Dec 2010
Scalable data-dependence profiling algorithm (for Parallelizable Section Finder and Pattern Advisor) will be presented in MICRO ’10
Beta version will be released as open source Loop-centric profiler Parallelizable Section Finder (i.e. Data-Dependence profiler) Parallel speedup predictor
Mar 2010 Parallel Speedup Predictor will be released
Aug 2010 First Parallelism Pattern Advisor will be released
16
Conclusion We need a new type of tool to help parallel programming
Prospector is a set of parallel programming advisor based on dynamic program analysis Finds good parallelization target Analyzes serial code to understand the
behavior Predicts speedup Provides advice on code changes
17
Thank you! Q&A
References Overall tool architecture
Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "Prospector: Helping Parallel Programming by A Data-Dependence Profiler", 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10), June 2010.
Scalable data-dependence profiling Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "SD3: A Scalable
Approach To Dynamic Data-Dependence Profiling", Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2010.