prospector : a toolchain to help parallel programming minjang kim, hyesoon kim, hparch lab, and...

17
Prospector: A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by Samsung

Upload: laureen-whitehead

Post on 26-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

Prospector: A Toolchain To Help Parallel Programming

Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel

This work will be also supported by Samsung

Page 2: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

2

Motivation (1/2) Parallel programming is hard

What if there is a tool that helps parallel programming? Already we have some tools like race

detectors

However, not many tools on guiding parallel programming itself A program wants to parallelize a serial

code Where to parallelize? How to parallelize?

Page 3: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

3

Motivation (2/2) We propose Prospector

A set of dynamic program analyzers to help parallelization of serial code

Goals Give information to find right

parallelization targets Provide advices on writing correct and

optimized parallelized code

Page 4: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

4

Overview of Prospector

Parallelizable

Section Finder

Parallelism Pattern Advisor

Parallel Speedup Predictor

Func1(){ Loop1; Loop2; Func2();}

Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}

Architecture

Advisor

# of core

Sp

eed

up

2 4 8 Sp

eed

up

CPU GPU

Parallel Performance Analyzer

Func1(){ Loop1; Loop2; Func2();}

Func2() { Loop3}

Source code or Binary

Input

Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}

Loop-Centric Profiler

Loop1

Invocation:Iteration:Max Iter:Min Iter:

85,0001,600

40

Page 5: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

5

Prospector: Loop-Centric Profiler Q: Which code section would good for parallelization?

Mostly frequently executed loops Legacy profilers only report hot functions and

instructions

We provide details of loop execution # of trip count Sufficient work? # of invocation Low fork/join overhead? Stats of the length of loop iteration Balanced?

Min, Max, Stdev

Loop1

Invocation:Iteration:Max Iter:Min Iter:

85,0001,600

40

Page 6: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

6

Prospector: Parallel Speedup Predictor (1/2)

Q: What would be expected speedup?

Analytical models (e.g., Amdahl’s Law) are not practical to predict speedup in the presence of locks

Our approach Dynamically predicting speedup based on

light profiling

Challenges How to model architecture factors (e.g., caches,

memory)?

# of core

Sp

eed

up

2 4 8

Page 7: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

7

Prospector: Parallel Speedup Predictor (2/2)

Mechanisms Programmers annotate the serial code

Describe the behaviors of parallel execution + locks Fast and light profiling

Measure time between annotations Emulation

Obtain estimated parallel execution time for speedup

Modeling architectural parameters Sampling memory accesses Using an analytical model for cache hit/miss

prediction

Page 8: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

8

Prospector: Parallelizable Section Finder (1/3)

Q: Is this code section parallelizable?

Data dependences determine the parallelizability Compilers may not be good due to pointers and

complex control flows

Our approach Dynamic data-dependence profiling Provides detailed dependence information for a given

input

Challenges Too much overhead; Smart algorithm is needed

Func1(){ Loop1; Loop2; Func2();}

Parallelizable!

Page 9: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

9

Prospector: Parallelizable Section Finder (2/3)

Mechanisms A dynamic profiler by using

instrumentations Instrumentation can be either binary and source

level At instrumentation time (or static time)

Analyzes control flow graphs and loop structures At runtime

We observe memory addresses (no pointer-to analysis)

These memory addresses are stored and analyzed to discover data dependences

Page 10: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

10

Prospector: Parallelizable Section Finder (3/3)

Mechanisms Scalability

Current tools require too much memory and time to analyze data dependence

Prospector implements a new scalable algorithm for data dependence profiling

Key ideas Using compression and parallelization (MICRO

‘10)

Page 11: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

11

Prospector: Parallelism Pattern Advisor

Q: How can I transform the serial code?

If dependences are easily removable I.e., Embarrassingly parallel loops with some

reductions Guide parallelization strategy directly

E.g., Use OpenMP pragma here

If severe dependences exist Can we give advice on avoiding these dependences?

General solutions are extremely hard Instead data-dependence pattern analysis

E.g., pipeline parallelism, a certain form of locking

Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}

Page 12: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

12

Prospector: Parallel Architecture Advisor

Q: Which parallel hardware would be better?

Can we predict performances on different hardware? E.g., Speedups on multicore and GPGPU

Challenges Need to model more architectural factors

Sp

eed

up

CPU GPU

Page 13: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

13

Prospector: Parallel Performance Analyzer

Q: What is the reason of poor speedup?

There are a couple of profiler for this purpose Analyzes the degree of concurrency Profiles lock contentions (wait time) Too low-level information to understand

problems

Alternative Macroscopic profiling of parallelized programs An alternative form of visualizations

Loop3 { Statements; Lock(); Statements; Unlock(); Statements;}

Page 14: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

14

Related Work State-of-the-art tools

Parallel Advisor from Intel Parallel Studio 2011 Speedup Predictor: cannot model architectures Parallelizable Section Finder: scalability issues

vfAnalyst from VectorFabric Parallelizable Section Finder: scalability issues

Page 15: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

15

Current Status and Timeline June 2010

Initial Prospector’s idea is presented in HotPar ‘10 Dec 2010

Scalable data-dependence profiling algorithm (for Parallelizable Section Finder and Pattern Advisor) will be presented in MICRO ’10

Beta version will be released as open source Loop-centric profiler Parallelizable Section Finder (i.e. Data-Dependence profiler) Parallel speedup predictor

Mar 2010 Parallel Speedup Predictor will be released

Aug 2010 First Parallelism Pattern Advisor will be released

Page 16: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

16

Conclusion We need a new type of tool to help parallel programming

Prospector is a set of parallel programming advisor based on dynamic program analysis Finds good parallelization target Analyzes serial code to understand the

behavior Predicts speedup Provides advice on code changes

Page 17: Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by

17

Thank you! Q&A

References Overall tool architecture

Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "Prospector: Helping Parallel Programming by A Data-Dependence Profiler", 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10), June 2010.

Scalable data-dependence profiling Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "SD3: A Scalable

Approach To Dynamic Data-Dependence Profiling", Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2010.