parallelising serial applications

26
Darryl Gove Compiler Performance Engineering Parallelising serial applications

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelising serial applications

Darryl GoveCompiler Performance Engineering

Parallelising serial applications

Page 2: Parallelising serial applications

2Parallelising serial applications – Darryl Gove – April 2008

Topics

• Process• Tools• Expectations

Page 3: Parallelising serial applications

3Parallelising serial applications – Darryl Gove – April 2008

Profile

• Compile with debug info> -g [C/Fortran]> -g0 [C++]> Enables mapping of disassembly to source

• Profile with Performance Analyzer> collect <app> <params>> collect -P <pid>

Page 4: Parallelising serial applications

4Parallelising serial applications – Darryl Gove – April 2008

Application profile

Page 5: Parallelising serial applications

5Parallelising serial applications – Darryl Gove – April 2008

Source level profile

Page 6: Parallelising serial applications

6Parallelising serial applications – Darryl Gove – April 2008

Parallelisation opportunities

• Large tasks• No dependencies• Avoid synchronisation• Examples:> Independent transactions> Different iterations of loops> Different tasks

Page 7: Parallelising serial applications

7Parallelising serial applications – Darryl Gove – April 2008

Expected gains from parallelisation

Maximum performance gain from parallelisation is determined by the time spent in code that can be

parallelised.Amdahl's Law.

Page 8: Parallelising serial applications

8Parallelising serial applications – Darryl Gove – April 2008

POSIX Threads

• POSIX Threads (Pthreads)> Very flexible> Requires new code> May mean restructuring

Page 9: Parallelising serial applications

9Parallelising serial applications – Darryl Gove – April 2008

Setting up Pthreadsvoid main(){ pthread_t threads[2]; int id[2]; for (int i=0; i<2; i++) { id[i]=i; pthread_create(&threads[i],0, calculate,(void*)&id[i]); } for (int i=0; i<2; i++) { pthread_join(threads[i],0); }}

Parameterpassing

Target routine

Page 10: Parallelising serial applications

10Parallelising serial applications – Darryl Gove – April 2008

Notes: Pthreads

• Solaris 9> -mt -lpthread

• Solaris 10> -mt > libc includes thread library

Page 11: Parallelising serial applications

11Parallelising serial applications – Darryl Gove – April 2008

OpenMP

• OpenMP> Minimal source modification> Incremental parallelisation> Serial and parallel versions from same source> Some skill needed> Limited parallelisation options

– 2.5 standard– Parallel for – Parallel sections

– 3.0 standard– Tasks

Page 12: Parallelising serial applications

12Parallelising serial applications – Darryl Gove – April 2008

OpenMPvoid calculate(){ int x,y; double xv,yv;#pragma omp parallel for private(y,xv,yv) for (x=0; x<SIZE; x++){ for (y=0; y<SIZE; y++){ xv = ((double)(x-SIZE/2)) /(double)(SIZE/4); yv = ((double)(y-SIZE/2)) /(double)(SIZE/4); data[x][y]=inset(xv,yv); } }}

Page 13: Parallelising serial applications

13Parallelising serial applications – Darryl Gove – April 2008

Notes: OpenMP

• Compile with > -xopenmp

• Warnings with> -xloopinfo -xvpara

• Set number of threads threads with> OMP_NUM_THREADS

Page 14: Parallelising serial applications

14Parallelising serial applications – Darryl Gove – April 2008

Autoparallelisation

• Autoparallelisation> Easy to use> Limited range> Improve with:

– Source changes– Compiler flags

Page 15: Parallelising serial applications

15Parallelising serial applications – Darryl Gove – April 2008

Using autopar

$ cc -fast -xautopar -xreduction \> -xloopinfo -xvpara m1.c..."m1.c", line 40: PARALLELIZED, interchanged (inlined loop)

...

Emit auto-parallelisation

information

Page 16: Parallelising serial applications

16Parallelising serial applications – Darryl Gove – April 2008

Notes: autoparallelisation

• Compiler options– -xautopar -xreduction

• Warnings:– -xloopinfo -xvpara

• Set number of threads threads with> OMP_NUM_THREADS

Page 17: Parallelising serial applications

17Parallelising serial applications – Darryl Gove – April 2008

Reductions

• Multiple threads cooperating to produce a single value.• Order of calculation will be different to serial case

for (i=0; i<N; i++){ total += a[i];}

Page 18: Parallelising serial applications

18Parallelising serial applications – Darryl Gove – April 2008

Profiling a Multi-threaded application

Workload imbalance between the two

threads

Page 19: Parallelising serial applications

19Parallelising serial applications – Darryl Gove – April 2008

Data races

$ cc -fast -mt -o race race.c -lpthread

$ dataracesum = 385158800

$ dataracesum = 385071679

Page 20: Parallelising serial applications

20Parallelising serial applications – Darryl Gove – April 2008

Load Sum

Add inset() to Sum

Store Sum

Data races

• Multiple threads reading/writing the same data without exclusive access• Results in unpredictable behaviour

$ cc -fast -mt -o race race.c -lpthread

$ dataracesum = 385158800

$ dataracesum = 385071679

Load Sum

Add inset() to Sum

Store Sum

Page 21: Parallelising serial applications

21Parallelising serial applications – Darryl Gove – April 2008

Thread Analyzer

Page 22: Parallelising serial applications

22Parallelising serial applications – Darryl Gove – April 2008

View source code

Page 23: Parallelising serial applications

23Parallelising serial applications – Darryl Gove – April 2008

Notes: Data races

$ cc -g -xinstrument=datarace -mt \

-lpthread -o race race.c

$ collect -r on datarace

$ analyzer tha.1.er

Generate instrumented executable

Page 24: Parallelising serial applications

24Parallelising serial applications – Darryl Gove – April 2008

Fixing data accesses

• Single thread access:> Mutex locks> Critical regions (OpenMP)

• Lock-less:> Atomic operations (man atomic_ops)

• Data sharing:> Thread local storage> OpenMP reduction directive

Page 25: Parallelising serial applications

25Parallelising serial applications – Darryl Gove – April 2008

Summary

• Profile to find hot code• Use multiple threads> Autoparallelisation > OpenMP easy to use> Pthreads gives more control

• Use the tools in Sun Studio> Performance Analyzer for profiling> Thread Analyzer for race condition detection

Page 26: Parallelising serial applications

Parallelising serial applications

Darryl [email protected]://blogs.sun.com/d/