masterpraktikum - high performance computing - openmp · #pragma omp for ..., and many more...

15
Masterpraktikum - High Performance Computing OpenMP Michael Bader Alexander Heinecke Alexander Breuer Technische Universit¨ at M ¨ unchen, Germany

Upload: others

Post on 16-Mar-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Masterpraktikum - High PerformanceComputing

OpenMP

Michael BaderAlexander Heinecke

Alexander Breuer

Technische Universitat Munchen, Germany

Page 2: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

2

Page 3: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

#include <omp.h>...#pragma omp parallel forfor(i = 0; i < n; i++){

for(j = 0; j < n; j++){

do some work(i, j);}

}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

3

Page 4: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

4

Page 5: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

Overview• Compiler-directives#pragma omp ... [clause[[,] clause]...]

• Work-sharing#pragma omp for ..., and many more

• Synchronization#pragma omp barrier, and many more

• Data Scope Clausesshared, private, firstprivate, reduction, u.a.

• Bibliotheksroutinenomp get num treads(),omp get tread num(), u.a.

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

5

Page 6: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

compiling and excuting• compile with additional Compiler-Flag -openmp (-fopenmp in

case of gcc):icc -openmp -o my alg my alg.c

• ausfuhren:export OMP NUM THREADS=number of threads(default = 1)./my alg

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

6

Page 7: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

Parallel Region Construct, OpenMP API#pragma omp parallel [clause[[,] clause]...]structured block

• code within parallel region are executed by all threads

Example#include <omp.h>...#pragma omp parallel private(size, rank){

size = omp get num threads();rank = omp get thread num();printf("Hello World! (Thread %d of %d)", rank,

size);}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

7

Page 8: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

work sharing constructs• #pragma omp for [clause[[,] clause]...]for-loop

• within a parallel region and directly in front of afor-loop

• iterations are scheduled across different threads• schedule clause determines how to map iterations to

threadsschedule(static[, chunksize]) default-valuechunksize is #iterations divided by #hreadsschedule(dynamic[, chunksize]) default-valueschunksize is 1schedule(runtime) Scheduling is given by environmentvariable OMP SCEDULE

• implicit synchronization in the end of for-loop (can bedisabled with nowait clause)

• shortcut possible: #pragma omp parallel for

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

8

Page 9: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

work sharing constructs II#pragma omp task [clause[[,] clause]...]structured blockwhere clause can be...

• final(expression): if expression is true, task isexecuted sequentially; no recursive task generation

• untied: the execution of a task is not tied to one single thread

• shared | firstprivate | private

• some more...

#pragma omp taskwait

• waits for children completition

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

9

Page 10: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

work sharing constructs III• #pragma omp single [clause[[,] clause]...]structured block

• use only one (arbitrary) thread• implicit synchronization

• #pragma omp masterstructured block

• use only master thread for structures block• NO synchronization in the end

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

10

Page 11: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

synchronization constructs• #pragma omp barrier

• blocks execution until all threads have reached the barrier

• #pragma omp criticalstructured block

• only one thread a time can execute the structured blockedencapsulated by critical

• ATTENTION: use carefully! Will definitely kill performance.

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

11

Page 12: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

reduction clause• reduction (operator: list)

• executes a reduction of variables in list using operator• available operators: +, ∗, −, &&, ||

Beispiel#pragma omp parallel for private(r), reduction(+:sum)for(i = 0; i < n; i++){

r = compute r(i);sum = sum + r;

}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

12

Page 13: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

OpenMP data scope clauses• private(list): DECLARES variables in list as private for each

thread (no copy!)

• shared(list): variables in list are used by all threads (raceconditions are possible!), write accesses have to be handled byprogrammer

• firstprivate(list): private varibles AND init them with latestvalid value before parallel region

• lastprivate(list): after parallel execution, in serial part reusedlatest modification.

• default data scope clause is shared, but exceptions exists⇒ beprecise!

• local variables are always private• loop-variables of for-loops are private (C++ only)

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

13

Page 14: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

OpenMP data scope clauses - Exercise

Beispielk = compute k();#pragma omp parallel for private(?), shared(?),firstprivate(?), lastprivate(?), reduction(?)for(i = 0; i < n; i++){

k = compute my k(i, k);r = compute r(i, k, h);sum = sum + r;

}

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

14

Page 15: Masterpraktikum - High Performance Computing - OpenMP · #pragma omp for ..., and many more Synchronization #pragma omp barrier, and many more Data Scope Clauses shared, private,

Technische Universitat Munchen

Questions?

Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing

15