masterpraktikum - high performance computing - openmp · #pragma omp for ..., and many more...
TRANSCRIPT
Masterpraktikum - High PerformanceComputing
OpenMP
Michael BaderAlexander Heinecke
Alexander Breuer
Technische Universitat Munchen, Germany
Technische Universitat Munchen
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
2
Technische Universitat Munchen
#include <omp.h>...#pragma omp parallel forfor(i = 0; i < n; i++){
for(j = 0; j < n; j++){
do some work(i, j);}
}
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
3
Technische Universitat Munchen
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
4
Technische Universitat Munchen
Overview• Compiler-directives#pragma omp ... [clause[[,] clause]...]
• Work-sharing#pragma omp for ..., and many more
• Synchronization#pragma omp barrier, and many more
• Data Scope Clausesshared, private, firstprivate, reduction, u.a.
• Bibliotheksroutinenomp get num treads(),omp get tread num(), u.a.
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
5
Technische Universitat Munchen
compiling and excuting• compile with additional Compiler-Flag -openmp (-fopenmp in
case of gcc):icc -openmp -o my alg my alg.c
• ausfuhren:export OMP NUM THREADS=number of threads(default = 1)./my alg
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
6
Technische Universitat Munchen
Parallel Region Construct, OpenMP API#pragma omp parallel [clause[[,] clause]...]structured block
• code within parallel region are executed by all threads
Example#include <omp.h>...#pragma omp parallel private(size, rank){
size = omp get num threads();rank = omp get thread num();printf("Hello World! (Thread %d of %d)", rank,
size);}
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
7
Technische Universitat Munchen
work sharing constructs• #pragma omp for [clause[[,] clause]...]for-loop
• within a parallel region and directly in front of afor-loop
• iterations are scheduled across different threads• schedule clause determines how to map iterations to
threadsschedule(static[, chunksize]) default-valuechunksize is #iterations divided by #hreadsschedule(dynamic[, chunksize]) default-valueschunksize is 1schedule(runtime) Scheduling is given by environmentvariable OMP SCEDULE
• implicit synchronization in the end of for-loop (can bedisabled with nowait clause)
• shortcut possible: #pragma omp parallel for
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
8
Technische Universitat Munchen
work sharing constructs II#pragma omp task [clause[[,] clause]...]structured blockwhere clause can be...
• final(expression): if expression is true, task isexecuted sequentially; no recursive task generation
• untied: the execution of a task is not tied to one single thread
• shared | firstprivate | private
• some more...
#pragma omp taskwait
• waits for children completition
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
9
Technische Universitat Munchen
work sharing constructs III• #pragma omp single [clause[[,] clause]...]structured block
• use only one (arbitrary) thread• implicit synchronization
• #pragma omp masterstructured block
• use only master thread for structures block• NO synchronization in the end
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
10
Technische Universitat Munchen
synchronization constructs• #pragma omp barrier
• blocks execution until all threads have reached the barrier
• #pragma omp criticalstructured block
• only one thread a time can execute the structured blockedencapsulated by critical
• ATTENTION: use carefully! Will definitely kill performance.
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
11
Technische Universitat Munchen
reduction clause• reduction (operator: list)
• executes a reduction of variables in list using operator• available operators: +, ∗, −, &&, ||
Beispiel#pragma omp parallel for private(r), reduction(+:sum)for(i = 0; i < n; i++){
r = compute r(i);sum = sum + r;
}
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
12
Technische Universitat Munchen
OpenMP data scope clauses• private(list): DECLARES variables in list as private for each
thread (no copy!)
• shared(list): variables in list are used by all threads (raceconditions are possible!), write accesses have to be handled byprogrammer
• firstprivate(list): private varibles AND init them with latestvalid value before parallel region
• lastprivate(list): after parallel execution, in serial part reusedlatest modification.
• default data scope clause is shared, but exceptions exists⇒ beprecise!
• local variables are always private• loop-variables of for-loops are private (C++ only)
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
13
Technische Universitat Munchen
OpenMP data scope clauses - Exercise
Beispielk = compute k();#pragma omp parallel for private(?), shared(?),firstprivate(?), lastprivate(?), reduction(?)for(i = 0; i < n; i++){
k = compute my k(i, k);r = compute r(i, k, h);sum = sum + r;
}
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
14
Technische Universitat Munchen
Questions?
Michael Bader Alexander Heinecke Alexander Breuer: Masterpraktikum - High Performance Computing
15