parallel programming with openmp › courses › csep524 › 13wi › omp_t… · writing openmp...

288
Parallel Programming with OpenMP Alejandro Duran Barcelona Supercomputing Center

Upload: others

Post on 29-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Parallel Programming withOpenMP

Alejandro Duran

Barcelona Supercomputing Center

Page 2: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Agenda

Agenda

- Thursday10:00 - 11:15 OpenMP Basics11:00 - 11:30 Break11:30 - 13:00 Hands-on (I)13:00 - 14:30 Lunch14:30 - 15:15 Task parallelism in OpenMP15:15 - 17:00 Hands-on (II)

- Friday10:00 - 11:00 Data parallelism in OpenMP11:00 - 11:30 Break11:30 - 13:00 Hands-on (III)13:00 - 14:30 Lunch14:30 - 15:00 Other OpenMP topics15:00 - 16:00 Hands-on (IV)16:00 - 16:30 OpenMP in the future

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 2 / 217

Page 3: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part I

OpenMP Basics

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 3 / 217

Page 4: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

OpenMP Overview

The OpenMP model

Writing OpenMP programs

Creating Threads

Data-sharing attributes

Synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 4 / 217

Page 5: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP Overview

Outline

OpenMP Overview

The OpenMP model

Writing OpenMP programs

Creating Threads

Data-sharing attributes

Synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 5 / 217

Page 6: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP Overview

What is OpenMP?

It’s an API extension to the C, C++ and Fortran languages to writeparallel programs for shared memory machines

Current version is 3.0 (May 2008)Supported by most compiler vendors

Intel,IBM,PGI,Sun,Cray,Fujitsu,HP,GCC,...

Maintained by the Architecture Review Board (ARB), a consortiumof industry and academia

http://www.openmp.org

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 6 / 217

Page 7: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP Overview

A bit of historyO

penM

PFo

rtra

n1.

0

1997

Ope

nMP

C/C

++1.

0

1998

Ope

nMP

Fort

ran

1.1

1999

Ope

nMP

Fort

ran

2.0

2000

Ope

nMP

C/C

++2.

0

2002O

penM

P2.

5

2005

Ope

nMP

3.0

2008

Ope

nMP

3.1

2011

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 7 / 217

Page 8: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP Overview

Advantages of OpenMP

Mature standard and implementationsStandardizes practice of the last 20 years

Good performance and scalabilityPortable across architecturesIncremental parallelizationMaintains sequential version(mostly) High level language

Some people may say a medium level language :-)

Supports both task and data parallelismCommunication is implicit

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 8 / 217

Page 9: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP Overview

Disadvantages of OpenMP

Communication is implicitFlat memory modelIncremental parallelization creates false sense of glory/failureNo support for acceleratorsNo error recovery capabilitiesDifficult to composeLacks high-level algorithms and structuresDoes not run on clusters

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 9 / 217

Page 10: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The OpenMP model

Outline

OpenMP Overview

The OpenMP model

Writing OpenMP programs

Creating Threads

Data-sharing attributes

Synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 10 / 217

Page 11: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The OpenMP model

OpenMP at a glance

OpenMP components

CPU CPU CPU CPU CPU CPU SMP

OS Threading Libraries

OpenMP Runtime Library ICVs

OpenMP Exec

Compiler

Constructs

OpenMP API EnvironmentVariables

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 11 / 217

Page 12: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The OpenMP model

Execution model

Fork-join modelOpenMP uses a fork-join model

The master thread spawns a team of threads that joins at the end ofthe parallel regionThreads in the same team can collaborate to do work

Parallel Region Parallel Region

Nested Parallel Region

Master Thread

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 12 / 217

Page 13: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The OpenMP model

Memory model

OpenMP defines a relaxed memory modelThreads can see different values for the same variableMemory consistency is only guaranteed at specific pointsLuckily, the default points are usually enough

Variables can be shared or private to each thread

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 13 / 217

Page 14: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Writing OpenMP programs

Outline

OpenMP Overview

The OpenMP model

Writing OpenMP programs

Creating Threads

Data-sharing attributes

Synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 14 / 217

Page 15: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Writing OpenMP programs

OpenMP directives syntax

In FortranThrough a specially formatted comment:

s e n t i n e l cons t ruc t [ c lauses ]

where sentinel is one of:!$OMP or C$OMP or *$OMP in fixed format!$OMP in free format

In C/C++Through a compiler directive:

#pragma omp cons t ruc t [ c lauses ]

OpenMP syntax is ignored if the compiler does not recognizeOpenMP

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 15 / 217

Page 16: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Writing OpenMP programs

OpenMP directives syntax

In FortranThrough a specially formatted comment:

s e n t i n e l cons t ruc t [ c lauses ]

where sentinel is one of:!$OMP or C$OMP or *$OMP in fixed format!$OMP in free format

In C/C++Through a compiler directive:

#pragma omp cons t ruc t [ c lauses ]

OpenMP syntax is ignored if the compiler does not recognizeOpenMP

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 15 / 217

We’ll be using C/C++ syntax through this tutorial

Page 17: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Writing OpenMP programs

Headers/Macros

C/C++ onlyomp.h contains the API prototypes and data types definitionsThe _OPENMP is defined by OpenMP enabled compiler

Allows conditional compilation of OpenMP

Fortran onlyThe omp_lib module contains the subroutine and functiondefinitions

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 16 / 217

Page 18: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Writing OpenMP programs

Structured Block

DefinitionMost directives apply to a structured block:

Block of one or more statementsOne entry point, one exit point

No branching in or out allowed

Terminating the program is allowed

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 17 / 217

Page 19: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Writing OpenMP programs

Hello world!

Example

i n t i d ;char ∗message = "Hello world!" ;

#pragma omp parallel private ( i d ){

i d = omp_get_thread_num ( ) ;p r i n t f ("Thread %d says: %s\n" , id , message ) ;

}

Directive

API call

Clause

Structured block

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 18 / 217

Page 20: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Writing OpenMP programs

Hello world!

Example

i n t i d ;char ∗message = "Hello world!" ;

#pragma omp parallel private ( i d ){

i d = omp_get_thread_num ( ) ;p r i n t f ("Thread %d says: %s\n" , id , message ) ;

}

Directive

API call

Clause

Structured block

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 18 / 217

Page 21: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Outline

OpenMP Overview

The OpenMP model

Writing OpenMP programs

Creating Threads

Data-sharing attributes

Synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 19 / 217

Page 22: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

The parallel construct

Directive

#pragma omp parallel [ c lauses ]s t r u c t u r e d block

where clauses can be:num_threads(expression)

if(expression)

shared(var-list)private(var-list)firstprivate(var-list)default(none|shared| private | firstprivate )reduction(var-list)copyin(var-list)

Coming shortly!

Only in Fortran

We’ll see it later

Not today

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 20 / 217

Page 23: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

The parallel construct

Specifying the number of threadsThe number of threads is controlled by an internal control variable(ICV) called nthreads-var.When a parallel construct is found a parallel region with amaximum of nthreads-var is created

Parallel constructs can be nested creating nested parallelismThe nthreads-var can be modified through

the omp_set_num_threads API calledthe OMP_NUM_THREADS environment variable

Additionally, the num_threads clause causes the implementationto ignore the ICV and use the value of the clause for that region.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 21 / 217

Page 24: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

The parallel construct

Avoiding parallel regionsSometimes we only want to run in parallel under certain conditions

E.g., enough input data, not running already in parallel, ...

The if clause allows to specify an expression. When evaluates tofalse the parallel construct will only use 1 thread

Note that still creates a new team and data environment

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 22 / 217

Page 25: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Hello world!

Example

i n t i d ;char ∗message = "Hello world!" ;

#pragma omp parallel private ( i d ){

i d = omp_get_thread_num ( ) ;p r i n t f ("Thread %d says: %s\n" , id , message ) ;

}

Creates a parallel region of OMP_NUM_THREADS

All threads execute the same code

id is private to each thread

Each thread gets its id in the teammessage is shared among all threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 23 / 217

Page 26: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Hello world!

Example

i n t i d ;char ∗message = "Hello world!" ;

#pragma omp parallel private ( i d ){

i d = omp_get_thread_num ( ) ;p r i n t f ("Thread %d says: %s\n" , id , message ) ;

}

Creates a parallel region of OMP_NUM_THREADS

All threads execute the same code

id is private to each thread

Each thread gets its id in the teammessage is shared among all threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 23 / 217

Page 27: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Hello world!

Example

i n t i d ;char ∗message = "Hello world!" ;

#pragma omp parallel private ( i d ){

i d = omp_get_thread_num ( ) ;p r i n t f ("Thread %d says: %s\n" , id , message ) ;

}

Creates a parallel region of OMP_NUM_THREADS

All threads execute the same code

id is private to each thread

Each thread gets its id in the team

message is shared among all threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 23 / 217

Page 28: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Hello world!

Example

i n t i d ;char ∗message = "Hello world!" ;

#pragma omp parallel private ( i d ){

i d = omp_get_thread_num ( ) ;p r i n t f ("Thread %d says: %s\n" , id , message ) ;

}

Creates a parallel region of OMP_NUM_THREADS

All threads execute the same code

id is private to each thread

Each thread gets its id in the team

message is shared among all threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 23 / 217

Page 29: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Putting it together

Example

void main ( ) {#pragma omp parallel

. . .omp_set_num_threads ( 2 ) ;#pragma omp parallel

. . .#pragma omp parallel num_threads ( random()%4+1) if ( 0 )

. . .}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 24 / 217

Page 30: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Putting it together

Example

void main ( ) {#pragma omp parallel

. . .omp_set_num_threads ( 2 ) ;#pragma omp parallel

. . .#pragma omp parallel num_threads ( random()%4+1) if ( 0 )

. . .}

An unknown number of threads here. Use OMP_NUM_THREADS

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 24 / 217

Page 31: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Putting it together

Example

void main ( ) {#pragma omp parallel

. . .omp_set_num_threads ( 2 ) ;#pragma omp parallel

. . .#pragma omp parallel num_threads ( random()%4+1) if ( 0 )

. . .}

A team of two threads here.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 24 / 217

Page 32: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

Putting it together

Example

void main ( ) {#pragma omp parallel

. . .omp_set_num_threads ( 2 ) ;#pragma omp parallel

. . .#pragma omp parallel num_threads ( random()%4+1) if ( 0 )

. . .}

A team of 1 thread here.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 24 / 217

Page 33: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Creating Threads

API calls

Other useful routinesint omp_get_num_threads() Returns the number of threads in the cur-

rent teamint omp_get_thread_num() Returns the id of the thread in the current

teamint omp_get_num_procs() Returns the number of processors in the

machineint omp_get_max_threads() Returns the maximum number of threads

that will be used in the next parallel regiondouble omp_get_wtime() Returns the number of seconds since an

arbitrary point in the past

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 25 / 217

Page 34: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Outline

OpenMP Overview

The OpenMP model

Writing OpenMP programs

Creating Threads

Data-sharing attributes

Synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 26 / 217

Page 35: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data environment

A number of clauses are related to building the data environment thatthe construct will use when executing.

shared

private

firstprivate

default

threadprivate

lastprivatereductioncopyincopyprivate

We’ll see them later

Out of our scope today

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 27 / 217

Page 36: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

SharedWhen a variable is marked as shared, the variable inside theconstruct is the same as the one outside the construct.

In a parallel construct this means all threads see the samevariable

but not necessarily the same valueUsually need some kind of synchronization to update themcorrectly

OpenMP has consistency points at synchronizations

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 28 / 217

Page 37: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel shared ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ;

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 29 / 217

Page 38: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel shared ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ; Prints 2 or 3

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 29 / 217

Page 39: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

PrivateWhen a variable is marked as private, the variable inside theconstruct is a new variable of the same type with an undefined value.

In a parallel construct this means all threads have a differentvariableCan be accessed without any kind of synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 30 / 217

Page 40: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel private ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ;

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 31 / 217

Page 41: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel private ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ;

Can print anything

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 31 / 217

Page 42: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel private ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ; Prints 1

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 31 / 217

Page 43: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

FirstprivateWhen a variable is marked as firstprivate, the variable inside theconstruct is a new variable of the same type but it is initialized to theoriginal variable value.

In a parallel construct this means all threads have a differentvariable with the same initial valueCan be accessed without any kind of synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 32 / 217

Page 44: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel firstprivate ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ;

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 33 / 217

Page 45: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel firstprivate ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ;

Prints 2 (twice)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 33 / 217

Page 46: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x =1;#pragma omp parallel firstprivate ( x ) num_threads ( 2 ){

x++;p r i n t f ("%d\n" , x ) ;

}p r i n t f ("%d\n" , x ) ; Prints 1

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 33 / 217

Page 47: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

What is the default?Static/global storage is sharedHeap-allocated storage is sharedStack-allocated storage inside the construct is privateOthers

If there is a default clause, what the clause saysnone means that the compiler will issue an error if the attribute is notexplicitly set by the programmer

Otherwise, depends on the constructFor the parallel region the default is shared

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 34 / 217

Page 48: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x , y ;#pragma omp parallel private ( y ){

x =y =#pragma omp parallel private ( x ){

x =y =

}}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 35 / 217

Page 49: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x , y ;#pragma omp parallel private ( y ){

x =y =#pragma omp parallel private ( x ){

x =y =

}}

x is shared

y is private

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 35 / 217

Page 50: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Data-sharing attributes

Example

i n t x , y ;#pragma omp parallel private ( y ){

x =y =#pragma omp parallel private ( x ){

x =y =

}}

x is private

y is shared

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 35 / 217

Page 51: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Threadprivate storage

The threadprivate construct

#pragma omp t h r e a d p r i v a t e ( var− l i s t )

Can be applied to:Global variablesStatic variablesClass-static members

Allows to create a per-thread copy of “global” variables.threadprivate storage persist across parallel regions if thenumber of threads is the same

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 36 / 217

Threadprivate persistence across nested regions is complex

Page 52: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Threaprivate storage

Example

char∗ foo ( ){

s t a t i c char b u f f e r [ BUF_SIZE ] ;

. . .

return b u f f e r ;}

void bar ( ){

#pragma omp parallel{

char ∗ s t r = foo ( ) ;s t r [ 0 ] = random ( ) ;

}}

Unsafe. All threadsaccess the same

buffer

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 37 / 217

Page 53: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Threaprivate storage

Example

char∗ foo ( ){

s t a t i c char b u f f e r [ BUF_SIZE ] ;

. . .

return b u f f e r ;}

void bar ( ){

#pragma omp parallel{

char ∗ s t r = foo ( ) ;s t r [ 0 ] = random ( ) ;

}}

Unsafe. All threadsaccess the same

buffer

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 37 / 217

Page 54: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Data-sharing attributes

Threaprivate storage

Example

char∗ foo ( ){

s t a t i c char b u f f e r [ BUF_SIZE ] ;#pragma omp t h r e a d p r i v a t e ( b u f f e r )

. . .

return b u f f e r ;}

void bar ( ){

#pragma omp parallel{

char ∗ s t r = foo ( ) ;s t r [ 0 ] = random ( ) ;

}}

Creates one staticcopy of buffer per

thread

Now foo can be called safelyby multiple threads at the

same time

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 38 / 217

Page 55: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Outline

OpenMP Overview

The OpenMP model

Writing OpenMP programs

Creating Threads

Data-sharing attributes

Synchronization

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 39 / 217

Page 56: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Why synchronization?

MechanismsThreads need to synchronize to impose some ordering in thesequence of actions of the threads. OpenMP provides differentsynchronization mechanisms:

barrier

critical

atomic

taskwaitorderedlocks

We’ll see them later

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 40 / 217

Page 57: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Thread Barrier

The barrier construct

#pragma omp barrier

Threads cannot proceed past a barrier point until all threads reachthe barrier AND all previously generated work is completedSome constructs have an implicit barrier at the end

E.g., the parallel construct

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 41 / 217

Page 58: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Barrier

Example

#pragma omp parallel{

foo ( ) ;#pragma omp barrierbar ( ) ;

}

Forces all foo occurrences toohappen before all bar occurrences

Implicit barrier at the end of the parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 42 / 217

Page 59: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Barrier

Example

#pragma omp parallel{

foo ( ) ;#pragma omp barrierbar ( ) ;

}

Forces all foo occurrences toohappen before all bar occurrences

Implicit barrier at the end of the parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 42 / 217

Page 60: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Barrier

Example

#pragma omp parallel{

foo ( ) ;#pragma omp barrierbar ( ) ;

}

Forces all foo occurrences toohappen before all bar occurrences

Implicit barrier at the end of the parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 42 / 217

Page 61: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Exclusive access

The critical construct

#pragma omp critical [ ( name ) ]s t r u c t u r e d block

Provides a region of mutual exclusion where only one thread canbe working at any given time.By default all critical regions are the same, but you can providethem with names

Only those with the same name synchronize

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 43 / 217

Page 62: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Critical construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp criticalx++;

}p r i n t f ("%d\n" , x ) ;

Only one thread at a time here

Prints 3!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 44 / 217

Page 63: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Critical construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp criticalx++;

}p r i n t f ("%d\n" , x ) ;

Only one thread at a time here

Prints 3!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 44 / 217

Page 64: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Critical construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp criticalx++;

}p r i n t f ("%d\n" , x ) ;

Only one thread at a time here

Prints 3!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 44 / 217

Page 65: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Critical construct

Example

i n t x=1 ,y =0;#pragma omp parallel num_threads ( 4 ){

#pragma omp critical ( x )x++;

#pragma omp critical ( y )y++;

}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 45 / 217

Page 66: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Critical construct

Example

i n t x=1 ,y =0;#pragma omp parallel num_threads ( 4 ){

#pragma omp critical ( x )x++;

#pragma omp critical ( y )y++;

}

Different names: One thread canupdate x while another updates y

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 45 / 217

Page 67: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Exclusive access

The atomic construct

#pragma omp atomicexpression

Provides an special mechanism of mutual exclusion to do read &update operationsOnly supports simple read & update expressions

E.g., x ++, x -= foo()Only protects the read & update part

foo() not protected

Usually much more efficient than a critical constructNot compatible with critical

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 46 / 217

Page 68: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Atomic construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp atomicx++;

}p r i n t f ("%d\n" , x ) ;

Only one thread at a time updates x here

Prints 3!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 47 / 217

Page 69: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Atomic construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp atomicx++;

}p r i n t f ("%d\n" , x ) ;

Only one thread at a time updates x here

Prints 3!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 47 / 217

Page 70: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Atomic construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp atomicx++;

}p r i n t f ("%d\n" , x ) ;

Only one thread at a time updates x here

Prints 3!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 47 / 217

Page 71: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Atomic construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp criticalx++;

#pragma omp atomicx++;

}p r i n t f ("%d\n" , x ) ;

Prints 3,4 or 5 :(

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 48 / 217

Page 72: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Atomic construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp criticalx++;

#pragma omp atomicx++;

}p r i n t f ("%d\n" , x ) ;

Different threads can update x atthe same time!

Prints 3,4 or 5 :(

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 48 / 217

Page 73: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Synchronization

Atomic construct

Example

i n t x =1;#pragma omp parallel num_threads ( 2 ){

#pragma omp criticalx++;

#pragma omp atomicx++;

}p r i n t f ("%d\n" , x ) ; Prints 3,4 or 5 :(

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 48 / 217

Page 74: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Break

Coffee time! :-)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 49 / 217

Page 75: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part II

Hands-on (I)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 50 / 217

Page 76: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

Setup

Hello world!

Other

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 51 / 217

Page 77: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Setup

Outline

Setup

Hello world!

Other

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 52 / 217

Page 78: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Setup

Hands-on preparationEnvironment

We’ll be using ...an SGI Altix 4700 System

128 cpus Dual Core Montecito(IA-64). Each one of the 256 coresworks at 1,6 GHz, with a 8MB L3 cache and 533 MHz Bus.

Unfortunately will be using just 8 of them :-)

2.5 TB RAM.2 internal SAS disks of 146 GB at 15000 RPMs12 external SAS disks of 300 GB at 10000 RPMS

Intel’s compiler version 11.0Full support of OpenMP 3.0Other vendors that support 3.0: PGI, IBM, SUN, GCC

Log into the system with the provided username and password

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 53 / 217

Page 79: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Setup

Hands-on preparation

Ready...Copy the exercises from my home:

$ cp -a∼aduran/Prace_OpenMP_Handson_1/hello .

Go!Now enter the hello directory to start the fun :-)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 54 / 217

Page 80: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Setup

Hands-on preparation

Ready...Copy the exercises from my home:

$ cp -a∼aduran/Prace_OpenMP_Handson_1/hello .

Go!Now enter the hello directory to start the fun :-)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 54 / 217

Page 81: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Hello world!

Outline

Setup

Hello world!

Other

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 55 / 217

Page 82: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Hello world!

First exerciseHello world!

Compile1 Edit the Makefile in the directory and answer the following

questions:Which is the compiler name?Which flag does activate OpenMP?

2 Run make and check that it generates a hello program.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 56 / 217

Page 83: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Hello world!

First exerciseHello world!

Run1 Edit the file hello.c and try to figure out what is going to be the

output of the following commands:

$ ./hello

$ OMP_NUM_THREADS=2 ./hello

$ OMP_NUM_THREADS=4 ./hello

2 Now run them. Were you right?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 57 / 217

Page 84: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Hello world!

First exerciseHello world!

Being oneself

Now modify our hello program so that each thread generates a mes-sage with its id

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 58 / 217

Tip: Use omp_get_thread_num()

Page 85: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Hello world!

First exerciseHello world!

Generate extra infoNow modify our hello program so before any thread says hello, it outputsthe following information:

1 The number of processors in the system2 The number of threads that will be available in the parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 59 / 217

Page 86: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Hello world!

First exerciseHello world!

Measuring timeMeasure the time that it takes to execute the parallel region andoutput it at the end of the program.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 60 / 217

Tip: Use omp_get_wtime()

Page 87: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Hello world!

First exercise

One at a time!Extend the program so that each thread uses C rand to get a randomnumber. Accumulate those numbers in a shared variable and outputthe result at the end of the program.

Should the result always be the same given the same seed andnumber of threads?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 61 / 217

Page 88: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other

Outline

Setup

Hello world!

Other

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 62 / 217

Page 89: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other

Second exercise

1 Edit the sync.c file2 Is correct the access to the variable x?3 Fix it using a critical construct. Compile it:

$ make sync

4 Run it from 1 to 4 threads and observe how it changes theaverage time

5 Now change the critical construct with an atomic one.6 Run it from 1 to 4 threads. How does the averages times compare

to the previous ones?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 63 / 217

Page 90: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other

Some more...

One for each thread1 Compile the tp.c program:

$ make tp

2 The program is suposed to print three times the tread id3 Run it with 4 threads. Observe the results4 Edit tp.c and fix it so it behaves correctly5 How did you solve the problem for x?6 How did you solve the problem for y?7 If you solved them in the same way, then rethink what you did for x

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 64 / 217

Page 91: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Break

Bon appétit!*

*Disclaimer: actual food may differfrom the image! :-)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 65 / 217

Page 92: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part III

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 66 / 217

Page 93: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 67 / 217

Page 94: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part IV

The OpenMP Tasking Model

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 68 / 217

Page 95: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

OpenMP tasks

Task synchronization

The single construct

Task clauses

Common tasking problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 69 / 217

Page 96: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Outline

OpenMP tasks

Task synchronization

The single construct

Task clauses

Common tasking problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 70 / 217

Page 97: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task parallelism in OpenMP

Task parallelism model

Team Task pool

Parallelism is extracted from “several” pieces of codeAllows to parallelize very unstructured parallelism

Unbounded loops, recursive functions, ...

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 71 / 217

Page 98: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

What is a task in OpenMP ?

Tasks are work units whose execution may be deferredthey can also be executed immediately

Tasks are composed of:code to executea data environment

Initialized at creation time

internal control variables (ICVs)

Threads of the team cooperate to execute them

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 72 / 217

Page 99: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Creating tasks

The task construct

#pragma omp task [ c lauses ]s t r u c t u r e d block

Where clauses can be:sharedprivatefirstprivate

Values are captured at creation time

defaultif(expression)

untied

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 73 / 217

Page 100: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

When are task created?

Parallel regions create tasksOne implicit task is created and assigned to each thread

So all task-concepts have sense inside the parallel region

Each thread that encounters a task constructPackages the code and dataCreates a new explicit task

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 74 / 217

Page 101: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Default task data-sharing attributesWhen there are no clauses ...

If no default clauseImplicit rules apply

e.g., global variables are sharedOtherwise...

firstprivateshared attribute is lexically inherited

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 75 / 217

Page 102: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task default data-sharing attributesIn practice...

Example

i n t a ;void foo ( ) {

i n t b , c ;#pragma omp parallel shared ( b )#pragma omp parallel private ( b ){

i n t d ;#pragma omp task{

i n t e ;

a =b =c =d =e =

} } }

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 76 / 217

Page 103: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task default data-sharing attributesIn practice...

Example

i n t a ;void foo ( ) {

i n t b , c ;#pragma omp parallel shared ( b )#pragma omp parallel private ( b ){

i n t d ;#pragma omp task{

i n t e ;

a = sharedb =c =d =e =

} } }

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 76 / 217

Page 104: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task default data-sharing attributesIn practice...

Example

i n t a ;void foo ( ) {

i n t b , c ;#pragma omp parallel shared ( b )#pragma omp parallel private ( b ){

i n t d ;#pragma omp task{

i n t e ;

a = sharedb = firstprivatec =d =e =

} } }

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 76 / 217

Page 105: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task default data-sharing attributesIn practice...

Example

i n t a ;void foo ( ) {

i n t b , c ;#pragma omp parallel shared ( b )#pragma omp parallel private ( b ){

i n t d ;#pragma omp task{

i n t e ;

a = sharedb = firstprivatec = sharedd =e =

} } }

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 76 / 217

Page 106: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task default data-sharing attributesIn practice...

Example

i n t a ;void foo ( ) {

i n t b , c ;#pragma omp parallel shared ( b )#pragma omp parallel private ( b ){

i n t d ;#pragma omp task{

i n t e ;

a = sharedb = firstprivatec = sharedd = firstprivatee =

} } }

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 76 / 217

Page 107: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task default data-sharing attributesIn practice...

Example

i n t a ;void foo ( ) {

i n t b , c ;#pragma omp parallel shared ( b )#pragma omp parallel private ( b ){

i n t d ;#pragma omp task{

i n t e ;

a = sharedb = firstprivatec = sharedd = firstprivatee = private

} } }

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 76 / 217

Page 108: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

Task default data-sharing attributesIn practice...

Example

i n t a ;void foo ( ) {

i n t b , c ;#pragma omp parallel shared ( b )#pragma omp parallel private ( b ){

i n t d ;#pragma omp task{

i n t e ;

a = sharedb = firstprivatec = sharedd = firstprivatee = private

} } }

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 76 / 217

Tip: default(none) is your friend if you do not see it clearly

Page 109: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP tasks

List traversal

Example

void t r a v e r s e _ l i s t ( L i s t l ){

Element e ;for ( e = l−> f i r s t ; e ; e = e−>next )

#pragma omp taskprocess ( e ) ;

}e is firstprivate

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 77 / 217

Page 110: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

Outline

OpenMP tasks

Task synchronization

The single construct

Task clauses

Common tasking problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 78 / 217

Page 111: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

Task synchronization

There are two main constructs to synchronize tasks:barrier

Remember: all previous work (including tasks) must be completed

taskwait

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 79 / 217

Page 112: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

Waiting for children

The taskwait construct

#pragma omp taskwait

Suspends the current task until all children tasks are completedJust direct children, not descendants

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 80 / 217

Page 113: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

Taskwait

Example

void t r a v e r s e _ l i s t ( L i s t l ){

Element e ;for ( e = l−> f i r s t ; e ; e = e−>next )

#pragma omp taskprocess ( e ) ;

#pragma omp taskwait

}

All tasks guaranteed to be completed here

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 81 / 217

Page 114: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

Taskwait

Example

void t r a v e r s e _ l i s t ( L i s t l ){

Element e ;for ( e = l−> f i r s t ; e ; e = e−>next )

#pragma omp taskprocess ( e ) ;

#pragma omp taskwait

}All tasks guaranteed to be completed here

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 81 / 217

Page 115: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

Taskwait

Example

void t r a v e r s e _ l i s t ( L i s t l ){

Element e ;for ( e = l−> f i r s t ; e ; e = e−>next )

#pragma omp taskprocess ( e ) ;

#pragma omp taskwait

}

All tasks guaranteed to be completed here

Now we need some threadsto execute the tasks

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 81 / 217

Page 116: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

List traversalCompleting the picture

Example

L i s t l

#pragma omp parallelt r a v e r s e _ l i s t ( l ) ;

This will generate multiple traversalsWe need a way to have a singlethread execute traverse_list

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 82 / 217

Page 117: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

List traversalCompleting the picture

Example

L i s t l

#pragma omp parallelt r a v e r s e _ l i s t ( l ) ; This will generate multiple traversals

We need a way to have a singlethread execute traverse_list

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 82 / 217

Page 118: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task synchronization

List traversalCompleting the picture

Example

L i s t l

#pragma omp parallelt r a v e r s e _ l i s t ( l ) ;

This will generate multiple traversals

We need a way to have a singlethread execute traverse_list

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 82 / 217

Page 119: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The single construct

Outline

OpenMP tasks

Task synchronization

The single construct

Task clauses

Common tasking problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 83 / 217

Page 120: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The single construct

Giving work to just one thread

The single construct

#pragma omp single [ c lauses ]s t r u c t u r e d block

where clauses can be:privatefirstprivatenowaitcopyprivate

Only one thread of the team executes the structured blockThere is an implicit barrier at the end

We’ll see it laterNot today

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 84 / 217

Page 121: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The single construct

The single construct

Example

i n t main ( i n t argc , char ∗∗argv ){

#pragma omp parallel{

#pragma omp single{

p r i n t f ("Hello world!\n" ) ;}

}}

This program outputs justone “Hello world”

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 85 / 217

Page 122: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The single construct

The single construct

Example

i n t main ( i n t argc , char ∗∗argv ){

#pragma omp parallel{

#pragma omp single{

p r i n t f ("Hello world!\n" ) ;}

}}

This program outputs justone “Hello world”

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 85 / 217

Page 123: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The single construct

List traversalCompleting the picture

Example

L i s t l

#pragma omp parallel#pragma single

t r a v e r s e _ l i s t ( l ) ;

One thread creates the tasks of the traversalAll threads cooperate to execute them

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 86 / 217

Page 124: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The single construct

List traversalCompleting the picture

Example

L i s t l

#pragma omp parallel#pragma single

t r a v e r s e _ l i s t ( l ) ; One thread creates the tasks of the traversal

All threads cooperate to execute them

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 86 / 217

Page 125: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The single construct

List traversalCompleting the picture

Example

L i s t l

#pragma omp parallel#pragma single

t r a v e r s e _ l i s t ( l ) ;

One thread creates the tasks of the traversal

All threads cooperate to execute them

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 86 / 217

Page 126: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task clauses

Outline

OpenMP tasks

Task synchronization

The single construct

Task clauses

Common tasking problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 87 / 217

Page 127: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task clauses

Task scheduling

How it works?Tasks are tied by default

Tied tasks are executed always by the same threadNot necessarily the creator

Tied tasks have scheduling restrictionsDeterministic scheduling points (creation, synchronization, ... )

Tasks can be suspended/resumed at these points

Another constraint to avoid deadlock problems

Tied tasks may run into performance problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 88 / 217

Page 128: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task clauses

The untied clause

A task that has been marked as untied has none of the previousscheduling restrictions:

Can potentially switch to any threadCan potentially switch at any momentBad mix with thread based features

thread-id, critical regions, threadprivate

Gives the runtime more flexibility to schedule tasks

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 89 / 217

Page 129: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Task clauses

The if clause

If the the expression of an if clause evaluates to falseThe encountering task is suspendedThe new task is executed immediately

with its own data environmentdifferent task with respect to synchronization

The parent task resumes when the task finishesAllows implementations to optimize task creation

For very fine grain task you may need to do your own if

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 90 / 217

Page 130: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Outline

OpenMP tasks

Task synchronization

The single construct

Task clauses

Common tasking problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 91 / 217

Page 131: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)

{s t a t e [ j ] = i ;i f ( ok ( j +1 , s t a t e ) ) {

search ( n , j +1 , s t a t e ) ;}

}}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 92 / 217

Page 132: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

s t a t e [ j ] = i ;i f ( ok ( j +1 , s t a t e ) ) {

search ( n , j +1 , s t a t e ) ;}

}}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 92 / 217

Page 133: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

s t a t e [ j ] = i ;i f ( ok ( j +1 , s t a t e ) ) {

search ( n , j +1 , s t a t e ) ;}

}}

Data scopingBecause it’s an orphanedtask all variables arefirstprivate

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 92 / 217

Page 134: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

s t a t e [ j ] = i ;i f ( ok ( j +1 , s t a t e ) ) {

search ( n , j +1 , s t a t e ) ;}

}}

Data scopingBecause it’s an orphanedtask all variables arefirstprivate

State is not capturedJust the pointer is capturednot the pointed data

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 92 / 217

Page 135: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

s t a t e [ j ] = i ;i f ( ok ( j +1 , s t a t e ) ) {

search ( n , j +1 , s t a t e ) ;}

}}

Problem #1Incorrectly capturingpointed data

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 92 / 217

Page 136: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Problem #1Incorrectly capturing pointed data

Problemfirstprivate does not allow to capture data through pointers

Solutions1 Capture it manually2 Copy it to an array and capture the array with firstprivate

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 93 / 217

Page 137: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 94 / 217

Page 138: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}}

Caution!Will state still be valid by thetime memcpy is executed?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 94 / 217

Page 139: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}}

Problem #2Data can go out of scope!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 94 / 217

Page 140: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Problem #2Out-of-scope data

ProblemStack-allocated parent data can become invalid before being used bychild tasks

Only if not captured with firstprivate

Solutions1 Use firstprivate when possible2 Allocate it in the heap

Not always easy (we also need to free it)3 Put additional synchronizations

May reduce the available parallelism

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 95 / 217

Page 141: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++ ;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 96 / 217

Page 142: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++ ;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

Shared variable needs protected access

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 96 / 217

Page 143: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /s o l u t i o n s ++ ;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

SolutionsUse critical

Use atomic

Use threadprivate

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 96 / 217

Page 144: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Reductions for tasks

Example

i n t s o l u t i o n s =0;i n t mysolutions=0;#pragma omp t h r e a d p r i v a t e (mysolutions )

void s ta r t_sea rch ( ){#pragma omp parallel{

#pragma omp single{

bool i n i t i a l _ s t a t e [ n ] ;search ( n ,0 , i n i t i a l _ s t a t e ) ;

}#pragma omp atomic

s o l u t i o n s += mysolutions ;}

}

Use a separate counter for each thread

Accumulate them at the end

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 97 / 217

Page 145: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /mysolutions++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 98 / 217

Page 146: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /mysolutions++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

Pruning mechanism potentially introducesimbalance in the tree

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 99 / 217

Page 147: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /mysolutions++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task untied{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

Untied clauseAllows theimplementation toeasier load balance

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 99 / 217

Page 148: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /mysolutions++ ;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task untied{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

Because of untied this is not safe!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 100 / 217

Page 149: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Pitfall #3Unsafe use of untied tasks

ProblemBecause tasks can migrate between threads at any pointthread-centric constructs can yield unexpected results

RememberWhen using untied tasks avoid:

Threadprivate variablesAny thread-id uses

And be very careful with:Critical regions (and locks)

Simple solutionCreate a task tied region with #pragma omp task if(0)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 101 / 217

Page 150: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Search problem

Example

void search ( i n t n , i n t j , bool ∗s ta te ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /#pragma omp task i f ( 0 )mysolutions++ ;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task untied{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state ) ;}

}

#pragma omp taskwait}

Now this statement is tied and safe

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 102 / 217

Page 151: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Task granularity

Granularity is a key performance factorTasks tend to be fine-grainedTry to “group“ tasks together

Use if clause or manual transformations

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 103 / 217

Page 152: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Using the if clause

Example

void search ( i n t n , i n t j , bool ∗s ta te , int depth ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /#pragma omp task i f ( 0 )mysolut ions ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task untied if(depth < MAX_DEPTH){

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

search ( n , j +1 , new_state,depth+1 ) ;}

}#pragma omp taskwait

}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 104 / 217

Page 153: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Common tasking problems

Using an if statement

Example

void search ( i n t n , i n t j , bool ∗s ta te , int depth ){

i n t i , res ;

i f ( n == j ) {/∗ good so lu t i on , count i t ∗ /#pragma omp task i f ( 0 )mysolut ions ++;return ;

}

/∗ t r y each poss ib le s o l u t i o n ∗ /for ( i = 0 ; i < n ; i ++)#pragma omp task untied{

bool ∗new_state = a l l o c a ( sizeof ( bool )∗n ) ;memcpy( new_state , s ta te , sizeof ( bool )∗n ) ;new_state [ j ] = i ;i f ( ok ( j +1 , new_state ) ) {

if ( depth < MAX_DEPTH )search ( n , j +1 , new_state,depth+1 ) ;

elsesearch_serial(n,j+1,new_state);

}}#pragma omp taskwait

}Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 105 / 217

Page 154: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part V

Hands-on (II)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 106 / 217

Page 155: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

List traversal

Computing Pi

Finding Fibonacci

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 107 / 217

Page 156: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Before you start

Copy the exercises to your directory:

$ cp -a∼aduran/Prace_OpenMP_Handson_1/tasking .

Enter the tasking directory to do the following exercises.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 108 / 217

Page 157: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

List traversal

Outline

List traversal

Computing Pi

Finding Fibonacci

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 109 / 217

Page 158: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

List traversal

List traversal

Examine the codeTake a look at the list.cc file which implements a parallel list traversalwith OpenMP.

1 What should be the output of executing this program?2 Run it with one thread:

$ ./list

3 Do you get the expected result?4 Run it with two threads:

$ OMP_NUM_THREADS=2 ./list

5 Does it work?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 110 / 217

Page 159: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

List traversal

List traversal

Fix itFix the list traversal so it gets the correct result with two threads (ormore). Use the following questions as a guide to help you:

1 How many tasks are being generated?2 Which is the data scoping in each construct?3 Are memory accesses properly synchronized?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 111 / 217

Page 160: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Outline

List traversal

Computing Pi

Finding Fibonacci

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 112 / 217

Page 161: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Computing Pi

Our algorithmWe will use an algorithm that computes the pi number throughnumerical integration.

Take a look at the pi.c fileBecause iterations are independent we will create one task periteration

When you run make it will generate two programs: pi.serial andpi.omp. We will use the serial version to evaluate our parallel version.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 113 / 217

Page 162: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Computing Pi

Measuring timeTo get reliable execution times will use the Altix batch system. Usethe following command to launch your executions:

$ make run-$program-$threads

It sets up OMP_NUM_THREADS for youIt will generate an output file in your directory when it finishes.You can check your status with mnqRun both versions with one thread

$ make run-pi.ser-1

$ make run-pi.omp-1

When they finish compare the results. Now run it with 2 threads.What do you observe? How is this possible?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 114 / 217

Page 163: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Computing Pi

ProblemsOur version of pi has two main problems:

Tasks are too fine grain. The overheads associated with creating atask cannot be overcome.There is too much synchronization. Hidden synchronization andcommunications are a common source of performance problems.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 115 / 217

Page 164: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Computing Pi

Increase the granularity1 Modify the pi program so that each task executes a chunk of N

iterations,2 Experiment with different numbers of N and see how the execution

time changesWhich would be the optimal number for N?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 116 / 217

Page 165: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Computing Pi

Reduce the number of synchronizations1 Modify the pi program so that instead of using critical uses anatomic construct

Does the execution time improve?2 We can improve it further by reducing the number of atomic

accessesUse a private variable and only do one atomic update at theend of the task

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 117 / 217

Page 166: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Computing Pi

Final numbers1 Run our improved version up to 8 threads.

Does it scale?How does it compare to the serial version?

2 Now increase the total number of iterations by 10 and run it again.

How it behaves now?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 118 / 217

Page 167: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi

Computing Pi

Some conclusionsIt’s difficult to go further than this with tasks

Task parallelism is very flexible but we need to overcome theoverheads

Beware hidden communication and synchronizationsOpenMP parallelization is an incremental process

As every other paradigm, sometimes we need effort to obtainoptimal performance

We’ll see later how to improve further our pi program

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 119 / 217

Page 168: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Finding Fibonacci

Outline

List traversal

Computing Pi

Finding Fibonacci

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 120 / 217

Page 169: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Finding Fibonacci

Fibonacci

The algorithmWe used a recursive implementation to find the Fibonacci number inthe fib.c file.

It’s very inefficientBut useful for educational purposes :-)

To compile it use:

$ make fib

To submit jobs use:

$ make run-fib-threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 121 / 217

Page 170: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Finding Fibonacci

Fibonacci

FirstComplete the code so all the branches are computed in parallel

Use the serial version to check you have the correct resultAdd code to measure the time it takes to compute the number

To be more precise put the code inside the single region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 122 / 217

Page 171: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Finding Fibonacci

Fibonacci

Evaluate1 Run the code from 1 to 8 threads.2 Compare it to the time of the serial version3 What do you observe?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 123 / 217

Page 172: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Finding Fibonacci

Fibonacci

Incresing granularityAs in the pi program, Fibonacci because it recursive nature ends gen-erating to fine grain tasks.

1 Modify the program so it does not generate tasks at all when n istoo small (e.g. 20)

2 Run again this improved version up to 8 threads3 How does it compare with respect to the serial version?4 Try changing the cut-off value from 20 and how affects

performance

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 124 / 217

Page 173: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part VI

Data Parallelism in OpenMP

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 125 / 217

Page 174: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

The worksharing concept

Loop worksharing

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 126 / 217

Page 175: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The worksharing concept

Outline

The worksharing concept

Loop worksharing

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 127 / 217

Page 176: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The worksharing concept

Worksharings

Worksharing constructs divide the execution of a code region amongthe threads of a team

Threads cooperate to do some workBetter way to split work than using thread-idsLower overhead than using tasks

But, less flexible

In OpenMP, there are four worksharing constructs:singleloop worksharingsectionworkshare

We’ll see them later

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 128 / 217

Restriction: worksharings cannot be nested

Page 177: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Outline

The worksharing concept

Loop worksharing

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 129 / 217

Page 178: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Loop parallelism

The for construct

#pragma omp for [ c lauses ]for ( i n i t −expr ; t es t−expr ; inc−expr )

where clauses can be:privatefirstprivatelastprivate(variable-list)reduction(operator:variable-list)schedule(schedule-kind)nowaitcollapse(n)ordered We’ll see it later

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 130 / 217

Page 179: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

How it works?The iterations of the loop(s) associated to the construct are dividedamong the threads of the team.

Loop iterations must be independentLoops must follow a form that allows to compute the number ofiterationsValid data types for inductions variables are: integer types,pointers and random access iterators (in C++)

The induction variable(s) are automatically privatized

The default data-sharing attribute is shared

It can be merged with the parallel construct:#pragma omp parallel for

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 131 / 217

Page 180: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

Example

void foo ( i n t ∗m, i n t N, i n t M){

i n t i ;#pragma omp parallel for private ( j )for ( i = 0 ; i < N; i ++ )

for ( j = 0 ; j < M; j ++ )m[ i ] [ j ] = 0 ;

}

The i variable is automatically privatizedMust be explicitly privatized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 132 / 217

Page 181: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

Example

void foo ( i n t ∗m, i n t N, i n t M){

i n t i ;#pragma omp parallel for private ( j )for ( i = 0 ; i < N; i ++ )

for ( j = 0 ; j < M; j ++ )m[ i ] [ j ] = 0 ;

}

New created threads cooperate to exe-cute all the iterations of the loop

The i variable is automatically privatizedMust be explicitly privatized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 132 / 217

Page 182: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

Example

void foo ( i n t ∗m, i n t N, i n t M){

i n t i ;#pragma omp parallel for private ( j )for ( i = 0 ; i < N; i ++ )

for ( j = 0 ; j < M; j ++ )m[ i ] [ j ] = 0 ;

}

The i variable is automatically privatized

Must be explicitly privatized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 132 / 217

Page 183: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

Example

void foo ( i n t ∗m, i n t N, i n t M){

i n t i ;#pragma omp parallel for private ( j )for ( i = 0 ; i < N; i ++ )

for ( j = 0 ; j < M; j ++ )m[ i ] [ j ] = 0 ;

}

The i variable is automatically privatized

Must be explicitly privatized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 132 / 217

Page 184: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

Example

void foo ( s td : : vector < int > &v ){#pragma omp parallel forfor ( s td : : vector < int > : : i t e r a t o r i t = v . begin ( ) ;

i t < v . end ( ) ;i t ++ )

∗ i t = 0 ;}

random access iterators(and pointers) are valid

types!= cannot be used in the test expression

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 133 / 217

Page 185: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

Example

void foo ( s td : : vector < int > &v ){#pragma omp parallel forfor ( s td : : vector < int > : : i t e r a t o r i t = v . begin ( ) ;

i t < v . end ( ) ;i t ++ )

∗ i t = 0 ;}

random access iterators(and pointers) are valid

types

!= cannot be used in the test expression

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 133 / 217

Page 186: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The for construct

Example

void foo ( s td : : vector < int > &v ){#pragma omp parallel forfor ( s td : : vector < int > : : i t e r a t o r i t = v . begin ( ) ;

i t < v . end ( ) ;i t ++ )

∗ i t = 0 ;}

random access iterators(and pointers) are valid

types

!= cannot be used in the test expression

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 133 / 217

Page 187: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Removing dependences

Example

x = 0;for ( i = 0 ; i < n ; i ++ ){

v [ i ] = x ;x += dx ;

}

Each iteration x depends on theprevious one. Can’t be parallelized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 134 / 217

Page 188: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Removing dependences

Example

x = 0;for ( i = 0 ; i < n ; i ++ ){

v [ i ] = x ;x += dx ;

}

Each iteration x depends on theprevious one. Can’t be parallelized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 134 / 217

Page 189: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Removing dependences

Example

x = 0;for ( i = 0 ; i < n ; i ++ ){

x = i ∗ dx ;v [ i ] = x ;

}

But x can be rewritten in terms of i .Now it can be parallelized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 135 / 217

Page 190: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Removing dependences

Example

x = 0;#pragma omp parallel for private ( x )for ( i = 0 ; i < n ; i ++ ){

x = i ∗ dx ;v [ i ] = x ;

}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 136 / 217

Page 191: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The lastprivate clause

When a variable is declared lastprivate, a private copy isgenerated for each thread. Then the value of the variable in the lastiteration of the loop is copied back to the original variable.

A variable can be both firstprivate and lastprivate

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 137 / 217

Page 192: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The lastprivate clause

Example

i n t i#pragma omp for l a s t p r i v a t e ( i )for ( i = 0 ; i < 100; i ++ )

v [ i ] = 0 ;

p r i n t f ("i=%d\n" , i ) ;

prints 100

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 138 / 217

Page 193: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The lastprivate clause

Example

i n t i#pragma omp for l a s t p r i v a t e ( i )for ( i = 0 ; i < 100; i ++ )

v [ i ] = 0 ;

p r i n t f ("i=%d\n" , i ) ; prints 100

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 138 / 217

Page 194: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The reduction clause

A very common pattern is where all threads accumulate some valuesinto a shared variable

E.g., n += v[i], our pi program, ...Using critical or atomic is not good enough

Besides being error prone and cumbersome

Instead we can use the reduction clause for basic types.Valid operators for C/C++: +,-,*,|,||,&,&&,^Valid operators for Fortran: +,-,*,.and.,.or.,.eqv.,.neqv.,max,min

also supports reductions of arrays

The compiler creates a private copy that is properly initializedAt the end of the region, the compiler ensures that the sharedvariable is properly (and safely) updated.

We can also specify reduction variables in the parallel construct.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 139 / 217

Page 195: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The reduction clause

Example

i n t vector_sum ( i n t n , i n t v [ n ] ){

i n t i , sum = 0;#pragma omp parallel for reduction ( + :sum)

for ( i = 0 ; i < n ; i ++ )sum += v [ i ] ;

return sum;}

Private copy initialized here to the identity value

Shared variable updated here with the partial values of each thread

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 140 / 217

Page 196: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The reduction clause

Example

i n t vector_sum ( i n t n , i n t v [ n ] ){

i n t i , sum = 0;#pragma omp parallel for reduction ( + :sum)

for ( i = 0 ; i < n ; i ++ )sum += v [ i ] ;

return sum;}

Private copy initialized here to the identity value

Shared variable updated here with the partial values of each thread

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 140 / 217

Page 197: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Also in parallel

Example

i n t nt = 0 ;

#pragma omp parallel reduction ( + : n t )n t ++;

p r i n t f ("%d\n" , n t ) ;

reduction available in parallel as well

Prints the number of threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 141 / 217

Page 198: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Also in parallel

Example

i n t nt = 0 ;

#pragma omp parallel reduction ( + : n t )n t ++;

p r i n t f ("%d\n" , n t ) ;

reduction available in parallel as well

Prints the number of threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 141 / 217

Page 199: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Also in parallel

Example

i n t nt = 0 ;

#pragma omp parallel reduction ( + : n t )n t ++;

p r i n t f ("%d\n" , n t ) ;

reduction available in parallel as well

Prints the number of threads

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 141 / 217

Page 200: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The schedule clause

The schedule clause determines which iterations are executed byeach thread.

If no schedule clause is present then is implementation definedThere are several possible options as schedule:

STATIC

STATIC,chunk

DYNAMIC[,chunk]

GUIDED[,chunk]

AUTO

RUNTIME

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 142 / 217

Page 201: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The schedule clause

Static scheduleThe iteration space is broken in chunks of approximately sizeN/num − threads. Then these chunks are assigned to the threads in aRound-Robin fashion.

Static,N schedule (Interleaved)The iteration space is broken in chunks of size N. Then these chunksare assigned to the threads in a Round-Robin fashion.

Characteristics of static schedulesLow overheadGood locality (usually)Can have load imbalance problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 143 / 217

Page 202: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The schedule clause

Dynamic,N scheduleThreads dynamically grab chunks of N iterations until all iterationshave been executed. If no chunk is specified, N = 1.

Guided,N scheduleVariant of dynamic. The size of the chunks deceases as the threadsgrab iterations, but it is at least of size N. If no chunk is specified,N = 1.

Characteristics of dynamic schedulesHigher overheadNot very good locality (usually)Can solve imbalance problems

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 144 / 217

Page 203: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The schedule clause

Auto scheduleIn this case, the implementation is allowed to do whatever it wishes.

Do not expect much of it as of now

Runtime scheduleThe decision is delayed until the program is run through thesched-nvar ICV. It can be set with:

The OMP_SCHEDULE environment variableThe omp_set_schedule() API call

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 145 / 217

Page 204: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

False sharing

When a thread writes to a cache location, and another threadreads the same location the coherence protocol will copy the datafrom one cache to the other. This is called true sharingBut it can happen that this communication happens even if twothreads are not working on the same memory address. This isfalse sharing

Cpu1 Cpu2

x y

Invalidations

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 146 / 217

Page 205: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Scheduling

Example

i n t v [N ] ;

#pragma omp forfor ( i n t i = 0 ; i < N; i ++ )

for ( i n t j = 0 ; j < i ; j ++ )v [ i ] += j ;

i loop quite unbalaceddynamic schedule?

lots of false sharing!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 147 / 217

Page 206: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Scheduling

Example

i n t v [N ] ;

#pragma omp forfor ( i n t i = 0 ; i < N; i ++ )

for ( i n t j = 0 ; j < i ; j ++ )v [ i ] += j ;

i loop quite unbalaced

dynamic schedule?

lots of false sharing!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 147 / 217

Page 207: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Scheduling

Example

i n t v [N ] ;

#pragma omp forfor ( i n t i = 0 ; i < N; i ++ )

for ( i n t j = 0 ; j < i ; j ++ )v [ i ] += j ;

i loop quite unbalaced

dynamic schedule?

lots of false sharing!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 147 / 217

Page 208: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

Scheduling

Example

i n t v [N ] ;

#pragma omp forfor ( i n t i = 0 ; i < N; i ++ )

for ( i n t j = 0 ; j < i ; j ++ )v [ i ] += j ;

i loop quite unbalaceddynamic schedule?

lots of false sharing!

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 147 / 217

Page 209: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The nowait clause

When a worksharing has a nowait clause then the implicit barrierat the end of the loop is removed.

This allows to overlap the execution of non-dependentloops/tasks/worksharings

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 148 / 217

Page 210: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The nowait clause

Example

#pragma omp for nowaitfor ( i = 0 ; i < n ; i ++ )

v [ i ] = 0 ;#pragma omp forfor ( i = 0 ; i < n ; i ++ )

a [ i ] = 0 ;

First and second loop are indepen-dent so we can overlap them

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 149 / 217

Page 211: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The nowait clause

Example

#pragma omp for nowaitfor ( i = 0 ; i < n ; i ++ )

v [ i ] = 0 ;#pragma omp forfor ( i = 0 ; i < n ; i ++ )

a [ i ] = 0 ;

On a side note, you would be bet-ter by fusing the loops in this case

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 149 / 217

Page 212: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The nowait clause

Example

#pragma omp for nowaitfor ( i = 0 ; i < n ; i ++ )

v [ i ] = 0 ;#pragma omp forfor ( i = 0 ; i < n ; i ++ )

a [ i ] = v [ i ]∗v [ i ] ;

First and second loop are depen-dent!. No guarantees that the pre-vious iteration is finished

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 150 / 217

Page 213: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The nowait clause

Exception: static schedulesIf the two (or more) loops have the same static schedule and allhave the same number of iterations.

Example

#pragma omp for schedule ( stat ic , 2 ) nowaitfor ( i = 0 ; i < n ; i ++ )

v [ i ] = 0 ;#pragma omp for schedule ( stat ic , 2 )for ( i = 0 ; i < n ; i ++ )

a [ i ] = v [ i ]∗v [ i ] ;

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 151 / 217

Page 214: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The collapse clause

Allows to distribute work from a set of n nested loops.Loops must be perfectly nestedThe nest must traverse a rectangular iteration space

Example

#pragma omp for collapse ( 2 )for ( i = 0 ; i < N; i ++ )

for ( j = 0 ; j < M; j ++ )foo ( i , j ) ;

i and j loops are folded and itera-tions distributed among all threads.Both i and j are privatized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 152 / 217

Page 215: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Loop worksharing

The collapse clause

Allows to distribute work from a set of n nested loops.Loops must be perfectly nestedThe nest must traverse a rectangular iteration space

Example

#pragma omp for collapse ( 2 )for ( i = 0 ; i < N; i ++ )

for ( j = 0 ; j < M; j ++ )foo ( i , j ) ;

i and j loops are folded and itera-tions distributed among all threads.Both i and j are privatized

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 152 / 217

Page 216: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Break

Coffee time! :-)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 153 / 217

Page 217: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part VII

Hands-on (III)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 154 / 217

Page 218: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

Matrix Multiply

Computing Pi (revisited)

Mandelbrot

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 155 / 217

Page 219: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Before you start

Copy the exercises to your directory:

$ cp -a∼aduran/Prace_OpenMP_Handson_2/worksharing.

Enter the worksharing directory to do the following exercises.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 156 / 217

Page 220: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Matrix Multiply

Outline

Matrix Multiply

Computing Pi (revisited)

Mandelbrot

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 157 / 217

Page 221: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Matrix Multiply

Matrix Multiply

Parallel loopsThe file matmul implements a sequential matrix multiply.

1 Use OpenMP worksharings to parallelize the application.check the init_mat and matmul functions

2 Run it up to 8 threads to check the scalability

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 158 / 217

Remember: To submit it use make run-matmul.omp-$threads

Page 222: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Matrix Multiply

Matrix Multiply

Memory matters!To optimize accesses to the cache in these kind of algorithms, it is acommon practice to “logically” split the matrix in blocks of size BxB, anddo computation block-a-block instead of going through all the matrix atonce.

1 Implement such a blocking scheme for our matrix multiply2 Experiment with different sizes of B3 Run it up to 8 threads and compare the results with the previous

version

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 159 / 217

Tip: You need three additional inner loops

Page 223: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi (revisited)

Outline

Matrix Multiply

Computing Pi (revisited)

Mandelbrot

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 160 / 217

Page 224: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi (revisited)

Computing Pi

Using data parallelism1 Complete the implementation of our pi algorithm using data

parallelism2 Execute with 1 and 2 threads.

Does it scale?How does it compare to our previous implementation with tasks?What is the problem?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 161 / 217

Page 225: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Computing Pi (revisited)

Computing Pi

ProblemThe number of synchronizations is still very high for this program toscale.

Using reduction

1 Change the program to make use of the reduction clause2 Run it up to 8 threads3 How it compares to the previous version?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 162 / 217

Page 226: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Mandelbrot

Outline

Matrix Multiply

Computing Pi (revisited)

Mandelbrot

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 163 / 217

Page 227: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Mandelbrot

Mandelbrot

More data parallelismWe will now parallelize an algorithm that generates sections of the Man-delbrot function.

1 Edit file mandel.c and complete the parallelization in functionmandel

Note that there is a dependence on the variable x

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 164 / 217

Page 228: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Mandelbrot

Mandelbrot

Uncover load imbalanceWe can see that each point in the final output is computed through themandel_point function. If we check the code of that function we can seethat the number of iterations it takes will be different from one point toanother.We want to know how many iterations (this also happens to be the resultof mandel_point) each thread does.

1 Add a private counter to each thread2 Add to this counter the result of each mandel_point call by that

thread3 Output the count for each thread at the end of the parallel region4 What do you observe?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 165 / 217

Page 229: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Mandelbrot

Mandelbrot

Playing with schedules

To overcome the observed load imbalance we can use a different loopschedule.

Use the clause schedule(runtime) so the schedule is notfixed at compile timeNow run different experiments with different schedules andnumber of threads

Try at least static, dynamic and guided

Which one obtains the best result?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 166 / 217

Tip: Change OMP_SCHEDULE before doing make run-...

Page 230: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part VIII

Other OpenMP Topics

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 167 / 217

Page 231: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

The master construct

Other synchronization mechanisms

Nested parallelism

Other worksharings

Other environment variables and API calls

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 168 / 217

Page 232: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The master construct

Outline

The master construct

Other synchronization mechanisms

Nested parallelism

Other worksharings

Other environment variables and API calls

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 169 / 217

Page 233: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The master construct

Only the master thread

The master construct

#pragma omp masters t r u c t u r e d block

The structured block is only executed by the master threadUseful when we want always the same thread to execute something

No implicit barrier at the end

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 170 / 217

Page 234: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The master construct

Master construct

Example

void foo ( ){

#pragma omp parallel{

#pragma omp singlep r i n f t ("I am %d\n" , omp_get_thread_num ( ) ) ;

#pragma omp masterp r i n f t ("I am %d\n" , omp_get_thread_num ( ) ) ;

}}

Can be any thread

It’s always thread 0

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 171 / 217

Page 235: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

The master construct

Master construct

Example

void foo ( ){

#pragma omp parallel{

#pragma omp singlep r i n f t ("I am %d\n" , omp_get_thread_num ( ) ) ;

#pragma omp masterp r i n f t ("I am %d\n" , omp_get_thread_num ( ) ) ;

}}

Can be any thread

It’s always thread 0

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 171 / 217

Page 236: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Outline

The master construct

Other synchronization mechanisms

Nested parallelism

Other worksharings

Other environment variables and API calls

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 172 / 217

Page 237: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Ordering

The ordered construct

#pragma omp ordereds t r u c t u r e d block

Must appear in the dynamic extend of a loop worksharingThe worksharing must also have the ordered clause

The structured block is executed in the iteration’s sequential order

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 173 / 217

Page 238: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Locks

OpenMP provides lock primitives for low-level synchronizationomp_init_lock Initialize the lockomp_set_lock Acquires the lockomp_unset_lock Releases the lockomp_test_lock Tries to acquire the lock (won’t block)omp_destroy_lock Frees lock resources

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 174 / 217

Page 239: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Locks

OpenMP provides lock primitives for low-level synchronizationomp_init_lock Initialize the lockomp_set_lock Acquires the lockomp_unset_lock Releases the lockomp_test_lock Tries to acquire the lock (won’t block)omp_destroy_lock Frees lock resources

OpenMP also provides nested locks where the thread owning the lockcan reacquire the lock without blocking.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 174 / 217

Page 240: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Locks

Example

#include <omp . h>void foo ( ){

omp_lock_t l ock ;

omp_init_lock(& lock ) ;#pragma omp parallel{

omp_set_lock(& lock ) ;/ / mutual exc lus ion reg ionomp_unset_lock(& lock ) ;

}omp_destroy_lock(& lock ) ;

}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 175 / 217

Page 241: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Locks

Example

#include <omp . h>void foo ( ){

omp_lock_t l ock ;

omp_init_lock(& lock ) ;#pragma omp parallel{

omp_set_lock(& lock ) ;/ / mutual exc lus ion reg ionomp_unset_lock(& lock ) ;

}omp_destroy_lock(& lock ) ;

}

Lock must be initialized before being used

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 175 / 217

Page 242: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Locks

Example

#include <omp . h>void foo ( ){

omp_lock_t l ock ;

omp_init_lock(& lock ) ;#pragma omp parallel{

omp_set_lock(& lock ) ;/ / mutual exc lus ion reg ionomp_unset_lock(& lock ) ;

}omp_destroy_lock(& lock ) ;

}

Only one thread at a time here

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 175 / 217

Page 243: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Locks

Example

# inc lude <omp . h>

omp_lock_t l ock ;

void foo ( ){

omp_set_lock(& lock ) ;}

void bar ( ){

omp_unset_lock(& lock ) ;}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 176 / 217

Page 244: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other synchronization mechanisms

Locks

Example

# inc lude <omp . h>

omp_lock_t l ock ;

void foo ( ){

omp_set_lock(& lock ) ;}

void bar ( ){

omp_unset_lock(& lock ) ;}

Locks are unstructured

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 176 / 217

Page 245: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Nested parallelism

Outline

The master construct

Other synchronization mechanisms

Nested parallelism

Other worksharings

Other environment variables and API calls

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 177 / 217

Page 246: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Nested parallelism

Nested parallelism

OpenMP parallel constructs can dynamically be nested. Thiscreates a hierarchy of teams that is called nested parallelism.Useful when not enough parallelism is available with a single levelof parallelism

More difficult to understand and manageImplementations are not required to support it

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 178 / 217

Page 247: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Nested parallelism

Controlling nested parallelism

Related Internal Control VariablesThe ICV nest-var controls whether nested parallelism isenabled or not.

Set with the OMP_NESTED environment variableSet with the omp_set_nested API callThe current value can be retrieved with omp_get_nested.

The ICV max-active-levels-var controls the maximumnumber of nested regions

Set with the OMP_MAX_ACTIVE_LEVELS environment variableSet with the omp_set_max_active_levels API callThe current value can be retrieved withomp_get_max_active_levels.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 179 / 217

Page 248: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Nested parallelism

Nested parallelism info API

To obtain information about nested parallelismHow many nested parallel regions at this point?

omp_get_level()How many active (with 2 or more threads) regions?

omp_get_active_level()Which thread-id was my ancestor?

omp_get_ancestor_thread_num(level)How many threads there are at a previous region?

omp_get_team_size(level)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 180 / 217

Page 249: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other worksharings

Outline

The master construct

Other synchronization mechanisms

Nested parallelism

Other worksharings

Other environment variables and API calls

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 181 / 217

Page 250: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other worksharings

Static tasks

The sections construct

#pragma omp sections [ c lauses ]#pragma omp section

s t r u c t u r e b lock. . .

The different section are distributed among the threadsThere is an implicit barrier at the endClauses can be:

privatelastprivatefirstprivatereductionnowait

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 182 / 217

Page 251: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other worksharings

Sections

Example

#pragma omp parallel sections num_threads ( 3 ){#pragma omp section

read ( data ) ;#pragma omp section#pragma omp parallel

work ( data ) ;#pragma omp section

w r i t e ( data ) ;}

Combined construct

Nested parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 183 / 217

Page 252: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other worksharings

Sections

Example

#pragma omp parallel sections num_threads ( 3 ){#pragma omp section

read ( data ) ;#pragma omp section#pragma omp parallel

work ( data ) ;#pragma omp section

w r i t e ( data ) ;}

Combined construct

Nested parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 183 / 217

Page 253: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other worksharings

Sections

Example

#pragma omp parallel sections num_threads ( 3 ){#pragma omp section

read ( data ) ;#pragma omp section#pragma omp parallel

work ( data ) ;#pragma omp section

w r i t e ( data ) ;}

Combined construct

Sections distributed among threads

Nested parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 183 / 217

Page 254: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other worksharings

Sections

Example

#pragma omp parallel sections num_threads ( 3 ){#pragma omp section

read ( data ) ;#pragma omp section#pragma omp parallel

work ( data ) ;#pragma omp section

w r i t e ( data ) ;}

Combined construct

Nested parallel region

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 183 / 217

Page 255: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other worksharings

Supporting array syntax

The workshare construct

$!OMP WORKSHAREar ray syntax

!$OMP END WORKSHARE [NOWAIT]

Only for FortranThe array operation is distributed among threads

Example

$!OMP WORKSHAREA( 1 :M) = A( 1 :M) ∗ B( 1 :M)

!$OMP END WORKSHARE NOWAIT

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 184 / 217

Page 256: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other environment variables and API calls

Outline

The master construct

Other synchronization mechanisms

Nested parallelism

Other worksharings

Other environment variables and API calls

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 185 / 217

Page 257: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other environment variables and API calls

Other Environment variables

OMP_STACKSIZE Controls the stack size of created threadsOMP_WAIT_POLICY Controls the behaviour of idle threadsOMP_THREAD_LIMIT Limit of threads that can be createdOMP_DYNAMIC Turns on/off thread dynamic adjusting

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 186 / 217

Page 258: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Other environment variables and API calls

Other API calls

omp_in_parallel Returns true if inside a parallel re-gion

omp_get_wtick Returns the precision of the wtimeclock

omp_get_thread_limit Returns the limit of threadsomp_set_dynamic Returns whether thread dynamic

adjusting is on or offomp_get_dynamic Returns the current value of dy-

namic adjustingomp_get_schedule Returns the current loop schedule

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 187 / 217

Page 259: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part IX

Hands-on (IV)

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 188 / 217

Page 260: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 189 / 217

Page 261: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Before you start

Copy the exercises to your directory:

$ cp -a∼aduran/Prace_OpenMP_Handson_2/other .

Enter the other directory to do the following exercises.

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 190 / 217

Page 262: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Nested parallelism

First take1 Edit the file nested.c and try to understand what it does2 Run make3 Execute the programe nested with differents numbers of threads

How many messages are printed? Does it match yourexpectations?

4 Run the program again the defining the OMP_NESTED variable.E.g.:

$ OMP_NUM_THREADS=2 OMP_NESTED=true./nested

5 What is the difference? Why?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 191 / 217

Page 263: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Nested parallelism

Shaping the tree1 Now, change the code so the nested level only creates as many

threads as the parent id+1Thread 0 creates a nested parallel region of 1

Thread 1 creates a nested parallel region of 2

...

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 192 / 217

Tip: Use either omp_set_num_threads or num_threads

Page 264: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Locks

Exclusive access1 Edit the file lock.c and take a look at the code2 Parallelize the first two loops of the application3 Now run it several times with different numbers of threads4 We see that result differs because of improper synchronization5 Use critical to fix it

What problem do we have?

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 193 / 217

Page 265: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Locks

Locks to the help1 Use locks to implement a fine grain locking scheme2 Assign a lock to each position of the array a3 Then use it to lock only that position in the main loop

Does it work better?4 Now compare it to an implementation using atomic

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 194 / 217

Page 266: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Part X

OpenMP in the future

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 195 / 217

Page 267: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

Outline

How OpenMP evolves

OpenMP 3.1

OpenMP 4.0

OpenMP is Open

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 196 / 217

Page 268: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

How OpenMP evolves

Outline

How OpenMP evolves

OpenMP 3.1

OpenMP 4.0

OpenMP is Open

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 197 / 217

Page 269: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

How OpenMP evolves

The OpenMP Language Committee

Body that prepares new standard versions for the ARB.Composed by representatives of all ARB members

Lead by Bronis de Supinski from LLNL

Integrates the information about the different subcommitteesCurrently working on OpenMP 3.1

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 198 / 217

Page 270: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

How OpenMP evolves

The OpenMP Subcommittees

When a topic is deemed important or too complex usually a separategroup is formed (with a subset of the same people usually).Currently, the following subcommittees exist:

1 Error model subcommitteeIn charge of defining an error model for OpenMP

2 Tasking subcommitteeIn charge of defining new extensions to the tasking model

3 Affinity subcommitteeIn charge of breaking the flat memory model

4 Accelerators subcommitteeIn charge of integrating accelerator computing into OpenMP

5 Interoperability and Composability subcommittee

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 199 / 217

Page 271: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

How OpenMP evolves

What can we expect in the future?

DisclaimerThis are my subjective appreciations.All these dates and topics are my guessings.They might or might not happen.

Tentative TimelineNovember 2010 3.1 Public comment versionMay 2011 3.1 Final versionJune 2012 4.0 Public comment versionNovember 2012 4.0 Final version

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 200 / 217

Page 272: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 3.1

Outline

How OpenMP evolves

OpenMP 3.1

OpenMP 4.0

OpenMP is Open

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 201 / 217

Page 273: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 3.1

Clarifications

Several clarifications to different parts of the specificationNothing exciting but needs to be done

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 202 / 217

Page 274: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 3.1

Atomic extensions

Extensions to the atomic construct to allow:to do atomic writes#pragma omp atomic

x = value ;

to capture the value before/after the atomic update#pragma omp atomic

v = x , x−−;

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 203 / 217

Page 275: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 3.1

User-defined reductions

Allow the users to extend reductions to cope with non-basic types andnon-standard operators.

In 3.1Including pointer reductions in C

Including class members and operators in C++

In 4.0Array for C

Template reductions for C++

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 204 / 217

Page 276: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 3.1

User-defined reductions

Example

#pragma omp declare reduction ( + : s td : : s t r i n g : omp_out += omp_in )

void foo ( ){

s td : : s t r i n g s ;

#pragma omp parallel reduction ( + : s ){

s += "I’m a thread"}

s td : : cout << s << std : : endl ;

}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 205 / 217

Page 277: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 3.1

Affinity extensions

New environment variablesOMP_PROCBIND=true, false

Portable mechanism to bind threads

Extend OMP_NUM_THREADS to support multiple levels ofparallelismOMP_AFFINITY=scatter,compact

Specifies how threads should be distributed in the machineOMP_MEMORY_PLACEMENT=first_touch|round_robin|random

Portable mechanisms to specify memory placement policies

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 206 / 217

Page 278: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 3.1

Tasking extensions

New constructs/clausethe taskyield construct to allow user-defined scheduling pointsthe final clause to allow the optimization of leaf tasks

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 207 / 217

Page 279: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 4.0

Outline

How OpenMP evolves

OpenMP 3.1

OpenMP 4.0

OpenMP is Open

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 208 / 217

Page 280: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 4.0

Error model

Allow the programmer to catch and react to runtime errorsIntegrate C++ exceptions into this modelAllow the programmer to cancel nicely the parallel computation

It looks like we are leaning towards a model based on callbacks

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 209 / 217

Page 281: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 4.0

Error model

Example

void er ro r_hand le r ( omp_err_info_t ∗ i n fo , i n t ∗nths ){

i f ( omp_get_error_type ( i n f o ) == OMP_ERR_NOT_ENOUGH_THREADS )∗nths = ∗nths > 1 ? ∗nths −1 : 1 ;

return OMP_RETRY ;}

nths = 4;#pragma omp parallel onerror ( e r ro r_hand ler ,& nths ) num_threads ( nths ){

. . . .}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 210 / 217

Page 282: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 4.0

Other tasking improvements

Tasking reductionsAdd a reduction clause to the task construct

Tasking dependencesAllow finer tasking synchronizations by means of expressing datadependences among tasks

Scheduling hints for the runtimeAllow the programmer to express some kind of task priority

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 211 / 217

Page 283: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 4.0

Task dependences

Example

for ( ; ; ) {char ∗b u f f e r ;#pragma omp task output ( b u f f e r ){

b u f f e r = mal loc ( . . . ) ;stage1 ( b u f f e r ) ;

}#pragma omp task inout ( b u f f e r ){

stage2 ( b u f f e r )}#pragma omp task input ( b u f f e r ){

stage3 ( b u f f e r )}

}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 212 / 217

Page 284: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 4.0

Accelerators support

Discussion is in the very early stages.Several proposals on the table

Cover both data and task parallelismWill probably take care of the backend compilation

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 213 / 217

Page 285: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP 4.0

A glimpse into BSC proposal

Example

i n t main ( void ) {for ( i n t i = 0 ; i < NB; i ++)

for ( i n t j = 0 ; j < NB; j ++)for ( i n t k = 0; k < NB; k++)#pragma omp target device (smp, c e l l ) \

copy_in ( [ BS ] [ BS] A, [BS ] [ BS] B, [BS ] [ BS] C) \copy_out ( [ BS ] [ BS] C)

#pragma omp task inout ( [ BS ] [ BS] C)matmul ( A [ i ] [ k ] , B [ k ] [ j ] , C[ i ] [ j ] ) ;

}

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 214 / 217

Page 286: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP is Open

Outline

How OpenMP evolves

OpenMP 3.1

OpenMP 4.0

OpenMP is Open

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 215 / 217

Page 287: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP is Open

OpenMP is Open

CompunityCompunity represents the OpenMP User’s Group.

It is an special ARB memberRepresentative: Barbara Chapman from Univ of Houston

Anyone can join and participateand also give feedback

OpenMP ForumForum oversighted by ARB members

OpenMP usage forumSpec clarifications forum

Several 3.1 clarifications have its origin in comments from users

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 216 / 217

Page 288: Parallel Programming with OpenMP › courses › csep524 › 13wi › omp_t… · Writing OpenMP programs Headers/Macros C/C++ only omp.hcontains the API prototypes and data types

OpenMP is Open

Where to go now?

http://www.openmp.orghttp://www.compunity.orghttp://nanos.ac.upc.edu

Alex Duran (BSC) Advanced Programming with OpenMP February 2, 2013 217 / 217