an introduction to openmp · 2 openmp general concepts parallelization constructs data environment...

65

Upload: others

Post on 16-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

An Introduction to OpenMP

Mirto Musci, PhD Candidate

Department of Computer ScienceUniversity of Pavia

Processors Architecture Class, Fall 2011

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 2: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 3: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 4: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Recap

Parallel machines are everywhere

Many architectures, many programming model

Multithreading is the natural programming model for multicoreand SMP machines (desktop, server, cluster, embedded):

All processors share the same memoryThreads in a process see the same address spaceLots of shared-memory algorithms de�ned

Multithreading is (correctly) perceived to be hard!

Lots of expertise necessarySynchronization, dependencies, granularity, balancing...Non-deterministic behavior makes it hard to debug

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 5: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

What is OpenMP?

What is OpenMP?

Standard API for de�ning multi-threaded shared-memoryprogramswww.openmp.org � Talks, examples, forums, etc.Last release: OpenMP 3.1 (July 2011)

High-level API

Preprocessor (compiler) directives ( ~ 80% )

Library Calls ( ~ 19% )

Environment Variables ( ~ 1% )

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 6: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

A Programmer's View of OpenMP

OpenMP is a portable, threaded, shared-memory programmingspeci�cation with �light� syntax

Exact behavior depends on OpenMP implementation!Requires compiler support (C or Fortran)

OpenMP will:

Allow a programmer to separate a program into serial regionsand parallel regions, rather than T concurrent threads.Hide stack managementProvide synchronization constructs

OpenMP will not:

Parallelize (or detect!) dependenciesGuarantee speedupProvide freedom from data races

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 7: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 8: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Ideal Parallel Programming

Ideal Parallel Programming Paradigm

1 Start with any algorithm

Embarrassing parallelism is helpful, but not necessary

2 Implement serially, ignoring:

Data RacesSynchronizationThreading Syntax

3 Test and Debug

4 Automatically (magically?) parallelize

Expect linear speedup

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 9: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Threading Programming Paradigm

Threading Programming Paradigm

1 Start with a parallel algorithm

2 Implement, keeping in mind:

Data racesSynchronizationThreading Syntax

3 Test & Debug

4 Debug

5 Debug

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 10: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Problem

Parallelize the following code using threads:

i n t main ( ) {f o r ( i n t i =0; i <16; i++)

p r i n t f ( " He l l o , World !\ n" ) ;}

A lot of work to do a simple thing

Di�erent threading APIs:

Windows: CreateThreadUNIX: pthread_create

Problems with the code:

Di�erent code for serial and parallel versionNo built-in tuning (# of processors someone?)

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 11: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Motivation - Threading Library

vo id ∗ SayHe l l o ( vo id ∗ f oo ) {p r i n t f ( " He l l o , wor ld !\ n" ) ;re tu rn NULL ;

}

i n t main ( ) {pthread_att r_t a t t r ;pthread_t t h r e ad s [ 1 6 ] ;i n t tn ;p t h r e ad_a t t r_ i n i t (& a t t r ) ;p th r ead_at t r_se t s cope (&a t t r , PTHREAD_SCOPE_SYSTEM) ;

f o r ( tn=0; tn <16; tn++) {pth read_crea te (& th r e ad s [ tn ] , &a t t r , SayHe l lo , NULL) ; }

f o r ( tn=0; tn<16 ; tn++) {p th r ead_jo i n ( t h r e ad s [ tn ] , NULL) ; }

re tu rn 0 ;}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 12: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Motivation - Threading Library II

Thread libraries are hard to use

PThreads has many library calls for initialization,synchronization, thread creation, condition variables, etc.Programmer must code with multiple threads in mind

Synchronization between threads introduces a new dimensionof program correctness

Wouldn't it be nice to write serial programs and somehowparallelize them �automatically�?

OpenMP can parallelize many serial programs with relativelyfew annotations that specify parallelism and independenceOpenMP is a small API that hides cumbersome threading callswith simpler directives

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 13: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Motivation - OpenMP

i n t main ( ) {

// Do t h i s p a r t i n p a r a l l e l

p r i n t f ( " He l l o , World !\ n" ) ;

re tu rn 0 ;}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 14: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

Motivation - OpenMP

i n t main ( ) {

omp_set_num_threads (16) ;

// Do t h i s p a r t i n p a r a l l e l#pragma omp p a r a l l e l{

p r i n t f ( " He l l o , World !\ n" ) ;}

re tu rn 0 ;}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 15: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

OpenMP Parallel Programming

OpenMP Programming Paradigm

1 Start with a parallelizable algorithm

Embarrassing parallelism is good, loop-level parallelism isnecessary

2 Implement serially, mostly ignoring:

Data RacesSynchronizationThreading Syntax

3 Test and Debug

4 Annotate the code with parallelization (and sync) directives

Hope for linear speedup

5 Test and Debug

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 16: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

MotivationParadigm Shift

OpenMP Methodology

Parallelization with OpenMP is an optimization process.Proceed with care:

Start with a working program, then add parallelizationMeasure the changes after every stepRemember Amdahl's law.Use the pro�lertools available

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 17: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 18: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Programming Model

Thread-like Fork/Join model

Arbitrary number of logicalthread creation/ destructionevents

Serial regions by default,annotate to create parallelregions

Generic parallel regionsParallelized loopsSectioned parallel regions

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 19: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Nested Threading

Fork/Join can be nested

Nesting complication handled�automagically� atcompile-timeIndependent of the number ofthreads actually running

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 20: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Thread Identi�cation

Master Thread

Thread with ID=0Only thread that exists insequential regionsDepending onimplementation, may havespecial purpose inside parallelregionsSome special directives a�ectonly the master thread (likemaster)

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 21: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Data/Control Parallelism

Data parallelism

Threads perform similarfunctions, guided by threadidenti�er

Control parallelism

Threads perform di�eringfunctionsOne thread for I/O, one forcomputation, etc. . .

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 22: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Programming Model - Summary

Fork and Join: Master thread spawns a team of threads as needed

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 23: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Memory Model

Shared memory communication

Threads cooperates by accessing shared variables

The sharing is de�ned syntactically

Any variable that is seen by two or more threads is sharedAny variable that is seen by one thread only is private

Race conditions possible

Use synchronization to protect from con�ictsChange how data is stored to minimize the synchronization

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 24: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

OpenMP Syntax

Most of the constructs of OpenMP are pragmas

#pragma omp c o n s t r u c t [ c l a u s e [ c l a u s e ] . . . ]

(FORTRAN: !$OMP, not covered here)An OpenMP construct applies to a structural block (one entrypoint, one exit point)

In addition:

Several omp_<something> function callsSeveral OMP_<something> environment variables

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 25: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Controlling OpenMP Behavior

Function calls and (for each one) matching environment variables:

omp_set_dynamic(int)/omp_get_dynamic()

Allows the implementation to adjust the number of threadsdynamically

omp_set_num_threads(int)/omp_get_num_threads()

Control the number of threads used for parallelization(maximum in case of dynamic adjustment)

Must be called from sequential code

Also can be set by OMP_NUM_THREADS environmentvariable

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 26: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Controlling OpenMP Behavior II

omp_get_num_procs()

How many processors are currently available?

omp_get_thread_num()

omp_set_nested(int)/omp_get_nested()

Enable nested parallelism

omp_in_parallel()

Am I currently running in parallel mode?

omp_get_wtime()

A portable way to compute wall clock time

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 27: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 28: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

OpenMP Structure

OpenMP languageextensions

parallel controlstructures

work sharingdata

environmentsynchronization

runtimefunctions, env.

variables

governs flow ofcontrol in theprogram

parallel directive

distributes workamong threads

do/parallel doandsection directives

scopesvariables

shared andprivateclauses

coordinates threadexecution

critical andatomic directivesbarrier directive

runtime environment

omp_get_thread_num()OMP_NUM_THREADSOMP_SCHEDULE

omp_set_num_threads()

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 29: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Parallel Regions I

Main construct:#pragma omp parallel

De�nes a parallel region overstructured block of code

Threads are created as �parallel�pragma is crossed

Threads block at end of region(implicit barrier)

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 30: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Parallel Regions II

double D[ 1 0 0 0 ] ;#pragma omp p a r a l l e l{

i n t i ; double sum = 0 ;f o r ( i =0; i <1000; i++) sum += D[ i ] ;p r i n t f ( "Thread %d computes %f \n" , omp_thread_num ( ) , sum) ;

}

Executes the same code as many times as there are threads

How many threads do we have? omp_set_num_threads(n)What is the use of repeating the same work n times in parallel?Can use omp_thread_num() to distribute the work betweenthreads.D is shared between the threads, i and sum are private

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 31: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Implementing Work Sharing

Sequential code f o r ( i n t i =0; i<N; i++) { a [ i ]=b [ i ]+c [ i ] ; }

(Semi) manualparallelization

#pragma omp p a r a l l e l{

i n t i d = omp_get_thread_num ( ) ;i n t Nthr = omp_get_num_threads ( ) ;i n t i s t a r t = i d ∗N/Nthr ;i n t i e nd = ( i d+1)∗N/Nthr ;f o r ( i n t i= i s t a r t ; i<i end ; i++) {

a [ i ]=b [ i ]+c [ i ] ; }}

Automaticparallelization

#pragma omp p a r a l l e l f o r s c h edu l e ( s t a t i c ){

f o r ( i n t i =0; i<N; i++) {a [ i ]=b [ i ]+c [ i ] ; }

}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 32: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Work Sharing: For

Used to assign each thread anindependent set of iterations

Threads must wait at the end

Can combine the directives:

#pragma omp parallel for

Only simple kinds of for loops:

Only one signed integer variableInitialization: var=initComparison: var op last

op: <, >, <=, >=Increment: var++, var--,var+=incr, var-=incr, etc.

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 33: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Problems of #parallel for

Load balancing

If all the iterations execute at the same speed, the processorsare used optimallyIf some iterations are faster than others, some processors mayget idle, reducing the speedupWe don't always know the distribution of work, may need tore-distribute dynamically

Granularity

Thread creation and synchronization takes timeAssigning work to threads on per-iteration resolution may takemore time than the execution itself!Need to coalesce the work to coarse chunks to overcome thethreading overhead

Trade-o� between load balancing and granularity!

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 34: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Controlling Granularity

#pragma omp parallel if (expression)

Can be used to disable parallelization in some cases (when theinput is determined to be too small to be bene�ciallymultithreaded)

#pragma omp num_threads (expression)

Control the number of threads used for this parallel region

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 35: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Work Sharing: Sections

answer1 = long_computation_1 ( ) ;answer2 = long_computation_2 ( ) ;i f ( answer1 != answer2 ) { . . . }

How to parallelize? These are just two independentcomputations!

#pragma omp s e c t i o n s{#pragma omp s e c t i o nanswer1 = long_computation_1 ( ) ;#pragma omp s e c t i o nanswer2 = long_computation_2 ( ) ;

}i f ( answer1 != answer2 ) { . . . }

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 36: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Schedule Clause: Controlling Work Distribution

schedule(static [, chunksize])

Default: chunks of approximately equivalent size, one to eachthreadIf more chunks than threads: assigned in round-robin to thethreadsWhy might we want to use chunks of di�erent size?

schedule(dynamic [, chunksize])

Threads receive chunk assignments dynamicallyDefault chunk size = 1 (why?)

schedule(guided [, chunksize])

Start with large chunksThreads receive chunks dynamicallyChunk size reduces exponentially, down to chunksize

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 37: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Graphic Scheduling

static dynamic guided(1)

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 38: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Scheduling Example

The function TestForPrime (usually) takes little time

But can take long, if the number is a prime indeed

#pragma omp p a r a l l e l f o r s c h edu l e ????f o r ( i n t i = s t a r t ; i <= end ; i += 2 )

{i f ( TestForPr ime ( i ) ) gPrimesFound++;

}

Solution: use dynamic, but with chunks

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 39: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 40: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Data Visibility

Shared Memory programming model

Most variables (including locals) are shared by default(unlike Pthreads!)

{i n t sum = 0 ;#pragma omp p a r a l l e l f o rf o r ( i n t i =0; i<N; i++) sum += i ;

}

Global variables are shared

Some variables can be private

Automatic variables inside the statement blockAutomatic variables in the called functionsVariables can be explicitly declared as private. In that case, alocal copy is created for each thread

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 41: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Overriding Storage Attributes

private:

A copy of the variable iscreated for each thread.

No connection between theoriginal variable and theprivate copies

Can achieve the same usingvariables inside { }

i n t i ;

#pragma omp p a r a l l e l f o r \p r i v a t e ( i )

f o r ( i =0; i<n ; i++) { . . . }

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 42: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Overriding Storage Attributes II

�rstprivate:

Same, but the initial value iscopied from the main copy

lastprivate:

Same, but the last value iscopied to the main copy

i n t i d x =1;i n t x = 10 ;

#pragma omp p a r a l l e l f o r \f i r s p r i v a t e ( x ) \l a s t p r i v a t e ( i d x )

f o r ( i =0; i<n ; i++) {i f ( data [ i ]==x ) i d x = i ;

}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 43: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Threadprivate

Similar to private, but de�ned per variable

Declaration immediately after variable de�nition. Must bevisible in all translation units.Persistent between parallel sectionsCan be initialized from the master's copy with#pragma omp copyinMore e�cient than private, but a global variable!

Example:

i n t data [ 1 0 0 ] ;#pragma omp t h r e a d p r i v a t e ( data ). . .#pragma omp p a r a l l e l f o r copy i n ( data )f o r ( . . . )

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 44: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 45: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Synchronization

X = 0 ;

#pragma omp p a r a l l e lX = X+1;

What should the result be (assuming 2 threads)?

2 is the expected answerBut can be 1 with unfortunate interleaving

OpenMP assumes that the programmer knows what he is doing

Regions of code that are marked to run in parallel areindependentIf access collisions are possible, it is the programmer'sresponsibility to insert protection

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 46: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Synchronization Mechanisms

Many of the existing mechanisms for shared programming

OpenMP Synchronization

Nowait (turn synchronization o�!)

Single/Master execution

Critical sections, Atomic updates

Ordered

Barriers

Flush (memory subsystem synchronization)

Reduction (special case)

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 47: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Single/Master

#pragma omp single

Only one of the threads will execute the following block ofcode

The rest will wait for it to completeGood for non-thread-safe regions of code (such as I/O)Must be used in a parallel regionApplicable to parallel for sections

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 48: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Single/Master II

#pragma omp master

The following block will be executed by the master thread

No synchronization involved

Applicable only to parallel sections

#pragma omp p a r a l l e l{

do_prep roce s s i ng ( ) ;

#pragma omp s i n g l eread_input ( ) ;#pragma omp masternot i fy_input_consumed ( ) ;

do_proces s ing ( ) ;}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 49: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Critical Sections

#pragma omp critical [name]

Standard critical section functionality

Critical sections are global in the program

Can be used to protect a single resource in di�erent functions

Critical sections are identi�ed by the name

All the unnamed critical sections are mutually exclusivethroughout the programAll the critical sections having the same name are mutuallyexclusive between themselves

i n t x = 0 ;#pragma omp p a r a l l e l s ha r ed ( x ){#pragma omp c r i t i c a lx++;

}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 50: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Atomic Execution

Critical sections on the cheap

Protects a single variable updateCan be much more e�cient (a dedicated assembly instructionon some architectures)

#pragma omp atomicupdate_statement

Update statement is one of: var= var op expr, var op= expr,var++, var�.

The variable must be a scalarThe operation op is one of: +, -, *, /, ^, &, |, <�<, >�>The evaluation of expr is not atomic!

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 51: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Ordered

#pragma omp orderedstatement

Executes the statement in the sequential order of iterations

Example:

#pragma omp p a r a l l e l f o rf o r ( j =0; j<N; j++) {

i n t r e s u l t = heavy_computat ion ( j ) ;#pragma omp o rde r edp r i n t f ( " computat ion(%d ) = %d\n" , j , r e s u l t ) ;

}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 52: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Barrier synchronization

#pragma omp barrier

Performs a barrier synchronization between all the threads in ateam at the given point.

Example:

#pragma omp p a r a l l e l{

i n t r e s u l t = heavy_computat ion_part1 ( ) ;#pragma omp atomicsum += r e s u l t ;#pragma omp b a r r i e rheavy_computat ion_part2 ( sum) ;

}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 53: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Explicit Locking

Can be used to pass lock variables around (unlike criticalsections!)

Can be used to implement more involved synchronizationconstructs

Functions:

omp_init_lock(), omp_destroy_lock(), omp_set_lock(),omp_unset_lock(), omp_test_lock()The usual semantics

Use #pragma omp �ush to synchronize memory

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 54: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Consistency Violation?

#pragma omp p a r a l l e l f o r \sha r ed ( x ) p r i v a t e ( i )

f o r ( i =0; i <100; i++ ) {#pragma omp atomicx++;

}p r i n t f ( "%i " , x ) ; /∗ 100 ∗/

#pragma omp p a r a l l e l f o r \sha r ed ( x ) p r i v a t e ( i )

f o r ( i =0; i <100; i++ ) {omp_set_lock (my_lock ) ;x++;

omp_unset_lock (my_lock ) ;}p r i n t f ( "%i " , x ) ; /∗ 96 ! ! ∗/

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 55: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Consistency Violation?

#pragma omp p a r a l l e l f o r \sha r ed ( x ) p r i v a t e ( i )

f o r ( i =0; i <100; i++ ) {#pragma omp atomicx++;

}p r i n t f ( "%i " , x ) ; /∗ 100 ∗/

#pragma omp p a r a l l e l f o r \sha r ed ( x ) p r i v a t e ( i )

f o r ( i =0; i <100; i++ ) {omp_set_lock (my_lock ) ;x++;#pragma omp f l u s homp_unset_lock (my_lock ) ;

}p r i n t f ( "%i " , x ) ; /∗ 100 ∗/

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 56: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Reduction

f o r ( j =0; j<N; j++) {sum = sum+a [ j ]∗ b [ j ] ;

}

How to parallelize this code?

sum is not private, but accessing it atomically is too expensiveHave a private copy of sum in each thread, then add them up

Use the reduction clause!#pragma omp parallel for reduction(+: sum)

Any associative operator must be used: +, -, ||, |, *, etc.The private value is initialized automatically (to 0, 1, ~0 . . . )

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 57: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

General ConceptsParallelization ConstructsData EnvironmentSynchronization

Synchronization Overhead

Lost time waiting for locks

Prefer to use structures that are as lock-free as possible!Use parallelization granularity which is as large as possible

#pragma omp p a r a l l e l{#pragma omp c r i t i c a l{

. . .}. . .

}

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 58: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Complete ExampleSummary

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 59: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Complete ExampleSummary

Numerical Integration

Mathematically, we know that:∫ 1

0

4.0

(1+ x2)dx = π

We can approximate the integralas a sum of rectangles:

N

∑i=0

F (xi )∆x ≈ π

Where each rectangle has width∆x and height F (xi ) at themiddle of interval i .

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 60: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Complete ExampleSummary

Serial Code

s t a t i c long num_steps=100000;double s tep , p i ;

vo id main ( ){ i n t i ;

double x , sum = 0 . 0 ;

s t e p = 1 . 0/ ( double ) num_steps ;

f o r ( i =0; i< num_steps ; i++){x = ( i +0.5)∗ s t e p ;sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ;

}p i = s t ep ∗ sum ;p r i n t f ( "Pi = %f \n" , p i ) ;

}

Parallelize thenumericalintegration codeusing OpenMP

What variables canbe shared?

What variables needto be private?

What variablesshould be set up forreductions?

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 61: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Complete ExampleSummary

Parallel Code

s t a t i c long num_steps=100000;double s tep , p i ;

vo id main ( ){ i n t i ;

double x , sum = 0 . 0 ;

s t e p = 1 . 0/ ( double ) num_steps ;

#pragma omp p a r a l l e l f o r \p r i v a t e ( x ) r e d u c t i o n (+:sum)

f o r ( i =0; i< num_steps ; i++){x = ( i +0.5)∗ s t e p ;sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ;

}p i = s t ep ∗ sum ;p r i n t f ( "Pi = %f \n" , p i ) ;

}

Parallelization codeis a one-liner!

sum is a reduction,hence shared,variable

i is private since itis the loop variable

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 62: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Complete ExampleSummary

Outline

1 IntroductionMotivationParadigm Shift

2 OpenMPGeneral ConceptsParallelization ConstructsData EnvironmentSynchronization

3 Final ConsiderationsComplete ExampleSummary

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 63: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Complete ExampleSummary

Summary

OpenMP: A framework for code parallelization

Available for C++ and FORTRANBased on a standardImplementations from a wide selection of vendors

Easy to use

Write (and debug!) code �rst, parallelize laterParallelization can be incrementalParallelization can be turned o� at runtime or compile timeCode is still correct for a serial machine

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 64: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

IntroductionOpenMP

Final Considerations

Complete ExampleSummary

Limitations

OpenMP requires compiler support

Sun, Intel, GCC...Embedded system?

OpenMP does not parallelize dependencies

Often does not detect dependenciesNasty race conditions still exist!

OpenMP is not guaranteed to divide work optimally amongthreads

Programmer-tweakable with scheduling clausesStill lots of rope available

Mirto Musci, PhD Candidate An Introduction to OpenMP

Page 65: An Introduction to OpenMP · 2 OpenMP General Concepts Parallelization Constructs Data Environment Synchronization 3 Final Considerations Complete Example Summary ... Thread libraries

Appendix For Further Reading

For Further Reading I

Clay BreshearsThe art of Concurrency

O'Really, 2009.

Blaise BarneyIntroduction to OpenMP, 2011https://computing.llnl.gov/tutorials/

Mirto Musci, PhD Candidate An Introduction to OpenMP