Transcript
Page 1: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Shared Memory Parallelism - OpenMP

Sathish Vadhiyar

Credits/Sources:

OpenMP C/C++ standard (openmp.org)

OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openMP/#Introduction)

OpenMP sc99 tutorial presentation (openmp.org)

Dr. Eric Strohmaier (University of Tennessee, CS594 class, Feb 9, 2000)

Page 2: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Introduction An API for multi-threaded shared memory

parallelism A specification for a set of compiler

directives, library routines, and environment variables – standardizing pragmas

Standardized by a group of hardware and software vendors

Both fine-grain and coarse-grain parallelism (orphaned directives)

Much easier to program than MPI

Page 3: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

History Many different vendors provided their

own ways of compiler directives for shared memory programming using threads

OpenMP standard started in 1997 October 1997: Fortran version 1.0 Late 1998: C/C++ version 1.0 June 2000: Fortran version 2.0 April 2002: C/C++ version 2.0

Page 4: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Introduction Mainly supports loop-level parallelism Follows fork-join model The number of threads can be varied from one region to

another Based on compiler directives User’s responsibility to ensure program correctness,

avoiding deadlocks etc.

Page 5: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Execution Model Begins as a single thread

called master thread Fork: When parallel

construct is encountered, team of threads are created

Statements in the parallel region are executed in parallel

Join: At the end of the parallel region, the team threads synchronize and terminate

Page 6: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Definitions Construct – statement containing directive

and structured block Directive - #pragma <omp id> <other

text> Based on C #pragma directives

#pragma omp directive-name [clause [, clause] …] new-line

Example:#pragma omp parallel default(shared) private(beta,pi)

Page 7: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Types of constructs, Calls, Variables

Work-sharing constructs Synchronization constructs Data environment constructs Library calls, environment variables

Page 8: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

parallel construct#pragma omp parallel [clause [, clause] …] new-line

structured-block

Clause:

Page 9: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Parallel construct Parallel region executed by multiple threads Implied barrier at the end of parallel section If num_threads, omp_set_num_threads(),

OMP_SET_NUM_THREADS not used, then number of created threads is implementation dependent

Number of threads can be dynamically adjusted using omp_set_dynamic() or OMP_DYNAMIC

Number of physical processors hosting the thread also implementation dependent

Threads numbered from 0 to N-1 Nested parallelism by embedding one parallel

construct inside another

Page 10: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Parallel construct - Example#include <omp.h>

main () {int nthreads, tid; #pragma omp parallel private(nthreads, tid) {

printf("Hello World \n); }

}

Page 11: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Work sharing construct For distributing the execution among the

threads that encounter it 3 types – for, sections, single

Page 12: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

for construct For distributing the iterations among the

threads#pragma omp for [clause [, clause] …] new-line for-loopClause:

Page 13: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

for construct Restriction in the structure of the for

loop so that the compiler can determine the number of iterations – e.g. no branching out of loop

The assignment of iterations to threads depend on the schedule clause

Implicit barrier at the end of for if not nowait

Page 14: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

schedule clause1. schedule(static, chunk_size) –

iterations/chunk_size chunks distributed in round-robin

2. schedule(dynamic, chunk_size) – chunk_size chunk given to the next ready thread

3. schedule(guided, chunk_size) – actual chunk_size is unassigned_iterations/(threads*chunk_size) to the next ready thread. Thus exponential decrease in chunk sizes

4. schedule(runtime) – decision at runtime. Implementation dependent

Page 15: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

for - Exampleinclude <omp.h>#define CHUNKSIZE 100#define N 1000

main () {int i, chunk; float a[N], b[N], c[N];

/* Some initializations */for (i=0; i < N; i++) a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;#pragma omp parallel shared(a,b,c,chunk) private(i) { #pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i]; } /* end of parallel section */

}

Page 16: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

sections construct For distributing

non-iterative sections among threads

Clause:

Page 17: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

sections - Example

Page 18: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

single directive Only a single thread can execute the block

Page 19: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Single - Example

Page 20: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Synchronization directives

Page 21: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

critical - Example

#include <omp.h>main() {int x;x = 0;#pragma omp parallel shared(x) {#pragma omp critical x = x + 1; }}

Page 22: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

atomic - Example

Page 23: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

flush directive

Point where consistent view of memory is provided among the threads

Thread-visible variables (global variables, shared variables etc.) are written to memory

If var-list is used, only variables in the list are flushed

Page 24: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

flush - Example

Page 25: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

flush – Example (Contd…)

Page 26: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

ordered - Example

Page 27: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Data Environment•Global variable-list declared are made private to a thread

•Each thread gets its own copy

•Persist between different parallel regions#include <omp.h>

int alpha[10], beta[10], i;

#pragma omp threadprivate(alpha)

main () {

/* Explicitly turn off dynamic threads */

omp_set_dynamic(0);

/* First parallel region */

#pragma omp parallel private(i,beta)

for (i=0; i < 10; i++) alpha[i] = beta[i] = i;

/* Second parallel region */

#pragma omp parallel

printf("alpha[3]= %d and beta[3]= %d\n",alpha[3],beta[3]);

}

Page 28: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Data Scope Attribute ClausesMost variables are shared by default

Data scopes explicitly specified by data scope attribute clauses

Clauses:

1. private

2. firstprivate

3. lastprivate

4. shared

5. default

6. reduction

7. copyin

8. copyprivate

Page 29: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

private, firstprivate & lastprivate private (variable-list) variable-list private to each thread A new object with automatic storage duration

allocated for the construct

firstprivate (variable-list) The new object is initialized with the value of the old

object that existed prior to the construct

lastprivate (variable-list) The value of the private object corresponding to the

last iteration or the last section is assigned to the original object

Page 30: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

private - Example

Page 31: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

lastprivate - Example

Page 32: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

shared, default, reduction shared(variable-list)

default(shared | none) Specifies the sharing behavior of all of the variables

visible in the construct

Reduction(op: variable-list) Private copies of the variables are made for each thread The final object value at the end of the reduction will be

combination of all the private object values

Page 33: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

default - Example

Page 34: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

reduction - Example#include <omp.h>main () {int i, n, chunk; float a[100], b[100], result;

/* Some initializations */ n = 100; chunk = 10; result = 0.0; for (i=0; i < n; i++) { a[i] = i * 1.0; b[i] = i * 2.0; }

#pragma omp parallel for \ default(shared) private(i) \ schedule(static,chunk) \ reduction(+:result)

for (i=0; i < n; i++) result = result + (a[i] * b[i]);

printf("Final result= %f\n",result);}

Page 35: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

copyin, copyprivate copyin(variable-list) Applicable to threadprivate variables Value of the variable in the master thread is

copied to the individual threadprivate copies

copyprivate(variable-list) Appears on a single directive Variables in variable-list are broadcast to

other threads in the team from the thread that executed the single construct

Page 36: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

copyprivate - Example

Page 37: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Library Routines (API)

Querying function (number of threads etc.)

General purpose locking routines Setting execution environment

(dynamic threads, nested parallelism etc.)

Page 38: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

API OMP_SET_NUM_THREADS(num_threads) OMP_GET_NUM_THREADS() OMP_GET_MAX_THREADS() OMP_GET_THREAD_NUM() OMP_GET_NUM_PROCS() OMP_IN_PARALLEL() OMP_SET_DYNAMIC(dynamic_threads) OMP_GET_DYNAMIC() OMP_SET_NESTED(nested) OMP_GET_NESTED()

Page 39: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

API(Contd..) omp_init_lock(omp_lock_t *lock) omp_init_nest_lock(omp_nest_lock_t *lock) omp_destroy_lock(omp_lock_t *lock) omp_destroy_nest_lock(omp_nest_lock_t *lock) omp_set_lock(omp_lock_t *lock) omp_set_nest_lock(omp_nest_lock_t *lock) omp_unset_lock(omp_lock_t *lock) omp_unset_nest__lock(omp_nest_lock_t *lock) omp_test_lock(omp_lock_t *lock) omp_test_nest_lock(omp_nest_lock_t *lock)

omp_get_wtime() omp_get_wtick()

Page 40: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Lock details Simple locks and nestable locks Simple locks are not locked if they are

already in a locked state Nestable locks can be locked multiple times

by the same thread Simple locks are available if they are

unlocked Nestable locks are available if they are

unlocked or owned by a calling thread

Page 41: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Example – Lock functions

Page 42: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Example – Nested lock

Page 43: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Example – Nested lock (Contd..)

Page 44: Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Hybrid Programming – Combining MPI and OpenMP benefits MPI - explicit parallelism, no synchronization problems - suitable for coarse grain OpenMP - easy to program, dynamic scheduling allowed - only for shared memory, data synchronization

problems MPI/OpenMP Hybrid - Can combine MPI data placement with OpenMP fine-

grain parallelism - Suitable for cluster of SMPs (Clumps) - Can implement hierarchical model


Top Related