openmp - ipsltodi.est.ips.pt/aabreu/openmp.pdf · #pragma omp sections [clause ...] newline...

37
OpenMP Ant´ onio Abreu Instituto Polit´ ecnico de Set´ ubal 1 de Mar¸ co de 2013 Ant´onio Abreu (Instituto Polit´ ecnico de Set´ ubal) OpenMP 1 de Mar¸co de 2013 1 / 37

Upload: others

Post on 16-Mar-2020

84 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

OpenMP

Antonio Abreu

Instituto Politecnico de Setubal

1 de Marco de 2013

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 1 / 37

Page 2: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

openMP – what?

It’s an Application Program Interface (API) that allows parallel programsto be explicitly and simply developed, in C/C++, for multi-platform,shared memory, multiprocessor computers (including Solaris, AIX, HP-UX,GNU/Linux, Mac OS X, and Windows platforms), supported by the majorcomputer hardware and software vendors (including AMD, IBM, Intel,Cray, HP, Fujitsu, Nvidia, NEC, Microsoft, Texas Instruments, OracleCorporation, and others.).

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 2 / 37

Page 3: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

cores and memory

Multicore computers have a memory system where some memories areshared while others are not. The next figure makes this distinction clear.TLB stands for Translation Lookaside Buffer, which is an address cache.When making parallel programs one must know which memory is sharedand which memory is not.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 3 / 37

Page 4: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Fork – join

OpenMP is based on multithreading, i.e., a form of parallelization wherebya master thread forks a specified number of slave threads, with theruntime environment allocating threads to different processors.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 4 / 37

Page 5: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

How many cores does my machine have?

In linux, the file /proc/cpuinfo contains a lot of information about thehardware of the machine. Typing less /proc/cpuinfo allows one to seeit all.

To see info about memory, see the contents of the file /proc/meminfo.The first number one wants to see is the one corresponding to MemTotal.

In order to use openMP, one has to have a propoer compiler. In linux,GCC 4.2 or higher supports openMP. To see the version of your (linux)compiler, type the command gcc -v.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 5 / 37

Page 6: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

parallel directive

#pragma omp parallel [clause ...] newline

{

structured_block

}

where clause can be

if (scalar_expression)

private (list)

shared (list)

default (shared | none)

firstprivate (list)

reduction (operator: list)

copyin (list)

num_threads (integer-expression)Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 6 / 37

Page 7: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Hello world

#include <stdio.h>

#include <omp.h>

int main(void)

{

#pragma omp parallel

{

int ID = omp_get_thread_num();

printf("Hello (%d)\n",ID);

printf("world (%d)\n",ID);

printf("! (%d)\n",ID);

}

return 0;

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 7 / 37

Page 8: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Compile with gcc -fopenmp hello.c -o hello

Hello (0)

world (0)

! (0)

Hello (1)

world (1)

! (1)

Hello (2)

world (2)

! (2)

Hello (3)

world (3)

! (3)

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 8 / 37

Page 9: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

The code between the curly brackets (after the pragma directive) is set toexecute in a predetermined number of threads.

After the first curly bracket there is a fork, i.e., the master thread createsa team of parallel threads, and after the second curly bracket there is ajoin, i.e., the master thread continues execution after all the slave threadsend. The second curly bracket constitutes a barrier, of which only themaster thread passes.

The number of threads is typically set to the number of cores in themicroprocessor; it can be set by the command lineexport OMP_NUM_THREADS=4.

omp_get_thread_num() is a function that returns the Id of the respectivethread. The master thread has Id 0 and makes part of the thread team.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 9 / 37

Page 10: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

We observe an ordered output, but sometimes this may not happen; infact there is a race condition because the four threads share the standardoutput.

Note that openMP is not necessarily implemented identically by allvendors. Also, it does not provide check for data dependencies, dataconflicts, race conditions, or deadlocks. In particular, it does not guaranteethat input or output to the same file is synchronous when executed inparallel. It is up to the programmer to synchronize input and output.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 10 / 37

Page 11: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Synchronization Constructs – barriers

#include <omp.h>

#include <stdio.h>

#include <stdlib.h>

int main (int argc, char *argv[]) {

int th_id, nthreads;

#pragma omp parallel private(th_id)

{

th_id = omp_get_thread_num();

printf("Hello World from thread %d\n", th_id);

#pragma omp barrier

if ( th_id == 0 ) {

nthreads = omp_get_num_threads();

printf("There are %d threads\n",nthreads);

}

}

return EXIT_SUCCESS;

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 11 / 37

Page 12: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Hello World from thread 1

Hello World from thread 3

Hello World from thread 0

Hello World from thread 2

There are 4 threads

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 12 / 37

Page 13: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Barriers are a synchronization primitive. This means that all threads in theteam wait for the last one to reach the barrier. At that moment, allthreads in the team resume execution in parallel. If there is a thread thatdoes not reach the barrier, all threads in the team wait, and the processhangs without any work being produced.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 13 / 37

Page 14: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Quiz

If we comment the barrier pragma in the code above the output will be,

Hello World from thread 0

There are 4 threads

Hello World from thread 3

Hello World from thread 1

Hello World from thread 2

Explain why.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 14 / 37

Page 15: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Quiz

If we add the code

printf("Bye from thread %d\n", th_id);

after the if, what would be the output?

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 15 / 37

Page 16: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Workshare directives – for

#pragma omp for [clause ...] newline

for_loop

where clause can be,

schedule (type [,chunk])

ordered

private (list)

firstprivate (list)

lastprivate (list)

shared (list)

reduction (operator: list)

collapse (n)

nowait

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 16 / 37

Page 17: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

parallel for example

#include <omp.h>

#define CHUNKSIZE 100

#define N 1000

main ()

{

int i, chunk = CHUNKSIZE;

float a[N], b[N], c[N];

/* Some initializations */

for (i=0; i < N; i++)

a[i] = b[i] = i * 1.0;

#pragma omp parallel shared(a,b,c,chunk) private(i)

{

#pragma omp for schedule(dynamic,chunk) nowait

for (i=0; i < N; i++)

c[i] = a[i] + b[i];

} /* end of parallel section */

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 17 / 37

Page 18: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

The for pragma asks the compiler to create threads from the N iterationsof the for loop.

The clause schedule informs the OS (operating system) about how toschedule those threads. In this case, the scheduling policy is dynamic,which means that threads are dynamically assigned on afirst-come-first-serve basis.

In this case each thread will execute chunk (i.e., 100) iterations of thetotal of 1000 in the loop.

The nowait clause makes the implied barrier at the end of the fordirective to be ignored. Put differently, if there was not such a clause, allteam threads stop at the end of the for primitive, and only thread 0would continue past this point.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 18 / 37

Page 19: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Quiz

In the following program, which for cycle is executed in parallel: the first,or both? Before answering, note that the clauses parallel and for arecombined in a single one. This is valid.

#include <stdio.h>

int main(int argc, char *argv[])

{

const int N = 100;

int i, a[N];

#pragma omp parallel for

for (i = 0; i < N; i++)

a[i] = 2 * i;

for (i = 0; i < N; i++)

printf("%d ",a[i]);

return 0;

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 19 / 37

Page 20: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Workshare directives – sections

#pragma omp sections [clause ...] newline

{

#pragma omp section newline

structured_block

#pragma omp section newline

structured_block

}

where clause can be,

private (list)

firstprivate (list)

lastprivate (list)

reduction (operator: list)

nowait

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 20 / 37

Page 21: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

section directive example

#include <stdio.h>

#include <omp.h>

int main(void)

{

#pragma omp parallel sections

{

#pragma omp section

{

printf("hello from thread %d\n",omp_get_thread_num());

}

#pragma omp section

{

printf("hello from thread %d\n",omp_get_thread_num());

}

#pragma omp section

{

printf("hello from thread %d\n",omp_get_thread_num());

}

}

printf("Bye from thread %d\n",omp_get_thread_num());

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 21 / 37

Page 22: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

A few executions

First execution

hello from thread 0

hello from thread 0

hello from thread 0

Bye from thread 0

Second execution

hello from thread 0

hello from thread 1

hello from thread 3

Bye from thread 0

Third execution

hello from thread 2

hello from thread 1

hello from thread 0

Bye from thread 0Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 22 / 37

Page 23: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Another example

#include <stdio.h>

#include <omp.h>

int main(void)

{

int i=0;

#pragma omp parallel sections if (i==1)

{

#pragma omp section

{

printf("hello from thread %d\n",omp_get_thread_num());

}

#pragma omp section

{

printf("hello from thread %d\n",omp_get_thread_num());

}

#pragma omp section

{

printf("hello from thread %d\n",omp_get_thread_num());

}

}

printf("Bye from thread %d\n",omp_get_thread_num());

}Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 23 / 37

Page 24: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Unique result

hello from thread 0

hello from thread 0

hello from thread 0

Bye from thread 0

Since the condition is false, the team of threads is not created; but themaster thread stands. Note that the assigned work (three blocks of code)is executed serially; so the if clause permits to parallelize work or not(i.e., to seriallize it), and the decision is made at runtime.Also, there is an implicit barrier at the end of each section. This explainswhy Bye from ... (in the last two examples) is always the last messageto print.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 24 / 37

Page 25: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Clause reduction

reduction (operator: list)

At the creation of a team of threads the variables in list are created asprivate. At the end of the threads in the team, operator is applied to thevariables in list, a process known as reduction, and the final result iswritten back to the variables in list, now seen as global shared variables.Variables in list must be scalar; not arrays or structures.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 25 / 37

Page 26: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

#include <stdio.h>

#include <omp.h>

int main(void)

{

int t=0;

omp_set_num_threads(4);

#pragma omp parallel reduction(+:t)

{

t = omp_get_thread_num() + 1;

printf("local %d\n", t);

}

printf("reduction %d\n", t);

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 26 / 37

Page 27: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Result

local 1

local 2

local 3

local 4

reduction 10

The function of omp_set_num_threads() is self explanatory. Asexpected, it cannot be called from a parallelized block of code.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 27 / 37

Page 28: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Synchronization Constructs – atomic

Used to identify a memory location that should not be modifiedsimultaneously by more than one thread in the team. In other words, itprovides an atomic access to the memory location.

#pragma omp atomic

<statement_block>

The directive applies only to a single statement.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 28 / 37

Page 29: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Synchronization Constructs – single

Used when there is a block of code that must be executed by a singlethread in the team. Note that by no means this implies that the code ismade atomic. It may happen that other threads (outside this team) accessthe same memory location, thus creating a race condition.

#pragma omp single [clause[[,] clause] ...]

statement_block

Threads in the team that do not execute this directive, wait at the end ofthe code block, unless a nowait clause is specified.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 29 / 37

Page 30: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Synchronization Constructs – master

Used to identify a block of code that must executed only by the masterthread.

#pragma omp master

statement_block

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 30 / 37

Page 31: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Synchronization Constructs – critical

Specifies a block of code that must be executed by only one thread at atime. In other words, if the code in a critical region is executing, no otherthread with that code will run in parallel.

#pragma omp critical [(name)]

statement_block

Different critical regions with the same name are treated as the sameregion. All unnamed critical regions are treated as the same region.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 31 / 37

Page 32: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Example

#include <omp.h>

main()

{

int x;

x = 0;

#pragma omp parallel shared(x)

{

#pragma omp critical

x = x + 1;

} /* end of parallel section */

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 32 / 37

Page 33: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Synchronization Constructs – flush

This directive identifies a point at which a consistent view of memory mustexist, i.e., thread-visible variables are written back to memory is responseto this directive.

#pragma omp flush [ (list) ]

Remember the first figure of these course notes. This directive forces thedata in the data cache of each core to be written to the shared unifiedcache memory (and not necessarily to the main memory; that decision ismade by the virtual memory system).

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 33 / 37

Page 34: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

openMP functions about threads

#include <stdio.h>

#include <omp.h>

int main(void)

{

printf("omp_get_max_threads=%d\n",omp_get_max_threads());

omp_set_num_threads(2);

printf("omp_get_num_procs=%d\n",omp_get_num_procs());

#pragma omp parallel

printf("omp_get_thread_num=%d\n",omp_get_thread_num());

printf("omp_get_thread_num=%d\n",omp_get_thread_num());

}

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 34 / 37

Page 35: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

omp_get_max_threads=4

omp_get_num_procs=2

omp_get_thread_num=0

omp_get_thread_num=1

omp_get_thread_num=0

omp_get_num_procs() returns the number of processors in the machine.

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 35 / 37

Page 36: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Synchronization – locks

omp_lock_t lck;

omp_init_lock(&lck);

#pragma omp parallel private (tmp,id)

{

id = omp_get_thread_num();

tmp = do_lots_of_work(id); // critical region wrt tmp

omp_set_lock(&lck);

printf(%d %d",id,tmp); // atomic access to id and tmp

omp_unset_lock(&lck);

tmp = do_more_lots_of_work(id); // critical region wrt tmp

}

omp_destroy_lock(&lck);

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 36 / 37

Page 37: OpenMP - IPSltodi.est.ips.pt/aabreu/openMP.pdf · #pragma omp sections [clause ...] newline {#pragma omp section newline structured_block #pragma omp section newline structured_block}

Bibliography

wikipedia

http://openmp.org

https://computing.llnl.gov/tutorials/openMP/

http://msdn.microsoft.com/

http://publib.boulder.ibm.com

Antonio Abreu (Instituto Politecnico de Setubal) OpenMP 1 de Marco de 2013 37 / 37