parallel processing (cs 667) lecture 5: shared memory parallel programming with openmp *

26
Parallel Processing 1 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP * Jeremy R. Johnson

Upload: coen

Post on 20-Jan-2016

49 views

Category:

Documents


2 download

DESCRIPTION

Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP *. Jeremy R. Johnson. Introduction. Objective: To further study the shared memory model of parallel programming. Introduction to the OpenMP standard for shared memory parallel programming Topics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 1

Parallel Processing (CS 667)

Lecture 5: Shared Memory Parallel Programming with OpenMP*

Jeremy R. Johnson

Page 2: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 2

Introduction

• Objective: To further study the shared memory model of parallel programming. Introduction to the OpenMP standard for shared memory parallel programming

• Topics– OpenMP vs. Pthreads

• hello_pthreadsc

• hello_openmp.c

– Parallel Regions and execution model– Data parallelism with loops– Shared vs. private variables– Scheduling and chunk size– Synchronization and reduction variables– Functional parallelism with parallel sections– Case Studies

Page 3: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 3

OpenMP

• Extension to FORTRAN, C/C++– Uses directives (comments in FORTRAN, pragma in C/C++)

• ignored without compiler support• Some library support required

• Shared memory model– parallel regions– loop level parallelism– implicit thread model– communication via shared address space– private vs. shared variables (declaration)– explicit synchronization via directives (e.g. critical)– library routines for returning thread information (e.g.

omp_get_num_threads(), omp_get_thread_num() )– Environment variables used to provide system info (e.g.

OMP_NUM_THREADS)

Page 4: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 4

Benefits

• Provides incremental parallelism

• Small increase in code size

• Simpler model than message passing

• Easier to use than thread library

• With hardware and compiler support smaller granularity than message passing.

Page 5: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 5

Further Information

• Adopted as a standard in 1997– Initiated by SGI

• www.openmp.org• computing.llnl.gov/tutorials/openMP

• Chandra, Dagum, Kohr, Maydan, McDonald, Menon, “Parallel Programming in OpenMP”, Morgan Kaufman Publishers, 2001.

• Chapman, Jost, and Van der Pas, “Using OpenMP: Portable Shared Memory Parallel Programming,” The MIT Press, 2008.

Page 6: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 6

Shared vs. Distributed Memory

Memory

P0 P1 Pn...

Interconnection Network

P0 P1 Pn

...M0 M1 Mn

Shared memory Distributed memory

Page 7: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 7

Shared Memory Programming Model

• Shared memory programming does not require physically shared memory so long as there is support for logically shared memory (in either hardware or software)

• If logical shared memory then there may be different costs for accessing memory depending on the physical location.

• UMA - uniform memory access– SMP - symmetric multi-processor– typically memory connected to processors via a bus

• NUMA - non-uniform memory access– typically physically distributed memory connected via an

interconnection network

Page 8: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 8

Hello_openmp.c#include <stdio.h>

#include <stdlib.h>

#include <omp.h>

int main(int argc, char **argv)

{

int n;

if (argc > 1) {

n = atoi(argv[1]); omp_set_num_threads(n);

}

printf("Number of threads = %d\n",omp_get_num_threads());

#pragma omp parallel

{

int id = omp_get_thread_num();

printf("Hello World from %d\n",id);

if (id == 0)

printf("Number of threads = %d\n",omp_get_num_threads());

}

exit(0);

}

Page 9: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 9

Compiling & Running Hello_openmp

% gcc –fopenmp hello_openmp.c –o hello

% ./hello 4

Number of threads = 1

Hello World from 1

Hello World from 0

Hello World from 3

Number of threads = 4

Hello World from 2

The order of the print statements is nondeterministic

Page 10: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 10

Execution Model

Master thread

Master and slave threads

Master thread

Implicit barrier synchronization(join)

Implicit thread creation (fork)

Parallel Region

Page 11: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 11

Explicit Barrier#include <stdio.h>

#include <stdlib.h>

int main(int argc, char **argv)

{

int n;

if (argc > 1) {

n = atoi(argv[1]);

omp_set_num_threads(n);

}

printf("Number of threads = %d\n",omp_get_num_threads());

#pragma omp parallel

{

int id = omp_get_thread_num();

printf("Hello World from %d\n",id);

#pragma omp barrier

if (id == 0) printf("Number of threads = %d\n",omp_get_num_threads());

}

exit(0);

}

Page 12: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 12

Output with Barrier

%./hellob 4

Number of threads = 1

Hello World from 1

Hello World from 0

Hello World from 2

Hello World from 3

Number of threads = 4

The order of the “Hello World” print statements are nondeterministic; however, the Number of threads print statement always comes at the end

Page 13: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 13

Hello_pthreads.c#include <stdio.h>

#include <stdlib.h>

#include <pthread.h>

#include <errno.h>

#define MAXTHREADS 32

int main(int argc, char **argv)

{

int error,i,n;

void hello(int *pid);

pthread_t tid[MAXTHREADS],mytid;

int pid[MAXTHREADS];

if (argc > 1) {

n = atoi(argv[1]);

if (n > MAXTHREADS) {

printf("Too many threads\n"); exit(1);

}

pthread_setconcurrency(n);

}

printf("Number of threads = %d\n",pthread_getconcurrency());

for (i=0;i<n;i++) {

pid[i]=i;

error = pthread_create(&tid[i], NULL,(void *(*)(void *))hello, &pid[i]);

}

for (i=0;i<n;i++) {

error = pthread_join(tid[i],NULL);

}

exit(0);

}

Page 14: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 14

Hello_pthreads.c

void hello(int *pid)

{

pthread_t tid;

tid = pthread_self();

printf("Hello World from %d (tid = %u)\n",*pid,(unsigned int) tid);

if (*pid == 0)

printf("Number of threads = %d\n",pthread_getconcurrency());

}

% gcc -pthread hello.c -o hello

% ./hello 4

Number of threads = 4

Hello World from 0 (tid = 1832728912)

Hello World from 1 (tid = 1824336208)

Number of threads = 4

Hello World from 3 (tid = 1807550800)

Hello World from 2 (tid = 1815943504)

The order of the print statements is nondeterministic

Page 15: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Types of Parallelism

Data Parallelism

Threads execute same instructions

… but on different data

Functional Parallelism

Threads execute different instructions

… and can read same data but should write different

data

F1

F2

F3

F4

Page 16: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 16

Parallel Loop

int a[1000], b[1000];

int main()

{

int i;

int N = 1000;

for (i=0; i<N; i++)

a[i] = i; b[i] = N-i;

for (i=0;i<N;i++) {

a[i] = a[i] + b[i];

}

int a[1000], b[1000];

int main()

{

int i;

int N = 1000;

// Serial Initialization

for (i=0; i<N; i++)

a[i] = i; b[i] = N-i;

#pragma omp for shared(a,b), private(i), schedule(static)

for (i=0;i<N;i++) {

a[i] = a[i] + b[i];

}

Page 17: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 17

Scheduling of Parallel Loop

+

a

b

0 1tid

Stripmining

2 Nthreads-1

Page 18: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 18

Implementation of Parallel Loop

void vadd(int *id){int i;for (i=*id;i<N;i+=numthreads) { a[i] = a[i] + b[i]; }}

for (i=0;i<numthreads;i++) { id[i] = i; error = pthread_create(&tid[i],NULL,(void *(*)(void *))vadd, &id[i]); }for (i=0;i<numthreads;i++) { error = pthread_join(tid[i],NULL); }

Page 19: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 19

Scheduling Chunks of Parallel Loop

a

b

0 1tid

chunk0

chunk0

Chunk 1

2

Chunk 2

Chunk Nthreads-1

Page 20: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 20

Implementation of Chunking

#pragma omp for shared(a,b), private(i), schedule(static,CHUNK)for (i=0;i<N;i++) { a[i] = a[i] + b[i];}

void vadd(int *id){int i,j;

for (i=*id*CHUNK;i<N;i+=numthreads*CHUNK) { for (j=0;j<CHUNK;j++) a[i+j] = a[i+j] + b[i+j]; }}

Page 21: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 21

Race Condition

int x[10000000];int main(int argc, char **argv) {int sum=0;…….omp_set_num_threads(numcounters);

for (i=0;i<numcounters*limit;i++) x[i] = 1;

#pragma omp parallel for schedule(static) private(i) shared(sum,x)for (i=0;i<numcounters*limit;i++) { sum = sum + x[i]; if (i==0) printf("num threads = %d\n",omp_get_num_threads()); }

Page 22: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 22

Critical Sections

int x[10000000];int main(int argc, char **argv) {int sum=0;…….#pragma omp parallel for schedule(static) private(i) shared(sum,x)for (i=0;i<numcounters*limit;i++) {#pragma omp critical(sum) sum = sum + x[i]; }

Page 23: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 23

Reduction Variables

int x[10000000];int main(int argc, char **argv) {int sum=0;…….#pragma omp parallel for schedule(static) private(i) shared(x)

reduction(+:sum)for (i=0;i<numcounters*limit;i++) { sum = sum + x[i]; }

Page 24: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 24

Reduction

X[]

+

partialsum

+

partialsum

+

partialsum

+

partialsum

+

partialsum

+

total sum

Page 25: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Parallel Processing 25

Implementing Reduction

#pragma omp parallel shared(sum,x) {int i;int localsum=0;int id;id = omp_get_thread_num();for (i=id;i<numcounters*limit;i+=numcounters) { localsum = localsum + x[i]; }#pragma omp critical(sum) sum = sum+localsum;}

Page 26: Parallel Processing  (CS 667) Lecture 5:  Shared Memory Parallel Programming with OpenMP *

Functional Parallelism Example

int main()

{

int i;

double a[N], b[N], c[N], d[N];

// Parallel Function

#pragma omp parallel shared(a,b,c,d) privite(i)

{

#pragma omp sections

{

#pragma omp section

for (i=0; i<N; i++)

c[i] = a[i] + b[i];

#pragma omp section

for (i=0; i<N; i++)

d[i] = a[i] * b[i];

}

}