parallel programming patterns · parallel programming patterns 13 fine grained vs coarse grained...
Post on 26-Jan-2021
12 Views
Preview:
TRANSCRIPT
-
Parallel Programming Parallel Programming PatternsPatterns
Moreno MarzollaDip. di Informatica—Scienza e Ingegneria (DISI)Università di Bologna
http://www.moreno.marzolla.name/
http://www.moreno.marzolla.name/
-
Parallel Programming Patterns 2
-
Parallel Programming Patterns 3
What is a pattern?● A design pattern is “a general solution to a recurring
engineering problem”● A design pattern is not a ready-made solution to a
given problem...● ...rather, it is a description of how a certain kind of
problem can be solved
-
Parallel Programming Patterns 4
Parallel Programming Patterns● Embarrassingly Parallel● Partition● Master-Worker● Stencil● Reduce● Scan
-
Parallel Programming Patterns 5
Example● Building a bridge across a river● You do not “invent” a brand new type of bridge each
time– Instead, you adapt an already existing type of bridge
-
Parallel Programming Patterns 6
Example
-
Parallel Programming Patterns 7
Example
-
Example
-
Parallel Programming Patterns 9
Embarrassingly Parallel● Applies when the computation can be decomposed in
independent tasks that require little or no communication● Examples:
– Vector sum– Mandelbrot set– 3D rendering – Brute force password cracking– ...
+ + +
===
a[]
b[]
c[]
Processor 0 Processor 1 Processor 2
-
Parallel Programming Patterns 10
Partition● The input data space (in short, domain) is split in
disjoint regions called partitions● Each processor operates on one partition● This pattern is particularly useful when the application
exhibits locality of reference– i.e., when processors can refer to their own partition only
and need little or no communication with other processors
-
Parallel Programming Patterns 11
Example
Proc 0
Proc 1
Proc 2
Proc 3
x =
● Matrix-vector product Ax = b
● Matrix A[][] is partitioned into P horizontal blocks
● Each processor– operates on one block
of A[][] and on a full copy of x[]
– computes a portion of the result b[] A[][] x[] b[]
-
Parallel Programming Patterns 12
Regular vs Irregular partitioning● Regular
– the domain is split into partitions of roughly the same size and shape
● Irregular– partitions do not
necessarily have the same size or shape
P0 P2 P3P1
Source: http://www.cdac.in/HTmL/events/beta-test/archives/promcore-2008/mpi-1x-promcore-2008/partial-diff-eqns-solvers-mpi.html
http://www.cdac.in/HTmL/events/beta-test/archives/promcore-2008/mpi-1x-promcore-2008/partial-diff-eqns-solvers-mpi.html
-
Parallel Programming Patterns 13
Fine grained vsCoarse grained partitioning
● Fine-grained Partitioning– Better load balancing, especially if combined
with the master-worker pattern (see later)– If granularity is too fine, the computation /
communication ratio might become too low (communication dominates on computation)
● Coarse-grained Partitioning– In general improves the computation /
communication ratio– However, it might cause load imbalancing
● The "optimal" granularity is sometimes problem-dependent; in other cases the user must choose which granularity to use
Computation
Communication
Tim
eTi
me
-
Parallel Programming Patterns 14
Example: Mandelbrot set● The Mandelbrot set is the
set of points c on the complex plane s.t. the sequence z
n(c) defined as
does not diverge whenn → +∞
zn(c)={ 0 if n=0zn−12 (c) + c otherwise
-
Parallel Programming Patterns 15
Mandelbrot set in color● If the modulus of z
n(c) does
not exceed 2 after nmax iterations, the pixel is black (the point is assumed to be part of the Mandelbrot set)
● Otherwise, the color depends on the number of iterations required for the modulus of z
n(c) to become
> 2
-
Parallel Programming Patterns 16
Pseudocode
maxit = 1000for each point (cx, cy) {
x = 0;y = 0;it = 0;while ( it < maxit AND x*x + y*y ≤ 2*2 ) {
xnew = x*x - y*y + cx;ynew = 2*x*y + cy;x = xnew;y = ynew;it = it + 1;
}plot(cx, cy, it);
}
Embarassingly parallel structure: the color of each
pixel can be computed independently from other pixels
Source: http://en.wikipedia.org/wiki/Mandelbrot_set#For_programmers
http://en.wikipedia.org/wiki/Mandelbrot_set#For_programmers
-
Parallel Programming Patterns 17
Mandelbrot set● A regular partitioning
can result in uneven load distribution– Black pixels require
maxit iterations– Other pixels require
fewer iterations
-
Parallel Programming Patterns 18
Load balancing● Ideally, each processor should perform the same
amount of work– If the tasks synchronize at the end of the computation, the
execution time will be that of the slower task
Task 1
Task 2
Task 3
Task 0
busy
idle
-
Parallel Programming Patterns 19
Load balancing howto● The workload is balanced if each processor performs
more or less the same amount of work● How to achieve load balancing:
– Use fine-grained partitioning● ...but beware of the possible communication overhead if the tasks
need to communicate– Use dynamic task allocation (master-worker paradigm)
● ...but beware that dynamic task allocation might incur in higher overhead with respect to static task allocation
-
Parallel Programming Patterns 20
Master-worker paradigm(process farm, work pool)
● Apply a fine-grained partitioning– number of task >> number of cores
● The master assigns a task to the first available worker
Master
Worker0
Worker1
WorkerP-1
Bag of tasks of possibly different duration
-
Parallel Programming Patterns 21
Choosing the partition size
Too small = higher scheduling overhead Too large = unbalanced workload
-
Parallel Programming Patterns 22
Stencils● Stencil computations involve a grid whose values are
updated according to a fixed pattern called stencil– Example: the Gaussian smoothing of an image updates the
color of each pixel with the weighted average of the previous colors of the 5 ´ 5 neighborhood
41
164
45
1628
287
164
28
1628
41 47
1
4
7
4
1
41
-
Parallel Programming Patterns 23
2D Stencils
5-point 2-axis 2D stencil(von Neumann neighborhood) 9-point 2-axis 2D stencil
9-point 1-plane 2D stencil(Moore neighborhood)
-
Parallel Programming Patterns 24
2D Stencils● 2D stencil computations usually employ two grids to
keep the current and next values– Values are read from the current grid– New values are written to the next grid– current and next grid are exchanged at the end of each
phase
-
Parallel Programming Patterns 25
Ghost Cells● How do we handle cells on
the border of the domain?– We might assume that cells
outside the border have some fixed, application-dependent value, or
– We may assume periodic boundary conditions, where sides are “glued” together to form a torus
● In either case, we extend the domain with ghost cells, so that cells on the border do not require any special treatment
Domain
Ghost cells
https://blender.stackexchange.com/questions/39735/how-could-i-animate-a-plane-into-a-pipe-and-then-a-pipe-into-a-torus
https://blender.stackexchange.com/questions/39735/how-could-i-animate-a-plane-into-a-pipe-and-then-a-pipe-into-a-torushttps://blender.stackexchange.com/questions/39735/how-could-i-animate-a-plane-into-a-pipe-and-then-a-pipe-into-a-torushttps://blender.stackexchange.com/questions/39735/how-could-i-animate-a-plane-into-a-pipe-and-then-a-pipe-into-a-torus
-
Parallel Programming Patterns 26
Parallelizing stencil computations● Computing the next grid from the current one has
embarassingly parallel structure
Initialize current gridwhile (!terminated) {
Fill ghost cellsCompute next gridExchange current and next grids
}
EmbarassinglyParallel
-
Parallel Programming Patterns 27
Reduce● A reduction is the application of an associative binary
operator (e.g., sum, product, min, max...) to the elements of an array [x
0, x
1, … x
n-1]
– sum-reduce( [x0, x
1, … x
n-1] ) = x
0+ x
1+ … + x
n-1
– min-reduce( [x0, x
1, … x
n-1] ) = min { x
0, x
1, … x
n-1}
– …● A reduction can be realized in O(log
2 n) parallel steps
-
Parallel Programming Patterns 28
Example: sum-reduce
12-52416-512-81174-231
-
Parallel Programming Patterns 29
Example: sum-reduce
12-52416-512-81174-231
3-669814-22
-
Parallel Programming Patterns 30
Example: sum-reduce
12-52416-512-81174-231
3-669814-22
118411
-
Parallel Programming Patterns 31
Example: sum-reduce
12-52416-512-81174-231
3-669814-22
118411
1519
-
Parallel Programming Patterns 32
Example: sum-reduce
12-52416-512-81174-231
3-669814-22
118411
1519
34
-
Parallel Programming Patterns 33
Example: sum-reduce
12-52416-512-81174-231
3-669814-22
118411
1519
34
int d, i;/* compute largest power of two < n */for (d=1; 2*d < n; d *= 2) ;/* do reduction */for ( ; d>0; d /= 2 ) { for (i=0; i
-
Parallel Programming Patterns 34
Scan (Prefix Sum)● A scan computes all prefixes of an array [x
0, x
1, … x
n-1]
using a given associative binary operator op (e.g., sum, product, min, max... )
[y0, y
1, … y
n - 1] = inclusive-scan( op, [x
0, x
1, … x
n - 1] )
where
y0 = x
0
y1 = x
0 op x
1
y2
= x0 op x
1 op x
2
…y
n - 1= x
0 op x
1 op … op x
n - 1
-
Parallel Programming Patterns 35
Scan (Prefix Sum)● A scan computes all prefixes of an array [x
0, x
1, … x
n-1]
using a given associative binary operator op (e.g., sum, product, min, max... )
[y0, y
1, … y
n - 1] = exclusive-scan( op, [x
0, x
1, … x
n - 1] )
where
y0 = 0
y1 = x
0
y2
= x0 op x
1
…y
n - 1= x
0 op x
1 op … op x
n - 2
this is the neutral element of the binary operator (zero for
sum, 1 for product, ...)
-
Parallel Programming Patterns 36
Example
1 -3 12 6 2 -3 7 -10x[] =
1 -2 10 16 18 15 22 12inclusive-scan(+, x) =
0 1 -2 10 16 18 15 22exclusive-scan(+, x) =
-
Parallel Programming Patterns 37
Example
1 -3 12 6 2 -3 7 -10x[] =
1 -2 10 16 18 15 22 12inclusive-scan(+, x) =
0 1 -2 10 16 18 15 22exclusive-scan(+, x) =
+
-
Parallel Programming Patterns 38
1 -2 10 16 18 15 22 12
Example
1 -3 12 6 2 -3 7 -10x[] =
inclusive-scan(+, x) =
0 1 -2 10 16 18 15 22exclusive-scan(+, x) =
+
-
Parallel Programming Patterns 39
Serial implementation
void inclusive_scan(int *x, int *s, int n) // n must be > 0{
int i;s[0] = x[0];for (i=1; i 0{
int i;s[0] = 0;for (i=1; i
-
Parallel Programming Patterns 40
Exclusive scan: Up-sweep
x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7]
x[0] ∑x[0..1] x[2] ∑x[2..3] x[4] ∑x[4..5] x[6] ∑x[6..7]
x[0] ∑x[0..1] x[2] ∑x[0..3] x[4] ∑x[4..5] x[6] ∑x[4..7]
x[0] ∑x[0..1] x[2] ∑x[0..3] x[4] ∑x[4..5] x[6] ∑x[0..7]
for ( d=1; d
-
Parallel Programming Patterns 41
Exclusive scan: Down-sweepx[0] ∑x[0..1] x[2] ∑x[0..3] x[4] ∑x[4..5] x[6] ∑x[0..7]
x[0] ∑x[0..1] x[2] ∑x[0..3] x[4] ∑x[4..5] x[6] 0
x[0] ∑x[0..1] x[2] 0 x[4] ∑x[4..5] x[6] ∑x[0..3]
zero
x[0] 0 x[2] ∑x[0..1] x[4] ∑x[0..3] x[6] ∑x[0..5]
0 x[0] ∑x[0..1] ∑x[0..2] ∑x[0..3] ∑x[0..4] ∑x[0..5] ∑x[0..6]
x[n-1] = 0;for ( ; d > 0; d >>= 1 ) {
for (k=0; k
-
Parallel Programming Patterns 42
Example: Line of Sight● n peaks of heights h[0], … h[n - 1]; the distance
between consecutive peaks is one● Which peaks are visible from peak 0?
visiblenot
visible
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 43
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 44
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 45
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 46
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 47
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 48
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 49
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 50
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 51
Line of sight
h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7]
-
Parallel Programming Patterns 52
Serial algorithm● For each i = 0, … n – 1
– Let a[i] be the slope of the line connecting the peak 0 to the peak i
– a[0] ← -∞– a[i] ← arctan( ( h[i] – h[0] ) / i ), se i > 0
● For each i = 0, … n – 1– amax[0] ← -∞– amax[i] ← max {a[0], a[1], … a[i – 1]}, se i > 0
● For each i = 0, … n – 1– If a[i] ≥ amax[i] then the peak i is visible– otherwise the peak i is not visible
-
Parallel Programming Patterns 53
Serial algorithm
bool[0..n-1] Line-of-sight( double h[0..n-1] )bool v[0..n-1]double a[0..n-1], amax[0..n-1]a[0] ← -∞for i ← 1 to n-1 do
a[i] ← arctan( ( h[i] – h[0] ) / i )endforamax[0] ← -∞for i ← 1 to n-1 do
amax[i] ← max{ a[i-1], amax[i-1] }endforfor i ← 0 to n-1 do
v[i] ← ( a[i] ≥ amax[i] )endforreturn v
-
Parallel Programming Patterns 54
Serial algorithm
bool[0..n-1] Line-of-sight( double h[0..n-1] )bool v[0..n-1]double a[0..n-1], amax[0..n-1]a[0] ← -∞for i ← 1 to n-1 do
a[i] ← arctan( ( h[i] – h[0] ) / i )endforamax[0] ← -∞for i ← 1 to n-1 do
amax[i] ← max{ a[i-1], amax[i-1] }endforfor i ← 0 to n-1 do
v[i] ← ( a[i] ≥ amax[i] )endforreturn v
Embarassinglyparallel
Embarassinglyparallel
-
Parallel Programming Patterns 55
Parallel algorithm
bool[0..n-1] Parallel-line-of-sight( double h[0..n-1] )bool v[0..n-1]double a[0..n-1], amax[0..n-1]a[0] ← -∞for i ← 1 to n-1 do in parallel
a[i] ← arctan( ( h[i] – h[0] ) / i )endforamax ← exclusive-scan( “max”, a )
for i ← 0 to n-1 do in parallelv[i] ← ( a[i] ≥ amax[i] )
endforreturn v
-
Parallel Programming Patterns 56
Conclusions● A parallel programming patterns defines:
– a partitioning of the input data– a communication structure among parallel tasks
● Parallel programming patterns can help to define efficient algorithms– Many problems can be solved by applying one or more
known patterns
top related