parallelizing iterative computation for multiprocessor architectures peter cappello
TRANSCRIPT
![Page 1: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/1.jpg)
Parallelizing Iterative Computation for Multiprocessor Architectures
Peter Cappello
![Page 2: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/2.jpg)
2
What is the problem?
Create programs for multi-processor unit (MPU)
– Multicore processors
– Graphics processing units (GPU)
![Page 3: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/3.jpg)
3
For whom is it a problem? Compiler designer
ApplicationProgram Compiler Executable
CPU
EASY
![Page 4: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/4.jpg)
4
For whom is it a problem? Compiler designer
ApplicationProgram Compiler Executable
MPU
HARD
![Page 5: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/5.jpg)
5
For whom is it a problem? Application programmer
ApplicationProgram Compiler Executable
MPU
![Page 6: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/6.jpg)
6
Complex machine consequences
• Programmer needs to be highly skilled
• Programming is error-prone
These consequences imply . . .
Increased parallelism increased development cost!
![Page 7: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/7.jpg)
7
Amdahl’s Law
The speedup of a program is bounded by its inherently sequential part.
(http://en.wikipedia.org/wiki/Amdahl's_law)
If– A program needs 20 hours using a CPU– 1 hour cannot be parallelized
Then– Minimum execution time ≥ 1 hour.– Maximum speed up ≤ 20.
![Page 8: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/8.jpg)
8(http://en.wikipedia.org/wiki/Amdahl's_law)
![Page 9: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/9.jpg)
9
Parallelization opportunities
Scalable parallelism resides in 2
sequential program constructs:
• Divide-and-conquer recursion
• Iterative statements (for)
![Page 10: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/10.jpg)
10
2 schools of thought
• Create a general solution
(Address everything somewhat well)
• Create a specific solution
(Address one thing very well)
![Page 11: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/11.jpg)
11
Focus on iterative statements (for)
float[] x = new float[n];
float[] b = new float[n];
float[][] a = new float[n][n];
. . .
for ( int i = 0; i < n; i++ )
{
b[i] = 0;
for ( int j = 0; j < n; j++ )
b[i] += a[i][j]*x[j];
}
![Page 12: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/12.jpg)
12
Matrix-Vector Product
b = Ax, illustrated with a 3X3 matrix, A.
_______________________________
b1 = a11*x1 + a12*x2 + a13*x3
b2 = a21*x1 + a22*x2 + a23*x3
b3 = a31*x1 + a32*x2 + a33*x3
![Page 13: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/13.jpg)
13
a31 a32 a33
a21 a22 a23
a11 a12 a13
x1 x2 x3
x1
x1
x2
x2
x3
x3b1
b2
b3
x1 x2 x3
![Page 14: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/14.jpg)
14
a31 a32 a33
a21 a22 a23
a11 a12 a13
x1 x2 x3
x1
x1
x2
x2
x3
x3
TIME
SPACE
![Page 15: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/15.jpg)
15
a31 a32 a33
a21 a22 a23
a11 a12 a13
x1 x2 x3
x1
x1
x2
x2
x3
x3
SPACE
TIME
![Page 16: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/16.jpg)
16
a31
a32
a33
a21
a22
a23
a11
a12
a13
x1
x2
x3
x1
x1 x
2
x2
x3
x3
SPACE
TIME
![Page 17: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/17.jpg)
17
Matrix Product
C = AB, illustrated with a 2X2 matrices.
c11 = a11*b11 + a12*b21
c12 = a11*b12 + a12*b22
c21 = a21*b11 + a22*b21
c12 = a21*b12 + a22*b22
![Page 18: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/18.jpg)
18
a21 a22
a11 a12
b11
b11 b21
k
row
a21 a22
a11 a12b12
b21
b12
b22
b22
col
![Page 19: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/19.jpg)
19
a11
a21a22
a12
b11
b11 b21
T
S
a21 a22
a11 a12b12
b21
b12
b22
b22
S
![Page 20: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/20.jpg)
20
a21 a22
a11 a12
b11
b11 b21
T
Sa21 a22
a11 a12b12
b21
b12
b22
b22
S
![Page 21: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/21.jpg)
21
Declaring an iterative computation
• Index set
• Data network
• Functions
• Space-time embedding
![Page 22: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/22.jpg)
22
Declaring an Index set
I1: I2:1 ≤ i ≤ j ≤ n 1 ≤ i ≤ n 1 ≤ j ≤ n
i
j
i
j
![Page 23: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/23.jpg)
23
Declaring a Data network
D1:
x: [ -1, 0];
b: [ 0, -1];
a: [ 0, 0];
D2:
x: [ -1, 0];
b: [ -1, -1];
a: [ 0, -1];
x
b
ax
ab
![Page 24: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/24.jpg)
24
I1:
D1:
x: [ -1, 0];
b: [ 0, -1];
a: [ 0, 0];
Declaring an Index set + Data network
i
j
x
b
a
1 ≤ i ≤ j ≤ n
![Page 25: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/25.jpg)
25
Declaring the Functions
R1:float x’ (float x) { return x; }
float b’ (float b, float x, float a)
{ return b + a*x; }
R2:char x’ (char x) { return x; }
boolean b’ (boolean b, char x, char a)
{ return b && a == x; }i
j
![Page 26: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/26.jpg)
26
Declaring a Spacetime embedding
E1:– space = -i + j– time = i + j.
E2:– space1 = i – space2 = j– time = i + j.
time
space
timespace2
space1
![Page 27: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/27.jpg)
27
Declaring an iterative computation Upper triangular matrix-vector product
UTMVP = (I1,D1,F1,E1)
time
space
![Page 28: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/28.jpg)
28
Declaring an iterative computation Full matrix-vector product
UTMVP = (I2,D1,F1,E1)
time
space
![Page 29: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/29.jpg)
29
Declaring an iterative computation Convolution (polynomial product)
UTMVP = (I2,D2,F1,E1)
time
space
![Page 30: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/30.jpg)
30
Declaring an iterative computation String pattern matching
UTMVP = (I2,D2,F2,E1)
time
space
![Page 31: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/31.jpg)
31
Declaring an iterative computation Pipelined String pattern matching
UTMVP = (I2,D2,F2,E2)
timespace2
space1
![Page 32: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/32.jpg)
32
Iterative computation specification
Declarative specification
Is a 4-dimensional design space
(actually 5 dimensional: space embedding is
independent of time embeding)
Facilitates reuse of design components.
![Page 33: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/33.jpg)
33
Starting with an existing language …
• Can infer
– Index set
– Data network
– Functions
• Cannot infer
– Space embedding
– Time embedding
![Page 34: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/34.jpg)
34
Spacetime embedding
• Start with it as a program annotation
• More advanced:
compiler optimized based on program
annotated figure of merit.
![Page 35: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/35.jpg)
35
Work
• Work out details of notation• Implement in Java, C, Matlab, HDL, …• Map virtual processor network to actual processor
network• Map
– Java: map processors to Threads, [links to Channels]– GPU: map processors to GPU processing elements
(Challenge: spacetime embedding depends on underlying architecture)
![Page 36: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/36.jpg)
36
Work …
• The output of 1 iterative computation is
the input to another.
• Develop a notation for specifying
composite iterative computation?
![Page 37: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello](https://reader033.vdocuments.mx/reader033/viewer/2022051401/56649ebf5503460f94bcab2e/html5/thumbnails/37.jpg)
37
Thanks for listening!
Questions?