introduction to parallel computation...introduction to parallel computation parallel algorithm...
TRANSCRIPT
![Page 1: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/1.jpg)
Introduction to Parallel Computation
Introduction to Parallel Computation
Wolfgang Schreiner
Research Institute for Symbolic Computation (RISC-Linz)
Johannes Kepler University, A-4040 Linz, Austria
http://www.risc.uni-linz.ac.at/people/schreine
Wolfgang Schreiner RISC-Linz
![Page 2: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/2.jpg)
Introduction to Parallel Computation
A Graph-Theoretical Problem
“All Pairs Shortest Paths”
• Given: graph G = (V,A)– Directed acyclic graph, |V | = n
• Representation: weight matrix W
W (i, j) =
0 if i = j
w > 0 if there is an edge of length w
between nodes i and j
∞ if there is no edge between i and j
•Wanted: distance matrix D
D(i, j) =
0 if i = j
d > 0 if i 6= j where d is the length of
the shortest path between i and j
∞ if there is no path between i and j
Wolfgang Schreiner 1
![Page 3: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/3.jpg)
Introduction to Parallel Computation
Example
2
1
V3
V4
V2
3
V1
1
2
11
Vo
W v0 v1 v2 v3 v4v0 0 1 ∞ 2 ∞v1 ∞ 0 2 ∞ ∞v2 ∞ ∞ 0 1 ∞v3 1 ∞ ∞ 0 3v4 ∞ ∞ 1 ∞ 0
⇒
D v0 v1 v2 v3 v4v0 0 1 3 2 5v1 4 0 2 3 6v2 2 3 0 1 4v3 1 2 4 0 3v4 3 4 1 2 0
Wolfgang Schreiner 2
![Page 4: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/4.jpg)
Introduction to Parallel Computation
Solution Idea
• Construct sequence of matrices– D0, D1, . . . , Dn−1
– Di describes all shortest paths with not more than i edges.
• Consequence: Dn−1 = D
• Proof– Assume shortest path p with more than n − 1 edges.
Then there is some node v twice in this path i.e.
p = 〈i, . . . , v, . . . , v, . . . , j〉. But then path p′ =
〈i, . . . , v, . . . , j〉 is shorter!
Wolfgang Schreiner 3
![Page 5: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/5.jpg)
Introduction to Parallel Computation
Construction
D0(i, j) =
0 if i = j∞ if i 6= j
D1(i, j) = W (i, j)
Dr+1(i, j) =?
Two Cases
Let l ≤ r be the number of edges of the shortest path
between i and j captured by Dr(i, j).
1. |p| = l• p = 〈i, . . . , j〉
l edgesDr+1(i, j) = Dr(i, j)
2. |p| = r + 1• p = 〈i, . . . , k, j〉
r edgesDr+1(i, j) = Dr(i, k) +W (k, j)
Dr+1(i, j) = min{Dr(i, j),mink{Dr(i, k) +W (k, j)}}
Wolfgang Schreiner 4
![Page 6: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/6.jpg)
Introduction to Parallel Computation
Sequential Algorithm
AllPairsShortestPaths(W):
D = W
for r=1 to n-2 do
D = MatMin(D, W)
return D
MatMin(D, W):
for i=1 to n do
for j=1 to n do
E[i,j] = D[i,j]
for k=1 to n do
E[i,j] = min(E[i,j],D[i,k]+W[k,j])
return E
Wolfgang Schreiner 5
![Page 7: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/7.jpg)
Introduction to Parallel Computation
Observation
MatMin has same structure as matrix multi-plication (+→ min, ∗ → +).
•Define D ×W = MatMin(D,W )
• Begin: D1 = W
• General: Di = Di−1 ×W = W i
• End: D = Dn−1 = Wn−1
Problem solution is essentially repeated ma-trix multiplication!
Wolfgang Schreiner 6
![Page 8: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/8.jpg)
Introduction to Parallel Computation
Optimization
• Instead of computing– W 1,W 2,W 3, . . . ,W n−1
• we can compute– W 1,W 2,W 4, . . . ,W 2s
(where 2s ≥ n− 1).
True for matrix multiplication as well as forMatMin (since both are associative).
AllPairsShortestPaths(W):
D = W
for r=1 to s do
D = MatMin(D, D)
return D
We can reduce n matrix multiplications tolog n square computations!
Wolfgang Schreiner 7
![Page 9: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/9.jpg)
Introduction to Parallel Computation
Time Analysis
• n nodes.
• log n square computations.
• n3 (min,+) operations for each squarecomputation.
Sequential time complexity O(log n · n3)
Wolfgang Schreiner 8
![Page 10: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/10.jpg)
Introduction to Parallel Computation
Parallel Algorithm
• Sequence of square operations.
• Each square op. contains n3 independent(min,+) operations that can be performedin parallel yielding n3 results.
• Each of the n2 entries D(i, j) is a minimumof n values.• Time complexity:
– log n square computations.
– 1 time step for all (min,+) operations.
– log n time to compute each of the n2 minimums.
Parallel time complexity O(log2 n)
Wolfgang Schreiner 9
![Page 11: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/11.jpg)
Introduction to Parallel Computation
Minimum of n Values
Tree-like minimum construction
v3 v4 v5 v7
min{v1,v2,v3,v4} min{v5,v6,v7,v8}
min{v1,v2,v3,v4,v5,v6,v7,v8}
v8v6
min{v7,v8}
v1 v2
min{v1,v2} min{v3,v4} min{v5,v6}
Depth of tree = computation time =O(log n)
Wolfgang Schreiner 10
![Page 12: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/12.jpg)
Introduction to Parallel Computation
Comparison
• General– Sequential: O(log n · n3)
– Parallel: O(log2 n)
– Processors: O(n3)
• Time/processor product– Sequential product: O(logn · n3)
– Parallel product: O(log2n · n3)
Parallel algorithm is in some sense less effi-cient than sequential one!
Wolfgang Schreiner 11
![Page 13: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/13.jpg)
Introduction to Parallel Computation
Parallel Machine Models
• How to define algorithm?
• How to implement algorithm?
• How to analyze algorithm?
We need a parallel machine and programmingmodel!
Wolfgang Schreiner 12
![Page 14: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/14.jpg)
Introduction to Parallel Computation
Sequential Machine Model
Von Neumann Computer, Random AccessMachine (RAM).
• Central Processing Unit (CPU)– Executes a stored program.
• Storage Unit (Memory)– Random access to any memory cell.
62
3.14
John
CPU Memory
Program execution is sequence of read/writeoperations on memory.
Wolfgang Schreiner 13
![Page 15: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/15.jpg)
Introduction to Parallel Computation
PRAM Model
Parallel Random Access Machine.
• Set of CPUs.
• CPUs operate on same memory.
• CPUs execute same program lock-step.
CPUs
v1
v2
v3
v4
min{v1, ...}
Memory
Theoretical model for designing and analyzingparallel algorithms.
Wolfgang Schreiner 14
![Page 16: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/16.jpg)
Introduction to Parallel Computation
PRAM Variants
Restrictions of access to memory cells.
• EREW (exclusive read, exclusive write)– Only one access to individual memory cell at a time.
• CREW (concurrent read, exclusivewrite).– Multiple concurrent reads to a memory cell, but writes are
exclusive.
• CRCW (concurrent read, concurrentwrite).– Multiple concurrent writes allowed, random value (maxi-
mum value, sum, . . . ) is written.
Different variants may yield different com-plexities of parallel algorithms.
Wolfgang Schreiner 15
![Page 17: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/17.jpg)
Introduction to Parallel Computation
PRAM Program
AllPairsShortestPaths(W):
D = W
for r=1 to s do
D = MatMin(D, D)
return D
MatMin(D, W):
for i=1 to n do in parallel
for j=1 to n do in parallel
for k=1 to n do in parallel
M[i,j,k] =
min(D[i,j], D[i,k]+W[k,j])
E[i,j] = MIN(M[i,j])
return E
MIN computes minimum of n values.
Wolfgang Schreiner 16
![Page 18: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/18.jpg)
Introduction to Parallel Computation
Complexity of MIN
• EREW, CREW: O(log n)– Tree-like combination of values.
• CRCW: O(1)– All values written simultaneously into same cell, minimum
value remains.
CPUs
v1
v2
v3
v4
min{v1, ...}
Memory
Wolfgang Schreiner 17
![Page 19: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/19.jpg)
Introduction to Parallel Computation
High-Performance Architectures
Irregular Applications
Computer Clusters
Regular Applications
Numerical Applications
SMP Symmetrical Multiprocessing
MPP Massively Parallel Multiprocessors
MIMD Multiple Instruction, Multiple Data
High-Performance
Concurrency
Multiprocessors
Pipelining
Array Computers
Shared Memory
Distributed Memory
SIMD Single Instruction, Multiple Data
Vektor-Supercomputer
Wolfgang Schreiner 18
![Page 20: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/20.jpg)
Introduction to Parallel Computation
Vector Supercomputers
b5b4b3b2b1b0
a4
Pipeline
Vector
a0 a1 a2 a3 a5
Vector Registers
Vectors of numbers are passed through arith-metic pipeline.
Wolfgang Schreiner 19
![Page 21: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/21.jpg)
Introduction to Parallel Computation
SIMD Array Computers
Control Unit
Memory
Arithmetic Unit
Comm.
Interface
Communication Network
Control Network
Array of arithmetic units operates and com-municates in lock-step.
Wolfgang Schreiner 20
![Page 22: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/22.jpg)
Introduction to Parallel Computation
Shared Memory Multiprocessors
Global Memory
Bus
ProcessorProcessorProcessor Processor Processor
CacheCacheCache Cache Cache
Processors operate asynchronously on sharedmemory.
Wolfgang Schreiner 21
![Page 23: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/23.jpg)
Introduction to Parallel Computation
Distributed Memory Multiprocessors
Communication Network
Processor
Memory
Interface
Comm.
Processors operate asynchronously on localmemory and communicate by point-to-pointnetwork.
Wolfgang Schreiner 22
![Page 24: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/24.jpg)
Introduction to Parallel Computation
Computer Clusters
Local Area Network (LAN)
Cluster of independent computers cooperatevia local network.
Wolfgang Schreiner 23
![Page 25: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/25.jpg)
Introduction to Parallel Computation
A Parallel Program
cccccccccccccccccccccccccccccccccccccccccccccccc
c Matrix Multiplication MPI Program c
c For this simple version, # of procssors c
c equals # of columns in matrix c
c c
c To run, mpirun -np 4 a.out c
cccccccccccccccccccccccccccccccccccccccccccccccc
include ’mpif.h’
parameter (ncols=4, nrows=4)
integer a(ncols,nrows), b(ncols,nrows), c(ncols,nrows)
integer buf(ncols),ans(nrows)
integer myid, root, numprocs, ierr, status(MPI_STATUS_SIZE)
integer sender, count
call MPI_INIT(ierr)
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
if(numprocs.ne.4) then
print *, "Please run this exercise on 4 processors"
call MPI_FINALIZE(ierr)
stop
endif
root = 0
tag = 100
count = nrows*ncols
c Master initializes and then dispatches to others
IF ( myid .eq. root ) then
do j=1,ncols
do i=1,nrows
a(i,j) = 1
b(i,j) = j
enddo
enddo
Wolfgang Schreiner 24
![Page 26: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/26.jpg)
Introduction to Parallel Computation
A Parallel Program (Contd)
c send a to all other process
call MPI_BCAST(a,count,MPI_INTEGER,root,MPI_COMM_WORLD,ierr)
c send one column of b to each other process
do j=1,numprocs-1
do i = 1,nrows
buf(i) = b(i,j+1)
enddo
call MPI_SEND(buf,nrows,MPI_INTEGER,j,tag,MPI_COMM_WORLD,ierr)
enddo
c Master does his own part here
do i=1,nrows
ans(i) = 0
do j=1,ncols
ans(i) = ans(i) + a(i,j) * b(i,1)
enddo
c(i,1) = ans(i)
enddo
c then receives answers from others
do j=1,numprocs-1
call MPI_RECV(ans, nrows, MPI_INTEGER, MPI_ANY_SOURCE,
$ MPI_ANY_TAG, MPI_COMM_WORLD, status, ierr)
sender = status(MPI_SOURCE)
do i=1,nrows
c(i,sender+1) = ans(i)
enddo
enddo
do i=1,nrows
write(6,*)(c(i,j),j=1,ncols)
enddo
Wolfgang Schreiner 25
![Page 27: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/27.jpg)
Introduction to Parallel Computation
A Parallel Program (Contd)
ELSE
c slaves receive a, and one column of b, then compute dot product
call MPI_BCAST(a,count,MPI_INTEGER,root,MPI_COMM_WORLD,ierr)
call MPI_RECV(buf, nrows, MPI_INTEGER, root,
$ MPI_ANY_TAG, MPI_COMM_WORLD, status, ierr)
do i=1,nrows
ans(i) = 0
do j=1,ncols
ans(i) = ans(i) + a(i,j) * buf(j)
enddo
enddo
call MPI_SEND(ans,nrows,MPI_INTEGER,root,0,MPI_COMM_WORLD,ierr)
ENDIF
call MPI_FINALIZE(ierr)
stop
end
Wolfgang Schreiner 26
![Page 28: Introduction to Parallel Computation...Introduction to Parallel Computation Parallel Algorithm Sequence of square operations. Each square op. contains n3 independent (min,+) operations](https://reader035.vdocuments.mx/reader035/viewer/2022071610/6149c68912c9616cbc68fb3f/html5/thumbnails/28.jpg)
Introduction to Parallel Computation
Literature
• Ian T. Foster, Designing and Build-ing Parallel Programs — Conceptsand Tools for Parallel Software En-gineering, Addison Wesley, Reading,Massachusetts, 1995. Online version:http://www.mcs.anl.gov/dbpp
•Michael J. Quinn, Parallel Computing —Theory and Practice, 2nd edition, McGraw-Hill, New York, 1994.
• Kai Hwang, Advanced Computer Architec-ture: Parallelism, Scalability, Programma-bility, McGraw-Hill, New York, 1993.
Wolfgang Schreiner 27