cs 179: gpu programming lab 7 recitation: the mpi/cuda wave equation solver
TRANSCRIPT
![Page 1: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/1.jpg)
CS 179: GPU ProgrammingLab 7 Recitation: The MPI/CUDA Wave Equation Solver
![Page 2: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/2.jpg)
MPI/CUDA – Wave Equation Big idea: Divide our data array between n processes!
![Page 3: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/3.jpg)
MPI/CUDA – Wave Equation Problem if we’re at the boundary of a process!
𝑦 𝑥 ,𝑡+1=2 𝑦 𝑥, 𝑡− 𝑦𝑥 ,𝑡 −1+(𝑐∆ 𝑡∆ 𝑥 )2
(𝑦¿¿ 𝑥+1 ,𝑡−2 𝑦𝑥 ,𝑡+𝑦𝑥−1 ,𝑡 )¿
x
Where do we get ? (It’s outside our process!)
tt-1
t+1
![Page 4: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/4.jpg)
Wave Equation – Simple Solution After every time-step, each process gives its leftmost and
rightmost piece of “current” data to neighbor processes!
Proc0 Proc1 Proc2 Proc3 Proc4
![Page 5: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/5.jpg)
Wave Equation – Simple Solution
Pieces of data to communicate:
Proc0 Proc1 Proc2 Proc3 Proc4
![Page 6: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/6.jpg)
Wave Equation – Simple Solution Can do this with MPI_Irecv, MPI_Isend, MPI_Wait:
Suppose process has rank r: If we’re not the rightmost process:
Send data to process r+1 Receive data from process r+1
If we’re not the leftmost process: Send data to process r-1 Receive data from process r-1
Wait on requests
![Page 7: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/7.jpg)
Wave Equation – Simple Solution Boundary conditions:
Use MPI_Comm_rank and MPI_Comm_size Rank 0 process will set leftmost condition Rank (size-1) process will set rightmost condition
![Page 8: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/8.jpg)
Simple Solution – Problems Communication can be expensive!
Expensive to communicate every timestep to send 1 value!
Better solution: Send some m values every m timesteps!
![Page 9: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/9.jpg)
Possible Implementation Initial setup: (Assume 3 processes)
Proc0 Proc1 Proc2
![Page 10: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/10.jpg)
Possible Implementation Give each array “redundant regions” (Assume communication interval = 3)
Proc0 Proc1 Proc2
![Page 11: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/11.jpg)
Possible Implementation Every (3) timesteps, send some of your data to neighbor
processes!
![Page 12: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/12.jpg)
Possible Implementation Send “current” data (current at time of communication)
Proc0 Proc1 Proc2
![Page 13: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/13.jpg)
Possible Implementation Then send “old” data
Proc0 Proc1 Proc2
![Page 14: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/14.jpg)
Then… Do our calculation as normal, if we’re not at the ends of our array
Our entire array, including redundancies!
𝑦 𝑥 ,𝑡+1=2 𝑦 𝑥, 𝑡− 𝑦𝑥 ,𝑡 −1+(𝑐∆ 𝑡∆ 𝑥 )2
(𝑦¿¿ 𝑥+1 ,𝑡−2 𝑦𝑥 ,𝑡+𝑦𝑥−1 ,𝑡 )¿
![Page 15: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/15.jpg)
What about corruption? Suppose we’ve just copied our data… (assume a non-
boundary process)
. = valid ? = garbage ~ = doesn’t matter
(Recall that there exist only 3 spaces – gray areas are nonexistent in our current time
![Page 16: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/16.jpg)
What about corruption? Calculate new data…
Value unknown!
![Page 17: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/17.jpg)
What about corruption? Time t+1:
Current -> old, new -> current (and space for old is overwritten by new…)
![Page 18: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/18.jpg)
What about corruption? More garbage data!
“Garbage in, garbage out!”
![Page 19: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/19.jpg)
What about corruption? Time t+2…
![Page 20: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/20.jpg)
What about corruption? Even more garbage!
![Page 21: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/21.jpg)
What about corruption? Time t+3…
Core data region - corruption imminent!?
![Page 22: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/22.jpg)
What about corruption? Saved!
Data exchange occurs after communication interval has passed!
![Page 23: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/23.jpg)
“It’s okay to play with garbage… just don’t get sick”
![Page 24: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/24.jpg)
Boundary Conditions Applied only at the leftmost and rightmost process!
![Page 25: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/25.jpg)
Boundary corruption? Examine left-most process:
We never copy to it, so left redundant region is garbage!
(B = boundary condition set)
![Page 26: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/26.jpg)
Boundary corruption? Calculation brings garbage into non-redundant region!
![Page 27: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/27.jpg)
Boundary corruption? …but boundary condition is set at every interval!
![Page 28: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/28.jpg)
Other details To run programs with MPI, use the “mpirun” command, e.g.mpirun -np (number of processes) (your program and arguments)
CMS machines: Add this to your .bashrc file:alias mpirun=/cs/courses/cs179/openmpi-1.6.4/bin/mpirun
![Page 29: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/29.jpg)
Common bugs (and likely causes) Lock-up (it seems like nothing’s happening):
Often an MPI issue – locks up on MPI_Wait because some request wasn’t fulfilled
Check that all sends have corresponding receives
Your wave looks weird: Likely cause 1: Garbage data is being passed between
processes Likely cause 2: Redundant regions aren’t being refreshed and/or
are contaminating non-redundant regions
![Page 30: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/30.jpg)
Your wave is flat-zero: Left boundary condition isn’t being initialized and/or isn’t
propagating Same reasons as previous
Common bugs (and likely causes)
![Page 31: CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver](https://reader035.vdocuments.mx/reader035/viewer/2022062500/5697bfde1a28abf838cb24ad/html5/thumbnails/31.jpg)
Common bugs (and likely causes) General debugging tips:
Run at MPI with process number = 1 or 2 Set kernel to write constant value