heterogeneous particle based simulation (siggraph asia 2011)
TRANSCRIPT
![Page 1: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/1.jpg)
HETEROGENEOUS PARTICLE
BASED SIMULATION
Takahiro Harada, AMD
![Page 2: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/2.jpg)
2 Harada, Heterogeneous Particle-based Simulation
Large number of particles
Particles with identical size
– Work granularity is almost the same
– Good for the wide SIMD architecture
PARTICLE BASED SIMULATION ON THE GPU
Harada et al. 2007
![Page 3: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/3.jpg)
3 Harada, Heterogeneous Particle-based Simulation
PARTICLE BASED SIMULATION
Collision
Integration
Acceleration structure is used for efficient collide
– Uniform grid → Suited for the GPU
– Less divergence
𝑓𝑐𝑜𝑙𝑙𝑖𝑑𝑒 = 𝑓𝑖𝑗
𝑣 +=𝑓
𝑚∆𝑡
𝑥 += 𝑣∆𝑡
𝑑𝑣
𝑑𝑡=𝑓
𝑚
𝑑𝑥
𝑑𝑡= 𝑣
![Page 4: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/4.jpg)
4 Harada, Heterogeneous Particle-based Simulation
DIVERGENCE ON SIMD
0 1 2 3 4 5 6 7
Void Kernel()
{
if(A)
FuncA();
else if(B)
FuncB();
else
FuncC();
}
![Page 5: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/5.jpg)
5 Harada, Heterogeneous Particle-based Simulation
PARTICLE BASED SIMULATION ON THE GPU
Particle collision using a uniform grid
0 1 2 3 4 5 6 7
Void Kernel()
{
prepare();
collide(Cell0);
collide(Cell1);
collide(Cell2);
collide(Cell3);
collide(Cell4);
collide(Cell5);
collide(Cell6);
collide(Cell7);
collide(Cell8);
}
Cell0 Cell1 Cell2
Cell3 Cell4 Cell5
Cell6 Cell7 Cell8
![Page 6: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/6.jpg)
6 Harada, Heterogeneous Particle-based Simulation
MIXED PARTICLE SIMULATION
Not only small particles
Difficulty for GPUs
– Large particles interact with small particles
– Large-large collision
![Page 7: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/7.jpg)
7 Harada, Heterogeneous Particle-based Simulation
CHALLENGE
Non uniform work granularity
– Small-small(SS) collision
Uniform, GPU
– Large-large(LL) collision
Non Uniform, CPU
– Large-small(LS) collision
Non Uniform, CPU
![Page 8: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/8.jpg)
8 Harada, Heterogeneous Particle-based Simulation
FUSION ARCHITECTURE
CPU and GPU are:
– On the same die
– Much closer
– Efficient data sharing
CPU and GPU are good at different works
– CPU: serial computation, conditional branch
– GPU: parallel computation
Able to dispatch works to:
– Serial work with varying granularity → CPU
– Parallel work with the uniform granularity → GPU
![Page 9: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/9.jpg)
9 Harada, Heterogeneous Particle-based Simulation
MIXED PARTICLE SIMULATION
Benefit from Fusion Architecture
– Different works in a simulation
– CPU & GPU are working together
– Shares data
![Page 10: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/10.jpg)
10 Harada, Heterogeneous Particle-based Simulation
METHOD
![Page 11: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/11.jpg)
11 Harada, Heterogeneous Particle-based Simulation
TWO SIMULATIONS
Small particles
Large particles
Build
Acc. Structure
SS
Collision
S
Integration
Build
Acc. Structure
LL
Collision
L
Integration
LS
Co
llis
ion
Position
Velocity
Force
Grid
Position
Velocity
Force
![Page 12: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/12.jpg)
12 Harada, Heterogeneous Particle-based Simulation
Small particles
Large particles
Uniform Work
Non Uniform Work
CLASSIFY BY WORK GRANULARITY
Build
Acc. Structure
SS
Collision
S
Integration
L
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
LS
Collision
Build
Acc. Structure
![Page 13: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/13.jpg)
13 Harada, Heterogeneous Particle-based Simulation
Small particles
Large particles
GPU
CPU
CLASSIFY BY WORK GRANULARITY, ASSIGN PROCESSOR
Build
Acc. Structure
SS
Collision
S
Integration
L
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
LS
Collision
Build
Acc. Structure
![Page 14: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/14.jpg)
14 Harada, Heterogeneous Particle-based Simulation
Small particles
Large particles
Grid, small particle data has to be shared with the CPU for LS collision
– Allocated as zero copy buffer
GPU
CPU
DATA SHARING
Build
Acc. Structure
SS
Collision
S
Integration
L
Integration
Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
Build
Acc. Structure
Position
Velocity
Grid
Force
LS
Collision
![Page 15: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/15.jpg)
15 Harada, Heterogeneous Particle-based Simulation
Small particles
Large particles
Grid, small particle data has to be shared with the CPU for LS collision
– Allocated as zero copy buffer
GPU
CPU
SYNCHRONIZATION
Position
Velocity
Force
Grid
Position
Velocity
Force
SS
Collision
S
Integration
L
Integration
LL
Collision
Position
Velocity
Grid
Force
Syn
ch
ron
iza
tio
n
LS
Collision
Build
Acc. Structure
Build
Acc. Structure
Syn
ch
ron
iza
tio
n
Build
Acc. Structure
Build
Acc. Structure
![Page 16: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/16.jpg)
16 Harada, Heterogeneous Particle-based Simulation
GPU
CPU
VISUALIZING WORKLOADS
Build
Acc. Structure
SS
Collision
S
Inte
gra
tio
n Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
LS
Collision
Syn
ch
ron
iza
tio
n
L
Inte
gra
tio
n
Small particles
Large particles
Grid construction can be moved at the end of the pipeline
– Unbalanced workload
![Page 17: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/17.jpg)
17 Harada, Heterogeneous Particle-based Simulation
Small particles
Large particles
To get better load balancing
– The sync is for passing the force buffer filled by the CPU to the GPU
– Move the LL collision after the sync
GPU
CPU
LOAD BALANCING
Build
Acc. Structure
SS
Collision
S
Inte
gra
tio
n Position
Velocity
Force
Grid
Position
Velocity
Force LL
Collision
Syn
ch
ron
iza
tio
n
L
Inte
gra
tio
n
LS
Collision
![Page 18: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/18.jpg)
18 Harada, Heterogeneous Particle-based Simulation
GP
U W
ork
CP
U W
ork
![Page 19: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/19.jpg)
19 Harada, Heterogeneous Particle-based Simulation
MULTI THREADING
(4 THREADS)
![Page 20: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/20.jpg)
20 Harada, Heterogeneous Particle-based Simulation
FURTHER OPTIMIZATION
GPU
CPU0
CPU1
CPU2
Build
Acc.
Structure
SS
Collision
S
Inte
g.
LL
Collision
L
Inte
g.
LS
Collision
Syn
ch
ron
iza
tio
n
1. Not optimized for “Llano” which is a 4 core CPU
– Only 2 CPU core were used
– Can use 2 more cores for LS collision
2. LL collision was not optimized
– CPU waits when the GPU was constructing a grid
– Use CPU to improve SS collision
![Page 21: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/21.jpg)
21 Harada, Heterogeneous Particle-based Simulation
OPTIMIZATION1: MULTITHREADING LARGE-SMALL COLLISION
Cannot split the work by large particle indices
– More than 1 large particle can collide with a small particle
– Have to lock the memory on write → Inefficient
Prepare a local buffer for a thread
– A buffer storing force on small particles
– Lock free
Local buffers are merged to one
L0
S0
S1
L1
Thread0
Thread1
Thread2
![Page 22: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/22.jpg)
22 Harada, Heterogeneous Particle-based Simulation
OPTIMIZATION1: MULTITHREADING LARGE-SMALL COLLISION
GPU
Build
Acc. Structure
SS
Collision
S
Inte
g.
CPU0
LL
Collision
L
Inte
g.
CPU1
CPU2
LS
Collision
Syn
ch
ron
iza
tio
n
![Page 23: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/23.jpg)
23 Harada, Heterogeneous Particle-based Simulation
OPTIMIZATION1: MULTITHREADING LARGE-SMALL COLLISION
GPU
Build
Acc. Structure
SS
Collision
S
Inte
g.
CPU0
LL
Collision
L
Inte
g.
CPU1
CPU2
LS
Collision
LS
Collision
LS
Collision S
yn
ch
ron
iza
tio
n
Merg
e
Merg
e
Merg
e
Syn
ch
ron
iza
tio
n
![Page 24: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/24.jpg)
24 Harada, Heterogeneous Particle-based Simulation
Spatially coherent memory layout improves cache utilization
As particles move, spatial locality decreases
OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION
![Page 25: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/25.jpg)
25 Harada, Heterogeneous Particle-based Simulation
Spatially coherent memory layout improves cache utilization
As particles move, spatial locality decreases
OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION
![Page 26: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/26.jpg)
26 Harada, Heterogeneous Particle-based Simulation
Sort particles by spatial location to improve cache utilization
– Z curve
SPATIAL SORT
![Page 27: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/27.jpg)
27 Harada, Heterogeneous Particle-based Simulation
Sort particles by spatial location to improve cache utilization
– Z curve
SPATIAL SORT
![Page 28: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/28.jpg)
28 Harada, Heterogeneous Particle-based Simulation
Requirements
– Full sort was over the budget
– Full sort is not “a must”
– Sort is an optional computation for performance improvement
– Incremental sort
– Use multiple threads
Solution
– Used generalized “Odd-even transition sort”
CHOOSE SORT
![Page 29: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/29.jpg)
29 Harada, Heterogeneous Particle-based Simulation
BLOCK TRANSITION SORT
Generalized “Odd-even transition sort”
Instead of sorting 2 adjacent elements, sort adjacent 2 blocks
Iterate until convergence
Use a thread to sort 2 adjacent blocks
– 6 blocks for 3 threads
– Radix sort
Odd-even transition sort
Block transition sort
![Page 30: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/30.jpg)
30 Harada, Heterogeneous Particle-based Simulation
OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION
GPU
Build
Acc. Structure
SS
Collision
S
Inte
g.
CPU0
LL
Collision
L
Inte
g.
CPU1
CPU2
LS
Collision
LS
Collision
LS
Collision S
yn
ch
ron
iza
tio
n
Merg
e
Merg
e
Merg
e
Syn
ch
ron
iza
tio
n
![Page 31: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/31.jpg)
31 Harada, Heterogeneous Particle-based Simulation
OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION
GPU
Build
Acc. Structure
SS
Collision
S
Inte
g.
CPU0
CPU1
CPU2
LS
Collision
LS
Collision
LS
Collision S
yn
ch
ron
iza
tio
n
Merg
e
Merg
e
Merg
e
LL
Co
ll.
L
Inte
g.
Syn
ch
ron
iza
tio
n
S Sorting
S Sorting
S Sorting
Syn
ch
ron
iza
tio
n
![Page 32: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/32.jpg)
32 Harada, Heterogeneous Particle-based Simulation
DEMO
GP
U W
ork
CP
U W
ork
![Page 33: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/33.jpg)
33 Harada, Heterogeneous Particle-based Simulation
DEMO
GP
U W
ork
CP
U W
ork
![Page 34: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/34.jpg)
34 Harada, Heterogeneous Particle-based Simulation
CONCLUSIONS
Realized a simulation that handles variable sized particles by leveraging the best features of both the CPU
and GPU on AMD’s Fusion Architecture
– The CPU is used for works with non identical compute granularity
– The GPU is used for highly parallel works
Memory sharing between the CPU and GPU is the key for the efficiency
– Avoid wasteful memory copies
![Page 35: Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)](https://reader033.vdocuments.mx/reader033/viewer/2022052311/5579abaed8b42ac1148b4e11/html5/thumbnails/35.jpg)
35 Harada, Heterogeneous Particle-based Simulation
REFERENCE
Takahiro Harada, Seiichi Koshizuka, Yoichiro Kawaguchi, Smoothed Particle Hydrodynamics on GPUs,
Proc. of Computer Graphics International, 63-70(2007)
Justin Hensley, Takahiro Harada, Chapter X OpenCL Case Study:Mixed Particle Simulation,
Heterogeneous Computing with OpenCL, Morgan Kaufmann(2011)