multi-gpu load balancing: simulation &...
TRANSCRIPT
Multi-GPU Load Balancing for Simulation and Rendering
Yong Cao Computer Science Department, Virginia Tech, USA
In-situ Visualization and Visual Analytics
• Instant visualization and interaction of computing tasks • Applications:
– Computational Fluid Dynamics – Seismic Propagation – Molecular Dynamics – Network Security Analysis – …
2
In-situ Visualization and Visual Analytics
• Instant visualization and interaction of computing tasks • Applications:
– Computational Fluid Dynamics – Seismic Propagation – Molecular Dynamics – Network Security Analysis – …
3
In-situ Visualization and Visual Analytics
• Instant visualization and interaction of computing tasks • Applications:
– Computational Fluid Dynamics – Seismic Propagation – Molecular Dynamics – Network Security Analysis – …
4
Parallel Execution – Task Split
7
T1 T2
Data write Data read
Processor 1:
Memory:
Processor 2:
Problem: Task (Context) Switch
• Disadvantage of context switch: - Overhead of another kernel launch - Flash of the cache lines - Disallow persistent threads
Parallel Execution: Pipelining
8
t t
Task 1 Task 2
Data write Data read
Memory:
t+1 t+1
Processor 1: Processor 2:
…
+ Simplified kernel for each GPU + Better share memory and cache usage + Persistent thread for distributed scheduling
Parallel Execution: Pipelining
9
t t
Task 1 Task 2
Data write Data read
Memory:
t+1 t+1
Processor 1: Processor 2:
…
Problem: bubble in the pipeline
Multi-GPU Pipeline Architecture
10
Multi-GPU Array
FIFO Data Buffer
Vis GPU
Vis GPU
Sim GPU
Sim GPU …
……
Read Write
Vis Sim
W
R Vis Sim
W
R
Time Step 1
… …Vis Sim
W
R
Time Step 2
Time Step n
Adaptive Load Balancing
11
Multi-GPU Array
FIFO Data Buffer
Vis GPU
Vis GPU
Sim GPU
Sim GPU …
……
Read Write
Ada
ptiv
e an
d D
istr
ibut
ed S
ched
ulin
g
Vis GPU
Vis GPU
Vis GPU
Sim GPU …
……
Read Write Full Buffer: Shift toward Rendering
Vis GPU
Sim GPU
Sim GPU
Sim GPU …
……
Read Write Empty Buffer: Shift toward Simulation
Task Partition
• Intra-frame partition
• Inter-frame partition
12
t
t t t t
t t+1 t+2 t+3
t t+1 t+2 t+3
Task Partition for Visual Simulation
13
Multi-GPU Array
FIFO Data Buffer
Vis GPU
Vis GPU
Sim GPU
Sim GPU …
……
Read Write
• Simulation: Intra frame partition • Rendering: Inter frame partition
Problem: Scheduling Algorithm
• Performance Model:
• Schedule to optimize:
14
n: The number of assigned GPUs.
Mi: The number of assigned Simulation GPUs.
Case Study Application
• N-body Simulation with Ray-Traced rendering
• Performance model parameters:
• Simulation: number of iterations (i) number of simulated bodies (p)
• Rendering: number of samples for super sampling (s)
• Scheduling Optimization:
15
Mt = f (it, st, pt )
Static Load-Balancing
• Assumption: the performance parameters do NOT change at run-time.
• Data driven modeling approach: – Sample the 3 dimensional (i,s,p) as a rigid grid – Use tri-linear interpolation to get the result for the new
inputs
16
Mt = f (it, st, pt ) M = f (i, s, p)
Static Load-Balancing: Results
• Performance Parameter Sampling
• Load Balancing
17
16 Samples, 80 iterations 4 Samples, 80 iterations
Dynamic Load Balancing
• Assumption: Performance parameters change during the run-time.
• Find the indirect load-balance indicator p – Execution time of the previous time step
• Problem: Performance different between two time steps can be dramatic.
– The fullness of the buffer F
18
Dynamic Load Balancing: Result
• Stability of the Dynamic Scheduling Algorithm
19
No parameter change (only at the beginning)
Parameters change at the dotted line.
Comparison: Dynamic vs. Static Scheduling
20
2000 Particles
4000 Particles
Performance Speedup over static load-balancing
Conclusion
+ Pipelining + Dynamic load balancing
- Fine granularity load balancing (SM level) - Communication overhead
- Programmability: Software framework, Library
21
Question(s):
• Contact Information:
Yong Cao
Computer Science Department
Virginia Tech Email: [email protected]
Website: www.cs.vt.edu/~yongcao
22