multi-gpu load balancing: simulation &...

$: Multi-GPU Load Balancing: Simulation & Renderingon-demand.gputechconf.com/...GPU-Load-Balancing.pdf · W\ e will also introduce a software framework for load balancing to account$
Multi-GPU Load Balancing for Simulation and Rendering

Yong Cao Computer Science Department, Virginia Tech, USA

In-situ Visualization and Visual Analytics

•  Instant visualization and interaction of computing tasks •  Applications:

–  Computational Fluid Dynamics –  Seismic Propagation –  Molecular Dynamics –  Network Security Analysis –  …

2




3




4

Generalized Execution Loop

5

Simulation Rendering

Data write Data read

Execution:

Memory:

Generalized Execution Loop

6

Task 1 Task 2


Execution:

Memory:

Parallel Execution – Task Split

7

T1 T2


Processor 1:

Memory:

Processor 2:

Problem: Task (Context) Switch

•  Disadvantage of context switch: - Overhead of another kernel launch - Flash of the cache lines - Disallow persistent threads

Parallel Execution: Pipelining

8

t t

Task 1 Task 2


Memory:

t+1 t+1

Processor 1: Processor 2:

…

+ Simplified kernel for each GPU + Better share memory and cache usage + Persistent thread for distributed scheduling

Parallel Execution: Pipelining

9

t t

Task 1 Task 2


Memory:

t+1 t+1

Processor 1: Processor 2:

…

Problem: bubble in the pipeline

Multi-GPU Pipeline Architecture

10

Multi-GPU Array

FIFO Data Buffer

Vis GPU

Vis GPU

Sim GPU

Sim GPU …

……

Read Write

Vis Sim

W

R Vis Sim

W

R

Time Step 1

… …Vis Sim

W

R

Time Step 2

Time Step n

Adaptive Load Balancing

11

Multi-GPU Array

FIFO Data Buffer

Vis GPU

Vis GPU

Sim GPU

Sim GPU …

……

Read Write

Ada

ptiv

e an

d D

istr

ibut

ed S

ched

ulin

g

Vis GPU

Vis GPU

Vis GPU

Sim GPU …

……

Read Write Full Buffer: Shift toward Rendering

Vis GPU

Sim GPU

Sim GPU

Sim GPU …

……

Read Write Empty Buffer: Shift toward Simulation

Task Partition

•  Intra-frame partition

•  Inter-frame partition

12

t

t t t t

t t+1 t+2 t+3

t t+1 t+2 t+3

Task Partition for Visual Simulation

13

Multi-GPU Array

FIFO Data Buffer

Vis GPU

Vis GPU

Sim GPU

Sim GPU …

……

Read Write

•  Simulation: Intra frame partition •  Rendering: Inter frame partition

Problem: Scheduling Algorithm

•  Performance Model:

•  Schedule to optimize:

14

n: The number of assigned GPUs.

Mi: The number of assigned Simulation GPUs.

Case Study Application

•  N-body Simulation with Ray-Traced rendering

•  Performance model parameters:

•  Simulation: number of iterations (i) number of simulated bodies (p)

•  Rendering: number of samples for super sampling (s)

•  Scheduling Optimization:

15

Mt = f (it, st, pt )

Static Load-Balancing

•  Assumption: the performance parameters do NOT change at run-time.

•  Data driven modeling approach: –  Sample the 3 dimensional (i,s,p) as a rigid grid –  Use tri-linear interpolation to get the result for the new

inputs

16

Mt = f (it, st, pt ) M = f (i, s, p)

Static Load-Balancing: Results

•  Performance Parameter Sampling

•  Load Balancing

17

16 Samples, 80 iterations 4 Samples, 80 iterations

Dynamic Load Balancing

•  Assumption: Performance parameters change during the run-time.

•  Find the indirect load-balance indicator p –  Execution time of the previous time step

•  Problem: Performance different between two time steps can be dramatic.

–  The fullness of the buffer F

18

Dynamic Load Balancing: Result

•  Stability of the Dynamic Scheduling Algorithm

19

No parameter change (only at the beginning)

Parameters change at the dotted line.

Comparison: Dynamic vs. Static Scheduling

20

2000 Particles

4000 Particles

Performance Speedup over static load-balancing

Conclusion

+ Pipelining + Dynamic load balancing

- Fine granularity load balancing (SM level) - Communication overhead

- Programmability: Software framework, Library

21

Question(s):

•  Contact Information:

Yong Cao

Computer Science Department

Virginia Tech Email: [email protected]

Website: www.cs.vt.edu/~yongcao

22

multi-gpu load balancing: simulation &...

Documents