real-time gpu managementheechul/courses/eecs753/s20/slides/w10...gpu with preemption support n....

66
Real-Time GPU Management Heechul Yun 1

Upload: others

Post on 09-Nov-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Real-Time GPU Management

Heechul Yun

1

Page 2: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

This Week

• Topic: General Purpose Graphic Processing Unit (GPGPU) management

• Today

– GPU architecture

– GPU programming model

– Challenges

– Real-Time GPU management

2

Page 3: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

History

• GPU– Graphic is embarrassingly parallel by nature– GeForce 6800 (2003): 53GFLOPs (MUL)

– Some PhDs tried to use GPU to do some general purpose computing, but difficult to program

• GPGPU– Ian Buck (Stanford PhD, 2004) joined Nvidia and

created CUDA language and runtime.– General purpose: (relatively) easy to program, many

scientific applications

3

Page 4: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Discrete GPU

• Add-on PCIe cards on PC– GPU and CPU memories are separate

– GPU memory (GDDR) is much faster than CPU one (DDR)

4

4992 GPU cores 4 CPU cores

Graphic DRAM Host DRAM

PCIE 3.0

Nvidia Tesla K80 Intel Core i7

Page 5: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Integrated CPU-GPU SoC

• Tighter integration of CPU and GPU

– Memory is shared by both CPU and GPU

– Good for embedded systems (e.g., smartphone)

5

GPUSync: A Famework for Real-Time GP Management

Nvidia Tegra K1

Core1

Shared DRAM

Shared Memory Controller

GPU coresCore2 Core3 Core4

Page 6: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

NVIDIA Titan Xp

• 3840 CUDA cores, 12GB GDDR5X

• Peak performance: 12 TFLOPS

6

Page 7: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

NVIDIA Jetson TX2

• 256 CUDA GPU cores + 4 CPU cores

7

Image credit: T. Amert et al., “GPU Scheduling on the NVIDIA TX2: Hidde

n Details Revealed,” RTSS17

Page 8: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

NVIDIA Jetson Platforms

8

Page 9: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

CPU vs. GPGPU

• CPU– Designed to run sequential programs faster

– High ILP: pipeline, superscalar, out-of-order, multi-level cache hierarchy

– Powerful, but complex and big

• GPGPU– Designed to compute math faster for embarrassingly

parallel data (e.g., pixels)

– No need for complex logics (no superscalar, out-of-order, cache)

– Simple, less powerful, but small---can put many of them

9

Page 10: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

10“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University

Page 11: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

11“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University

Page 12: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

12“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University

Page 13: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

13“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University

Page 14: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

14“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University

Page 15: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

15“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University

Page 16: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

GPU Programming Model

• Host = CPU

• Device = GPU

• Kernel

– Function that executes on the device

– Multiple threads execute each kernel

16

Page 17: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

17Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf

Page 18: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

18Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf

Page 19: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

19Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf

Page 20: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

20Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf

Page 21: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Challenges for Discrete GPU

• Data movement problem– Host mem <-> gpu mem

– Copy overhead can be high

• Scheduling problem– Limited ability to prioritize important GPU kernels

– Most (old) GPUs don’t support preemption

– New GPUs support preemption within a process

21

User buffer

Kernel buffer

user

kernel

4992 GPU cores

4 CPU cores

Graphic DRAM

Host DRAM

PCIE 3.0

GPU CPU

Page 22: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Data Movement Challenge

22

4992 GPU cores 4 CPU cores

Graphic DRAM Host DRAM

PCIE 3.0

Nvidia Tesla K80 Intel Core i7

480 GB/s 25 GB/s

16 GB/s

Data transfer is the bottleneck

Page 23: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

An Example

23

CPU GPU GPU CPU

PTask: Operating System Abstractions To Manage GPUs as Compute Devices, SOSP'11

Page 24: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Inefficient Data migration

OS executive

capture

GPU

Run!

camdrv GPU driver

PCI-xfer PCI-xfer

xform

copyto

GPU

copyfrom GPU

PCI-xfer PCI-xfer

filter

copyfrom GPU

detect

IRP

HIDdrv

read()

copyto

GPU

write() read() write() read() write() read()

capture xform filter detect

#> capture | xform | filter | detect &

24Acknowledgement: This slide is from the paper’s author’s slide

A lot of copies

Page 25: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Scheduling Challenge

CPU priorities do notapply to GPU

Long running GPU task(xform) is not preemptibledelaying short GPU task(mouse update)

PTask: Operating System Abstractions To Manage GPUs as Compute Devices, SOSP'11

Page 26: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Challenges for Integrated CPU-GPU

• Memory is shared by both CPU and GPU

• Data movement may be easier but…

26

GPUSync: A Famework for Real-Time GP Management

Nvidia Tegra X2

Core1

PMC

Shared DRAM

Shared Memory Controller

GPU coresCore2

PMC

Core3

PMC

Core4

PMC

16 GB/s

Page 27: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Memory Bandwidth Contention

27

3GPU

2 1 0

GPU AppCo-runners

CPU

Co-scheduling memory intensive CPU task affects GPU performanceon Integrated CPU-GPU SoC

Waqar Ali, Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms.Euromicro Conference on Real-Time Systems (ECRTS), 2018 [pdf] [arXiv] [ppt] [code]

Page 28: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Summary

• GPU Architecture– Many simple in-order cores

• GPU Programming Model– SIMD

• Challenges– Data movement cost

– Scheduling

– Bandwidth bottleneck

– NOT time predictable!

28

Page 29: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Real-Time GPU Management

• Goal

– Time predictable and efficient GPU sharing in multi-tasking environment

• Challenges

– High data copy overhead

– Real-time scheduling support -- preemption

– Shared resource (bandwidth) contention

29

Page 30: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

References

• Timegraph: Gpu scheduling for real-time multi-tasking environments. In ATC, 2011.

• Gdev: First-class gpu resource management in the operating system. In ATC, 2012.

• GPES: a preemptive execution system for gpgpu computing. In RTAS, 2015

• Gpusync: A framework for real-time gpu management. In RTSS, 2013.

• A server based approach for predictable gpu access control. In RTCSA, 2017.

30

Page 31: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Real-Time GPU Scheduling

• Early real-time GPU schedulers– Timegraph

– Gdev

• GPU kernel slicing– GPES

• Synchronization (Lock) based approach– GPUSync

• Server based approach– GPU Server

31

Page 32: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

GPU Software Stack

32Acknowledgement: This slide is from the paper author’s slide

Gdev: First-class gpu resource management in the operating system. In ATC, 2012.

Page 33: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

TimeGraph

• First work to support “soft” real-time GPU scheduling.

• Implemented at the device driver level

33S. Kato, K. Lakshmanan, R. R. Rajkumar, and Y. Ishikawa, “TimeGraph: GPU scheduling for real-time multi-tasking environments,” in USENIX ATC, 2011

Page 34: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

TimeGraph Scheduling

• GPU commands are not immediately sent to the GPU, if it is busy• Schedule high-priority GPU commands when GPU becomes idle

34

Page 35: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

GDev

• Implemented at the kernel level on top of the stock GPU driver

35Acknowledgement: This slide is from the paper author’s slide

Gdev: First-class gpu resource management in the operating system. In ATC, 2012.

Page 36: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

TimeGraph Scheduling

• high priority tasks can still suffer long delays

• Due to lack of hardware preemption36

Acknowledgement: This slide is from the paper author’s slideGdev: First-class gpu resource management in the operating system. In ATC, 2012.

Page 37: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

GDev’s BAND Scheduler

• Monitor consumed b/w, add some delay to wait high-priority requests

• Non-work conserving, no real-time guarantee

37Acknowledgement: This slide is from the paper author’s slide

Gdev: First-class gpu resource management in the operating system. In ATC, 2012.

Page 38: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

GPES

• Based on Gdev

• Implement kernel slicing to reduce latency

38(*) GPES: A Preemptive Execution System for GPGPU Computing, RTAS'14

Page 39: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

39

GPUSync

http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf

Page 40: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

40http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf

Page 41: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

41http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf

Page 42: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

42http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf

Page 43: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

43http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf

Page 44: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Hardware Preemption

• Recent GPUs (NVIDIA Pascal) support hardware preemption capability

– Problem solved?

• Issues

– Works only between GPU streams within a single address space (process)

– High context switching overhead

• ~100us per context switch (*)

44(*) AnandTech, “Preemption Improved: Fine-Grained Preemption for Time-Critical Tasks”

Page 45: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Hardware Preemption

45

Page 46: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Discussion

• Long running low priority GPU kernel?

• Memory interference from the CPU?

46

Page 47: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Challenges for Integrated CPU-GPU

• Memory is shared by both CPU and GPU

• Data movement may be easier but…

47

GPUSync: A Famework for Real-Time GP Management

Nvidia Tegra X2

Core1

PMC

Shared DRAM

Shared Memory Controller

GPU coresCore2

PMC

Core3

PMC

Core4

PMC

16 GB/s

Page 48: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

References

• SiGAMMA: server based integrated GPU arbitration mechanism for memory accesses, RTNS, 2017

• GPUguard: towards supporting a predictable execution model for heterogeneous SoC, DATE, 2017

• Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms, ECRTS, 2018

48

Page 49: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

SiGAMMA

• Protect PREM compliant real-time CPU tasks

• Throttle GPU when CPU is in a mem-phase

49

Page 50: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

SiGAMMA

• GPU is throttled by launching a high-priority spinning kernel

50

Page 51: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

GPUGuard

• PREM compliant GPU and CPU tasks.

• CPU is throttled using MemGuard

• Need GPU source code modification

51

Page 52: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

BWLOCK++

• Protect real-time GPU tasks by throttling CPU

• Runtime binary instrumentation overriding CUDA API– No code modification is needed

52Waqar Ali, Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms.Euromicro Conference on Real-Time Systems (ECRTS), 2018 [pdf] [arXiv] [ppt] [code]

Page 53: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Dynamic Instrumentation

• Begin/stop throttling by instrumenting CUDA

53

CPU GPU

cudaMalloc(...)

cudaMemcpy(...)

cudaMemcpy(...)

kernel<<<...>>>(...)

cudaFree(...)

cudaLaunch()

cudaSynchronize()

Acquire memory bandwidth lock

Instrument binary Using LD_PRELOAD

Release memory bandwidth lock

No source code modification is needed

Page 54: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

BWLOCK++

• Real-time GPU kernels are protected.

54

Page 55: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Summary

• Integrated CPU-GPU SoC– Shared main memory is a source of interference

• SiGAMMA– PREM compliant CPU tasks– Throttle GPU to protect real-time CPU tasks

• GPUGuard– PREM compliant GPU (and CPU) tasks. – Need GPU source code modification

• BWLOCK++– Throttle CPU (memguard) to protect real-time GPU tasks– Runtime instrumentation (no code modification)

55

Page 56: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Discussion Papers

• Deadline-based Scheduling for GPU with Preemption Support, RTSS, 2018

• Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference, RTSS, 2019

56

Page 57: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Deadline-based Scheduling for GPU with Preemption Support

N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru.

RTSS, 2018

57

Page 58: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time

Inference

Yecheng Xiang and Hyoseung Kim

RTSS’19

58

Page 59: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Cameras in Self-Driving Car

59

Page 60: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

NVIDIA Jetson TX2

60

Page 61: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

DNN Layer Processing Time

61

Page 62: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Problem

• Processing DNNs on a heterogenous platform (e.g., TX2) is not efficient– Using one type of resource (e.g., GPU) may not be

always the best

– Because CPUs don’t do much (if any) and GPU utilization is low for many layers

– Also, sometimes CPUs are faster than GPU

• How to process multiple DNNs efficiently on a heterogeneous platform?

62

Page 63: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Key Ideas

• Group layers into multiple stages

• A stage can be processed in different computing nodes

• Dynamically match make stages and nodes to maximize performance while respecting real-time requirements

63

Page 64: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

64

Page 65: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

65

Page 66: Real-Time GPU Managementheechul/courses/eecs753/S20/slides/W10...GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. RTSS, 2018 57 Pipelined Data-Parallel

Summary & Discussion

• Efficiently schedule multiple DNN models on a heterogeneous computing platform

• Exploit DNN layer-level differences on best computing resource configurations

• Improve throughput and support real-time scheduling

• Downsides?

66