rcuda 4: gpgpu as a service inrcuda 4: gpgpu as a service in hpc … · 2013. 7. 5. ·...

71
rCUDA 4: GPGPU as a service in rCUDA 4: GPGPU as a service in HPC clusters Rafael Mayo Antonio J. Peña Antonio J. Peña Universitat Jaume I, Castelló (Spain) Joint research effort

Upload: others

Post on 05-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC clusters

Rafael MayoAntonio J. PeñaAntonio J. Peña

Universitat Jaume I, Castelló (Spain)

Joint research effort

Page 2: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 2/72

Page 3: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 3/72

Page 4: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing GPU computing: defines all the technological issues for

using the GPU computational power for executing l dgeneral purpose code

GPU computing has experienced remarkable growth in th l tthe last years

HPC Advisory Council Spain Conference 2012 4/72

Page 5: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing

The basic building block is a node with 1 or more GPUsPU em

Main Memory

PU em

CPU

GPU

GP

me

Net

wor

k

GPU

GP

me

PCI-e

U m U m U m U m Main Memory

GPU

GPU

mem

GPU

GPU

mem

GPU

GPU

mem

GPU

GPU

mem Main Memory

CPU

Net

wor

kN

PCI-e

HPC Advisory Council Spain Conference 2012 5/72

Page 6: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing

From the programming point of view: A set of nodes, each one with:

one or more CPUs (with several cores per CPU) one or more GPUs (1-4)

An interconnection network

GPU GPUmem GPU GPU

memGPU GPUmem GPU GPU

mem GPU GPUmem GPU GPU

mem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

ory

Network

Interconnection Network

ory

Network

ory

Network

ory

Network

ory

Network

ory

Network

HPC Advisory Council Spain Conference 2012 6/72

Interconnection Network

Page 7: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing

Development tools have been introduced in porder to ease the programming of the GPUsTwo main approaches in GPU computing Two main approaches in GPU computing development environments: CUDA NVIDIA proprietary OpenCL open standard OpenCL open standard

HPC Advisory Council Spain Conference 2012 7/72

Page 8: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing

Basically CUDA and OpenCL have the same working scheme:working scheme: Compilation: Separate CPU code from GPU

code (GPU kernel)code (GPU kernel) Execution:

D t t f CPU d GPU Data transfers: CPU and GPU memory spaces1.Before GPU kernel execution: data from CPU

memory space to GPU memory space2.Computation: Kernel execution3.After GPU kernel execution: results from GPU

memory space to CPU memory space

HPC Advisory Council Spain Conference 2012 8/72

Page 9: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing

Time spent on data transfers may not be negligible

90

100Influence of data transfers for SGEMM

Pinned MemoryNon-Pinned Memory

%)

60

70

80y

rans

fers

(%

40

50

60

ed to

dat

a tr

10

20

30

Tim

e de

vote

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

10

Matrix Size

T

HPC Advisory Council Spain Conference 2012 9/72

Matrix Size

Page 10: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 10/72

Page 11: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing scenarios

For the right kind of code the use of GPUs brings huge benefits in terms of performance and energybenefits in terms of performance and energy

There must be data parallelism in the code: this is the l t t k b fit f th h d d fonly way to take benefit from the hundreds of processors

in a GPU Different scenarios from the point of view of the

application: Low level of data parallelism High level of data parallelism Moderate level of data parallelism Applications for multi-GPU computing

HPC Advisory Council Spain Conference 2012 11/72

Applications for multi GPU computing

Page 12: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing scenarios

Low level of data parallelismRegarding GPU computing?

No GPU is needed just proceed with the tradicional HPCNo GPU is needed, just proceed with the tradicional HPC strategies

High level of data parallelismg pRegarding GPU computing?

Add one or more GPUs to every node in the system and rewrite applications to use them

HPC Advisory Council Spain Conference 2012 12/72

Page 13: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing scenarios

Moderate level of data parallelism: Application presents a data parallelism around [40%-80%]

Regarding GPU computing?

presents a data parallelism around [40% 80%]

The GPUs in the system are used only for some parts of the application, remaining idle the rest of the time and, thus, wasting resources and energy

HPC Advisory Council Spain Conference 2012 13/72

Page 14: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

GPU computing scenarios

Applications for multi GPU computing Applications for multi-GPU computingAn application can use a large amount of GPUs

i ll lin parallel

Regarding GPU computing?The code running in a node can only access the GPUs in that node, but it would run faster if it could have access to more GPUsto more GPUs

HPC Advisory Council Spain Conference 2012 14/72

Page 15: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 15/72

Page 16: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

The wish list

+ GPU+ GPU+ GPU

Current trend in GPU deploymentEach node one or more GPUs

CONCERNS:

+ GPU+ GPU+ GPU+ GPU

Each node one or more GPUs

For applications with moderate levels of data parallelism, the GPUs in the cluster may be idle for

+ GPU+ GPU+ GPU+ GPU

long periods of time

Multi-GPU applications cannot make use of the

+ GPU+ GPU+ GPU+ GPU

tremendous GPU resources available across the cluster

+ GPU+ GPU+ GPU+ GPU

+ GPU

+ GPU+ GPU+ GPU Deployment and energy costs!

HPC Advisory Council Spain Conference 2012 16/72

Page 17: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

The wish list

A way of addressing the first concern, the energy concern is by reducing the number

+ GPU

energy concern, is by reducing the number of GPUs present in the cluster + GPU

+ GPUNEW CONCERN:

+ GPU

NEW CONCERN: A lower amount of GPUs noticeably increases

the difficulty of efficiently scheduling jobs

+ GPU

the difficulty of efficiently scheduling jobs (considering global CPU and GPU requirements)

+ GPU

HPC Advisory Council Spain Conference 2012 17/72

Page 18: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

The wish list

+ GPU A way of addressing the new scheduling

problem is by reducing the number of+ GPU

problem is by reducing the number of GPUs present in the cluster and sharing the remaining ones among the CPU nodes

+ GPU

the remaining ones among the CPU nodes

Additionally, by appropriately sharing GPU lti GPU ti i l

+ GPU

GPUs, multi-GPU computing is also feasible

+ GPU

This would increase GPU utilization, also lowering power consumption, at the same i h i i i l i i i + GPUtime that initial acquisition costs are reduced

HPC Advisory Council Spain Conference 2012 18/72

Page 19: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

The wish list

+ GPU Efficiently sharing GPUs across a cluster

can be achieved by leveraging GPU+ GPU

can be achieved by leveraging GPU virtualization

C A f k

+ GPU

rCUDA

gVirtuS

As far as we kwon, rCUDA is the only remote

virtualization solution ith CUDA 4 s pport

+ GPU

vCUDA

GViM

with CUDA 4 support

+ GPU

GViM

VGPU + GPU

GridCuda

HPC Advisory Council Spain Conference 2012 19/72

Page 20: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

The wish list

Going even beyond: consolidating GPUs into dedicated consolidating GPUs into dedicated

servers (no CPU power) and allowing GPU task migrationg g

TRUE GREEN GPU COMPUTING

HPC Advisory Council Spain Conference 2012 20/72

Page 21: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

The wish list

GPUs are disaggregated from CPUsandand

global schedulers are enhanced

HPC Advisory Council Spain Conference 2012 21/72

Page 22: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 22/72

Page 23: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Introduction to rCUDA

A framework enabling that a CUDA-based A framework enabling that a CUDA based application running in one node can

access GPUs in other nodes

It is useful when you have:

Moderate level of data parallelism

A li ti f lti GPU ti Applications for multi GPU computing

HPC Advisory Council Spain Conference 2012 23/72

Page 24: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Introduction to rCUDA

Moderate level of data parallelism

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

yNetwork

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

Interconnection Network

Adding GPUs at each node makes some GPUs to remain idle for long periods. This is a waste of money and energy

HPC Advisory Council Spain Conference 2012 24/72

Page 25: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Introduction to rCUDA

Moderate level of data parallelism

GPU mem

GPU mem

GPU mem

GPU mem

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

yNetwork

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

Interconnection Network

Add only the GPUs that are needed considering application requirements and their level of data parallelism and

HPC Advisory Council Spain Conference 2012 25/72

Page 26: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Introduction to rCUDA

Moderate level of data parallelism

GPU mem

GPU mem

GPU mem

GPU mem

Logical connections

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

yNetwork

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

Interconnection Network

Add only the GPUs that are needed considering application requirements and their level of data parallelism and make all of them accessible from every node

HPC Advisory Council Spain Conference 2012 26/72

Page 27: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Introduction to rCUDA

Applications for multi-GPU computingpp p g

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

yNetwork

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

Interconnection Network

From a given CPU it is only possible to access the corresponding GPUs in that very same node

HPC Advisory Council Spain Conference 2012 27/72

Page 28: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Introduction to rCUDA

Applications for multi-GPU computing

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

pp p g

Logical connections

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

yNetwork

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

Interconnection Network

Make all GPUs accessible from every node

HPC Advisory Council Spain Conference 2012 28/72

Page 29: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Introduction to rCUDA

Applications for multi-GPU computing

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

GPU mem

pp p g

Logical connections

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

yNetwork

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

CPU

PCI-e

Mai

nM

emor

y

Network

Interconnection Network

Make all GPUs accessible from every node and enable the access from a CPU to as many as required GPUs

HPC Advisory Council Spain Conference 2012 29/72

Page 30: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 30/72

Page 31: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA functionality

CUDA application

Application

CUDAlibraries

HPC Advisory Council Spain Conference 2012 31/72

Page 32: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA functionality

CUDA applicationClient side Server side

ApplicationApplication

CUDAdriver + runtime

CUDAlibraries

HPC Advisory Council Spain Conference 2012 32/72

Page 33: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA functionality

CUDA applicationClient side Server side

Application rCUDA daemon

CUDAlibraries

rCUDA library

Networkdevice

Networkdevicedevice device

HPC Advisory Council Spain Conference 2012 33/72

Page 34: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA functionality

CUDA applicationClient side Server side

Application rCUDA daemon

CUDAdriver + runtime

rCUDA library

Networkdevice

Networkdevicedevice device

HPC Advisory Council Spain Conference 2012 34/72

Page 35: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA functionality

CUDA applicationClient side Server side

Application rCUDA daemon

CUDAdriver + runtime

rCUDA library

Networkdevice

Networkdevicedevice device

HPC Advisory Council Spain Conference 2012 35/72

Page 36: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA functionality

rCUDA uses a proprietary communication protocolcommunication protocol

Example:

1) initialization2) memory allocation on the

remote GPUremote GPU3) CPU to GPU memory

transfer of the input data 4) kernel execution4) kernel execution5) GPU to CPU memory

transfer of the results 6) GPU memory release) y7) communication channel

closing and server process finalization

HPC Advisory Council Spain Conference 2012 36/72

p

Page 37: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 37/72

Page 38: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v3 (stable)

Current stable release

Uses TCP/IP stack

Runs over all TPC/IP networks

Presents some non-negligible overhead due to the Presents some non-negligible overhead due to the

use of TCP/IP

HPC Advisory Council Spain Conference 2012 38/72

Page 39: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v3 (stable)Execution time for matrix-matrix multiplication (GEMM)

T l C1060T l C1060

60

70Tesla C1060Intel Xeon E5410 (8 cores)Gigabit Ethernet

Tesla C1060Intel Xeon E5410 (8 cores)Gigabit Ethernet CPU

40

50

kernexecu

rCUDA

(sec

)

20

30

eltionD

tra

Tim

e

0

10

Data

ansfers

512

2048

4096

6144

8192

1024

012

288

1433

616

384

M t i di iJ. Duato, F. D. Igual, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla, “An efficient implementation

HPC Advisory Council Spain Conference 2012 39/72

Matrix dimensionof GPU virtualization in high performance clusters”, in Euro-Par 2009, Parallel Processing –Workshops, 6043, pp. 385-394, Lecture Notes in Computer Science, Springer-Verlag, 2010.

Page 40: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 40/72

Page 41: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha) Low-level InfiniBand support

InfiniBand is the most used HPC network InfiniBand is the most used HPC network Low latency and high bandwidth

HPC Advisory Council Spain Conference 2012 41/72

Top500 June

Page 42: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Same user level functionality: CUDA Runtime Use of IB-Verbs

All TCP/IP stack overflow is out• All TCP/IP stack overflow is out Bandwidth client to/from remote GPU near

th k I fi iB d t k b d idththe peak InfiniBand network bandwidth Use of GPUDirect

Reduce the number of intra-node data movements Use of pipelined transfers Use of pipelined transfers

Overlap intra-node data movements and transfers

GPUDirect RDMA support is coming

HPC Advisory Council Spain Conference 2012 42/72

GPUDirect RDMA support is coming

Page 43: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect)

HPC Advisory Council Spain Conference 2012 43/72

Page 44: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect)

HPC Advisory Council Spain Conference 2012 44/72

Page 45: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect)

HPC Advisory Council Spain Conference 2012 45/72

Page 46: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect)

HPC Advisory Council Spain Conference 2012 46/72

Page 47: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect)

HPC Advisory Council Spain Conference 2012 47/72

Page 48: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect RDMA) CUDA 5, upcoming

HPC Advisory Council Spain Conference 2012 48/72

Page 49: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect RDMA) CUDA 5, upcoming

HPC Advisory Council Spain Conference 2012 49/72

Page 50: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Remote GPU Transfers (GPUDirect RDMA) CUDA 5, upcoming

HPC Advisory Council Spain Conference 2012 50/72

Page 51: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Bandwidth for a matrix of 4096 x 4096 single precision

5000

Bandwidth for a matrix of 4096 x 4096 single precision

350040004500

B/s

ec) QDR InfiniBand

2000250030003500

wid

th (M

B

IB peak bandwidth 2900 MB/sec

500100015002000

Ban

dw

rCUDA GigaE rCUDA IPoIB rCUDA IBVerbs CUDA0

500

J D t A J P ñ F Sill R M E S Q i t O tí CUDA I fi iB d

HPC Advisory Council Spain Conference 2012 51/72

J. Duato, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí, rCUDA InfiniBandperformance, Poster, 2011. International Supercomputing Conference.

Page 52: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Execution time for a matrix-matrix product

45

50Tcpu Tgpu TrCUDA Tesla C2050

Intel Xeon E5645QDR InfiniBand

30

35

40

20

25

30

Tim

e (s

)

5

10

15

T

100 2100 4100 6100 8100 10100600

11001600 2600

31003600 4600

51005600 6600

71007600 8600

91009600 10600

1110011600

1210012600

1310013600

1410014600

0

HPC Advisory Council Spain Conference 2012 52/72

Matrix dimension

Page 53: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Overhead time for a matrix-matrix product

0,9

1% overhead gpu % overhead rcuda

Tesla C2050Intel Xeon E5645QDR InfiniBand

0 6

0,7

0,8QDR InfiniBand

0,4

0,5

0,6

ove

rhea

d

0 1

0,2

0,3%

100 2100 4100 6100 8100 10100600

11001600 2600

31003600 4600

51005600 6600

71007600 8600

91009600 10600

1110011600

1210012600

1310013600

1410014600

0

0,1

HPC Advisory Council Spain Conference 2012 53/72

100 2100 4100 6100 8100 101001100 3100 5100 7100 9100 11100121001310014100

Matrix dimension

Page 54: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)Execution time for the LAMMPS application, in.eam input scriptscaled by a factor of 5 in the three dimensions

Tesla C2050I t l X E5520Intel Xeon E5520QDR InfiniBand

C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí. “CU2rCU: towards the

HPC Advisory Council Spain Conference 2012 54/72

complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the 2012 International Conference on High Performance Computing (HiPC 2012), Pune, India, Dec. 2012.

Page 55: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Multi-thread capabilities (thread-safe)

HPC Advisory Council Spain Conference 2012 55/72

Page 56: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Multi-server capabilities

HPC Advisory Council Spain Conference 2012 56/72

Page 57: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Multi-thread and multi-server capabilities

HPC Advisory Council Spain Conference 2012 57/72

Page 58: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

GPUDirect 2.0 emulation is coming

HPC Advisory Council Spain Conference 2012 58/72

Page 59: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

Multi-thread and multi-server performance

C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí. “CU2rCU: towards the

HPC Advisory Council Spain Conference 2012 59/72

complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the 2012 International Conference on High Performance Computing (HiPC 2012), Pune, India, Dec. 2012.

Page 60: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha) rCUDA does not support the CUDA extensions to C In order to execute a program within rCUDA, the CUDA p g ,

extensions included in its code must be “unextended” to the plain C API

NVCC inserts calls to undocumented CUDA functions

HPC Advisory Council Spain Conference 2012 60/72

Page 61: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

CU2rCU tool

HPC Advisory Council Spain Conference 2012 61/72

Page 62: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

rCUDA v4 (alpha)

C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí. “CU2rCU: towards the

HPC Advisory Council Spain Conference 2012 62/72

complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the 2012 International Conference on High Performance Computing (HiPC 2012), Pune, India, Dec. 2012.

Page 63: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Outline

GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status

HPC Advisory Council Spain Conference 2012 63/72

Page 64: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Current Status

How compatible with CUDA is rCUDA?

HPC Advisory Council Spain Conference 2012 64/72

Page 65: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Current StatusCUDA Runtime API functions implemented by rCUDA

rCUDA does not provide support for graphic functions

HPC Advisory Council Spain Conference 2012 65/72

Page 66: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Current StatusNVIDIA CUDA C SDK code samples included in rCUDA SDK

SDK samples using graphic functionsor

driver API functions (not suppported)driver API functions (not suppported)

HPC Advisory Council Spain Conference 2012 66/72

Page 67: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Current Status

rCUDA v3 2 stable release rCUDA v3.2 stable release Non-pipelined TCP/IP transfers CUDA 4.2 support:

Only excluding graphics interoperability Not thread-safe Single rCUDA server per application Only one GPU module per application allowed CUDA C extensions unsupportedCUDA C extensions unsupported rCUDA 3.0a experimental build for MS Windows

HPC Advisory Council Spain Conference 2012 67/72

Page 68: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Current Status

rCUDA v4 alpha features (experimental) rCUDA v4 alpha features (experimental) CUDA 4.2 (excluding graphics interoperability) TCP + low-level InfiniBand communications GPU & network pipelined transfers (GPUDirect) Multithread support for applications (thread-safe) Multiserver capabilitiesp Support for CUDA C extensions (CU2rCU) Multiple GPU modules allowed Multiple GPU modules allowed MPI integration (MVAPICH + OpenMPI)

HPC Advisory Council Spain Conference 2012 68/72

Page 69: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

Current Status

What’s coming? What s coming? CUDA 5 Inter-node GPUDirect 2 emulation GPUDirect RDMA rCUDA integration into SLURM Full Runtime API support:pp

Graphics interoperability rCUDA 4 for MS WindowsrCUDA 4 for MS Windows …

HPC Advisory Council Spain Conference 2012 69/72

Page 70: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

http://www.rcuda.net

Free binary distribution (rCUDA v3.2 + rCUDA v4.0a): o Fedorao Fedorao Scientific Linux (RHEL compatible)o Ubuntuo OpenSuseo MS Windows

Page 71: rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node

http://www.rcuda.net

Antonio Peña Ad i C t llórCUDA Team

Antonio PeñaCarlos ReañoFederico Silla

José Duato

Adrian CastellóRafael MayoEnrique S. Quintana-Ortí

Parallel Architectures

High Performance Computing and

A hi G

Thanks to Mellanox and AIC for their kindly support to this work

Group Architectures Group

Thanks to Mellanox and AIC for their kindly support to this work