nvidia® cudnn gpu-accelerated machine learningspeech.ee.ntu.edu.tw/~tlkagk/courses/mlds_2015/nn...

NVIDIA® cuDNN

GPU-Accelerated Machine Learning

How GPU Acceleration Works

Application Code

+

GPU CPU 5% of Code

~ 80% of run-time

Compute-Intensive Functions

Rest of Sequential CPU Code

3 Ways to Program GPUs

Applications

Libraries

“Drop-in”

Acceleration

Programming

Languages

Maximum

Flexibility

OpenACC

Directives

Easily Accelerate

Applications

HPC Today cuDNN is a library of primitives for deep learning

Deep Learning with cuDNN cuDNN is a library of primitives for deep learning

GPUs

cuDNN

Frameworks

Applications

Tesla TX-1 Titan

LARGE SCALE VISUAL RECOGNITION CHALLENGE (ILSVRC)

person

car

helmet

motorcycle

bird

frog

person

dog

chair

person

hammer

flower pot

power drill

1.2M training images • 1000 object categories

Image Classification Error Rates

2012

CHALLENGE SUMMARY

4

60

110

0

20

40

60

80

100

120

2010 2011 2012 2013 2014

Entries using GPUs

28% 26%

16%

12%

7%

0%

5%

10%

15%

20%

25%

30%

2010 2011 2012 2013 2014

DEEP LEARNING VISUALIZED

Image Classification, Object Detection, Localization Face Recognition

Speech & Natural Language Processing

Medical Imaging & Interpretation

Seismic Imaging & Interpretation Recommendation

Example Use Cases

Deep learning revolutionizing medical research

Detecting Mitosis in

Breast Cancer Cells — IDSIA

Predicting the Toxicity

of New Drugs — Johannes Kepler University

Understanding Gene Mutation

to Prevent Disease — University of Toronto

cuDNN Version 2

cuDNN Design Goal

Basic Deep Learning Subroutines

Allow user to write a DNN application without any CUDA code

Flexible Layout

Handle any data layout

Basic Deep Learning Subroutines

Great performance with more memory use

Good performance with minimal memory usage

DNN ROUTINES

Convolutions – 80-90% of the execution time

Pooling – Spatial smoothing

Activation – Pointwise non-linear function

CONVOLUTIONS – The MAIN Workload

2D conv as a GEMV

I1 I2 I3 I4 I5 I6

I7 I8 I9 I10 I11 I12

I13 I14 I15 I16 I17 I18

I19 I20 I21 I22 I23 I24

I25 I26 I27 I28 I29 I30

I31 I32 I33 I34 I35 I36

F1 F2 F3

F4 F5 F6

F7 F8 F9

I1 I2 I3 I7 I8 I9 I13 I14 I15

I2 I3 I4 I8 I9 I10 I14 I15 I16

I3 I4 I5 I9 I10 I11 I15 I16 I17

F1

F2

F3

F4

F5

F6

F7

F8

F9

Image

Filter

Multi-convolve

cuDNN V2 Flexibility

cuDNN V2 new features

cuDNN Version 2

Accelerates key routines to

improve performance of neural

net training

Up to 1.8x faster on AlexNet than

a baseline GPU implementation

New support for 3D convolutions

Integrated into all major Deep

Learning frameworks: Caffe,

Theano, Torch

1.0x 1.0x

1.6x

1.2x

Caffe (GoogLeNet) Torch (OverFeat)

Baseline (GPU)

With cuDNN

2.5M

18M

23M

43M

0

10

20

30

40

50

16 Core CPU GTX Titan Titan BlackcuDNN v1

Titan XcuDNN v2

Millions

of

Images

Images Trained Per Day (Caffe AlexNet)

E5-2698 v3 @ 2.3GHz / 3.6GHz Turbo

NVIDIA® cuDNN Roadmap

Q3’14 Q4’14

Layers (foward & backprop)

- Convolutional

- Pooling

- Softmax

- ReLu/Sigmoid/Tanh

Performance Features

Release 1 September 2014

High performance

convolution

Layers

- Local receptive field

- Contrast normalization

- Fully-connected

- Recurrent

Support for multiple GPUs

per node

Faster convolution routines

Release 3 Release 2

Q2’15 Q1’15

Tuning for future chips

GPU-Accelerated Deep Learning Frameworks

CAFFE TORCH THEANO Mernava neo CUDA-

CONVNET2 KALDI

Description Deep Learning

Framework

Scientific Computing

Framework

Math Expression

Compiler

Deep Learning

Framework

Deep Learning

Application

Speech Recognition

Toolkit

cuDNN R2 R2 R2 -- -- --

Multi-GPU In Progress In Progress In Progress (nnet2)

Multi-CPU (nnet2)

License BSD-2 BSD BSD Apache 2.0 Apache 2.0 Apache 2.0

Interface(s) Text-based definition

files, C++. Python,

MATLAB

Python, Lua,

MATLAB Python Python C++ C++, Shell scripts

Embedded (TK1)

http://developer.nvidia.com/deeplearning

http://developer.nvidia.com/deeplearning

Using cuDNN

cuDNN Easy to Enable

DIGITS

Visualization tool for DNN training

Use default network, import one, or

design your own

Import your training data from disk or

web

Monitor multiple trainings in parallel

Deep Learning GPU Training System

DIGITS

Test Image

Monitor Progress Configure DNN Process Data Visualize Layers

DIGITS

Deep Learning GPU Training System

Who it is for

Deep learning researchers

Automotive

Medical Researchers

Defense

Intelligent Video Analytics

Web Companies

Startups

Thank you!

Developer Zone: https://developer.nvidia.com/deeplearning

GPU Technology Conference: http://www.gputechconf.com/

cuDNN Download: https://developer.nvidia.com/cuDNN

DIGITS Download: https://developer.nvidia.com/digits

DIGITS Source: https://www.github.com/nvidia/digits

https://developer.nvidia.com/deeplearning

http://www.gputechconf.com/

http://www.gputechconf.com/

https://developer.nvidia.com/cuDNN



https://developer.nvidia.com/digits

https://developer.nvidia.com/digits

https://www.github.com/nvidia/digits

https://www.github.com/nvidia/digits

nvidia® cudnn gpu-accelerated machine learningspeech.ee.ntu.edu.tw/~tlkagk/courses/mlds_2015/nn...

Documents