master class: deep learning - matlab expo

68
1 © 2015 The MathWorks, Inc. Master Class: Deep Learning Del Prototipo a su Despliegue en Entornos Embarcados Lucas García

Upload: others

Post on 04-Nov-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Master Class: Deep Learning - MATLAB EXPO

1© 2015 The MathWorks, Inc.

Master Class: Deep LearningDel Prototipo a su Despliegue en Entornos Embarcados

Lucas García

Page 2: Master Class: Deep Learning - MATLAB EXPO

2

Page 3: Master Class: Deep Learning - MATLAB EXPO

3

Why MATLAB for Deep Learning?

▪ MATLAB is Productive

▪ MATLAB Integrates with Open Source

▪ MATLAB is Fast

Page 4: Master Class: Deep Learning - MATLAB EXPO

4

MATLAB Deep Learning Framework

Access Data Design + Train Deploy

▪ Manage large image sets

▪ Automate image labeling

▪ Easy access to models

▪ Automate compilation to

GPUs and CPUs using

GPU Coder:▪ 5x faster than TensorFlow

▪ 2x faster than MXNet

▪ Acceleration with GPU’s

▪ Scale to clusters

Page 5: Master Class: Deep Learning - MATLAB EXPO

5

Deep Learning Applications

Voice assistants (speech to text)

Teaching character to beat video game

Automatically coloring black-and-white images

Page 6: Master Class: Deep Learning - MATLAB EXPO

6

What is Deep Learning?

Page 7: Master Class: Deep Learning - MATLAB EXPO

7

Deep Learning

Model learns to perform classification tasks directly from data.

x1000

x1000

x1000

x1000

x1000

Deep

Learning

Model

Image

Classifier

Page 8: Master Class: Deep Learning - MATLAB EXPO

8

Data Types for Deep Learning

Signal ImageText

Page 9: Master Class: Deep Learning - MATLAB EXPO

9

Deep Learning is Versatile

Iris Recognition – 99.4% accuracy2

Rain Detection and Removal1Detection of cars and road in autonomous driving systems

1. Deep Joint Rain Detection and Removal from a Single Image" Wenhan Yang,

Robby T. Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan

2. Source: An experimental study of deep convolutional features for iris recognition

Signal Processing in Medicine and Biology Symposium (SPMB), 2016 IEEE

Shervin Minaee ; Amirali Abdolrashidiy ; Yao Wang; An experimental study of

deep convolutional features for iris recognition

Page 10: Master Class: Deep Learning - MATLAB EXPO

10

How is deep learning performing so well?

Page 11: Master Class: Deep Learning - MATLAB EXPO

11

Deep Learning Uses a Neural Network Architecture

Input

Layer Hidden Layers (n)

Output

Layer

Page 12: Master Class: Deep Learning - MATLAB EXPO

12

Thinking about Layers

▪ Layers are like blocks

– Stack them on top of each other

– Replace one block with a

different one

▪ Each hidden layer processes

the information from the

previous layer

Page 13: Master Class: Deep Learning - MATLAB EXPO

13

Thinking about Layers

▪ Layers are like blocks

– Stack them on top of each other

– Replace one block with a

different one

▪ Each hidden layer processes

the information from the

previous layer

▪ Layers can be ordered in

different ways

Page 14: Master Class: Deep Learning - MATLAB EXPO

14

Deep Learning in 6 Lines of MATLAB Code

Page 15: Master Class: Deep Learning - MATLAB EXPO

15

Why MATLAB for Deep Learning?

▪ MATLAB is Productive

▪ MATLAB integrates with Open Source

▪ MATLAB is Fast

Page 16: Master Class: Deep Learning - MATLAB EXPO

16

“I love to label and

preprocess my data”

~ Said no engineer, ever.

Page 17: Master Class: Deep Learning - MATLAB EXPO

17

Caterpillar Case Study

▪ World’s leading manufacturer of

construction and mining

equipment.

▪ Similarity between these

projects?

– Autonomous haul trucks

– Pedestrian detection

– Equipment classification

– Terrain mapping

Page 18: Master Class: Deep Learning - MATLAB EXPO

18

Computer Must Learn from Lots of Data

▪ ALL data must first be labeled to create these autonomous systems.

“We were spending way too much time ground-truthing [the data]”

--Larry Mianzo, Caterpillar

Page 19: Master Class: Deep Learning - MATLAB EXPO

19

How Did Caterpillar Do with Our Tools?

▪ Semi-automated labeling process– “We go from having to label 100 percent of our data to only having to

label about 80 to 90 percent”

▪ Used MATLAB for entire development workflow.

– “Because everything is in MATLAB, development time is short”

Page 20: Master Class: Deep Learning - MATLAB EXPO

20

How Does MATLAB Come into Play?

Page 21: Master Class: Deep Learning - MATLAB EXPO

21

Page 22: Master Class: Deep Learning - MATLAB EXPO

22

Page 23: Master Class: Deep Learning - MATLAB EXPO

23

Page 24: Master Class: Deep Learning - MATLAB EXPO

24

MATLAB is Productive

▪ Image Labeler App semi-automates labeling workflow

▪ Bootstrapping

– Improve automatic labeling by updating algorithm as you label

more images correctly.

▪ Easy to load metadata even when labeling manually

Page 25: Master Class: Deep Learning - MATLAB EXPO

25

Why MATLAB?

▪ MATLAB is Productive

▪ MATLAB Integrates with Open Source

▪ MATLAB is Fast

Page 26: Master Class: Deep Learning - MATLAB EXPO

26

Used MATLAB and Open Source Together

1. Deep Joint Rain Detection and Removal from a Single

Image" Wenhan Yang, Robby T. Tan, Jiashi Feng,

Jiaying Liu, Zongming Guo, and Shuicheng Yan

▪ Used Caffe and MATLAB

together

▪ Achieved significantly

better results than an

engineered rain model.

▪ Use our tools where it

makes your workflow

easier!

Page 27: Master Class: Deep Learning - MATLAB EXPO

27

MATLAB Integrates with Open Source Frameworks

▪ Access to many pretrained models through add-ons

▪ Users wanted to import latest models

▪ Import models directly from TensorFlow or Caffe

– Allows for improved collaboration

Page 28: Master Class: Deep Learning - MATLAB EXPO

28

Keras-TensorFlow Importer

Page 29: Master Class: Deep Learning - MATLAB EXPO

29

MATLAB Integrates with Open Source

Frameworks

▪ MATLAB supports entire deep learning workflow

– Use when it is convenient for your workflow

▪ Access to latest models

▪ Improved collaboration with other users

Page 30: Master Class: Deep Learning - MATLAB EXPO

30

Why MATLAB?

▪ MATLAB is Productive

▪ MATLAB Integrates with Open Source

▪ MATLAB is Fast

Page 31: Master Class: Deep Learning - MATLAB EXPO

31

MATLAB is Fast

Performance

Training Deployment

Page 32: Master Class: Deep Learning - MATLAB EXPO

32

What is Training?Feed labeled data into neural network to create working model

Convolution

Neural

Network

Image

Classifier

Model

x1000

x1000

x1000

x1000

x1000

Page 33: Master Class: Deep Learning - MATLAB EXPO

33

Speech Recognition Example

Audio signal → Spectrogram → Image Classification algorithm

Time Time

Am

plit

ude

Fre

quency

Page 34: Master Class: Deep Learning - MATLAB EXPO

34

Another Network for Signals - LSTM

▪ LSTM = Long Short Term Memory (Networks)

– Signal, text, time-series data

– Use previous data to predict new information

▪ I live in France. I speak ___________.

c0 C1 Ct

Page 35: Master Class: Deep Learning - MATLAB EXPO

35

1. Create Datastore

▪ Datastore creates

reference for data

▪ Do not have to load in

all objects into memory

Page 36: Master Class: Deep Learning - MATLAB EXPO

36

2. Compute Speech Spectrograms

Am

plit

ude

Fre

quency

Time

Page 37: Master Class: Deep Learning - MATLAB EXPO

37

3. Split datastores

Training Validation Test

70% 15% 15%

• Trains the model

• Computer “learns”

from this data

• Checks accuracy

of model during

training

• Tests model accuracy

• Not used until validation

accuracy is good

Page 38: Master Class: Deep Learning - MATLAB EXPO

38

4. Define Architecture and Parameters

Neural Network Architecture

Model Parameters

Page 39: Master Class: Deep Learning - MATLAB EXPO

39

5. Train Network

Page 40: Master Class: Deep Learning - MATLAB EXPO

40

Deep Learning on CPU, GPU, Multi-GPU and Clusters

Single CPU

Single CPUSingle GPU

HOW TO TARGET?

Single CPU, Multiple GPUs

On-prem server with GPUs

Cloud GPUs(AWS)

Page 41: Master Class: Deep Learning - MATLAB EXPO

41

Training Performance

TensorFlow

MATLAB

MXNet

Batch size 32

Seconds / Epoch

Page 42: Master Class: Deep Learning - MATLAB EXPO

42

Training is an Iterative Process

Parameters adjusted according to performance

Page 43: Master Class: Deep Learning - MATLAB EXPO

43

MATLAB is Fast for Deployment

▪ Target a GPU for optimal

performance

▪ NVIDIA GPUs use CUDA code

▪ We only have MATLAB code.

Can we translate this?

Page 44: Master Class: Deep Learning - MATLAB EXPO

44

GPU Coder

▪ Automatically generates CUDA Code from MATLAB Code

– can be used on NVIDIA GPUs

▪ CUDA extends C/C++ code with constructs for parallel computing

Page 45: Master Class: Deep Learning - MATLAB EXPO

45

GPU Coder for Deployment

Deep Neural Networks

Deep Learning, machine learning

Image Processing and

Computer Vision

Image filtering, feature detection/extraction

Signal Processing and

Communications FFT, filtering, cross correlation,

5x faster than TensorFlow

2x faster than MXNet

60x faster than CPUs

for stereo disparity

20x faster than

CPUs for FFTs

GPU CoderAccelerated implementation of

parallel algorithms on GPUs & CPUs

ARM Compute

Library

Intel

MKL-DNN

Library

Page 46: Master Class: Deep Learning - MATLAB EXPO

46

GPUs and CUDA

CUDA

kernelsC/C++

ARM

Cortex

GPU

CUDA Cores

C/C++

CUDA Kernel

C/C++

CUDA Kernel

GPU Memory

Space

CPU Memory

Space

Page 47: Master Class: Deep Learning - MATLAB EXPO

47

Challenges of Programming in CUDA for GPUs

▪ Learning to program in CUDA

– Need to rewrite algorithms for parallel processing paradigm

▪ Creating CUDA kernels

– Need to analyze algorithms to create CUDA kernels that maximize parallel processing

▪ Allocating memory

– Need to deal with memory allocation on both CPU and GPU memory spaces

▪ Minimizing data transfers

– Need to minimize while ensuring required data transfers are done at the appropriate

parts of your algorithm

Page 48: Master Class: Deep Learning - MATLAB EXPO

48

GPU Coder Helps You Deploy to GPUs Faster

GPU Coder

CUDA Kernel creation

Memory allocation

Data transfer minimization

• Library function mapping

• Loop optimizations

• Dependence analysis

• Data locality analysis

• GPU memory allocation

• Data-dependence analysis

• Dynamic memcpy reduction

Page 49: Master Class: Deep Learning - MATLAB EXPO

49

Scalarized MATLAB

GPU Coder Generates CUDA from MATLAB: saxpy

CUDA kernel for GPU parallelization

CUDA

Vectorized MATLAB

Loops and matrix operations are

directly compiled into kernels

Page 50: Master Class: Deep Learning - MATLAB EXPO

50

Generated CUDA Optimized for Memory Performance

Mandelbrot space

CUDA kernel for GPU parallelization

… …

… …

CUDA

Kernel data allocation is

automatically optimized

Page 51: Master Class: Deep Learning - MATLAB EXPO

51

Example: Fog Rectification

Page 52: Master Class: Deep Learning - MATLAB EXPO

52

Algorithm Design to Embedded Deployment Workflow

MATLAB algorithm

(functional reference)

Functional test1 Deployment

unit-test

2

Desktop

GPU

C++

Deployment

integration-test

3

Desktop

GPU

C++

Real-time test4

Embedded GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Page 53: Master Class: Deep Learning - MATLAB EXPO

53

Demo: Alexnet Deployment with ‘mex’ Code Generation

Page 54: Master Class: Deep Learning - MATLAB EXPO

54

Algorithm Design to Embedded Deployment on Tegra GPU

MATLAB algorithm

(functional reference)

Functional test1

(Test in MATLAB on host)

Deployment

unit-test

2

(Test generated code in

MATLAB on host + GPU)

Tesla

GPU

C++

Deployment

integration-test

3

(Test generated code within

C/C++ app on host + GPU)

Tesla

GPU

C++

Real-time test4

(Test generated code within

C/C++ app on Tegra target)

Tegra GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Cross-compiled on host

with Linaro toolchain

Page 55: Master Class: Deep Learning - MATLAB EXPO

55

Alexnet Deployment to Tegra: Cross-Compiled with ‘lib’

Two small changes

1. Change build-type to ‘lib’

2. Select cross-compile toolchain

Page 56: Master Class: Deep Learning - MATLAB EXPO

56

End-to-End Application: Lane Detection

Transfer Learning

Alexnet

Lane detection

CNN

Post-processing

(find left/right lane

points)Image

Image with

marked lanes

Left lane coefficients

Right lane coefficients

Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c

GPU coder generates code for whole application

Page 57: Master Class: Deep Learning - MATLAB EXPO

57

Deep Learning Network Support (with Neural Network Toolbox)

SeriesNetwork DAGNetwork

GPU Coder: R2017b

Networks: MNist

Alexnet

YOLO

VGG

Lane detection

Pedestrian detection

GPU Coder: R2018a

Networks: GoogLeNet

ResNet

SegNet

DeconvNetSemantic

segmentation

Object

detection

Page 58: Master Class: Deep Learning - MATLAB EXPO

58

Semantic Segmentation

Running in MATLAB Generated Code from GPU Coder

Page 59: Master Class: Deep Learning - MATLAB EXPO

59

Deploying to CPUs

GPU

Coder

Deep Learning

Networks

NVIDIA

TensorRT &

cuDNN

Libraries

ARM

Compute

Library

Intel

MKL-DNN

Library

Page 60: Master Class: Deep Learning - MATLAB EXPO

60

Desktop CPU

Raspberry Pi board

Deploying to CPUs

GPU

Coder

Deep Learning

Networks

NVIDIA

TensorRT &

cuDNN

Libraries

Page 61: Master Class: Deep Learning - MATLAB EXPO

61

How Good is Generated Code Performance

▪ Performance of image processing and computer vision

▪ Performance of CNN inference (Alexnet) on Titan XP GPU

▪ Performance of CNN inference (Alexnet) on Jetson (Tegra) TX2

Page 62: Master Class: Deep Learning - MATLAB EXPO

62

GPU Coder for Image Processing and Computer Vision

8x speedup

Distance

transform

5x speedup

Fog removal

700x speedup

SURF feature

extraction

18x speedup

Ray tracing

3x speedup

Frangi filter

Page 63: Master Class: Deep Learning - MATLAB EXPO

63

Alexnet Inference on NVIDIA Titan Xp

GPU Coder +

TensorRT (3.0.1)

GPU Coder +

cuDNN

Fra

mes p

er

second

Batch Size

CPU Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

GPU Pascal Titan Xp

cuDNN v7

Testing platform

MXNet (1.1.0)

GPU Coder +

TensorRT (3.0.1, int8)

TensorFlow (1.6.0)

Page 64: Master Class: Deep Learning - MATLAB EXPO

64

VGG-16 Inference on NVIDIA Titan Xp

GPU Coder +

TensorRT (3.0.1)

GPU Coder +

cuDNN

Fra

mes p

er

second

Batch Size

CPU Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

GPU Pascal Titan Xp

cuDNN v7

Testing platform

MXNet (1.1.0)

GPU Coder +

TensorRT (3.0.1, int8)

TensorFlow (1.6.0)

Page 65: Master Class: Deep Learning - MATLAB EXPO

65

Alexnet Inference on Jetson TX2: Frame-Rate Performance

MATLAB GPU Coder (R2017b)

Batch Size

C++ Caffe (1.0.0-rc5)

TensorRT (2.1)

2x

1.15x

Fra

mes p

er

second

Page 66: Master Class: Deep Learning - MATLAB EXPO

66

Alexnet Inference on Jetson TX2: Memory Performance

MATLAB GPU Coder (R2017b)

C++ Caffe (1.0.0-rc5)

TensorRT 2.1

(using giexec wrapper)

Peak M

em

ory

(M

B)

Batch Size

Page 67: Master Class: Deep Learning - MATLAB EXPO

67

MATLAB Deep Learning Framework

Access Data Design + Train Deploy

▪ Manage large image sets

▪ Automate image labeling

▪ Easy access to models

▪ Automate compilation to

GPUs and CPUs using

GPU Coder:▪ 5x faster than TensorFlow

▪ 2x faster than MXNet

▪ Acceleration with GPU’s

▪ Scale to clusters

Page 68: Master Class: Deep Learning - MATLAB EXPO

68

Why MATLAB for Deep Learning?

▪ MATLAB is Productive

▪ MATLAB Integrates with Open Source

(Frameworks)

▪ MATLAB is Fast (Performance)