© 2015 the mathworks, inc....3 matlab deep learning framework access data design + train deploy...
TRANSCRIPT
![Page 1: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/1.jpg)
1© 2015 The MathWorks, Inc.
![Page 2: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/2.jpg)
2© 2015 The MathWorks, Inc.
Deploying Deep Learning Networks
to Embedded GPUs and CPUs
성호현부장
![Page 3: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/3.jpg)
3
MATLAB Deep Learning Framework
Access Data Design + Train Deploy
▪ Manage large image sets
▪ Automate image labeling
▪ Easy access to models
▪ Automate compilation to
GPUs and CPUs using
GPU Coder:▪ 5x faster than TensorFlow
▪ 2x faster than MXNet
▪ Acceleration with GPU’s
▪ Scale to clusters
![Page 4: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/4.jpg)
4
Design Deep Learning & Vision AlgorithmsTransfer Learning Workflow
Transfer Learning
Images
New
ClassifierLearn New
Weights
Modify
Network
Structure
Load
Reference
NetworkLabels
Labels: Hot dogs, Pizzas, Ice cream,
Chocolate cake, French fries
Training Data
![Page 5: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/5.jpg)
5
Example: Transfer Learning in MATLAB
Learn New
Weights
Modify
Network
Structure
Load
Reference
Network
Set up
training
dataset
![Page 6: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/6.jpg)
6
Scaling Up Model Training Performance
Multiple GPU support
Training on the AWS (EC2)
Single GPU performance
![Page 7: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/7.jpg)
7
Visualizing and Debugging Intermediate Results
Filters…
Activations
Deep Dream
Training Accuracy Visualization Deep Dream
Layer Activations Feature Visualization
• Many options for visualizations and debugging• Examples to get started
![Page 8: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/8.jpg)
8
GPU Coder for Deployment
Deep Neural Networks
Deep Learning, machine learning
Image Processing and
Computer Vision
Image filtering, feature detection/extraction
Signal Processing and
Communications FFT, filtering, cross correlation,
5x faster than TensorFlow
2x faster than MXNet
60x faster than CPUs
for stereo disparity
20x faster than
CPUs for FFTs
GPU CoderAccelerated implementation of
parallel algorithms on GPUs & CPUs
ARM Compute
Library
Intel
MKL-DNN
Library
![Page 9: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/9.jpg)
9
GPUs and CUDA
CUDA
kernelsC/C++
ARM
Cortex
GPU
CUDA Cores
C/C++
CUDA Kernel
C/C++
CUDA Kernel
GPU Memory
Space
CPU Memory
Space
![Page 10: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/10.jpg)
10
Challenges of Programming in CUDA for GPUs
▪ Learning to program in CUDA
– Need to rewrite algorithms for parallel processing paradigm
▪ Creating CUDA kernels
– Need to analyze algorithms to create CUDA kernels that maximize parallel processing
▪ Allocating memory
– Need to deal with memory allocation on both CPU and GPU memory spaces
▪ Minimizing data transfers
– Need to minimize while ensuring required data transfers are done at the appropriate
parts of your algorithm
![Page 11: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/11.jpg)
11
GPU Coder Helps You Deploy to GPUs Faster
GPU Coder
CUDA Kernel creation
Memory allocation
Data transfer minimization
• Library function mapping
• Loop optimizations
• Dependence analysis
• Data locality analysis
• GPU memory allocation
• Data-dependence analysis
• Dynamic memcpy reduction
![Page 12: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/12.jpg)
12
Scalarized MATLAB
GPU Coder Generates CUDA from MATLAB: saxpy
CUDA kernel for GPU parallelization
CUDA
Vectorized MATLAB
Loops and matrix operations are
directly compiled into kernels
![Page 13: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/13.jpg)
13
Generated CUDA Optimized for Memory Performance
Mandelbrot space
CUDA kernel for GPU parallelization
… …
… …
CUDA
Kernel data allocation is
automatically optimized
![Page 14: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/14.jpg)
15
Algorithm Design to Embedded Deployment Workflow
MATLAB algorithm
(functional reference)
Functional test1 Deployment
unit-test
2
Desktop
GPU
C++
Deployment
integration-test
3
Desktop
GPU
C++
Real-time test4
Embedded GPU
.mex .lib Cross-compiled
.lib
Build type
Call CUDA
from MATLAB
directly
Call CUDA from
(C++) hand-
coded main()
Call CUDA from (C++)
hand-coded main().
![Page 15: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/15.jpg)
16
Demo: Alexnet Deployment with ‘mex’ Code Generation
![Page 16: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/16.jpg)
17
Algorithm Design to Embedded Deployment on Tegra GPU
MATLAB algorithm
(functional reference)
Functional test1
(Test in MATLAB on host)
Deployment
unit-test
2
(Test generated code in
MATLAB on host + GPU)
Tesla
GPU
C++
Deployment
integration-test
3
(Test generated code within
C/C++ app on host + GPU)
Tesla
GPU
C++
Real-time test4
(Test generated code within
C/C++ app on Tegra target)
Tegra GPU
.mex .lib Cross-compiled
.lib
Build type
Call CUDA
from MATLAB
directly
Call CUDA from
(C++) hand-
coded main()
Call CUDA from (C++)
hand-coded main().
Cross-compiled on host
with Linaro toolchain
![Page 17: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/17.jpg)
18
Alexnet Deployment to Tegra: Cross-Compiled with ‘lib’
Two small changes
1. Change build-type to ‘lib’
2. Select cross-compile toolchain
![Page 18: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/18.jpg)
19
End-to-End Application: Lane Detection
Transfer Learning
Alexnet
Lane detection
CNN
Post-processing
(find left/right lane
points)Image
Image with
marked lanes
Left lane coefficients
Right lane coefficients
Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c
GPU coder generates code for whole application
![Page 19: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/19.jpg)
20
Deep Learning Network Support (with Neural Network Toolbox)
SeriesNetwork DAGNetwork
GPU Coder: R2017b
Networks: MNist
Alexnet
YOLO
VGG
Lane detection
Pedestrian detection
GPU Coder: R2018a
Networks: GoogLeNet
ResNet
SegNet
DeconvNetSemantic
segmentation
Object
detection
![Page 20: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/20.jpg)
21
Semantic Segmentation
Running in MATLAB Generated Code from GPU Coder
![Page 21: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/21.jpg)
22
Deploying to CPUs
GPU
Coder
Deep Learning
Networks
NVIDIA
TensorRT &
cuDNN
Libraries
ARM
Compute
Library
Intel
MKL-DNN
Library
![Page 22: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/22.jpg)
23
Desktop CPU
Raspberry Pi board
Deploying to CPUs
GPU
Coder
Deep Learning
Networks
NVIDIA
TensorRT &
cuDNN
Libraries
![Page 23: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/23.jpg)
24
How Good is Generated Code Performance
▪ Performance of image processing and computer vision
▪ Performance of CNN inference (Alexnet) on Titan XP GPU
▪ Performance of CNN inference (Alexnet) on Jetson (Tegra) TX2
![Page 24: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/24.jpg)
25
GPU Coder for Image Processing and Computer Vision
8x speedup
Distance
transform
5x speedup
Fog removal
700x speedup
SURF feature
extraction
18x speedup
Ray tracing
3x speedup
Frangi filter
![Page 25: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/25.jpg)
26
Alexnet Inference on NVIDIA Titan Xp
GPU Coder +
TensorRT (3.0.1)
GPU Coder +
cuDNN
Fra
mes p
er
second
Batch Size
CPU Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
GPU Pascal Titan Xp
cuDNN v7
Testing platform
MXNet (1.1.0)
GPU Coder +
TensorRT (3.0.1, int8)
TensorFlow (1.6.0)
![Page 26: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/26.jpg)
27
VGG-16 Inference on NVIDIA Titan Xp
GPU Coder +
TensorRT (3.0.1)
GPU Coder +
cuDNN
Fra
mes p
er
second
Batch Size
CPU Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
GPU Pascal Titan Xp
cuDNN v7
Testing platform
MXNet (1.1.0)
GPU Coder +
TensorRT (3.0.1, int8)
TensorFlow (1.6.0)
![Page 27: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/27.jpg)
28
Alexnet Inference on Jetson TX2: Frame-Rate Performance
MATLAB GPU Coder (R2017b)
Batch Size
C++ Caffe (1.0.0-rc5)
TensorRT (2.1)
2x
1.15x
Fra
mes p
er
second
To be updated with R2018a
benchmarks soon
Contact
more information
![Page 28: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/28.jpg)
29
Alexnet Inference on Jetson TX2: Memory Performance
MATLAB GPU Coder (R2017b)
C++ Caffe (1.0.0-rc5)
TensorRT 2.1
(using giexec wrapper)
Peak M
em
ory
(M
B)
Batch Size
To be updated with R2018a
benchmarks soon
Contact
more information
![Page 29: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/29.jpg)
30
Design Your DNNs in MATLAB, Deploy with GPU Coder
Access Data Design + Train Deploy
▪ Manage large image sets
▪ Automate image labeling
▪ Easy access to models
▪ Automate compilation to
GPUs and CPUs using
GPU Coder:▪ 5x faster than TensorFlow
▪ 2x faster than MXNet
▪ Acceleration with GPU’s
▪ Scale to clusters
![Page 30: © 2015 The MathWorks, Inc....3 MATLAB Deep Learning Framework Access Data Design + Train Deploy Manage large image sets Automate image labeling Easy access to models Automate compilation](https://reader033.vdocuments.mx/reader033/viewer/2022042116/5e9380118c585b14db24142a/html5/thumbnails/30.jpg)
31
감사합니다.