the long road to model deployment · 2018-04-11 · •hyper-parameters 5 categorical parameters of...
TRANSCRIPT
![Page 1: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/1.jpg)
…or how to make a good (machine learning) model great.
The Long Road to Model Deployment
GTC, March 2018Greg Heinrich
![Page 2: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/2.jpg)
22
Exemplar model A less successful trial
Same model design, same data, different parameters.
MODELS IN ACTION
![Page 3: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/3.jpg)
33
Object Detector Concorde
It is as easy as flying a Concorde.
PARAMETER TUNING
• Topology parameters▪ Number of layers and their width
▪ Choice of activations
• Training parameters▪ Learning rate
▪ Batch size
▪ Choice of optimizer
▪ Normalization
▪ Number of iterations
• Data parameters▪ Spatial augmentation
▪ Color augmentation
▪ Rasterization
▪ Number of training samples
Source: Christian Kath
![Page 4: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/4.jpg)
44
Hypothetical example
What is the scale of the problem?
PARAMETER TUNING
• Data▪ 5 cars, 6 cameras, 1 month, 10 frames per
minute per camera → 1.3M images
▪ 60 epochs
• Hyper-parameters▪ 5 categorical parameters of 4 values
▪ 5 continuous parameters
• Quasi-Exhaustive (“Grid”) search▪ Explore only 3 values of the continuous
parameters → 250k jobs
▪ 10ms/image → 6 thousand years (!)
• Random search▪ May reduce total time by orders of
magnitude: 50 jobs → 1.25 years
Workflow
Model
Experiment
Dataset
→→
![Page 5: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/5.jpg)
55
Divide & conquer
• Run the 50 jobs in
parallel
• Use 8 GPUs per job
• Total time → 1 day
Parallelism comes to the rescue
PARAMETER TUNING
Run
Training
Use
Datasets
Analyze
ResultsBuild
Experiments
Dataset
ServiceExperiment
Service
Workflow
Manager
Training Cluster
(10’s of thousands of GPUs)
Metrics
![Page 6: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/6.jpg)
66
One service, multiple identical workers
PARAMETER TUNING
Worker
Experiment/hyperopt Service
Get
ParamsTrain Evaluate
Report
MetricsContinue?
![Page 7: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/7.jpg)
77
Jupyter notebook
Collecting and analyzing results
PARAMETER TUNING
![Page 8: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/8.jpg)
88
Overall parameter sensitivity When Accuracy is over 60%
Which parameters have the most impact on metrics?
PARAMETER TUNING
![Page 9: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/9.jpg)
99
Learning rate Batch size
Zooming in on important parameters
PARAMETER TUNING
![Page 10: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/10.jpg)
1010
Eliminating underperformers
PARAMETER TUNING
![Page 11: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/11.jpg)
1111
Diminishing returns
• Adding data helps
• Increasing accuracy
through more data is
increasingly expensive
Do we need more?
DATA
![Page 12: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/12.jpg)
1212
Active learning
• Collecting data is
relatively cheap
• Annotating data is very
expensive
• Use trained model to
select next frame to
annotate
• Strategies:▪ Maximum variance
▪ Maximum entropy
Selecting better data
DATA
Annotated dataset
Train
Trained model
Select imageto annotate
Annotate
Unannotated dataset
See: Adam Lesnikowsky’s talk on Deep Active Learning
![Page 13: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/13.jpg)
1313
Inference time
• Using DL framework to
run inference
• DrivePX2 platform
• Time/frame (excluding
data loading): 73ms
• 6 cameras → one
prediction every 438ms
Great accuracy, slow response time
SELECTING BEST MODEL
![Page 14: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/14.jpg)
1414
Unpruned network Pruned network
Pruning unimportant neurons
REDUCING MODEL COMPLEXITY
12 neurons, 32 connections
11 neurons, 24 connections
![Page 15: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/15.jpg)
1515
Selecting neurons to prune Workflow
Pruning implementation
REDUCING MODEL COMPLEXITY
• Exhaustive search
• Random
• Minimum weight
• Minimum activations
• Gradient based
Train Prune Re-train
Results
• Min weight method, single pass
• 83% of weights removed
• Inference time: 73ms → 26ms
• 2.7x speed-up
See: Jose Alvarez’s talk on Model Compression
![Page 16: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/16.jpg)
1616
TensorRT conversion
• Inference-specific
optimizations
• Platform-specific
optimizations
Inference optimization
TENSORRT
TensorRT OptimizerTrained Neural Network
Plan 1
Plan 2
Plan 3
Optimized Plans
ImportModel
SerializeEngine
Results (FP32 precision)
• 26ms → 8.5ms/frame
• 3x speed-up
![Page 17: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/17.jpg)
1717
TensorRT conversion
• Store weights and
activations in 8 bits
• Accumulations in FP16
INT8 precision
TENSORRT
Results (FP16 precision)
• 8.5ms → 3.9ms/frame
• 2.2x speed-up
• Even greater speed-up
with Tensor Cores on
Xavier SoC!
Dataset
Pre-
processor
Pre-
processed
images
INT8
calibrator
INT8 cal
file
Train Model
TensorRT
Optimizer
TensorRT
engineEvaluatorMetrics
Process Artifact
Legend
![Page 18: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/18.jpg)
1818
Automated workflows enable traceability
AUTOMATION
Code
BuildData
Loader
SCM Data
TrainPrune Re-Train
Pick best
modelTensorRT Evaluate
Train
Train
Train
Train
Train
Knowledge base
Config
Traceability firewall
![Page 19: The Long Road to Model Deployment · 2018-04-11 · •Hyper-parameters 5 categorical parameters of 4 values 5 continuous ... Selecting better data DATA Annotated dataset Train Trained](https://reader034.vdocuments.mx/reader034/viewer/2022050502/5f94e118fb9aaa4933518409/html5/thumbnails/19.jpg)
1919
Q&A
Fore More information contact:Poonam Chitale ([email protected])NVIDIA AV Perception Infrastructure Product Manager