real world problem simplification using deep learning … · gpu servers: dual xeon e5-2690...
TRANSCRIPT
Dr. Ettikan Kandasamy Karuppiah
Director, Developers’ Ecosystem
2 November 2017
REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING / AI
2
“Find where I parked my car”
AI IS EVERYWHERE
“Find the bag I just saw in this magazine”
“What movie should I watch next?”
3
Bringing grandmother closer to family by bridging language barrier
TOUCHING OUR LIVES
Predicting sick baby’s vitals like heart rate, blood pressure, survival rate
Enabling the blind to “see” their surrounding, read emotions on faces
4
Increasing public safety with smart video surveillance at airports & malls
AI FOR PUBLIC GOOD
Providing intelligent services in hotels, banks and stores
Separating weeds as it harvests, reduces chemical usage by 90%
5
HOW A DEEP NEURAL NETWORK SEES
Image “Audi A7”
Raw data Low-level features Mid-level features High-level features
6
KNOW YOUR PROBLEM WELL
7
3000
AI MOMENTUM
TODAY BY 2020
AI startups
85% $47B 20%
of all customer service interactions will
be powered by AI bots
spend on AI technologies
of companies will dedicate workers to monitor and guide neural networks
8
EVERY INDUSTRY HAS AWOKEN TO AI
2014 2016
1,549
19,439
Higher Ed
Internet
Healthcare
Finance
Automotive
Others
Government
Developer Tools
Organizations engaged with NVIDIA on Deep Learning
9
1980 1990 2000 2010 2020
GPU-Computing perf
1.5X per year
1000X
by
2025
RISE OF GPU COMPUTING
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,
O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected
for 2010-2015 by K. Rupp
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
10
GPUCPU
Add GPUs: Accelerate Data Processing & Analytics
© NVIDIA 2013
11
NVIDIA IGNITES
THE AI BIG BANGArtificial intelligence is the use of computers to simulate human intelligence.
AI amplifies our cognitive abilities — letting us solve problems where the
complexity is too great, the information is incomplete, or the details are
too subtle and require expert training.
Learning from data — a computer’s version of life experience — is how AI evolves.
GPU computing powers the computation required for deep neural networks to learn
to recognize patterns from massive amounts of data.
This new computing model sparked the AI era.
12
DEEP LEARNING FRAMEWORKS
VISION SPEECH BEHAVIOR
Object Detection Voice Recognition Language TranslationRecommendation
EnginesSentiment Analysis
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
MULTI-GPU
NCCL
cuFFT
Mocha.jl
Image Classification
NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning
13
ANNOUNCING TESLA V100GIANT LEAP FOR AI & HPCVOLTA WITH NEW TENSOR CORE
21B xtors | TSMC 12nm FFN | 815mm2
5,120 CUDA cores
7.5 FP64 TFLOPS | 15 FP32 TFLOPS
NEW 120 Tensor TFLOPS
20MB SM RF | 16MB Cache
16GB HBM2 @ 900 GB/s
300 GB/s NVLink
14
NEW TENSOR CORE
New CUDA TensorOp instructions & data formats
4x4 matrix processing array
D[FP32] = A[FP16] * B[FP16] + C[FP32]
Optimized for deep learning
Activation Inputs Weights Inputs Output Results
15
16
MODEL COMPLEXITY IS EXPLODING
2016 — Baidu Deep Speech 22015 — Microsoft ResNet 2017 — Google NMT
105 ExaFLOPS8.7 Billion Parameters
20 ExaFLOPS300 Million Parameters
7 ExaFLOPS60 Million Parameters
17
REVOLUTIONARY AI PERFORMANCE3X Faster DL Training Performance
Over 80x DL Training Performance in 3 Years
1x K80cuDNN2
4x M40cuDNN3
8x P100cuDNN6
8x V100cuDNN7
0x
20x
40x
60x
80x
100x
Q1
15
Q3
15
Q2
17
Q2
16
Googlenet Training Performance(Speedup Vs K80)
Speedup v
s K80
85% Scale-Out EfficiencyScales to 64 GPUs with Microsoft
Cognitive Toolkit
0 5 10 15
64X V100
8X V100
8X P100
Multi-Node Training with NCCL2.0(ResNet-50)
ResNet50 Training for 90 Epochs with 1.28M images dataset | Using Caffe2 | V100 performance measured on pre-production hardware.
1 Hour
7.4 Hours
18 Hours
3X Reduction in Time to Train Over P100
0 10 20
1XV100
1XP100
2XCPU
LSTM Training(Neural Machine Translation)
Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x Xeon E5 2699 V4 | V100 performance
measured on pre-production hardware.
15 Days
18 Hours
6 Hours
18
Deep Learning-Inferencing:TESLA V100 DELIVERS NEEDED RESPONSIVENESS WITH UP TO 99X MORE THROUGHPUT
0
1,000
2,000
3,000
4,000
5,000
CPU Server Tesla V100
ResNet-50
4,647 i/s@ 7mslatency
47 i/s@ 21msLatency
0
600
1,200
1,800
CPU Server Tesla V100
VGG-16
1,658i/s@ 7ms
Latency
23 i/s@ 43msLatency
0
500
1,000
1,500
2,000
2,500
3,000
3,500
CPU Server Tesla V100
GoogleNet
3,270 i/s@ 7ms
Latency
136 i/s@ 7ms
LatencyThro
ughput
(Im
ages/
Sec)
Thro
ughput
(Im
ages/
Sec)
Thro
ughput
(Im
ages/
Sec)
CPU: Xeon E5-2690 V4
GPU Servers: Dual Xeon E5-2690 [email protected] with 16GB PCIe GPUs configs as shownUbuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13; NCCL 2.0.4, TensorRT pre-release, data set: ImageNet,GPU Optimal batch size used to achieve 7ms latency; CPU batch size reduced to 1 if latency exceeds 7ms
NVIDIA METROPOLIS —EDGE TO CLOUD AI CITY PLATFORM
Camera
CLOUDTraining and Inference
EDGE AND ON-PREMISESInference
DGX Appliance
Video recorder Server
JETSON TESLA/QUADRO
JETPACK, TENSOR RT, DEEPSTREAM
20
0
10
20
30
40
50
60
70
HPCG Performance EquivalencySingle GPU Server vs Multiple CPU-Only Servers
CPU Server: Dual Xeon E5-2690 [email protected], GPU Servers: same CPU server w/ V100s PCIeCUDA Version: CUDA 9.0.103; Dataset: 256x256x256 local sizeTo arrive at CPU node equivalence, we use measured benchmark with up to 8 CPU nodes. Then we use linear scaling to scale beyond 8 nodes.
HPCGBenchmark
Exercises computational and data access patterns that closely match a broad set of important HPC applications
VERSION3
ACCELERATED FEATURESAll
SCALABILITYMulti-GPU and Multi-Node
MORE INFORMATIONhttp://www.hpcg-benchmark.org/index.html
19 CPU Servers
37 CPU Servers
67 CPU Servers # of CPU Only Servers
1 server 8x V100 GPU’s
19x 36x 67xSpeed up vs
CPU server
1 server 2x V100 GPU’s
1 server 4x V100 GPU’s
REINVENTING OUR COMMUNITY
REINVENTING OUR COMMUNITY
23
AUTOMOTIVE DEEP LEARNING
Deep Learning for Damage Estimation based on photo of the damage
Custom development for one of top 5 Insurance Companies
Total Data Available
90,000 Claims~380,000 car images
Scope of Work
Data Processed to Date*
1000 Claims4500 images
*Time lag in data migration, data clean-up and tagging images
Visualization of the activations of theconvolutions over a car damage image
Visualization of the layers of a neural network and the back propagation process
Neural Network Demo
https://www.galacticar.ai/ -> Galaxy.ai : Artificial Intelligence Driven Sedan Damage Estimator
Software as a Service
Annual fee + fixed fee every time API is pinged
Integration with Insurance Mobile app.
Use cases verified by the industry and insurance clients:
1) Whether client should file a claim or not - Claims triaging
2) Claims estimation from damaged sedan vehicle images
Business Model
27
(1) DEEP LEARNING IN FINANCE - TRADING
http://on-demand.gputechconf.com/gtc/2016/presentation/s6589-masahiko-todoriki-performance-improvement-algorithmic-trading.pdf
http://tradetechasia.wbresearch.com/ 20 Oct 2016
Generic Bigdata Operations
Correction
& Assurance
Real-time
Data
Sources
Data Analytics
Data Visualization
Scrambling
&
Protecting
Fields/Data
Data
Governanc
e
Published
Data
Marts/Lake
Harmonization
& Ontology
Mapping
Data
Harmonization
Harmonization
Terminologies
Data
Quality &
Integrity
Data
Cleansing
Data
Staging
Data
Acquisition
Unstructured
Data
Sources
Structured
Data
Sources
Data Preparation Data Dissemination
Virtualized Platform & Security Management
Data Warehousing/Storage (including Network Data)
Check for different Representation and usage Ontology
Check forCorrectionException
Data Anonymity &
Data Protection
Structured Data Access
Data Modeling
Sentiment Analytics
Data Reporting & Visualization
Data Interpretation
Social Network Understanding
Real-time Data Ingestion
Data Analytics
NetworkAnalytics
Unstructured Data Collection
Data ……….
Data Statistics
Mobile Sharing
Tablet Sharing
PC Sharing
Push & Pull Data Platform
Data Models
& Schema
29
GPU DATABASES ARE EVEN FASTER1.1 Billion Taxi Ride Benchmarks
21 30
1560
80 99
1250
150269
2250
372
696
2970
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node
Query 1 Query 2 Query 3 Query 4
Tim
e in M
illise
conds
Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS @marklit82
10190 8134 19624 85942
30NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Silicon Valley-based Blue River
Technology has developed
a deep learning solution called
LettuceBot that rolls through a
field photographing 5,000
young plants a minute, using
algorithms and machine vision
to identify each sprout as
lettuce or a weed. The
company trained their neural
network with GPUs and the
Caffe deep learning
framework.
Accurate within a quarter inch,
the LettuceBot automatically
pinpoints weeds,
underdeveloped sprouts, and
overplanted areas and then
applies tiny doses of herbicide
to maximize crop production.
Automated Crop Management
Source : https://news.developer.nvidia.com/artificial-intelligence-helping-to-ensure-humanitys-future-food-supply/
31NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Researchers from the Costa
Rica Institute of Technology
and French Agricultural
Research Centre for
International Development
developed a deep learning
algorithm to automatically
identify plant specimens that
have been pressed, dried and
mounted on herbarium sheets.
According to the researchers,
this is the first attempt to use
deep learning to tackle the
difficult taxonomic task of
identifying species in natural-
history collections.
Artificial Intelligence Helps Identify Plant Species
Source : https://news.developer.nvidia.com/artificial-intelligence-helps-identify-plant-species-for-science/
32
Train
whole slide image
sample
sample
training data
no
rmal
tum
or
Test
whole slide imageoverlapping image
patches tumor prob. map
1.0
0.0
0.5
Convolutional Neural Network
P(tumor)
How does it work?HOW DOES IT WORK?
SAFE AND SMART CITIES IS AN AI PROBLEM
0M
200M
400M
600M
800M
1,000M
2016 2020
1 billion installed security cameras WW (2020)
30 billion frames per day
Challenging real world conditions
Traditional video analytics not trustworthy
74%
9…
2010 2011 2012 2013 2014 2015 2016
Accura
cy
Image Classification
Human
Hand-coded CV
Deep Learning
AI achieves super human results
AI driven intelligent video analytics
34
35
36
38
39
AI FOR VISUAL SEARCH
IN MARKET PLACE
40
TESLA PLATFORMLeading Data Center Platform for HPC and AI
TESLA GPU & SYSTEMS
NVIDIA SDK
INDUSTRY TOOLS
APPLICATIONS & SERVICES
C/C++
ECOSYSTEM TOOLS & LIBRARIES
HPC
+400 More Applications
cuBLAS
cuDNN TensorRT
DeepStreamSDK
FRAMEWORKS MODELS
Cognitive Services
AI TRAINING & INFERENCE
Machine Learning Services
ResNetGoogleNetAlexNet
DeepSpeechInceptionBigLSTM
DEEP LEARNING SDK
NCCL
COMPUTEWORKS
NVLINK SYSTEM OEM CLOUDTESLA GPU
41
Instant productivity — plug-and-play, supports every AI framework
Performance optimized across the entire stack
Docker/Container Development and Deployment model
Always up-to-date via the cloud
Mixed framework environments — containerized
Direct access to NVIDIA experts
DGX STACKFully integrated Deep Learning platform
42
DEEP LEARNING REQUIREMENTS
DEEP LEARNING NEEDS… DEEP LEARNING CHALLENGES NVIDIA DELIVERS
Data Scientists Demand far exceeds supply DIGITS, DLI Training
Latest Algorithms Rapidly evolvingDL SDK, GPU-Accelerated
Frameworks
Fast Training Impossible -> Practical DGX-1, P100, P40, TITAN X
Deployment Platform Must be available everywhereTensorRT, P40, P4, Jetson,
Drive PX
43
DEEP LEARNING SOFTWARE
developer.nvidia.com/deep-learning
44
NVIDIA DEEP LEARNING INSTITUTE
Training organizations and individuals to solve challenging problems using Deep Learning
On-site workshops and online courses presented by certified experts
Covering complete workflows for proven application use cases
Self-driving cars, recommendation engines, medical image classification, intelligent video analytics and more
Hands-on training for data scientists and software engineers
Nvidia AI Conference Singapore
23/24th October 2017
THANK YOU