"the road ahead for neural networks: five likely surprises," a presentation from cadence
TRANSCRIPT
Copyright © 2016 Cadence Design Systems Inc. 1
The Road Ahead for Neural Networks: Five Likely Surprises
Chris Rowen, PhD, FIEEE – CTO IP Group
May 2, 2016
Copyright © 2016 Cadence Design Systems Inc. 2
The Deep Learning Buzz
Speech recognition: Apple,
Google, Nuance, Microsoft
Vision/ADAS: Nvidia, Mobileye
Finance: TradeTrek, M.J. Futures, Alyuda
Social media and big-data search:
Google, Facebook, Microsoft,
Baidu, NEC, IBM, Yahoo, AT&T
Medical: genomics, radiology,
screening, protein sequencing
Security: Google
Copyright © 2016 Cadence Design Systems Inc. 3
Tracking the Enthusiasm Gartner Hype Cycle for Emerging Technology: 2014 versus 2015
2014: Machine Learning Not On Radar 2015: Machine Learning At Peak Hype
Source: Gartner, August 2015: “Gartner's 2015 Hype Cycle for Emerging Technologies Identifies the Computing Innovations That Organizations Should Monitor”
Copyright © 2016 Cadence Design Systems Inc. 4
Vision is the Computing Challenge Growing data + compute drives new system-on-chip designs
-
10,000
20,000
30,000
40,000
2010 2011 2012 2013 2014 2015 2016 2017 2018
Un
it p
er
Ye
ar
(M)
Sensor Unit Volume: Source: Semico Research, 2014
MicrophoneImageGyroscopeAccelerometerAmbient lightProximityMagnetometerPressureTouchFingerprintChemical/gasTemperatureUltrasonicIRBiologicalHumidityHall effectUVECGEMGUltrasonicEEG
0.00E+00
2.00E+12
4.00E+12
6.00E+12
8.00E+12
1.00E+13
1.20E+13
1.40E+13
2010 2011 2012 2013 2014 2015 2016 2017 2018
Da
ta-R
ate
We
ighte
d V
olu
me
per
Ye
ar
(M
u
nits *
bits p
er
se
con
d)
Sensor Volume Adjusted for Data Rate
Cisco: “Consumer internet video traffic will be 80 percent of all consumer Internet traffic in 2019” Source: Cisco May 2015: “Cisco Visual Networking Index: Forecast and Methodology, 2014-2019 White Paper”
Copyright © 2016 Cadence Design Systems Inc. 5
107-1011 MACs/image
1016-1022 MACs/dataset
The Basics of Real Time Neural Networks
Training: Runs once per database, server-based, very compute intensive
Deployment (“Inference”): Runs on every image, device-based, compute intensive
Labeled dataset
Selection of
layered
network
Iterative derivation of coefficients by
stochastic descent error minimization
Single pass evaluation of
input image Most probable label
Copyright © 2016 Cadence Design Systems Inc. 6
Typical embedded SW
Programming Training
Embedded neural network
Data selection and labeling
Network selection and
optimization
Embedded code generation Test data
Network training
Training
data
Training
data
Training
data
Algorithm exploration
(MatLab)
Algorithm optimization
(e.g., float fixed point)
Code testing and
characterization Test data
Code porting to target
platform
Training
data
Test data
Test data
Training Challenges: tiny apps/data ecosystem, few commercial tools, scarce expertise
Copyright © 2016 Cadence Design Systems Inc. 7
New Value Chains for Deep Learning Extreme range of market size estimates: $5B — $2T
Neural
Network
Silicon
Vendor
(e.g.,
Nvidia)
Neural Network
Silicon IP
(e.g., Cadence)
System
Integrator
(e.g.,
Delphi)
Training
Data
Owner
(TBD)
System
OEM (e.g.,
Ford)
End U
ser
Neural
Network Tools
(TBD)
Copyright © 2016 Cadence Design Systems Inc. 8
Today: Training for recognition of objects
What Hard About Neural Networks Training for Useful Behavior
Next: Training for judgment and strategy
Example: Google AlphaGo beats best human Go players • Based on 2 neural networks
• Policy network: find highest probability moves
• Value network: assign value to board positions
• Complex training:
• Patterns from expert games
• Reinforcement training from machine vs. machine games
• Matching to labeled images
• Simple data set-up
• Image enhancement
• Find Regions of
Interest
• More complex extraction,
labeling of key patterns
• Neural networks INSIDE
larger algorithms
• Push to get enough good
data
Copyright © 2016 Cadence Design Systems Inc. 9
Example: AlexNet
• ~60M model parameters (FP32: 240MB)
• ~800M multiply accumulates (MACs) per
image
• At 1 T MAC: 350GB/s DDR bandwidth (FP32)
• Killed by the memory power, not the compute
Today’s Neural Networks are Inefficient but likely to get MUCH better!
Record circa 2012
Performance Optimized
Compute Optimized
99.00%
99.20%
99.40%
99.60%
99.80%
100.00%
0 20,000,000 40,000,000 60,000,000 80,000,000 100,000,000
German Traffic Sign Recognition Benchmark
1.5M
parameters
(6 MB)
154K
parameters
(150KB)
~10x more efficient
Need too much memory bandwidth
Need too much compute
Not accurate enough
2M parameters
(2 MB)
Copyright © 2016 Cadence Design Systems Inc. 10
• Good neural networks need lots
of compute — esp. multiply-add
• Two key metrics
• Scaling to high total compute
• High multiply-add per watt
• Vision DSPs often give greater
efficiency than GPUs or FPGAs
• CNN-Specific architectures
• Clusters essential to scaling
Neural Network Efficiency Trends
10
100
1000
10000
10 100 1,000 10,000
Eff
icie
nc
y:
Pe
ak
GM
AC
s p
er
W
Throughput: GMACs per second
Estimated CNN Throughput and Efficiency
Embedded
Vision DSPs
Desktop GPUs
CNN-Specific
Engines
FPGAs
Copyright © 2016 Cadence Design Systems Inc. 11
Data
• If programming training, training data
gets more valuable
• Must have large, relevant data-sets
• Must label the data
• Often must clean up the data to fit task
• Data scientists more in demand
• Open data-sets become the new “open
source”
• Data can be “mined” to serve many
different problems
Data Ownership and Privacy
Privacy
• Neural networks may identify
health, habits, opinions and
finances
• Large-scale data collection picks
up personally sensitive data (by-
catch)
• Difficulty in constraining usage
• Unexpected personal insights need
protection
Copyright © 2016 Cadence Design Systems Inc. 12
• Default: Training in cloud, inference (recognition) in device
• Technical factors drive work distribution
1. Energy and bandwidth cost for shipping raw data up to cloud
2. Latency and reliability of network in real-time applications
3. Usage frequency: occasional use may make cloud cost-effective
4. Frequency of retraining : data h or trained weights i
• Business factors drive work distribution
1. Liability concern pushes network execution to “deep pockets” or pulls to local control
2. Today’s input data is tomorrow’s training data. Network execution has fringe benefits
3. Privacy concerns minimize data movement — raw streams expensive to encrypt
Distributed Systems & Neural Networks What happens where
Copyright © 2016 Cadence Design Systems Inc. 13
1. Neural networks will continue to proliferate in cloud-based
applications
2. Neural networks will expand rapidly into real-time embedded
functions
3. Power constraints and extreme throughput needs will drive CNN
optimization in processor platforms — embedded and server
4. Real time neural network evolves from object recognition to action
recognition
5. Vision-based problems dominate the computations and the high-
profile deployments
6. Expect a mad — sometimes unguided — scramble for expertise,
data, and applications
The Road Ahead — Not Too Surprising
Copyright © 2016 Cadence Design Systems Inc. 14
1. >100x energy and >20x in bandwidth from network AND
engine architecture optimization near-term
2. In time: deployment of 1000 tera-MAC (peta-MAC) embedded,
1,000,000 tera-MAC (exa-MAC) server neural networks
3. Network optimization evolves from ad hoc exploration to
automated “synthesis” — a new kind of EDA
4. New value chains emerge — and swing between vertical
integration and disintegration. New kinds of IP, tools and data
services
5. Data is king. Access to large, diverse training sets makes new
winners.
6. Potential backlash over privacy and “rise of the machines”
The Road Ahead — More Surprising
Copyright © 2016 Cadence Design Systems Inc. 15
Cadence Product Announcement Today Vision P6 DSP: A complete imaging/vision and CNN processor
• Up to 4X neural network performance
Neural Network Performance
• Up to 4X performance in well-known benchmarks
Imaging and Vision
Performance
• 4X MAC count Multiply Accumulate
• 32-way vector FPU on FP16
• Easy GPU code porting Vector Floating-Point
Support
Relative to Tensilica Vision P5 DSP for same process node
Extends the Cadence product portfolio further into the fast-growing
vision/deep learning applications areas
Copyright © 2016 Cadence Design Systems Inc. 16
• Some market sizing efforts:
http://techemergence.com/valuing-the-artificial-intelligence-market-2016-and-beyond/
• Cadence Neural Network story:
http://ip.cadence.com/applications/cnn
• Cadence Embedded Neural Network Summit Proceedings: http://ip.cadence.com/knowledgecenter/enns
• Using Convolutional Neural Networks for Image Recognition: http://ip.cadence.com/uploads/901/cnn_wp-pdf
• The latest Cadence Vision DSPs:
http://ip.cadence.com/ipportfolio/tensilica-ip/image-vision-processing
Please come to our demo table at the Summit and talk with our neural network,
imaging and vision experts. Cadence MIPI CSI/DSI IP demo at mipi alliance table
Resource Slide
Copyright © 2016 Cadence Design Systems Inc. 17
Thank You
© 2016 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, Denali, and Tensilica are registered trademarks of Cadence Design
Systems, Inc. All others are the property of their respective holders.