"the road ahead for neural networks: five likely surprises," a presentation from cadence

Copyright © 2016 Cadence Design Systems Inc. 1

The Road Ahead for Neural Networks: Five Likely Surprises

Chris Rowen, PhD, FIEEE – CTO IP Group

May 2, 2016


The Deep Learning Buzz

Speech recognition: Apple,

Google, Nuance, Microsoft

Vision/ADAS: Nvidia, Mobileye

Finance: TradeTrek, M.J. Futures, Alyuda

Social media and big-data search:

Google, Facebook, Microsoft,

Baidu, NEC, IBM, Yahoo, AT&T

Medical: genomics, radiology,

screening, protein sequencing

Security: Google


Tracking the Enthusiasm Gartner Hype Cycle for Emerging Technology: 2014 versus 2015

2014: Machine Learning Not On Radar 2015: Machine Learning At Peak Hype

Source: Gartner, August 2015: “Gartner's 2015 Hype Cycle for Emerging Technologies Identifies the Computing Innovations That Organizations Should Monitor”


Vision is the Computing Challenge Growing data + compute drives new system-on-chip designs

-

10,000

20,000

30,000

40,000

2010 2011 2012 2013 2014 2015 2016 2017 2018

Un

it p

er

Ye

ar

(M)

Sensor Unit Volume: Source: Semico Research, 2014

MicrophoneImageGyroscopeAccelerometerAmbient lightProximityMagnetometerPressureTouchFingerprintChemical/gasTemperatureUltrasonicIRBiologicalHumidityHall effectUVECGEMGUltrasonicEEG

0.00E+00

2.00E+12

4.00E+12

6.00E+12

8.00E+12

1.00E+13

1.20E+13

1.40E+13

2010 2011 2012 2013 2014 2015 2016 2017 2018

Da

ta-R

ate

We

ighte

d V

olu

me

per

Ye

ar

(M

u

nits *

bits p

er

se

con

d)

Sensor Volume Adjusted for Data Rate

Cisco: “Consumer internet video traffic will be 80 percent of all consumer Internet traffic in 2019” Source: Cisco May 2015: “Cisco Visual Networking Index: Forecast and Methodology, 2014-2019 White Paper”


107-1011 MACs/image

1016-1022 MACs/dataset

The Basics of Real Time Neural Networks

Training: Runs once per database, server-based, very compute intensive

Deployment (“Inference”): Runs on every image, device-based, compute intensive

Labeled dataset

Selection of

layered

network

Iterative derivation of coefficients by

stochastic descent error minimization

Single pass evaluation of

input image Most probable label


Typical embedded SW

Programming Training

Embedded neural network

Data selection and labeling

Network selection and

optimization

Embedded code generation Test data

Network training

Training

data

Training

data

Training

data

Algorithm exploration

(MatLab)

Algorithm optimization

(e.g., float fixed point)

Code testing and

characterization Test data

Code porting to target

platform

Training

data

Test data

Test data

Training Challenges: tiny apps/data ecosystem, few commercial tools, scarce expertise


New Value Chains for Deep Learning Extreme range of market size estimates: $5B — $2T

Neural

Network

Silicon

Vendor

(e.g.,

Nvidia)

Neural Network

Silicon IP

(e.g., Cadence)

System

Integrator

(e.g.,

Delphi)

Training

Data

Owner

(TBD)

System

OEM (e.g.,

Ford)

End U

ser

Neural

Network Tools

(TBD)


Today: Training for recognition of objects

What Hard About Neural Networks Training for Useful Behavior

Next: Training for judgment and strategy

Example: Google AlphaGo beats best human Go players • Based on 2 neural networks

• Policy network: find highest probability moves

• Value network: assign value to board positions

• Complex training:

• Patterns from expert games

• Reinforcement training from machine vs. machine games

• Matching to labeled images

• Simple data set-up

• Image enhancement

• Find Regions of

Interest

• More complex extraction,

labeling of key patterns

• Neural networks INSIDE

larger algorithms

• Push to get enough good

data


Example: AlexNet

• ~60M model parameters (FP32: 240MB)

• ~800M multiply accumulates (MACs) per

image

• At 1 T MAC: 350GB/s DDR bandwidth (FP32)

• Killed by the memory power, not the compute

Today’s Neural Networks are Inefficient but likely to get MUCH better!

Record circa 2012

Performance Optimized

Compute Optimized

99.00%

99.20%

99.40%

99.60%

99.80%

100.00%

0 20,000,000 40,000,000 60,000,000 80,000,000 100,000,000

German Traffic Sign Recognition Benchmark

1.5M

parameters

(6 MB)

154K

parameters

(150KB)

~10x more efficient

Need too much memory bandwidth

Need too much compute

Not accurate enough

2M parameters

(2 MB)


• Good neural networks need lots

of compute — esp. multiply-add

• Two key metrics

• Scaling to high total compute

• High multiply-add per watt

• Vision DSPs often give greater

efficiency than GPUs or FPGAs

• CNN-Specific architectures

• Clusters essential to scaling

Neural Network Efficiency Trends

10

100

1000

10000

10 100 1,000 10,000

Eff

icie

nc

y:

Pe

ak

GM

AC

s p

er

W

Throughput: GMACs per second

Estimated CNN Throughput and Efficiency

Embedded

Vision DSPs

Desktop GPUs

CNN-Specific

Engines

FPGAs


Data

• If programming training, training data

gets more valuable

• Must have large, relevant data-sets

• Must label the data

• Often must clean up the data to fit task

• Data scientists more in demand

• Open data-sets become the new “open

source”

• Data can be “mined” to serve many

different problems

Data Ownership and Privacy

Privacy

• Neural networks may identify

health, habits, opinions and

finances

• Large-scale data collection picks

up personally sensitive data (by-

catch)

• Difficulty in constraining usage

• Unexpected personal insights need

protection


• Default: Training in cloud, inference (recognition) in device

• Technical factors drive work distribution

1. Energy and bandwidth cost for shipping raw data up to cloud

2. Latency and reliability of network in real-time applications

3. Usage frequency: occasional use may make cloud cost-effective

4. Frequency of retraining : data h or trained weights i

• Business factors drive work distribution

1. Liability concern pushes network execution to “deep pockets” or pulls to local control

2. Today’s input data is tomorrow’s training data. Network execution has fringe benefits

3. Privacy concerns minimize data movement — raw streams expensive to encrypt

Distributed Systems & Neural Networks What happens where


1. Neural networks will continue to proliferate in cloud-based

applications

2. Neural networks will expand rapidly into real-time embedded

functions

3. Power constraints and extreme throughput needs will drive CNN

optimization in processor platforms — embedded and server

4. Real time neural network evolves from object recognition to action

recognition

5. Vision-based problems dominate the computations and the high-

profile deployments

6. Expect a mad — sometimes unguided — scramble for expertise,

data, and applications

The Road Ahead — Not Too Surprising


1. >100x energy and >20x in bandwidth from network AND

engine architecture optimization near-term

2. In time: deployment of 1000 tera-MAC (peta-MAC) embedded,

1,000,000 tera-MAC (exa-MAC) server neural networks

3. Network optimization evolves from ad hoc exploration to

automated “synthesis” — a new kind of EDA

4. New value chains emerge — and swing between vertical

integration and disintegration. New kinds of IP, tools and data

services

5. Data is king. Access to large, diverse training sets makes new

winners.

6. Potential backlash over privacy and “rise of the machines”

The Road Ahead — More Surprising


Cadence Product Announcement Today Vision P6 DSP: A complete imaging/vision and CNN processor

• Up to 4X neural network performance

Neural Network Performance

• Up to 4X performance in well-known benchmarks

Imaging and Vision

Performance

• 4X MAC count Multiply Accumulate

• 32-way vector FPU on FP16

• Easy GPU code porting Vector Floating-Point

Support

Relative to Tensilica Vision P5 DSP for same process node

Extends the Cadence product portfolio further into the fast-growing

vision/deep learning applications areas


• Some market sizing efforts:

http://techemergence.com/valuing-the-artificial-intelligence-market-2016-and-beyond/

• Cadence Neural Network story:

http://ip.cadence.com/applications/cnn

• Cadence Embedded Neural Network Summit Proceedings: http://ip.cadence.com/knowledgecenter/enns

• Using Convolutional Neural Networks for Image Recognition: http://ip.cadence.com/uploads/901/cnn_wp-pdf

• The latest Cadence Vision DSPs:

http://ip.cadence.com/ipportfolio/tensilica-ip/image-vision-processing

Please come to our demo table at the Summit and talk with our neural network,

imaging and vision experts. Cadence MIPI CSI/DSI IP demo at mipi alliance table

Resource Slide





















http://ip.cadence.com/knowledgecenter/enns



http://ip.cadence.com/uploads/901/cnn_wp-pdf















Thank You

© 2016 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, Denali, and Tensilica are registered trademarks of Cadence Design

Systems, Inc. All others are the property of their respective holders.