deep learning scaling is predictable (empirically)

35
Silicon Valley AI Lab Deep Learning scaling is predictable (empirically) Greg Diamos December 9, 2017

Upload: others

Post on 18-Nov-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Deep Learning scaling is predictable(empirically)Greg DiamosDecember 9, 2017

Page 2: Deep Learning scaling is predictable (empirically)

AI

• AI is like electricity

Page 3: Deep Learning scaling is predictable (empirically)

Deep Learning scalesAc

cura

cy

Data + Model Size

Deep LearningTraditional methods

Page 4: Deep Learning scaling is predictable (empirically)

Why?

• Why do deep neural networks scale so well?

• How much data do we need?

• How fast do computers need to be?

Page 5: Deep Learning scaling is predictable (empirically)

This talk: looking deeper

Page 6: Deep Learning scaling is predictable (empirically)

SVAIL’s ASIMOV supercomputer• We used a 11 PFLOP/s GPU

supercomputer to study deep learning scaling

• 1500 GPUs

• 2 months training time

• **This experiment would cost over $2 million USD if performed on AWS**

Page 7: Deep Learning scaling is predictable (empirically)

Application domains

Speech RecognitionSpeech Synthesis

Natural LanguageUnderstanding

Computer Vision

Page 8: Deep Learning scaling is predictable (empirically)

State of the art neural nets

+

relu

weights

weights

H

C

T

+

*

* H

C

T

+

*

*

CONV + RNN

SPRECTRA NET

RECURRENTHIGHWAY NET

RNN + ATTENTIONRESNET

Page 9: Deep Learning scaling is predictable (empirically)

Methodology

More Data BiggerModel

Page 10: Deep Learning scaling is predictable (empirically)

Generalization error scaling

Page 11: Deep Learning scaling is predictable (empirically)

Generalization error scaling

Neural Language ModelDeep Speech

Page 12: Deep Learning scaling is predictable (empirically)

Model size scaling data

Page 13: Deep Learning scaling is predictable (empirically)

Model size scaling data

Resnet50 Object Detection Neural Language Model

Page 14: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

What do you think?

Page 15: Deep Learning scaling is predictable (empirically)

We find: generalization error scaling consistently follows a power-law

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Page 16: Deep Learning scaling is predictable (empirically)

We find: model size scales sublinearly

BestModelSize

Data

SOTA Models

Page 17: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Acknowledgements

Page 18: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

The Deep Learning Recipe

Page 19: Deep Learning scaling is predictable (empirically)

Data-limited problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Not Enough Data!

Page 20: Deep Learning scaling is predictable (empirically)

Compute-limited problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

It Takes Forever!

Page 21: Deep Learning scaling is predictable (empirically)

Solved problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Acceptable Error

Page 22: Deep Learning scaling is predictable (empirically)

Impossible problems

log(Error)

log(Data)

Best Guess

Irreducible Error(model bias, Bayes Error, etc)

Acceptable Error

Page 23: Deep Learning scaling is predictable (empirically)

Impossible problem

Page 24: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Implications

Page 25: Deep Learning scaling is predictable (empirically)

#1: Data is extremely valuable• If all you need is scale, then we should invest in data

• How can we reduce the cost to collect and label data?

Page 26: Deep Learning scaling is predictable (empirically)

#2: Achievable error follows Moore’s Law

log(Error)

log(Data)

Random Guessing

Irreducible Error(model bias, Bayes Error, etc)

Acceptable Error

log(Computer Speed)

Page 27: Deep Learning scaling is predictable (empirically)

#2: Achievable error follows Moore’s LawSupporting Evidence

6http://cpudb.stanford.edu/

log(ComputerSpeed)

Time

Page 28: Deep Learning scaling is predictable (empirically)

#2: Achievable error follows Moore’s Lawlo

g(Ac

hiev

able

Erro

r)

Time

Random Guessing

Irreducible Error(model bias, Bayes Error, etc)

Page 29: Deep Learning scaling is predictable (empirically)

#3: Requirements are predictable

• We can now predict

• How much data we need

• How fast computers need to be

Page 30: Deep Learning scaling is predictable (empirically)

#4: Model architecture search

• Search may be feasible in the small data regime• if architecture affects the intercept, not the slope

• Caveats:• variance• models with different irreducible error

Page 31: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

We need you!

Page 32: Deep Learning scaling is predictable (empirically)

Reproduce our work

+

relu

weights

weights

H

C

T

+

*

* H

C

T

+

*

*

SPRECTRA NETRECURRENT

HIGHWAY NET

RNN + ATTENTIONRESNET ?

Page 33: Deep Learning scaling is predictable (empirically)

Build AI Data Centers

AI Node1x

2017

AI Data Center10,000x-100,000x

2025

Improved AI Chips10x-100x

2025

Page 34: Deep Learning scaling is predictable (empirically)

Join Us!

• http://bit.ly/join-svail

Page 35: Deep Learning scaling is predictable (empirically)

Silicon Valley AI Lab

Deep Learning scaling is predictable(empirically)http://research.baidu.com/deep-learning-scaling-predictable-empirically/https://arxiv.org/abs/1712.00409

Greg DiamosDecember 9, 2017