using sigopt to tune deep learning models with nervana cloud

27
USING SIGOPT TO TUNE DEEP LEARNING MODELS WITH NERVANA CLOUD Scott Clark Co-founder and CEO of SigOpt [email protected] @DrScottClark

Upload: sigopt

Post on 09-Jan-2017

442 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

USING SIGOPT TO TUNE DEEP LEARNING MODELS WITH NERVANA CLOUD

Scott ClarkCo-founder and CEO of SigOpt

[email protected] @DrScottClark

Page 2: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

TRIAL AND ERROR WASTES EXPERT TIME

● Deep Learning is extremely powerful

● Tuning Deep Learning systems is extremely non-intuitive

Page 3: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

UNRESOLVED PROBLEM IN ML

https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3

What is the most important unresolved problem in machine learning?

“...we still don't really know why some configurations of deep neural networks work in some case and not others, let alone having a more or less automatic approach to determining the architectures and the hyperparameters.”

Xavier Amatriain, VP Engineering at Quora(former Director of Research at Netflix)

Page 4: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

TUNING DEEP LEARNING MODELS

Big DataDeep Learning

System

With tunable parametersExpertise

Page 5: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

TUNING DEEP LEARNING MODELS

Big DataMetics

Optimally suggestsnew parameters

Objective

New parameters

Expertise

Deep Learning System

With tunable parameters

Page 6: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

TUNING DEEP LEARNING MODELS

Big DataMetics

Optimally suggestsnew parameters

Objective

New parameters

Better Results

Expertise

Deep Learning System

With tunable parameters

Page 7: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

COMMON APPROACH

Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012

1. Random search or grid search2. Expert defined grid search near “good” points3. Refine domain and repeat steps - “grad student descent”

Page 8: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

COMMON APPROACH

● Expert intensive● Computationally intensive● Finds potentially local optima● Does not fully exploit useful information

Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012

1. Random search or grid search2. Expert defined grid search near “good” points3. Refine domain and repeat steps - “grad student descent”

Page 9: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

… the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive.

Prof. Warren Powell - Princeton

What is the most efficient way to collect information?Prof. Peter Frazier - Cornell

How do we make the most money, as fast as possible?Me - @DrScottClark

OPTIMAL LEARNING

Page 10: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

● Optimize some Overall Evaluation Criterion (OEC)○ Loss, Accuracy, Likelihood, Revenue

● Given tunable parameters○ Hyperparameters, feature parameters

● In an efficient way○ Sample function as few times as possible○ Training on big data is expensive

BAYESIAN GLOBAL OPTIMIZATION

Details at https://sigopt.com/research

Page 11: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

EXAMPLE: TUNING DNN CLASSIFIERS

CIFAR10 Dataset● Photos of objects

● 10 classes

● Metric: Accuracy○ [0.1, 1.0]

Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.

Page 12: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

● All convolutional neural network● Multiple convolutional and dropout layers● Hyperparameter optimization mixture of

domain expertise and grid search (brute force)

USE CASE: ALL CONVOLUTIONAL

http://arxiv.org/pdf/1412.6806.pdf

Page 13: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

EXAMPLE: NCLOUD/NEON

● epochs: “number of epochs to run fit” - int [1,∞]● learning rate: influence on current value of weights at each step - double (0, 1]● momentum coefficient: “the coefficient of momentum” - double (0, 1]● weight decay: parameter affecting how quickly weight decays - double (0, 1]● depth: parameter affecting number of layers in net - int [1, 20(?)]● gaussian scale: standard deviation of initialization normal dist. - double (0,∞] ● momentum step change: mul. amount to decrease momentum - double (0, 1]● momentum step schedule start: epoch to start decreasing momentum - int [1,∞]● momentum schedule width: epoch stride for decreasing momentum - int [1,∞]

Many tunable parameters...

...optimal values non-intuitive

Page 14: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

COMPARATIVE PERFORMANCE

● Expert baseline: 0.8995○ (using neon)

● SigOpt best: 0.9011○ 1.6% reduction in

error rate○ No expert time

wasted in tuning

Page 15: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

USE CASE: DEEP RESIDUAL

http://arxiv.org/pdf/1512.03385v1.pdf

● Explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions

● Variable depth● Hyperparameter optimization mixture of domain expertise and grid

search (brute force)

Page 16: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

COMPARATIVE PERFORMANCE

Standard Method

● Expert baseline: 0.9339○ (from paper)

● SigOpt best: 0.9343○ Found after 17 trials○ 0.61% reduction in

error rate○ No expert time

wasted in tuning

Page 18: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

TRY OUT SIGOPT FOR FREE

https://sigopt.com/get_started

● Quick example and intro to SigOpt● No signup required● Visual and code examples

https://sigopt.com/text-classifier

● Jupyter Notebook● Use SigOpt to tune feature and model parameters● Detailed walkthrough with code

Page 19: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

MORE EXAMPLES

https://github.com/sigopt/sigopt-examples Examples of using SigOpt in a variety of languages and contexts.

Tuning Machine Learning Models (with code)A comparison of different hyperparameter optimization methods.

Using Model Tuning to Beat Vegas (with code)Using SigOpt to tune a model for predicting basketball scores.

Learn more about the technology behind SigOpt athttps://sigopt.com/research

Page 20: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

HOW DOES IT WORK?

1. User reports data

2. SigOpt builds statistical model (Gaussian Process)

3. SigOpt finds the points of highest Expected Improvement

4. SigOpt suggests best parameters to test next

5. User tests those parameters and reports results to SigOpt

6. Repeat

Page 21: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

HOW DOES IT WORK?

1. User reports data

2. SigOpt builds statistical model (Gaussian Process)

3. SigOpt finds the points of highest Expected Improvement

4. SigOpt suggests best parameters to test next

5. User tests those parameters and reports results to SigOpt

6. Repeat

Page 22: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

HOW DOES IT WORK?

1. User reports data

2. SigOpt builds statistical model (Gaussian Process)

3. SigOpt finds the points of highest Expected Improvement

4. SigOpt suggests best parameters to test next

5. User tests those parameters and reports results to SigOpt

6. Repeat

Page 23: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

HOW DOES IT WORK?

1. User reports data

2. SigOpt builds statistical model (Gaussian Process)

3. SigOpt finds the points of highest Expected Improvement

4. SigOpt suggests best parameters to test next

5. User tests those parameters and reports results to SigOpt

6. Repeat

Page 24: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

HOW DOES IT WORK?

1. User reports data

2. SigOpt builds statistical model (Gaussian Process)

3. SigOpt finds the points of highest Expected Improvement

4. SigOpt suggests best parameters to test next

5. User tests those parameters and reports results to SigOpt

6. Repeat

Page 25: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

HOW DOES IT WORK?

1. User reports data

2. SigOpt builds statistical model (Gaussian Process)

3. SigOpt finds the points of highest Expected Improvement

4. SigOpt suggests best parameters to test next

5. User tests those parameters and reports results to SigOpt

6. Repeat

Page 26: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

GPs: FUNCTIONAL VIEW

Page 27: Using SigOpt to Tune Deep Learning Models with Nervana Cloud

overfit good fit underfit

GPs: FITTING THE GP