alexandra johnson, software engineer, sigopt, at mlconf nyc 2017

Common Problems in Hyperparameter Optimization

Alexandra Johnson@alexandraj777

What are Hyperparameters?

Hyperparameter Optimization● Hyperparameter

tuning, model tuning, model selection

● Finding "the best" values for the hyperparameters of your model

Better Performance● +315% accuracy boost for TensorFlow● +49% accuracy boost for xgboost● -41% error reduction for recommender system

#1 Trusting the Defaults

● Default values are an implicit choice● Defaults not always appropriate for your model● You may build a classifier that looks like this:

Default Values

#2 Using the Wrong Metric

Choosing a Metric● Balance long-term

and short-term goals● Question underlying

assumptions● Example from

Microsoft

http://www.exp-platform.com/documents/puzzlingoutcomesincontrolledexperiments.pdf


Choose Multiple Metrics● ● Composite Metric

● Multi-metric

#3 Overfitting

Metric Generalization● Cross validation● Backtesting

● Regularization terms

#4 Too Few Hyperparameters

Optimize all Parameters at Once

Include Feature Parameters

Example: xgboost● Optimized model

always performed better with tuned feature parameters

● No matter which optimization method

#5 Hand Tuning

What is an Optimization Method?

You are not an Optimization Method● Hand tuning is time

consuming and expensive

● Algorithms can quickly and cheaply beat expert tuning

Grid Search Random Search Bayesian Optimization

Use an Algorithm

#6 Grid Search

No Grid Search

Hyper-parameters

Model Evaluations

2 100

3 1,000

4 10,000

5 100,000

#7 Random Search

Random Search● Theoretically more

effective than grid search

● Large variance in results

● No intelligence

Use an Intelligent MethodGenetic algorithms

Bayesian optimizationParticle-based methods

Convex optimizersSimulated annealing

To name a few...

http://sigopt.com

http://sigopt.com

SigOpt: Bayesian Optimization ServiceThree API calls:

1. Define hyperparameters

2. Receive suggested hyperparameters

3. Report observed performance

Thank You!

IntroIan Dewancker. SigOpt for ML: TensorFlow ConvNets on a Budget with Bayesian Optimization.Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization.Ian Dewancker. SigOpt for ML : Bayesian Optimization for Collaborative Filtering with MLlib.#1 Trusting the DefaultsKeras recurrent layers documentation#2 Using the Wrong MetricRon Kohavi et al. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained. Xavier Amatriain. 10 Lessons Learning from building ML systems [Video at 19:03]. Image from PhD Comics.See also: SigOpt in Depth: Intro to Multicriteria Optimization.#4 Too Few HyperparametersImage from TensorFlow Playground.Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization.#5 Hand TuningOn algorithms beating experts: Scott Clark, Ian Dewancker, and Sathish Nagappan. Deep Neural Network Optimization with SigOpt and Nervana Cloud.#6 Grid SearchNoGridSearch.com

References - by Section

http://blog.sigopt.com/post/140871698423/sigopt-for-ml-unsupervised-learning-with-even


http://blog.sigopt.com/post/148703071378/sigopt-for-ml-bayesian-optimization-for

https://keras.io/layers/recurrent/

https://keras.io/layers/recurrent/


https://youtu.be/WdzWPuazLA8?t=19m3s

http://www.phdcomics.com/comics/archive.php?comicid=1272

http://blog.sigopt.com/post/149041563563/sigopt-in-depth-intro-to-multicriteria

http://playground.tensorflow.org/


http://blog.sigopt.com/post/146208659358/much-deeper-much-faster-deep-neural-network



http://nogridsearch.com

http://nogridsearch.com

References - by Section#7 Random SearchJames Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke. A Stratified Analysis of Bayesian Optimization Methods. Learn Moreblog.sigopt.comsigopt.com/research

http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a

https://arxiv.org/pdf/1603.09441



http://blog.sigopt.com/

http://blog.sigopt.com/

https://sigopt.com/research

https://sigopt.com/research

alexandra johnson, software engineer, sigopt, at mlconf nyc 2017

Technology