nih.ai workshop on hyperparameters optimization intro to ...hpo-introduction.pdf · hpo and high...

17
DEPARTMENT OF HEALTH AND HUMAN SERVICES • National Institutes of Health • National Cancer Institute Frederick National Laboratory is a federally funded research and development center operated by Leidos Biomedical Research, Inc., for the National Cancer Institute. NIH.AI Workshop on Hyperparameters Optimization Intro to hyperparameter sweeps techniques George Zaki, Ph.D. [C] BIDS, Frederick National Lab for Cancer Research (FNLCR) July 18, 2019

Upload: others

Post on 29-May-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

DEPARTMENT OF HEALTH AND HUMAN SERVICES • National Institutes of Health • National Cancer InstituteFrederick National Laboratory is a federally funded research and development center operated by Leidos Biomedical Research, Inc., for the National Cancer Institute.

NIH.AI Workshop on Hyperparameters Optimization

Intro to hyperparameter sweeps techniquesGeorge Zaki, Ph.D. [C]BIDS, Frederick National Lab for Cancer Research (FNLCR)

July 18, 2019

Page 2: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Model’s Parameters

• Are fit during training• Result of model fitting or training• Also optimized during training

• Examples• Slope and intercept in linear modeling

• Weights and biases in Neural Nets

Page 3: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

What are Hyperparameters?

• Parameters of your system with no straightforward method on how to set their values:– Usually set before learning process– Is not directly estimated from the data

deepai.org

Page 4: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Examples of Hyperparameters

• The depth of a decision tree• Number of trees in a forest• Number of hidden layers and neurons in a neural network, • Degree of regularization to prevent overfitting• K in K-means• Learning rate schedule in Stochastic Gradient Descent (SGD)• …. Activation functions

28

Page 5: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Generalized Machine Learning Workflow

https://github.com/ECP-CANDLE/Tutorials/tree/master/2019/ECP

Page 6: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

What is Hyperparameter Optimization

• Models have a large number of possible configuration parameters, called hyperparameters

• Applying optimization can automate part of the design of machine learning models

• Involves two problem: – How to set the values of the hyperparameters?– How to manage multiple evaluations on compute resources?

Hyperparameter Optimization (tuning) = HPO

Page 7: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Generalized HPO Diagram

https://sigopt.com/blog/common-problems-in-hyperparameter-optimization/

Page 8: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Basic HPO Strategies

• Grid search: – Generate all possible combinatorial configurations

6 hyperparameters, each with 4 values: 4^6 = 4096 configurations

• Random search– Randomly select some configurations to evaluate

• Sequential grid search: – Sequentially adjust one hyperparameter at a time, while fixing all other

hyperparameters

• Generic optimization– Evolutionary algorithms (Simulated annealing, particle swarm, genetic algorithms)

– Bayesian Optimization– Gradient-Based Optimization– Model-based optimization (mlrMBO in R)

Page 9: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

U-Net Hyper Parameters Example:288 possible configurations

ONLY 2 Levels of Max-Pooling

N_layers = {2,3,4,5}

How many convolution filters?

Num_filters= {16,32,64}

What is the activation function?

Activation= {relu, softmax, tanh}

Size of conv filter?

Filter_size = {3x3, 5x5}

Drop out some results to avoid

overfitting?

Drop_out = {0, 0.2, 0.4, 0.6, 0.8}

Page 10: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Effect of Hyper Parameters Sweepon the Objective Function

00.10.20.30.40.50.60.70.80.91

1 51 101 151 201 251

Configuration ID

DICE Values Intersection

Union

Ground Truth

Predicted

DICE =

Page 11: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Frederick National Laboratory for Cancer Research

Baseline Methods: Grid Search & Random Search

• Embarrassingly parallel• Curse of dimensionality

• Embarrassingly parallel• Does not learn from history

Page 12: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Frederick National Laboratory for Cancer Research

Bayesian Optimization

1. Initially select random configurations to evaluate

2. Build a surrogate gaussian process as approximation of the objective function based on seen evaluations (posteriorydistribution)

3. Select good configurations to evaluate next based on a surrogate function (acquisition function) of your real objective.

4. Balance exploration versus exploitation

5. Repeat steps 2:4 until you reach your compute budget

Gaussian process approximation of objective function from Eric Brochu, Cora and Freitas 2010

Page 13: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Frederick National Laboratory for Cancer Research

Random versus Bayesian

https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f

Page 14: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Frederick National Laboratory for Cancer Research

HPO packages

• Python:– Hyperopt– scikit-optimize – Spearmint

• R:– mlrMBO

• Cloud:– Google’s Hypertune– Amazon’s SageMaker

• NN hyperparameter-specific optimization– NEAT, Optunity, …

Page 15: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Frederick National Laboratory for Cancer Research

HPO and High Performance Computing (HPC)

• HPO requires good amount of compute resources• HPC Used to manage large-scale training runs

– Hyperparameter searches O(104) jobs

– Cross validation (5-fold, 10-fold, etc.)

• Each job could be 10’s to 100’s of nodes• At NIH, we can use the Biowulf HPC cluster to perform these

evaluations

Page 16: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Survey

• Please follow the following link to share your thoughts about the workshop:

https://bit.ly/2JPagbe

Page 17: NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High Performance Computing (HPC) •HPO requires good amount of compute resources •HPC

Frederick National Laboratory for Cancer Research

References

• https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf

• https://cloud.google.com/blog/products/gcp/hyperparameter-tuning-cloud-machine-learning-engine-using-bayesian-optimization

• https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html

• https://roamanalytics.com/2016/09/15/optimizing-the-hyperparameter-of-which-hyperparameter-optimizer-to-use/

• https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters