hyperparameter search in machine learningclaesenm/optunity/varia/...hyperparameter search in machine...
TRANSCRIPT
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Hyperparameter Search in Machine Learning
Marc Claesen and Bart De Moor
ESAT-STADIUS, KU LeuveniMinds Medical IT Department
STADIUSCenter for Dynamical Systems,
Signal Processing and Data Analytics
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Outline
1 Introduction
2 Example: optimizing hyperparameters for an SVM classifier
3 Challenges in hyperparameter search
4 State-of-the-art
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Machine learning
Methods capable of learning patterns of interest from data.
by formulating the learning task as an optimization problem
Machine learning is situated on the intersection of various fields:
statistics, computer science, optimization, (biology), . . .
The field encompasses learning methods with various origins, e.g.:
biology, e.g. neural networks [1]
convex optimization, e.g. support vector machines [2]
statistics, e.g. hidden Markov models [3]
tensor decompositions, e.g. recommender systems [4]
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Machine learning
Methods capable of learning patterns of interest from data.
by formulating the learning task as an optimization problem
Machine learning is situated on the intersection of various fields:
statistics, computer science, optimization, (biology), . . .
The field encompasses learning methods with various origins, e.g.:
biology, e.g. neural networks [1]
convex optimization, e.g. support vector machines [2]
statistics, e.g. hidden Markov models [3]
tensor decompositions, e.g. recommender systems [4]
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Machine learning
Methods capable of learning patterns of interest from data.
by formulating the learning task as an optimization problem
Machine learning is situated on the intersection of various fields:
statistics, computer science, optimization, (biology), . . .
The field encompasses learning methods with various origins, e.g.:
biology, e.g. neural networks [1]
convex optimization, e.g. support vector machines [2]
statistics, e.g. hidden Markov models [3]
tensor decompositions, e.g. recommender systems [4]
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Hyperparameter search
Most machine learning methods are (hyper)parameterized.
e.g. Occam’s razor: model complexity and overfitting
Hyperparameters can significantly impact performance
suitable hyperparameters must be determined for each task
occurs in both supervised and unsupervised learning→ need for disciplined, automated optimization methods
Some examples:
SVM: regularization and kernel hyperparameters
ANN: regularization, network architecture, transfer functions
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Hyperparameter search
Most machine learning methods are (hyper)parameterized.
e.g. Occam’s razor: model complexity and overfitting
Hyperparameters can significantly impact performance
suitable hyperparameters must be determined for each task
occurs in both supervised and unsupervised learning→ need for disciplined, automated optimization methods
Some examples:
SVM: regularization and kernel hyperparameters
ANN: regularization, network architecture, transfer functions
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Hyperparameter search
Most machine learning methods are (hyper)parameterized.
e.g. Occam’s razor: model complexity and overfitting
Hyperparameters can significantly impact performance
suitable hyperparameters must be determined for each task
occurs in both supervised and unsupervised learning→ need for disciplined, automated optimization methods
Some examples:
SVM: regularization and kernel hyperparameters
ANN: regularization, network architecture, transfer functions
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Formalizing hyperparameter tuning
In a general sense, tuning involves these components:
a learning algorithm A, parameterized by hyperparameters λ
training and test data X(tr), X(te)
a model M = A(X(tr) | λ)
loss function L to assess quality of M, typically using X(te):L(M | X(te))
In optimization terms, we aim to find λ∗ (assuming minimization):
λ∗ = arg minλL(A(X(tr) | λ) | X(te)
)= arg min
λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸
objective function
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Formalizing hyperparameter tuning
In a general sense, tuning involves these components:
a learning algorithm A, parameterized by hyperparameters λ
training and test data X(tr), X(te)
a model M = A(X(tr) | λ)
loss function L to assess quality of M, typically using X(te):L(M | X(te))
In optimization terms, we aim to find λ∗ (assuming minimization):
λ∗ = arg minλL(A(X(tr) | λ) | X(te)
)= arg min
λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸
objective function
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Formalizing hyperparameter tuning
In a general sense, tuning involves these components:
a learning algorithm A, parameterized by hyperparameters λ
training and test data X(tr), X(te)
a model M = A(X(tr) | λ)
loss function L to assess quality of M, typically using X(te):L(M | X(te))
In optimization terms, we aim to find λ∗ (assuming minimization):
λ∗ = arg minλL(A(X(tr) | λ) | X(te)
)= arg min
λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸
objective function
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Formalizing hyperparameter tuning
In a general sense, tuning involves these components:
a learning algorithm A, parameterized by hyperparameters λ
training and test data X(tr), X(te)
a model M = A(X(tr) | λ)
loss function L to assess quality of M, typically using X(te):L(M | X(te))
In optimization terms, we aim to find λ∗ (assuming minimization):
λ∗ = arg minλL(A(X(tr) | λ) | X(te)
)= arg min
λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸
objective function
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Formalizing hyperparameter tuning
In a general sense, tuning involves these components:
a learning algorithm A, parameterized by hyperparameters λ
training and test data X(tr), X(te)
a model M = A(X(tr) | λ)
loss function L to assess quality of M, typically using X(te):L(M | X(te))
In optimization terms, we aim to find λ∗ (assuming minimization):
λ∗ = arg minλL(A(X(tr) | λ) | X(te)
)
= arg minλF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸
objective function
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Formalizing hyperparameter tuning
In a general sense, tuning involves these components:
a learning algorithm A, parameterized by hyperparameters λ
training and test data X(tr), X(te)
a model M = A(X(tr) | λ)
loss function L to assess quality of M, typically using X(te):L(M | X(te))
In optimization terms, we aim to find λ∗ (assuming minimization):
λ∗ = arg minλL(A(X(tr) | λ) | X(te)
)= arg min
λF(λ | A,X(tr),X(te),L)︸ ︷︷ ︸
objective function
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Tuning in practice
Most often done using a combination of grid and manual search:
grid search suffers from the curse of dimensionality
manual tuning leads to poor reproducibility
Better solutions exist but lack adoption because:
potential performance improvements are underestimated
lack of availability and/or ease of use
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Tuning in practice
Most often done using a combination of grid and manual search:
grid search suffers from the curse of dimensionality
manual tuning leads to poor reproducibility
Better solutions exist but lack adoption because:
potential performance improvements are underestimated
lack of availability and/or ease of use
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Outline
1 Introduction
2 Example: optimizing hyperparameters for an SVM classifier
3 Challenges in hyperparameter search
4 State-of-the-art
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Support vector machine (SVM) classifiers
minα,ξ,b
1
2
∑i∈SV
∑j∈SV
αiαjyiyjκ(xi , xj) + Cn∑
i=1
ξi ,
subject to yi( ∑j∈SV
αiαjyiyjκ(xi , xj) + b)≥ 1− ξi , ξi ≥ 0, ∀i .
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Support vector machine (SVM) classifiers
minα,ξ,b
1
2
∑i∈SV
∑j∈SV
αiαjyiyjκ(xi , xj) + Cn∑
i=1
ξi ,
subject to yi( ∑j∈SV
αiαjyiyjκ(xi , xj) + b)≥ 1− ξi , ξi ≥ 0, ∀i .
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Support vector machine (SVM) classifiers
minα,ξ,b
1
2
∑i∈SV
∑j∈SV
αiαjyiyjκ(xi , xj) + Cn∑
i=1
ξi ,
subject to yi( ∑j∈SV
αiαjyiyjκ(xi , xj) + b)≥ 1− ξi , ξi ≥ 0, ∀i .
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Task: optimize hyperparameters for an SVM
Tune an SVM classifier with RBF kernel κ(u, v) = e−γ‖u−v‖2:
minα,b,ξ
1
2
∑i∈SV
∑j∈SV
αiαjyiyj exp(− γ‖xi − xj‖2
)︸ ︷︷ ︸
‖w‖2
+C∑i∈SV
ξi
optimize regularization parameter C and kernel parameter γevaluate (C , γ) pair using 2× iterated 10-fold cross-validationvia Optunity’s particle swarm optimizer [5]
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Task: optimize hyperparameters for an SVM
Tune an SVM classifier with RBF kernel κ(u, v) = e−γ‖u−v‖2:
minα,b,ξ
1
2
∑i∈SV
∑j∈SV
αiαjyiyj exp(− γ‖xi − xj‖2
)︸ ︷︷ ︸
‖w‖2
+C∑i∈SV
ξi
optimize regularization parameter C and kernel parameter γevaluate (C , γ) pair using 2× iterated 10-fold cross-validationvia Optunity’s particle swarm optimizer [5]
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Task: optimize hyperparameters for an SVM
Tune an SVM classifier with RBF kernel κ(u, v) = e−γ‖u−v‖2:
minα,b,ξ
1
2
∑i∈SV
∑j∈SV
αiαjyiyj exp(− γ‖xi − xj‖2
)︸ ︷︷ ︸
‖w‖2
+C∑i∈SV
ξi
optimize regularization parameter C and kernel parameter γevaluate (C , γ) pair using 2× iterated 10-fold cross-validationvia Optunity’s particle swarm optimizer [5]
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Response surface I
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Response surface II
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Outline
1 Introduction
2 Example: optimizing hyperparameters for an SVM classifier
3 Challenges in hyperparameter search
4 State-of-the-art
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Expensive function evaluations
A single objective function evaluation consists of:
1 training a model via the learning methodcan be very time consuming (days up to weeks! [6, 7, 8])
2 predict a test set (for supervised methods)
3 compute some evaluation metric for the model/its predictions
All of the above is often done in cross-validation [9, 10].
used to reliably estimate generalization performance
involves many repetitions → exacerbates computation time
Training/evaluation time is a function of hyperparameter choice!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Expensive function evaluations
A single objective function evaluation consists of:
1 training a model via the learning methodcan be very time consuming (days up to weeks! [6, 7, 8])
2 predict a test set (for supervised methods)
3 compute some evaluation metric for the model/its predictions
All of the above is often done in cross-validation [9, 10].
used to reliably estimate generalization performance
involves many repetitions → exacerbates computation time
Training/evaluation time is a function of hyperparameter choice!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Expensive function evaluations
A single objective function evaluation consists of:
1 training a model via the learning methodcan be very time consuming (days up to weeks! [6, 7, 8])
2 predict a test set (for supervised methods)
3 compute some evaluation metric for the model/its predictions
All of the above is often done in cross-validation [9, 10].
used to reliably estimate generalization performance
involves many repetitions → exacerbates computation time
Training/evaluation time is a function of hyperparameter choice!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Randomness
The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps
This gives rise to a stochastic component, inherent to:
the learning method (e.g. resampling methods [11, 12, 13])
random sampling (e.g. cross-validation, bootstrap [10, 9])
The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results
Empirical optimum might not really be best!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Randomness
The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps
This gives rise to a stochastic component, inherent to:
the learning method (e.g. resampling methods [11, 12, 13])
random sampling (e.g. cross-validation, bootstrap [10, 9])
The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results
Empirical optimum might not really be best!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Randomness
The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps
This gives rise to a stochastic component, inherent to:
the learning method (e.g. resampling methods [11, 12, 13])
random sampling (e.g. cross-validation, bootstrap [10, 9])
The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results
Empirical optimum might not really be best!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Randomness
The objective function measures empirical performance based on afinite sample (data set) → induces discrete, non-smooth jumps
This gives rise to a stochastic component, inherent to:
the learning method (e.g. resampling methods [11, 12, 13])
random sampling (e.g. cross-validation, bootstrap [10, 9])
The objective function F is not a strict mathematical function→ evaluating F(x) multiple times yields multiple results
Empirical optimum might not really be best!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Exotic search spaces
Hyperparameter search spaces can be extremely complex:
mixed integer-continuous (e.g. regularization & kernel)
often domain constrained (e.g. positive regularization)
combinatorial (e.g. feature selection)
conditional dimensions (*)
(*) Consider the architecture of an artificial neural network:
number of hidden layers
size per hidden layer
(transfer functions per layer)
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Exotic search spaces
Hyperparameter search spaces can be extremely complex:
mixed integer-continuous (e.g. regularization & kernel)
often domain constrained (e.g. positive regularization)
combinatorial (e.g. feature selection)
conditional dimensions (*)
(*) Consider the architecture of an artificial neural network:
number of hidden layers
size per hidden layer
(transfer functions per layer)
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Desiderata for hyperparameter optimizers
Optimization routines for hyperparameter search are ideally:
efficient in terms of function evaluations,
appropriate for wildly varying objective functions,
able to account for randomness,
flexible in terms of search space,
parallelizable.
The practical performance bottleneck is evaluating F → decidingon the next point to evaluate need not be fast
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Outline
1 Introduction
2 Example: optimizing hyperparameters for an SVM classifier
3 Challenges in hyperparameter search
4 State-of-the-art
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Sequential model-based optimization (SMBO)
Commonly used for time-consuming objective functions F [14, 15].
SMBO is an iterative approach, in which each iteration involves:
1 model the response surface M, based on previous evaluations→ evaluating M is cheap, use M as surrogate for F
2 find optimal test point x∗ based on M→ optimize some criterion, e.g. expected improvement [16]
Approaches differ in terms of model and criterion [14, 15, 17].
But: inherently sequential!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Sequential model-based optimization (SMBO)
Commonly used for time-consuming objective functions F [14, 15].
SMBO is an iterative approach, in which each iteration involves:
1 model the response surface M, based on previous evaluations→ evaluating M is cheap, use M as surrogate for F
2 find optimal test point x∗ based on M→ optimize some criterion, e.g. expected improvement [16]
Approaches differ in terms of model and criterion [14, 15, 17].
But: inherently sequential!
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Metaheuristic optimization techniques
A large variety of metaheuristic methods have been used, such as:
particle swarm optimization [18, 19, 20]
genetic algorithms [21, 22]
artificial bee colony [23]
harmonic search [24]
simulated annealing [25]
Nelder-Mead simplex [26]
Advantages:
ease of implementation and parallelization
general purpose solvers → few implicit assumptions
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Software
Several packages offer Bayesian SMBO approaches:
Hyperopt [27], Spearmint [17]
ParamILS [28], AutoWEKA [29]
BayesOpt [30], DiceKriging [31]
Optunity offers fundamentally distinct methods [5]:
focus on metaheuristic techniques not offered elsewhere
PSO, CMA-ES, random search, sobol sequences, . . .
multiplatform: Python, R, MATLAB, Octave
General purpose optimization libraries also applicable→ but often difficult to integrate in machine learning pipeline
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Software
Several packages offer Bayesian SMBO approaches:
Hyperopt [27], Spearmint [17]
ParamILS [28], AutoWEKA [29]
BayesOpt [30], DiceKriging [31]
Optunity offers fundamentally distinct methods [5]:
focus on metaheuristic techniques not offered elsewhere
PSO, CMA-ES, random search, sobol sequences, . . .
multiplatform: Python, R, MATLAB, Octave
General purpose optimization libraries also applicable→ but often difficult to integrate in machine learning pipeline
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Metaheuristic methods are competitive to SMBO
Optunity’s standard PSO [5] versus Hyperopt’s tree-structuredParzen estimator [15, 27] on two-dimensional rastrigin function.
1 100 200 300 400 500
100
101
winnerfunction evaluation number
error
random search
tree of Parzen estimators
particle swarm optimization
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Conclusion
Hyperparameter search in machine learning
requires disciplined optimization methods
is receiving a lot of research attention, e.g. ChaLearn AutoML
The main challenges are:
expensive function evaluations with a stochastic component
exotic search spaces
Hyperparameter search is an interesting optimization problem→ metaheuristic optimization methods are good candidates
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Conclusion
Hyperparameter search in machine learning
requires disciplined optimization methods
is receiving a lot of research attention, e.g. ChaLearn AutoML
The main challenges are:
expensive function evaluations with a stochastic component
exotic search spaces
Hyperparameter search is an interesting optimization problem→ metaheuristic optimization methods are good candidates
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Conclusion
Hyperparameter search in machine learning
requires disciplined optimization methods
is receiving a lot of research attention, e.g. ChaLearn AutoML
The main challenges are:
expensive function evaluations with a stochastic component
exotic search spaces
Hyperparameter search is an interesting optimization problem→ metaheuristic optimization methods are good candidates
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
Acknowledgements
Research Council KU Leuven: GOA/10/09 MaNet
Flemish Government:
FWO: projects: G.0871.12N (Neural circuits)IWT: TBM-Logic Insulin(100793), TBM RectalCancer(100783), TBM IETA(130256); PhD grant (111065)Industrial Research fund (IOF): IOF/HB/13/027 Logic InsuliniMinds Medical Information Technologies SBO 2014VLK Stichting E. van der Schueren: rectal cancer
Federal Government: FOD: Cancer Plan 2012-2015KPC-29-023 (prostate)
COST: Action: BM1104: Mass Spectrometry Imaging
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References I
[1] Simon Haykin and Neural Network. A comprehensivefoundation. Neural Networks, 2(2004), 2004.
[2] Corinna Cortes and Vladimir Vapnik. Support-vectornetworks. Machine learning, 20(3):273–297, 1995.
[3] Lawrence Rabiner. A tutorial on hidden markov models andselected applications in speech recognition. Proceedings ofthe IEEE, 77(2):257–286, 1989.
[4] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas,and Nuria Oliver. Multiverse recommendation: n-dimensionaltensor factorization for context-aware collaborative filtering.In Proceedings of the fourth ACM conference onRecommender systems, pages 79–86. ACM, 2010.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References II
[5] Marc Claesen, Jaak Simm, Dusan Popovic, Yves Moreau, andBart De Moor. Easy hyperparameter search using Optunity.arXiv preprint arXiv:1412.1114, 2014.
[6] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.Imagenet classification with deep convolutional neuralnetworks. In Advances in neural information processingsystems, pages 1097–1105, 2012.
[7] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen,Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker,Ke Yang, Quoc V Le, et al. Large scale distributed deepnetworks. In Advances in Neural Information ProcessingSystems, pages 1223–1231, 2012.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References III
[8] Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence tosequence learning with neural networks. In Advances in NeuralInformation Processing Systems, pages 3104–3112, 2014.
[9] Bradley Efron and Gail Gong. A leisurely look at thebootstrap, the jackknife, and cross-validation. The AmericanStatistician, 37(1):36–48, 1983.
[10] Ron Kohavi. A study of cross-validation and bootstrap foraccuracy estimation and model selection. In InternationalJoint Conference on Artificial Intelligence, volume 14, pages1137–1145, 1995.
[11] Leo Breiman. Random forests. Machine learning, 45(1):5–32,2001.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References IV
[12] Marc Claesen, Frank De Smet, Johan A.K. Suykens, andBart De Moor. EnsembleSVM: A library for ensemble learningusing support vector machines. Journal of Machine LearningResearch, 15:141–145, 2014.
[13] Marc Claesen, Frank De Smet, Johan AK Suykens, and BartDe Moor. A robust ensemble approach to learn from positiveand unlabeled data using svm base models. Neurocomputing,160:73–84, 2015.
[14] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown.Sequential model-based optimization for general algorithmconfiguration. In Learning and Intelligent Optimization, pages507–523. Springer, 2011.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References V
[15] James S Bergstra, Remi Bardenet, Yoshua Bengio, and BalazsKegl. Algorithms for hyper-parameter optimization. InAdvances in Neural Information Processing Systems, pages2546–2554, 2011.
[16] Donald R Jones, Matthias Schonlau, and William J Welch.Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998.
[17] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. PracticalBayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems, pages2951–2959, 2012.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References VI
[18] Michael Meissner, Michael Schmuker, and Gisbert Schneider.Optimized particle swarm optimization (opso) and itsapplication to artificial neural network training. BMCbioinformatics, 7(1):125, 2006.
[19] XC Guo, JH Yang, CG Wu, CY Wang, and YC Liang. A novells-svms hyper-parameter selection based on particle swarmoptimization. Neurocomputing, 71(16):3211–3215, 2008.
[20] Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, andZne-Jung Lee. Particle swarm optimization for parameterdetermination and feature selection of support vectormachines. Expert systems with applications,35(4):1817–1824, 2008.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References VII
[21] Jinn-Tsong Tsai, Jyh-Horng Chou, and Tung-Kuan Liu.Tuning the structure and parameters of a neural network byusing hybrid taguchi-genetic algorithm. Neural Networks,IEEE Transactions on, 17(1):69–80, 2006.
[22] Carlos Ansotegui, Meinolf Sellmann, and Kevin Tierney. Agender-based genetic algorithm for the automaticconfiguration of algorithms. In Principles and Practice ofConstraint Programming-CP 2009, pages 142–157. Springer,2009.
[23] Dervis Karaboga, Bahriye Akay, and Celal Ozturk. Artificialbee colony (abc) optimization algorithm for trainingfeed-forward neural networks. In Modeling decisions forartificial intelligence, pages 318–329. Springer, 2007.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References VIII
[24] Joao P Papa, Gustavo H Rosa, Aparecido N Marana, WalterScheirer, and David D Cox. Model selection for DiscriminativeRestricted Boltzmann Machines through meta-heuristictechniques. Journal of Computational Science, 9:14–18, 2015.
[25] Samuel Xavier-de Souza, Johan AK Suykens, JoosVandewalle, and Desire Bolle. Coupled simulated annealing.Systems, Man, and Cybernetics, Part B: Cybernetics, IEEETransactions on, 40(2):320–335, 2010.
[26] Gavin C Cawley and Nicola LC Talbot. Fast exactleave-one-out cross-validation of sparse least-squares supportvector machines. Neural networks, 17(10):1467–1475, 2004.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References IX
[27] James Bergstra, Dan Yamins, and David D Cox. Hyperopt: Apython library for optimizing the hyperparameters of machinelearning algorithms. In Proceedings of the 12th Python inScience Conference, pages 13–20. SciPy, 2013.
[28] Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, andThomas Stutzle. ParamILS: an automatic algorithmconfiguration framework. Journal of Artificial IntelligenceResearch, 36(1):267–306, 2009.
[29] Chris Thornton, Frank Hutter, Holger H. Hoos, and KevinLeyton-Brown. Auto-WEKA: Automated selection andhyper-parameter optimization of classification algorithms.CoRR, abs/1208.3719, 2012.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning
IntroductionExample: optimizing hyperparameters for an SVM classifier
Challenges in hyperparameter searchState-of-the-art
References
References X
[30] Ruben Martinez-Cantin. BayesOpt: A Bayesian optimizationlibrary for nonlinear optimization, experimental design andbandits. arXiv preprint arXiv:1405.7430, 2014.
[31] Olivier Roustant, David Ginsbourger, Yves Deville, et al.DiceKriging, DiceOptim: Two R packages for the analysis ofcomputer experiments by kriging-based metamodeling andoptimization. 2012.
Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning