electrochem bo slides - people @ eecs at uc berkeley
TRANSCRIPT
An Introduction to Bayesian Optimisation and(Potential) Applications in Materials Science
Kirthevasan Kandasamy
Machine Learning Dept, CMU
Electrochemical Energy Symposium
Pittsburgh, PA, November 2017
Designing Electrolytes in Batteries
1/19
Black-box Optimisation in Computational Astrophysics
Cosmological Simulator
Observation
E.g:Hubble ConstantBaryonic Density
Likelihood Score
Likelihood computation
1/19
Black-box Optimisation
Expensive Blackbox Function
Other Examples:- Pre-clinical Drug Discovery- Optimal policy in Autonomous Driving- Synthetic gene design
1/19
Black-box Optimisation
f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.
Let x? = argmaxx f (x).
x
f(x)
2/19
Black-box Optimisation
f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.
Let x? = argmaxx f (x).
x
f(x)
2/19
Black-box Optimisation
f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.Let x? = argmaxx f (x).
x
f(x)
x∗
f(x∗)
2/19
Outline
I Part I: Bayesian Optimisation
I Bayesian Models for f
I Two algorithms: upper confidence bounds & Thompsonsampling
I Part II: Some Modern Challenges
I Multi-fidelity Optimisation
I Parallelisation
3/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Functions with no observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Functions with no observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Prior GP
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Posterior GP given observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Models for f e.g. Gaussian Processes (GP)
GP: A distribution over functions from X to R.
Posterior GP given observations
x
f(x)
After t observations, f (x) ∼ N (µt(x), σ2t (x) ).
4/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x)
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x)
1) Construct posterior GP.
2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x) ϕt = µt−1 + β1/2t σt−1
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x) ϕt = µt−1 + β1/2t σt−1
xt
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x).
4) Evaluate f at xt .
5/19
Bayesian Optimisation with Upper Confidence Bounds
Model f ∼ GP.
Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)
x
f(x) ϕt = µt−1 + β1/2t σt−1
xt
1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.
3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .5/19
GP-UCB (Srinivas et al. 2010)
x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 1x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 2x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 3x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 4x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 5x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 6x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 7x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 11x
f(x)
6/19
GP-UCB (Srinivas et al. 2010)
t = 25x
f(x)
6/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
1) Construct posterior GP.
2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
xt
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x).
4) Evaluate f at xt .
7/19
Bayesian Optimisation with Thompson Sampling
Model f ∼ GP(0, κ).
Thompson Sampling (TS) (Thompson, 1933).
x
f(x)
xt
1) Construct posterior GP. 2) Draw sample g from posterior.
3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .
7/19
More on Bayesian Optimisation
Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .
Other criteria for selecting xt :
I Expected improvement (Jones et al. 1998)
I Probability of improvement (Kushner et al. 1964)
I Predictive entropy search (Hernandez-Lobato et al. 2014)
I Information directed sampling (Russo & Van Roy 2014)
Other Bayesian models for f :
I Neural networks (Snoek et al. 2015)
I Random Forests (Hutter 2009)
8/19
More on Bayesian Optimisation
Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .
Other criteria for selecting xt :
I Expected improvement (Jones et al. 1998)
I Probability of improvement (Kushner et al. 1964)
I Predictive entropy search (Hernandez-Lobato et al. 2014)
I Information directed sampling (Russo & Van Roy 2014)
Other Bayesian models for f :
I Neural networks (Snoek et al. 2015)
I Random Forests (Hutter 2009)
8/19
More on Bayesian Optimisation
Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .
Other criteria for selecting xt :
I Expected improvement (Jones et al. 1998)
I Probability of improvement (Kushner et al. 1964)
I Predictive entropy search (Hernandez-Lobato et al. 2014)
I Information directed sampling (Russo & Van Roy 2014)
Other Bayesian models for f :
I Neural networks (Snoek et al. 2015)
I Random Forests (Hutter 2009)
8/19
Some Modern Challenges/Opportunities
1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b,
Kandasamy et al. ICML 2017)
2. Parallelisation (Kandasamy et al. Arxiv 2017)
9/19
1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)
Desired function f is very expensive, but . . .we have access to cheap approximations.
x⋆f
f1, f2, f3 ≈ f whichare cheaper to evaluate.
E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation
10/19
1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)
Desired function f is very expensive, but . . .we have access to cheap approximations.
x⋆
f1
f2
f3
f
f1, f2, f3 ≈ f whichare cheaper to evaluate.
E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation
10/19
1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)
Desired function f is very expensive, but . . .we have access to cheap approximations.
x⋆
f1
f2
f3
f
f1, f2, f3 ≈ f whichare cheaper to evaluate.
E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation
10/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b)
Multi-fidelity Gaussian Process Upper Confidence Bound
With 2 fidelities (1 Approximation),
x⋆xt
t = 14
f (1)
f (2)
Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).
Can be extended to multiple approximations and continuousapproximations.
11/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b)
Multi-fidelity Gaussian Process Upper Confidence Bound
With 2 fidelities (1 Approximation),
x⋆xt
t = 14
f (1)
f (2)
Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).
Can be extended to multiple approximations and continuousapproximations.
11/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b)
Multi-fidelity Gaussian Process Upper Confidence Bound
With 2 fidelities (1 Approximation),
x⋆xt
t = 14
f (1)
f (2)
Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).
Can be extended to multiple approximations and continuousapproximations.
11/19
Experiment: Cosmological Maximum Likelihood Inference
I Type Ia Supernovae Data
I Maximum likelihood inference for 3 cosmological parameters:
I Hubble Constant H0
I Dark Energy Fraction ΩΛ
I Dark Matter Fraction ΩM
I Likelihood: Robertson Walker metric (Robertson 1936)
Requires numerical integration for each point in the dataset.
12/19
Experiment: Cosmological Maximum Likelihood Inference
3 cosmological parameters. (d = 3)Fidelities: integration on grids of size (102, 104, 106). (M = 3)
500 1000 1500 2000 2500 3000 3500-10
-5
0
5
10
13/19
Experiment: Hartmann-3D
2 Approximations (3 fidelities).We want to optimise the m = 3rd fidelity, which is the mostexpensive. m = 1st fidelity is cheapest.
0 0.5 1 1.5 2 2.5 3 3.50
5
10
15
20
25
30
35
40
Num.of
Queries
Query frequencies for Hartmann-3D
f (3)(x)
m=1m=2m=3
14/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.
Sequential evaluations with one worker
Parallel evaluations with M workers (Asynchronous)
Parallel evaluations with M workers (Synchronous)
15/19
Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)
Asynchronous: asyTS
At any given time,1. (x ′, y ′)← Wait for
a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.
4. Re-deploy worker atargmax g .
Synchronous: synTS
At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for
all workers to finish.2. Compute posterior GP.3. Draw M samples
gm ∼ GP, ∀m.4. Re-deploy worker m at
argmax gm, ∀m.
16/19
Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)
Asynchronous: asyTS
At any given time,1. (x ′, y ′)← Wait for
a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.
4. Re-deploy worker atargmax g .
Synchronous: synTS
At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for
all workers to finish.2. Compute posterior GP.3. Draw M samples
gm ∼ GP, ∀m.4. Re-deploy worker m at
argmax gm, ∀m.
16/19
Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution
0 10 20 30 40
10 -2
10 -1
17/19
Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution
0 10 20 30 40
10 -2
10 -1
17/19
Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution
synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS
0 10 20 30 40
10 -2
10 -1
17/19
Experiment: Hartmann-18D M = 25Evaluation time sampled from an exponential distribution
synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS
0 5 10 15 20 25 30
2.5
3
3.5
4
4.5
55.5
66.5
18/19
Summary
I Black-box Optimisation methods are used in several scientificand engineering applications.
I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .
I Some modern challenges
I Multi-fidelity optimisation
I Parallel evaluations
I and several more . . .
Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa
19/19
Summary
I Black-box Optimisation methods are used in several scientificand engineering applications.
I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .
I Some modern challenges
I Multi-fidelity optimisation
I Parallel evaluations
I and several more . . .
Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa
19/19