electrochem bo slides - people @ eecs at uc berkeley

An Introduction to Bayesian Optimisation and(Potential) Applications in Materials Science

Kirthevasan Kandasamy

Machine Learning Dept, CMU

Electrochemical Energy Symposium

Pittsburgh, PA, November 2017

Designing Electrolytes in Batteries

1/19

Black-box Optimisation in Computational Astrophysics

Cosmological Simulator

Observation

E.g:Hubble ConstantBaryonic Density

Likelihood Score

Likelihood computation

1/19

Black-box Optimisation

Expensive Blackbox Function

Other Examples:- Pre-clinical Drug Discovery- Optimal policy in Autonomous Driving- Synthetic gene design

1/19


f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.

Let x? = argmaxx f (x).

x

f(x)

2/19


f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.Let x? = argmaxx f (x).

x

f(x)

x∗

f(x∗)

2/19

Outline

I Part I: Bayesian Optimisation

I Bayesian Models for f

I Two algorithms: upper confidence bounds & Thompsonsampling

I Part II: Some Modern Challenges

I Multi-fidelity Optimisation

I Parallelisation

3/19

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Functions with no observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19



Prior GP

x

f(x)


4/19



Observations

x

f(x)


4/19



Posterior GP given observations

x

f(x)


4/19

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x)

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .

5/19


Model f ∼ GP.


x

f(x)

1) Construct posterior GP.

2) ϕt = µt−1 + β1/2t σt−1 is a UCB.


5/19


Model f ∼ GP.


x

f(x) ϕt = µt−1 + β1/2t σt−1



5/19


Model f ∼ GP.


x

f(x) ϕt = µt−1 + β1/2t σt−1

xt


3) Choose xt = argmaxx ϕt(x).

4) Evaluate f at xt .

5/19


Model f ∼ GP.


x

f(x) ϕt = µt−1 + β1/2t σt−1

xt


3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .5/19

GP-UCB (Srinivas et al. 2010)

x

f(x)

6/19


t = 1x

f(x)

6/19


t = 2x

f(x)

6/19


t = 3x

f(x)

6/19


t = 4x

f(x)

6/19


t = 5x

f(x)

6/19


t = 6x

f(x)

6/19


t = 7x

f(x)

6/19


t = 11x

f(x)

6/19


t = 25x

f(x)

6/19

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19




x

f(x)

1) Construct posterior GP.

2) Draw sample g from posterior.


7/19




x

f(x)



7/19




x

f(x)

xt


3) Choose xt = argmaxx g(x).

4) Evaluate f at xt .

7/19




x

f(x)

xt



7/19

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .

Other criteria for selecting xt :

I Expected improvement (Jones et al. 1998)

I Probability of improvement (Kushner et al. 1964)

I Predictive entropy search (Hernandez-Lobato et al. 2014)

I Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

I Neural networks (Snoek et al. 2015)

I Random Forests (Hutter 2009)

8/19

Some Modern Challenges/Opportunities

1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b,

Kandasamy et al. ICML 2017)

2. Parallelisation (Kandasamy et al. Arxiv 2017)

9/19

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆

f1

f2

f3

f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

MF-GP-UCB (Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation),

x⋆xt

t = 14

f (1)

f (2)

Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).

Can be extended to multiple approximations and continuousapproximations.

11/19

Experiment: Cosmological Maximum Likelihood Inference

I Type Ia Supernovae Data

I Maximum likelihood inference for 3 cosmological parameters:

I Hubble Constant H0

I Dark Energy Fraction ΩΛ

I Dark Matter Fraction ΩM

I Likelihood: Robertson Walker metric (Robertson 1936)

Requires numerical integration for each point in the dataset.

12/19

Experiment: Cosmological Maximum Likelihood Inference

3 cosmological parameters. (d = 3)Fidelities: integration on grids of size (102, 104, 106). (M = 3)

500 1000 1500 2000 2500 3000 3500-10

-5

0

5

10

13/19

Experiment: Hartmann-3D

2 Approximations (3 fidelities).We want to optimise the m = 3rd fidelity, which is the mostexpensive. m = 1st fidelity is cheapest.

0 0.5 1 1.5 2 2.5 3 3.50

5

10

15

20

25

30

35

40

Num.of

Queries

Query frequencies for Hartmann-3D

f (3)(x)

m=1m=2m=3

14/19

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)

Asynchronous: asyTS

At any given time,1. (x ′, y ′)← Wait for

a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.

4. Re-deploy worker atargmax g .

Synchronous: synTS

At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for

all workers to finish.2. Compute posterior GP.3. Draw M samples

gm ∼ GP, ∀m.4. Re-deploy worker m at

argmax gm, ∀m.

16/19

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

0 10 20 30 40

10 -2

10 -1

17/19

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS

0 10 20 30 40

10 -2

10 -1

17/19

Experiment: Hartmann-18D M = 25Evaluation time sampled from an exponential distribution

synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS

0 5 10 15 20 25 30

2.5

3

3.5

4

4.5

55.5

66.5

18/19

Summary

I Black-box Optimisation methods are used in several scientificand engineering applications.

I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .

I Some modern challenges

I Multi-fidelity optimisation

I Parallel evaluations

I and several more . . .

Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa

19/19

electrochem bo slides - people @ eecs at uc berkeley

Documents