bayesian optimization (bo)

26
Bayesian Optimization (BO) Javad Azimi Fall 2010 http://web.engr.oregonstate. edu/~azimi/

Upload: verity

Post on 22-Feb-2016

63 views

Category:

Documents


6 download

DESCRIPTION

Bayesian Optimization (BO). Javad Azimi Fall 2010 http://web.engr.oregonstate.edu/~azimi/. Outline. Formal Definition Application Bayesian Optimization Steps Surrogate Function(Gaussian Process) Acquisition Function PMAX IEMAX MPI MEI UCB GP-Hedge. Formal Definition. Input: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bayesian Optimization (BO)

Bayesian Optimization

(BO)Javad Azimi

Fall 2010http://web.engr.oregonstate.edu/~azimi/

Page 2: Bayesian Optimization (BO)

Outline

• Formal Definition• Application• Bayesian Optimization Steps– Surrogate Function(Gaussian Process)– Acquisition Function

• PMAX• IEMAX• MPI• MEI• UCB• GP-Hedge

Page 3: Bayesian Optimization (BO)

Formal Definition

• Input:

• Goal:

Page 4: Bayesian Optimization (BO)

Fuel Cell Application

AnodeCathode

bact

eria

Oxidation products

(CO2)

Fuel (organic matter)

e-

e-

O2

H2OH+

This is how an MFC works

SEM image of bacteria sp. on Ni nanoparticle enhanced carbon fibers.

Nano-structure of anode significantly impact the electricity production.

We should optimize anode nano-structure to maximize power by selecting a set of experiment.4

Page 5: Bayesian Optimization (BO)

Big Picture• Since Running experiment is very expensive we use BO.

• Select one experiment to run at a time based on results of previous experiments.Current Experiments Our Current Model Select Single Experiment

Run Experiment 5

Page 6: Bayesian Optimization (BO)

BO Main Steps

• Surrogate Function(Response Surface , Model)– Make a posterior over unobserved points based

on the prior.– Its parameter might be based on the prior.

Remember it is a BAYESIAN approach.• Acquisition Criteria(Function)– Which sample should be selected next.

Page 7: Bayesian Optimization (BO)

Surrogate Function• Simulates the unknown function distribution based

on the prior.– Deterministic (Classical Linear Regression,…)• There is a deterministic prediction for each point x in

the input space.– Stochastic (Bayesian regression, Gaussian Process,

…)• There is a distribution over the prediction for each

point x in the input space. (i.e Normal distribution)– Example• Deterministic: f(x1)=y1, f(x2)=y2• Stochastic: f(x1)=N(y1,2) f(x2)=N(y2,5)

Page 8: Bayesian Optimization (BO)

Gaussian Process(GP)

• A Gaussian process is a collection number of random variables, any finite number of which have a joint Gaussian distribution.– Consistency requirement or marginalization

property.• Marginalization property:

Page 9: Bayesian Optimization (BO)

Gaussian Process(GP)• Formal prediction:

• Interesting points:– Squared exponential function corresponds to Bayesian linear

regression with an infinite number of basis function.– Variance is independent from observation– The mean is a linear combination of observation.– If the covariance function specifies the entries of covariance

matrix, marginalization is satisfied!

Page 10: Bayesian Optimization (BO)

Gaussian Process(GP)• Gaussian Process is:– An exact interpolating regression method.• Predict the training data perfectly. (not true in classical

regression)– A natural generalization of linear regression.• Nonlinear regression approach!

– A simple example of GP can be obtained from Bayesian regression.• Identical results

– Specifies a distribution over functions.

Page 11: Bayesian Optimization (BO)

Gaussian process(2):distribution over functions

95% confidence interval for each point x.

Three sampled functions

Page 12: Bayesian Optimization (BO)

Gaussian process(2):GP vs Bayesian regression

• Bayesian regression:– Distribution over weight– The prior is defined over the weights.

• Gaussian Process– Distribution over function– The prior is defined over the function space.

• These are the same but from different view.

Page 13: Bayesian Optimization (BO)

Short Summary

• Given any unobserved point z, we can define a normal distribution of its prediction value such that:– Its means is the linear combination of the

observed value.– Its variance is related to its distance from

observed value. (closer to observed data, less variance)

Page 14: Bayesian Optimization (BO)

BO Main Steps

• Surrogate Function(Response Surface , Model)– Make a posterior over unobserved points based

on the prior.– Its parameter might be based on the prior.

Remember it is a BAYESIAN approach.• Acquisition Criteria(Function)– Which sample should be selected next.

Page 15: Bayesian Optimization (BO)

Bayesian Optimization:(Acquisition criterion)

• Remember: we are looking for:

• Input:– Set of observed data.– A set of points with their corresponding mean and variance.

• Goal: Which point should be selected next to get to the maximizer of the function faster.

• Different Acquisition criterion(Acquisition functions or policies)

Page 16: Bayesian Optimization (BO)

Policies

• Maximum Mean (MM).• Maximum Upper Interval (MUI).• Maximum Probability of Improvement (MPI).• Maximum Expected of Improvement (MEI).

Page 17: Bayesian Optimization (BO)

Policies:Maximum Mean (MM).

• Returns the point with highest expected value.

• Advantage:– If the model is stable and has been learnt very good,

performs very good.• Disadvantage:– There is a high chance to fall in local minimum(just exploit).

• Can converge to global optimum finally?– No

Page 18: Bayesian Optimization (BO)

Policies:Maximum Upper Interval (MUI).

• Returns the point with highest 95% upper interval.

• Advantage:– Combination of mean and variance(exploitation and exploration).

• Disadvantage:– Dominated by variance and mainly explore the input space.

• Can converge to global optimum finally?– Yes.– But needs almost infinite number of samples.

Page 19: Bayesian Optimization (BO)

Policies:Maximum Probability of Improvement (MPI)

• Selects the sample with highest probability of improving the current best observation (ymax) by some margins m.

Page 20: Bayesian Optimization (BO)

Policies:Maximum Probability of Improvement (MPI)

• Advantage:– Considers mean and variance and ymax in policy(smarter than MUI)

• Disadvantage:– Ad-hoc parameter m – Large value of m?

• Exploration– Small value of m?

• Exploitation

Page 21: Bayesian Optimization (BO)

Policies:Maximum Expected of Improvement (MEI)

• Maximum Expected of improvement. • Question: Expectation over which variable?– m

Page 22: Bayesian Optimization (BO)

Policies:Upper Confidence Bounds

• Select based on the variance and mean of each point.

– The selection of k left to the user.– Recently, a principle approach to select this

parameter has been proposed.

Page 23: Bayesian Optimization (BO)

Summary

• We introduced several approaches, each of which has advantage and disadvantage.– MM– MUI– MPI– MEI– GP-UCB

• Which one should be selected for an unknown model?

Page 24: Bayesian Optimization (BO)

GP-Hedge• GP-Hedge(2010) • It select one of the baseline policy based on the theoretical

results of multi-armed bandit problem, although the objective is a bit different!

• They show that they can perform better than (or as well as) the best baseline policy in some framework.

Page 25: Bayesian Optimization (BO)

Future Works

• Method selection smarter than GP-Hedge with theoretical analysis.

• Batch Bayesian optimization.• Scheduling Bayesian optimization.

Page 26: Bayesian Optimization (BO)