bayesian prediction of code output - isye homejeffwu/isye7400/unit 7.pdfbayesian prediction: sheet...

Bayesian Prediction of Code Output ASA Albuquerque Chapter Short

Course October 2014

2

This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation of computer model output. The conditional predictive distribution for predicting unsampled code output is derived, followed by posterior distributions for GP regression and precision parameters based on two common prior distribution assumptions. These results lead to a description of the predictive distribution itself when the GP correlation parameters are assumed known. In the case of unknown correlation parameters, their posterior distribution is provided. Markov chain Monte Carlo (MCMC) techniques are introduced as a means of sampling this posterior, and this results in a Monte Carlo method of sampling the predictive distribution when the GP correlation parameters are unknown. Two examples of Bayesian prediction of unsampled code output are provided to illustrate the methods discussed.

Abstract

3

Bayesian Prediction: Framework

Inference based on predictive distribution y

str = (x

tr1 , y

s(xtr1 )), . . . , (xtrn , y

s(xtrn ))

y

ste = (x

te1 , y

s(xte1 )), . . . , (xtenp , y

s(xtenp))

training data

test data

Derived from process modeling assumptions, e.g. Gaussian process

Derived analytically (conjugate prior) or sampled via MCMC

Predictive distribution All uncertain model parameters

p(yste |ystr ) =Z

p(yste, ⇠ |ystr ) d⇠

=

Zp(yste |ystr, ⇠ )⇡( ⇠ |ystr ) d⇠

4

Bayesian Prediction: Sampling Distribution

Joint Sampling Distribution (test and training data)

(Yste , Ystr) ⌘

⇣Y s

�x

te1

�, . . . , Y s

⇣x

tenp

⌘, Y s

�x

tr1

�, . . . , Y s

�x

trn

�⌘

⇠ Nnp+n�F� , ��1R

�(given (β, λ, R))

F =

✓FteFtr

◆ (np+n) k regression

matrix R =

✓Rte Rte,trRTte,tr Rtr

◆ (np+n) × (np+n) correlation

matrix

Conditional Distribution (test data given training data)

m (yste |ystr,�,�) = Fte� +Rte,trR�1tr (ystr � Ftr�)V (yste |ystr,�,�) = ��1

�Rte �Rte,trR�1tr RTte,tr

�

p (yste |ystr,�,� ) is Nnp (m (yste |ystr,�,�) , V (yste |ystr,�,�))

5

Bayesian Prediction: Priors and Posteriors I

Prior Distributions: Case I (Informative) ⇡(� |� ) is Nk

⇣�0,�

�1⌃�1�

⌘

⇡(� ) is Gamma(a, b)

Posterior Distributions

a1 = (2a+ n)/2

b1 =

✓2b+

⇣ystr � Ftr b�

⌘TR�1tr


⌘+⇣b� � �0

⌘T⌃�1⇡

⇣b� � �0

⌘◆/2

⇡(� |ystr) is Gamma(a1, b1)

b� =�FTtrR

�1tr Ftr

��1FTtrR

�1tr y

str and ⌃⇡ = ⌃

�1� +

�FTtrR

�1tr Ftr

��1

⇡(� |ystr ) is Tk⇣2a1, bµ, b1b⌃/a1

⌘

b⌃�1 = ⌃� + FTtrR�1tr Ftr and bµ = b⌃h�FTtrR

�1tr Ftr

� b� + ⌃��0i

6

Bayesian Prediction: Priors and Posteriors II

Prior Distributions: Case II (Noninformative)

Posterior Distributions

⇡(� ) / 1 independent of ⇡(� ) is Gamma(a, b)

⇡(� |ystr) is Gamma(a1, b1)a1 = (2a+ n� k)/2

b1 =

✓2b+


⌘TR�1tr


⌘◆/2

b� =�FTtrR

�1tr Ftr

��1FTtrR

�1tr y

str

⇡(� |ystr ) is Tk⇣2a1, b�, b1

�FTtrR

�1tr Ftr

��1/a1

⌘

Results for Jeffreys’ Prior Distribution ⇡(�,� ) / (1/�): Set a = b = 0

7

Bayesian Prediction: Predictive Distribution

Predictive Distribution

p(yste |ystr ) is Tnp (2a1, µte, b1⌃te/a1)µte = Ftebµ+Rte,trR�1tr (ystr � Ftrbµ)

⌃te = Rte �Rte,trR�1tr RTte,tr +Hte b⌃HTteHere, Hte = Fte �Rte,trR�1tr Ftr!

For Case II priors, bµ = b� and b⌃ =�FTtrR

�1tr Ftr

��1!

For Case I priors, bµ and b⌃ given on slide 4!

For Je↵reys’ prior, µte is the BLUP and ⌃te

the associated prediction uncertainty

!

5

8

Bayesian Prediction: Uncertain Correlation Parameters

Parametric Correlation Functions

Predictive Distribution

•  φ: Denotes uncertain correlation function parameters •  For example, consider the Gaussian correlation function

parameterized by correlation length 0 < ρ < 1:#

� = (⇢1, . . . , ⇢k)

R (u,v | ⇢ ) = exp"4

kX

i=1

log (⇢i) (ui � vi)2#; u,v 2 [0, 1]k

p (yste |ystr ) =Z

p (yste |ystr,� ) ⇡ (� |ystr ) d�Obtained from results on slide 7

Obtained from results on next slide

9

Bayesian Prediction: Correlation Posterior

Correlation Parameter Posterior Distribution

Example: Gaussian Correlation ( ⇢1, . . . , ⇢k ) independent

⇢i ⇠ Beta ( a⇢, b⇢ )a⇢ = 1, b⇢ = 0.1 =) e↵ect sparsity

(�1, . . . ,�M ) sampled from ⇡ (� |ystr ) via MCMC

⇡ (� |ystr ) /⇡ (� )

ba11 |Rtr|1/2 ��FTtrR�1tr Ftr

��1/2 |⌃⇡|1/2

For ⇡ (� |� ) / 1, use |⌃⇡| = 1

10

Objective: Generate a sample from the target π(x) Algorithm: •  Repeat for j = 1, 2, …, M •  Generate y from q(xj, !) and u from Uniform(0, 1) •  If

set xj+1 = y

•  Else, set xj+1 = xj •  Return values x1, …, xM

Implementation: •  Discard initial m0 samples as ``burn-in” •  Metropolis: symmetric proposal distribution q(y,x) = q(x,y) •  Challenge is choosing q(x, !) for effective “mixing”

23.4% multi-parameter, 44% single parameter, 57.4% Langevin diffusion

MCMC: Metropolis-Hastings

u ↵(xj ,y) for ↵(x,y) =(

min

h⇡(y)q(y,x)⇡(x)q(x,y) , 1

i, if ⇡(x)q(x,y) > 0

1 , otherwise

proposal density

↵(x,y) = min

⇡(y)

⇡(x), 1

�

11

Bayesian Prediction: Estimation of Predictive Distribution Predictive Distribution

p (yste |ystr ) =Z

p (yste |ystr , ⇠ ) ⇡ ( ⇠ |ystr ) d⇠

Predictive Distribution Estimation Given ( ⇠1, . . . , ⇠M ) ⇠ ⇡ ( ⇠ |ystr ) sampled via MCMC,

p (yste |ystr ) ⇡1

M

MX

i=1

p (yste |ystr, ⇠i )

µste =1

M

MX

i=1

µite

⌃ste =1

M

MX

i=1

h⌃ite +

�µite � µste

� �µite � µste

�T i

In previous example with uncertain correlation parameters (ξ = φ), estimated predictive distribution is a mixture of t-distributions

12

Bayesian Prediction: Damped Sine Example

p (yste |ystr ) =Z

p (yste |ystr,� ) ⇡ (� |ystr ) d�

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

−0.5

0

0.5

1

1.5

x

y

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

x

pred

ictiv

e st

anda

rd e

rror

MAP Bayes

Data

Realizations

�ste =

vuut 1M

MX

i=1

h��ite

�2+�µite � µste

�2iSmall variation in conditional means compared with average conditional variances

Priors: β = 0; λ ~ Gamma(5,5);#

φ ~ Beta(1,0.1)

13

RMSPE = 17.3 r = 0.98

2 4 6 8 10 12 142

4

6

8

10

12

14

16

18

REM

L−EB

LUP

Stan

dard

Dev

iatio

n

Estimated Bayesian Standard Deviation

Frequentist coverage: Nominal: 95%

REML-EBLUP: 61% Bayes: 71%

Bayesian prediction standard errors tend to be larger than REML-EBLUP standard errors

REML-EBLUP Prediction Standard Error

Bay

esia

n P

redi

ctio

n S

tand

ard

Err

or

•  6 code inputs •  60 training runs •  174 test runs

Bayesian Prediction: Sheet Metal Pockets Example

Prior Distributions: β = 0

λ ~ Gamma(5,5) ρ ~ Beta(5,1)

−50 0 50 100 150 200 250 3000

50

100

150

200

250

300

350

Obs

erve

d Fa

ilure

Dep

th

Bayesian Prediction of y(x)

Bayesian Prediction of 174 Failure Depths

RMSPE = 17.9 r = 0.98

14

Bayesian prediction based on the predictive distribution: π( ynew | ycurrent ) •  Derive analytically when possible •  Realizations generated from parameter posterior (MCMC) and

conditional predictive samples

Many MCMC algorithms implemented in software •  MCMCpack, mcmc, adaptMCMC, AMCMC in R

http://cran.r-project.org •  OpenBUGS, WinBUGS

http://www.mrc-bsu.cam.ac.uk/software/bugs/ •  Delayed Rejection Adaptive Metropolis

http://helios.fmi.fi/~lainema/dram/ R package coda provides a suite of MCMC diagnostic tools

Bayesian Prediction: Summary

15

Andrieu, C. and Thoms, J. (2008). “A tutorial on adaptive MCMC,” Statistics and Computing, 18, 343-373.

Casella, G. and George, E. (1992). “Explaining the Gibbs sampler,” The American Statistician, 46, 167-174.

Chib, S. and Greenberg, E. (1995). “Understanding the Metropolis-Hastings algorithm,” The American Statistician, 49, 327-335.

Currin, C., Mitchell, T.J., Morris, M.D., and Ylvisaker, D. (1991). “Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments.” Journal of the American Statistical Association, 86, 953-963.

Haario, H., Laine, M., and Mira, A. (2006). “DRAM: Efficient adaptive MCMC,” Statistics and Computing, 16, 339-354.

O’Hagan, A. (1994). Kendall’s Advanced Theory of Statistics, 2B, Bayesian Inference. London: Edward Arnold.

Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer.

Santner, T.J., Williams, B.J., and Notz, W.I. (2003). The Design and Analysis of Computer Experiments. New York: Springer.

References

bayesian prediction of code output - isye homejeffwu/isye7400/unit 7.pdfbayesian prediction: sheet...

Documents