-
Bayesian Prediction of Code Output ASA Albuquerque Chapter Short
Course October 2014
-
2
This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation of computer model output. The conditional predictive distribution for predicting unsampled code output is derived, followed by posterior distributions for GP regression and precision parameters based on two common prior distribution assumptions. These results lead to a description of the predictive distribution itself when the GP correlation parameters are assumed known. In the case of unknown correlation parameters, their posterior distribution is provided. Markov chain Monte Carlo (MCMC) techniques are introduced as a means of sampling this posterior, and this results in a Monte Carlo method of sampling the predictive distribution when the GP correlation parameters are unknown. Two examples of Bayesian prediction of unsampled code output are provided to illustrate the methods discussed.
Abstract
-
3
Bayesian Prediction: Framework
Inference based on predictive distribution y
str = (x
tr1 , y
s(xtr1 )), . . . , (xtrn , y
s(xtrn ))
y
ste = (x
te1 , y
s(xte1 )), . . . , (xtenp , y
s(xtenp))
training data
test data
Derived from process modeling assumptions, e.g. Gaussian process
Derived analytically (conjugate prior) or sampled via MCMC
Predictive distribution All uncertain model parameters
p(yste |ystr ) =Z
p(yste, ⇠ |ystr ) d⇠
=
Zp(yste |ystr, ⇠ )⇡( ⇠ |ystr ) d⇠
-
4
Bayesian Prediction: Sampling Distribution
Joint Sampling Distribution (test and training data)
(Yste , Ystr) ⌘
⇣Y s
�x
te1
�, . . . , Y s
⇣x
tenp
⌘, Y s
�x
tr1
�, . . . , Y s
�x
trn
�⌘
⇠ Nnp+n�F� , ��1R
�(given (β, λ, R))
F =
✓FteFtr
◆ (np+n) k regression
matrix R =
✓Rte Rte,trRTte,tr Rtr
◆ (np+n) × (np+n) correlation
matrix
Conditional Distribution (test data given training data)
m (yste |ystr,�,�) = Fte� +Rte,trR�1tr (ystr � Ftr�)V (yste |ystr,�,�) = ��1
�Rte �Rte,trR�1tr RTte,tr
�
p (yste |ystr,�,� ) is Nnp (m (yste |ystr,�,�) , V (yste |ystr,�,�))
-
5
Bayesian Prediction: Priors and Posteriors I
Prior Distributions: Case I (Informative) ⇡(� |� ) is Nk
⇣�0,�
�1⌃�1�
⌘
⇡(� ) is Gamma(a, b)
Posterior Distributions
a1 = (2a+ n)/2
b1 =
✓2b+
⇣ystr � Ftr b�
⌘TR�1tr
⇣ystr � Ftr b�
⌘+⇣b� � �0
⌘T⌃�1⇡
⇣b� � �0
⌘◆/2
⇡(� |ystr) is Gamma(a1, b1)
b� =�FTtrR
�1tr Ftr
��1FTtrR
�1tr y
str and ⌃⇡ = ⌃
�1� +
�FTtrR
�1tr Ftr
��1
⇡(� |ystr ) is Tk⇣2a1, bµ, b1b⌃/a1
⌘
b⌃�1 = ⌃� + FTtrR�1tr Ftr and bµ = b⌃h�FTtrR
�1tr Ftr
� b� + ⌃��0i
-
6
Bayesian Prediction: Priors and Posteriors II
Prior Distributions: Case II (Noninformative)
Posterior Distributions
⇡(� ) / 1 independent of ⇡(� ) is Gamma(a, b)
⇡(� |ystr) is Gamma(a1, b1)a1 = (2a+ n� k)/2
b1 =
✓2b+
⇣ystr � Ftr b�
⌘TR�1tr
⇣ystr � Ftr b�
⌘◆/2
b� =�FTtrR
�1tr Ftr
��1FTtrR
�1tr y
str
⇡(� |ystr ) is Tk⇣2a1, b�, b1
�FTtrR
�1tr Ftr
��1/a1
⌘
Results for Jeffreys’ Prior Distribution ⇡(�,� ) / (1/�): Set a = b = 0
-
7
Bayesian Prediction: Predictive Distribution
Predictive Distribution
p(yste |ystr ) is Tnp (2a1, µte, b1⌃te/a1)µte = Ftebµ+Rte,trR�1tr (ystr � Ftrbµ)
⌃te = Rte �Rte,trR�1tr RTte,tr +Hte b⌃HTteHere, Hte = Fte �Rte,trR�1tr Ftr!
For Case II priors, bµ = b� and b⌃ =�FTtrR
�1tr Ftr
��1!
For Case I priors, bµ and b⌃ given on slide 4!
For Je↵reys’ prior, µte is the BLUP and ⌃te
the associated prediction uncertainty
!
5
-
8
Bayesian Prediction: Uncertain Correlation Parameters
Parametric Correlation Functions
Predictive Distribution
• φ: Denotes uncertain correlation function parameters • For example, consider the Gaussian correlation function
parameterized by correlation length 0 < ρ < 1:#
� = (⇢1, . . . , ⇢k)
R (u,v | ⇢ ) = exp"4
kX
i=1
log (⇢i) (ui � vi)2#; u,v 2 [0, 1]k
p (yste |ystr ) =Z
p (yste |ystr,� ) ⇡ (� |ystr ) d�Obtained from results on slide 7
Obtained from results on next slide
-
9
Bayesian Prediction: Correlation Posterior
Correlation Parameter Posterior Distribution
Example: Gaussian Correlation ( ⇢1, . . . , ⇢k ) independent
⇢i ⇠ Beta ( a⇢, b⇢ )a⇢ = 1, b⇢ = 0.1 =) e↵ect sparsity
(�1, . . . ,�M ) sampled from ⇡ (� |ystr ) via MCMC
⇡ (� |ystr ) /⇡ (� )
ba11 |Rtr|1/2 ��FTtrR�1tr Ftr
��1/2 |⌃⇡|1/2
For ⇡ (� |� ) / 1, use |⌃⇡| = 1
-
10
Objective: Generate a sample from the target π(x) Algorithm: • Repeat for j = 1, 2, …, M • Generate y from q(xj, !) and u from Uniform(0, 1) • If
set xj+1 = y
• Else, set xj+1 = xj • Return values x1, …, xM
Implementation: • Discard initial m0 samples as ``burn-in” • Metropolis: symmetric proposal distribution q(y,x) = q(x,y) • Challenge is choosing q(x, !) for effective “mixing”
23.4% multi-parameter, 44% single parameter, 57.4% Langevin diffusion
MCMC: Metropolis-Hastings
u ↵(xj ,y) for ↵(x,y) =(
min
h⇡(y)q(y,x)⇡(x)q(x,y) , 1
i, if ⇡(x)q(x,y) > 0
1 , otherwise
proposal density
↵(x,y) = min
⇡(y)
⇡(x), 1
�
-
11
Bayesian Prediction: Estimation of Predictive Distribution Predictive Distribution
p (yste |ystr ) =Z
p (yste |ystr , ⇠ ) ⇡ ( ⇠ |ystr ) d⇠
Predictive Distribution Estimation Given ( ⇠1, . . . , ⇠M ) ⇠ ⇡ ( ⇠ |ystr ) sampled via MCMC,
p (yste |ystr ) ⇡1
M
MX
i=1
p (yste |ystr, ⇠i )
µste =1
M
MX
i=1
µite
⌃ste =1
M
MX
i=1
h⌃ite +
�µite � µste
� �µite � µste
�T i
In previous example with uncertain correlation parameters (ξ = φ), estimated predictive distribution is a mixture of t-distributions
-
12
Bayesian Prediction: Damped Sine Example
p (yste |ystr ) =Z
p (yste |ystr,� ) ⇡ (� |ystr ) d�
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1
−0.5
0
0.5
1
1.5
x
y
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
pred
ictiv
e st
anda
rd e
rror
MAP Bayes
Data
Realizations
�ste =
vuut 1M
MX
i=1
h��ite
�2+�µite � µste
�2iSmall variation in conditional means compared with average conditional variances
Priors: β = 0; λ ~ Gamma(5,5);#
φ ~ Beta(1,0.1)
-
13
RMSPE = 17.3 r = 0.98
2 4 6 8 10 12 142
4
6
8
10
12
14
16
18
REM
L−EB
LUP
Stan
dard
Dev
iatio
n
Estimated Bayesian Standard Deviation
Frequentist coverage: Nominal: 95%
REML-EBLUP: 61% Bayes: 71%
Bayesian prediction standard errors tend to be larger than REML-EBLUP standard errors
REML-EBLUP Prediction Standard Error
Bay
esia
n P
redi
ctio
n S
tand
ard
Err
or
• 6 code inputs • 60 training runs • 174 test runs
Bayesian Prediction: Sheet Metal Pockets Example
Prior Distributions: β = 0
λ ~ Gamma(5,5) ρ ~ Beta(5,1)
−50 0 50 100 150 200 250 3000
50
100
150
200
250
300
350
Obs
erve
d Fa
ilure
Dep
th
Bayesian Prediction of y(x)
Bayesian Prediction of 174 Failure Depths
RMSPE = 17.9 r = 0.98
-
14
Bayesian prediction based on the predictive distribution: π( ynew | ycurrent ) • Derive analytically when possible • Realizations generated from parameter posterior (MCMC) and
conditional predictive samples
Many MCMC algorithms implemented in software • MCMCpack, mcmc, adaptMCMC, AMCMC in R
http://cran.r-project.org • OpenBUGS, WinBUGS
http://www.mrc-bsu.cam.ac.uk/software/bugs/ • Delayed Rejection Adaptive Metropolis
http://helios.fmi.fi/~lainema/dram/ R package coda provides a suite of MCMC diagnostic tools
Bayesian Prediction: Summary
-
15
Andrieu, C. and Thoms, J. (2008). “A tutorial on adaptive MCMC,” Statistics and Computing, 18, 343-373.
Casella, G. and George, E. (1992). “Explaining the Gibbs sampler,” The American Statistician, 46, 167-174.
Chib, S. and Greenberg, E. (1995). “Understanding the Metropolis-Hastings algorithm,” The American Statistician, 49, 327-335.
Currin, C., Mitchell, T.J., Morris, M.D., and Ylvisaker, D. (1991). “Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments.” Journal of the American Statistical Association, 86, 953-963.
Haario, H., Laine, M., and Mira, A. (2006). “DRAM: Efficient adaptive MCMC,” Statistics and Computing, 16, 339-354.
O’Hagan, A. (1994). Kendall’s Advanced Theory of Statistics, 2B, Bayesian Inference. London: Edward Arnold.
Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer.
Santner, T.J., Williams, B.J., and Notz, W.I. (2003). The Design and Analysis of Computer Experiments. New York: Springer.
References