bayesian prediction of code output - isye homejeffwu/isye7400/unit 7.pdfbayesian prediction: sheet...

15
Bayesian Prediction of Code Output ASA Albuquerque Chapter Short Course October 2014

Upload: others

Post on 17-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • Bayesian Prediction of Code Output ASA Albuquerque Chapter Short

    Course October 2014

  • 2

    This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation of computer model output. The conditional predictive distribution for predicting unsampled code output is derived, followed by posterior distributions for GP regression and precision parameters based on two common prior distribution assumptions. These results lead to a description of the predictive distribution itself when the GP correlation parameters are assumed known. In the case of unknown correlation parameters, their posterior distribution is provided. Markov chain Monte Carlo (MCMC) techniques are introduced as a means of sampling this posterior, and this results in a Monte Carlo method of sampling the predictive distribution when the GP correlation parameters are unknown. Two examples of Bayesian prediction of unsampled code output are provided to illustrate the methods discussed.

    Abstract

  • 3

    Bayesian Prediction: Framework

    Inference based on predictive distribution y

    str = (x

    tr1 , y

    s(xtr1 )), . . . , (xtrn , y

    s(xtrn ))

    y

    ste = (x

    te1 , y

    s(xte1 )), . . . , (xtenp , y

    s(xtenp))

    training data

    test data

    Derived from process modeling assumptions, e.g. Gaussian process

    Derived analytically (conjugate prior) or sampled via MCMC

    Predictive distribution All uncertain model parameters

    p(yste |ystr ) =Z

    p(yste, ⇠ |ystr ) d⇠

    =

    Zp(yste |ystr, ⇠ )⇡( ⇠ |ystr ) d⇠

  • 4

    Bayesian Prediction: Sampling Distribution

    Joint Sampling Distribution (test and training data)

    (Yste , Ystr) ⌘

    ⇣Y s

    �x

    te1

    �, . . . , Y s

    ⇣x

    tenp

    ⌘, Y s

    �x

    tr1

    �, . . . , Y s

    �x

    trn

    �⌘

    ⇠ Nnp+n�F� , ��1R

    �(given (β, λ, R))

    F =

    ✓FteFtr

    ◆ (np+n) k regression

    matrix R =

    ✓Rte Rte,trRTte,tr Rtr

    ◆ (np+n) × (np+n) correlation

    matrix

    Conditional Distribution (test data given training data)

    m (yste |ystr,�,�) = Fte� +Rte,trR�1tr (ystr � Ftr�)V (yste |ystr,�,�) = ��1

    �Rte �Rte,trR�1tr RTte,tr

    p (yste |ystr,�,� ) is Nnp (m (yste |ystr,�,�) , V (yste |ystr,�,�))

  • 5

    Bayesian Prediction: Priors and Posteriors I

    Prior Distributions: Case I (Informative) ⇡(� |� ) is Nk

    ⇣�0,�

    �1⌃�1�

    ⇡(� ) is Gamma(a, b)

    Posterior Distributions

    a1 = (2a+ n)/2

    b1 =

    ✓2b+

    ⇣ystr � Ftr b�

    ⌘TR�1tr

    ⇣ystr � Ftr b�

    ⌘+⇣b� � �0

    ⌘T⌃�1⇡

    ⇣b� � �0

    ⌘◆/2

    ⇡(� |ystr) is Gamma(a1, b1)

    b� =�FTtrR

    �1tr Ftr

    ��1FTtrR

    �1tr y

    str and ⌃⇡ = ⌃

    �1� +

    �FTtrR

    �1tr Ftr

    ��1

    ⇡(� |ystr ) is Tk⇣2a1, bµ, b1b⌃/a1

    b⌃�1 = ⌃� + FTtrR�1tr Ftr and bµ = b⌃h�FTtrR

    �1tr Ftr

    � b� + ⌃��0i

  • 6

    Bayesian Prediction: Priors and Posteriors II

    Prior Distributions: Case II (Noninformative)

    Posterior Distributions

    ⇡(� ) / 1 independent of ⇡(� ) is Gamma(a, b)

    ⇡(� |ystr) is Gamma(a1, b1)a1 = (2a+ n� k)/2

    b1 =

    ✓2b+

    ⇣ystr � Ftr b�

    ⌘TR�1tr

    ⇣ystr � Ftr b�

    ⌘◆/2

    b� =�FTtrR

    �1tr Ftr

    ��1FTtrR

    �1tr y

    str

    ⇡(� |ystr ) is Tk⇣2a1, b�, b1

    �FTtrR

    �1tr Ftr

    ��1/a1

    Results for Jeffreys’ Prior Distribution ⇡(�,� ) / (1/�): Set a = b = 0

  • 7

    Bayesian Prediction: Predictive Distribution

    Predictive Distribution

    p(yste |ystr ) is Tnp (2a1, µte, b1⌃te/a1)µte = Ftebµ+Rte,trR�1tr (ystr � Ftrbµ)

    ⌃te = Rte �Rte,trR�1tr RTte,tr +Hte b⌃HTteHere, Hte = Fte �Rte,trR�1tr Ftr!

    For Case II priors, bµ = b� and b⌃ =�FTtrR

    �1tr Ftr

    ��1!

    For Case I priors, bµ and b⌃ given on slide 4!

    For Je↵reys’ prior, µte is the BLUP and ⌃te

    the associated prediction uncertainty

    !

    5

  • 8

    Bayesian Prediction: Uncertain Correlation Parameters

    Parametric Correlation Functions

    Predictive Distribution

    •  φ: Denotes uncertain correlation function parameters •  For example, consider the Gaussian correlation function

    parameterized by correlation length 0 < ρ < 1:#

    � = (⇢1, . . . , ⇢k)

    R (u,v | ⇢ ) = exp"4

    kX

    i=1

    log (⇢i) (ui � vi)2#; u,v 2 [0, 1]k

    p (yste |ystr ) =Z

    p (yste |ystr,� ) ⇡ (� |ystr ) d�Obtained from results on slide 7

    Obtained from results on next slide

  • 9

    Bayesian Prediction: Correlation Posterior

    Correlation Parameter Posterior Distribution

    Example: Gaussian Correlation ( ⇢1, . . . , ⇢k ) independent

    ⇢i ⇠ Beta ( a⇢, b⇢ )a⇢ = 1, b⇢ = 0.1 =) e↵ect sparsity

    (�1, . . . ,�M ) sampled from ⇡ (� |ystr ) via MCMC

    ⇡ (� |ystr ) /⇡ (� )

    ba11 |Rtr|1/2 ��FTtrR�1tr Ftr

    ��1/2 |⌃⇡|1/2

    For ⇡ (� |� ) / 1, use |⌃⇡| = 1

  • 10

    Objective: Generate a sample from the target π(x) Algorithm: •  Repeat for j = 1, 2, …, M •  Generate y from q(xj, !) and u from Uniform(0, 1) •  If

    set xj+1 = y

    •  Else, set xj+1 = xj •  Return values x1, …, xM

    Implementation: •  Discard initial m0 samples as ``burn-in” •  Metropolis: symmetric proposal distribution q(y,x) = q(x,y) •  Challenge is choosing q(x, !) for effective “mixing”

    23.4% multi-parameter, 44% single parameter, 57.4% Langevin diffusion

    MCMC: Metropolis-Hastings

    u ↵(xj ,y) for ↵(x,y) =(

    min

    h⇡(y)q(y,x)⇡(x)q(x,y) , 1

    i, if ⇡(x)q(x,y) > 0

    1 , otherwise

    proposal density

    ↵(x,y) = min

    ⇡(y)

    ⇡(x), 1

  • 11

    Bayesian Prediction: Estimation of Predictive Distribution Predictive Distribution

    p (yste |ystr ) =Z

    p (yste |ystr , ⇠ ) ⇡ ( ⇠ |ystr ) d⇠

    Predictive Distribution Estimation Given ( ⇠1, . . . , ⇠M ) ⇠ ⇡ ( ⇠ |ystr ) sampled via MCMC,

    p (yste |ystr ) ⇡1

    M

    MX

    i=1

    p (yste |ystr, ⇠i )

    µste =1

    M

    MX

    i=1

    µite

    ⌃ste =1

    M

    MX

    i=1

    h⌃ite +

    �µite � µste

    � �µite � µste

    �T i

    In previous example with uncertain correlation parameters (ξ = φ), estimated predictive distribution is a mixture of t-distributions

  • 12

    Bayesian Prediction: Damped Sine Example

    p (yste |ystr ) =Z

    p (yste |ystr,� ) ⇡ (� |ystr ) d�

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

    −0.5

    0

    0.5

    1

    1.5

    x

    y

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    x

    pred

    ictiv

    e st

    anda

    rd e

    rror

    MAP Bayes

    Data

    Realizations

    �ste =

    vuut 1M

    MX

    i=1

    h��ite

    �2+�µite � µste

    �2iSmall variation in conditional means compared with average conditional variances

    Priors: β = 0; λ ~ Gamma(5,5);#

    φ ~ Beta(1,0.1)

  • 13

    RMSPE = 17.3 r = 0.98

    2 4 6 8 10 12 142

    4

    6

    8

    10

    12

    14

    16

    18

    REM

    L−EB

    LUP

    Stan

    dard

    Dev

    iatio

    n

    Estimated Bayesian Standard Deviation

    Frequentist coverage: Nominal: 95%

    REML-EBLUP: 61% Bayes: 71%

    Bayesian prediction standard errors tend to be larger than REML-EBLUP standard errors

    REML-EBLUP Prediction Standard Error

    Bay

    esia

    n P

    redi

    ctio

    n S

    tand

    ard

    Err

    or

    •  6 code inputs •  60 training runs •  174 test runs

    Bayesian Prediction: Sheet Metal Pockets Example

    Prior Distributions: β = 0

    λ ~ Gamma(5,5) ρ ~ Beta(5,1)

    −50 0 50 100 150 200 250 3000

    50

    100

    150

    200

    250

    300

    350

    Obs

    erve

    d Fa

    ilure

    Dep

    th

    Bayesian Prediction of y(x)

    Bayesian Prediction of 174 Failure Depths

    RMSPE = 17.9 r = 0.98

  • 14

    Bayesian prediction based on the predictive distribution: π( ynew | ycurrent ) •  Derive analytically when possible •  Realizations generated from parameter posterior (MCMC) and

    conditional predictive samples

    Many MCMC algorithms implemented in software •  MCMCpack, mcmc, adaptMCMC, AMCMC in R

    http://cran.r-project.org •  OpenBUGS, WinBUGS

    http://www.mrc-bsu.cam.ac.uk/software/bugs/ •  Delayed Rejection Adaptive Metropolis

    http://helios.fmi.fi/~lainema/dram/ R package coda provides a suite of MCMC diagnostic tools

    Bayesian Prediction: Summary

  • 15

    Andrieu, C. and Thoms, J. (2008). “A tutorial on adaptive MCMC,” Statistics and Computing, 18, 343-373.

    Casella, G. and George, E. (1992). “Explaining the Gibbs sampler,” The American Statistician, 46, 167-174.

    Chib, S. and Greenberg, E. (1995). “Understanding the Metropolis-Hastings algorithm,” The American Statistician, 49, 327-335.

    Currin, C., Mitchell, T.J., Morris, M.D., and Ylvisaker, D. (1991). “Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments.” Journal of the American Statistical Association, 86, 953-963.

    Haario, H., Laine, M., and Mira, A. (2006). “DRAM: Efficient adaptive MCMC,” Statistics and Computing, 16, 339-354.

    O’Hagan, A. (1994). Kendall’s Advanced Theory of Statistics, 2B, Bayesian Inference. London: Edward Arnold.

    Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer.

    Santner, T.J., Williams, B.J., and Notz, W.I. (2003). The Design and Analysis of Computer Experiments. New York: Springer.

    References