prior and posterior predictive checking · visualization in bayesian workflow. estimation of human...

17
Prior and posterior predictive checking Bayesian Data Analysis, 3rd ed, Chapter 6 Jonah Gabry, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman (2018). Visualization in Bayesian workflow. Journal of the Royal Statistical Society Series A, accepted for publication as discussion paper. arXiv preprint arXiv:1709.01449. Graphical posterior predictive checks using the bayesplot package http://mc-stan.org/bayesplot/articles/graphical-ppcs.html demo demos rstan/ppc/poisson-ppc.Rmd Workflow with prior and posterior predictive checking https://betanalpha.github.io/assets/case studies/ principled bayesian workflow.html Aki.Vehtari@aalto.fi – @avehtari

Upload: others

Post on 22-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Prior and posterior predictive checking

Bayesian Data Analysis, 3rd ed, Chapter 6Jonah Gabry, Daniel Simpson, Aki Vehtari, MichaelBetancourt, and Andrew Gelman (2018). Visualization inBayesian workflow. Journal of the Royal Statistical SocietySeries A, accepted for publication as discussion paper.arXiv preprint arXiv:1709.01449.Graphical posterior predictive checks using the bayesplotpackagehttp://mc-stan.org/bayesplot/articles/graphical-ppcs.htmldemo demos rstan/ppc/poisson-ppc.RmdWorkflow with prior and posterior predictive checkinghttps://betanalpha.github.io/assets/case studies/principled bayesian workflow.html

[email protected] – @avehtari

Page 2: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Replicates vs. future observation

Predictive y is the next not yet observed possibleobservation. y rep refers to replicating the whole experiment(with same values of x) and obtaining as many replicatedobservations as in the original data.

[email protected] – @avehtari

Page 3: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Posterior predictive checking

Data yParameters θReplicated data y rep

assume that the data has been generated by a processwhich can be well described by the model M withparameters θreplicated data could be observed if the experiment wererepeatedreplace “true” data generating process by the model

p(y rep|y ,M) =

∫p(y rep|θ,M)p(θ|y ,M)dθ

Graphical checking: compare distribution plotsTest quantity (or discrepancy measure) T (y , θ)

summary quantity used to compare the observed data andreplicates from the predictive distribution

[email protected] – @avehtari

Page 4: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Posterior predictive checking

Data yParameters θReplicated data y rep

assume that the data has been generated by a processwhich can be well described by the model M withparameters θreplicated data could be observed if the experiment wererepeatedreplace “true” data generating process by the model

p(y rep|y ,M) =

∫p(y rep|θ,M)p(θ|y ,M)dθ

Graphical checking: compare distribution plots

Test quantity (or discrepancy measure) T (y , θ)summary quantity used to compare the observed data andreplicates from the predictive distribution

[email protected] – @avehtari

Page 5: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Posterior predictive checking

Data yParameters θReplicated data y rep

assume that the data has been generated by a processwhich can be well described by the model M withparameters θreplicated data could be observed if the experiment wererepeatedreplace “true” data generating process by the model

p(y rep|y ,M) =

∫p(y rep|θ,M)p(θ|y ,M)dθ

Graphical checking: compare distribution plotsTest quantity (or discrepancy measure) T (y , θ)

summary quantity used to compare the observed data andreplicates from the predictive distribution

[email protected] – @avehtari

Page 6: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Prior predictive checking

Similar to posterior predictive checking but without dataDraws from the prior predictive distribution are comparedto external information

[email protected] – @avehtari

Page 7: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of prior predictive checking

Gabry et al (2017). Visualization in Bayesian workflow.Estimation of human exposure to air pollution fromparticulate matter measuring less than 2.5 microns indiameter (PM2.5)A recent report estimated that PM2.5 is responsible for threemillion deaths worldwide each year (Shaddick et al, 2017)

[email protected] – @avehtari

Page 8: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of prior predictive checking

Gabry et al (2017). Visualization in Bayesian workflow.

Satellite estimates and ground monitor locations

[email protected] – @avehtari

Page 9: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of prior predictive checking

Gabry et al (2017). Visualization in Bayesian workflow.

[email protected] – @avehtari

Page 10: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of prior predictive checking

−1500 −1000 −500 0 500 1000log(PM2.5)

Prior predictive distribution with vague prior

[email protected] – @avehtari

Page 11: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of prior predictive checking

Pallastunturi fellsConcrete

Neutron star

−1500 −1000 −500 0 500 1000log(PM2.5)

Prior predictive distribution with vague prior

[email protected] – @avehtari

Page 12: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of prior predictive checking

Pallastunturi fells

Clean room ISO 6

Concrete

−20 −10 0 10 20 30log10(PM2.5)

Prior predictive distribution with weakly informative prior

[email protected] – @avehtari

Page 13: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of posterior predictive checkingPosterior predictive checking – predictive distributions

(a) Model 1 (b) Model 2 (c) Model 3

[email protected] – @avehtari

Page 14: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of posterior predictive checkingPosterior predictive checking – test statistic (skewness)

(a) Model 1 (b) Model 2 (c) Model 3

[email protected] – @avehtari

Page 15: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of posterior predictive checkingPosterior predictive checking – test statistic (median for groups)

(a) Model 1 (b) Model 2 (c) Model 3

[email protected] – @avehtari

Page 16: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Example of posterior predictive checkingLOO predictive checking – LOO-PIT

(a) Model 1 (b) Model 2 (c) Model 3

[email protected] – @avehtari

Page 17: Prior and posterior predictive checking · Visualization in Bayesian workflow. Estimation of human exposure to air pollution from particulate matter measuring less than 2.5 microns

Posterior predictive checking

demo demos rstan/ppc/poisson-ppc.Rmddata {

int<lower=1> N;int<lower=0> y [N ] ;

}parameters {

real<lower=0> lambda ;}model {

lambda ˜ exponential ( 0 . 2 ) ;y ˜ poisson ( lambda ) ;

}generated quant i t ies {

rea l l o g l i k [N ] ;i n t y rep [N ] ;for ( n in 1:N) {

y rep [ n ] = poisson rng ( lambda ) ;l o g l i k [ n ] = po isson lpmf ( y [ n ] | lambda ) ;}

}[email protected] – @avehtari