introducation to bayesian methods...bayesian estimation the posterior represents all the information...

30
Introduction to Bayesian methods - II Rita Almeida 2nd of February, 2016

Upload: others

Post on 23-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Introduction to Bayesian methods - II

Rita Almeida

2nd of February, 2016

Page 2: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Overview

Previous part:

Bayes theorem

Examples with discrete and continuous normal variables

Posteriors, priors and likelihood

Comparison with frequentist approaches

This part:

Bayesian estimation

Bayesian testing

Bayesian model comparison

Methods to calculate the posterior

Bayesian linear and hierarchical linear models

Page 3: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Overview

Bayesian estimation

Bayesian testing

Bayesian model comparison

Methods to calculate the posterior

Bayesian linear and hierarchical linear models

Page 4: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian estimation

The posterior represents all the information for θ.

Sometimes one wants to provide a single value - estimator.

For δ(Y ) to be a good estimator of θ the probability of

δ(Y )− θ being close to 0 must be high.

Let L(θ,a) be the loss when the estimate is a and the truevalue is θ.

• A common loss function is L(θ, a) = (θ − a)2.

The Bayes estimator δ∗(y) of θ is:

E [L(θ, δ∗(y))|y ] = mina ∈Ω

E [L(θ,a)|y ]

Page 5: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian estimation - example

Someone answers a test with 12 true-or-false equally difficult

questions. The answers are independent. 9 correct answers.

Estimate the probability of answering a question correctly.

Likelihood: binomial n = 12, y = 9

p(y |θ) =

(n

y

)

θy (1 − θ)n−y

Prior: uniform - Beta distribution with

parameters α0 = 1, β0 = 1

• Mean of a Beta distribution with

α, β is αα+β

Figure adapted from Wagenmakers 2007

Page 6: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian estimation - example

Posterior: Beta with parameters

α = α0 + y = 10, β = β0 + n − y = 4

and n = 12

p(θ|y = 9) = 13

(12

9

)

θ9(1 − θ)3

Considering L(θ, a) = (θ − a)2:

δ∗(y) =α0 + y

α0 + y + β0 + n − y= 0.71

Figure adapted from Wagenmakers 2007

The estimator is calculated from the posterior.

Page 7: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian estimation - example with normal

Suppose we have a random sample y1, y2, . . . , yn from a

normal distribution with unknown mean θ and known

variance σ2.

Considering a normal prior with mean µ and variance ν2:

p(θ) is N(µ, ν2)

The posterior becomes:

θ|y1, y2, . . . , yn ∼ N(µ1, ν21) with µ1 = w µ+ (1 − w) yn

Considering L(θ,a) = (θ − a)2: δ∗(y) = µ1

Page 8: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Overview

Bayesian estimation

Bayesian testing

Bayesian model comparison

Methods to calculate the posterior

Bayesian linear and hierarchical linear models

Page 9: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian testing Hypotheses can be described as prior beliefs:

p(H0), p(H1), p(H2),...

Posteriors after observing the data (2 hypotheses):

p(H0|y), p(H1|y)

Posterior odds in favor of H0 comparing to H1

p(H0|y)

p(H1|y)︸ ︷︷ ︸

posterior odds

=p(y |H0)

p(y |H1)︸ ︷︷ ︸

Bayes factor

p(H0)

p(H1)︸ ︷︷ ︸

prior odds

If H0 is complement of H1:

p(H0|y) =

(1

B

p(H1)

p(H0)+ 1

)−1

If p(H0) = p(H1) one can report Bayes factor.

Page 10: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian testing - example

Someone answers a test with 12 true-or-false equally difficult

questions. The answers are independent. 9 correct answers.

What is the probability of random gessing?

H0 : θ = 1/2, θ is fixed

H1 : θ 6= 1/2, θ ∼ Beta(1,1)

The Bayes factor is:

B =p(y |θ = 1/2)

p(y |θ ∼ Beta(1,1))=

(129

)(1

2)12

∫ 10

p(y |θ)p(θ)dθ= 0.7

The data is 1/0.7 = 1.4 times more likely under H1.

If the priors are equal p(H0|y) = 0.41.

One determines p(H0|y) instead of p(y |H0)!

Page 11: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Summary

Advantages of Bayesian methods:

Focus on the data collected, not on the average if one

would collect many sets of data.

Logic / more intuitive

Frequentist hypotheses testing does not necessarily do

what one would like:

• Does not give the probability of the null hypothesis.

• If the sample is large enough a small effect becomessignificant.

Principled way to take into consideration prior knowledge

and incorporate new evidence.

Unified flexible approach

Page 12: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Overview

Bayesian estimation

Bayesian testing

Bayesian model comparison

Methods to calculate the posterior

Bayesian linear and hierarchical linear models

Page 13: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian model comparison

In general, for a model m:

p(m|y) =p(y |m)p(m)

p(y) p(y |m) is the model evidence

Hypotheses can be seen as models.

One can compute a Bayes factor (Bij ) for two models mi

and mj :

Bij =p(y |mi)

p(y |mj)

With equal priors for models i and j , Bij is enough to

compare the models.

If Bij is large model i is more likely than model j .

Page 14: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian model comparison

For a model m with parameters θ:

p(θ|y ,m) =p(y |θ,m) · p(θ|m)

p(y |m)

the model evidence is:

p(y |m) =

p(y |θ,m)p(θ|m)dθ

Page 15: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Overview

Bayesian estimation

Bayesian testing

Bayesian model comparison

Methods to calculate the posterior

Bayesian linear and hierarchical linear models

Page 16: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Posterior distribution calculations

Bayesian inference is based on the posterior distribution

and functions of the posterior distribution.

These calculations are often difficult.

Methods used:

Monte Carlo/ sampling approximations asymptotically exact simple idea computationally expensive

• Markov Chain Monte Carlo (MCMC) sampling

Deterministic approximations not exact computationally efficient

• Variational Bayes

Page 17: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Monte Carlo methods

Idea: instead of calculating the posterior analytically,

approximate it (or a function of it) by sampling.

Example:

g(θ) is a function of θ.

The aim is to estimate E [g(θ)|y ], but p(θ|y) is not possibleto integrate analytically.

Example from estimation: E [L(θ, a)|y ], L(θ, a) = (θ − a)2.

One generates a sample of size n of θ: set of independent and identically distributed θ1, θ2, . . . , θn

E(g(θ)|y) is estimated by:

g =1

n

n∑

i=1

g(θi)

Page 18: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Markov Chain Monte Carlo

Problem: How to generate a sample θ1, θ2, . . . , θn from a

posterior distribution p(θ|y)?

In general it is not simple to generate random numbers

according to a given distribution.

Markov Chain Monte Carlo (MCMC) methods provide away to generate the samples.

MCMC methods are based on constructing a Markov chain

on the space of θ whose steady distribution as the desiredprobability p(θ|y).

The Metropolis-Hastings algorithm and Gibbs sampling are

MCMC methods.

Computationally expensive.

Page 19: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Variational Bayes

Idea: approximate the posterior distribution by a distribution

that is easier to use.

Variational Bayes (VB) is a particular approach to

approximate a distribution.

It is not exact.

It is less computationally expensive - fast.

Often difficult to derive.

It has been very used in neuroimaging.

Page 20: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Overview

Bayesian estimation

Bayesian testing

Bayesian model comparison

Methods to calculate the posterior

Bayesian linear and hierarchical linear models

Page 21: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

General linear model

Y = β1X1 + β2X2 + . . .+ βpXp + ǫ

y1...

yn

= β1

x11...

xn1

+ β2

x12...

xn2

+ . . .+

ǫ1...

ǫn

Y = Xβ + ǫ ǫ ∼ N(0, σ2I)

One finds parameters β’s so that y = β1X1 + β2X2 + . . . is

as close as possible to y .

The ordinary least square estimator of β is:

β = (X ′X )−1X ′y

If ǫ ∼ N(0,Σ) generalized least squares estimator is:

βGLS = (X ′Σ−1X )−1X ′Σ−1y

Page 22: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian analysis of the general linear model

Y = Xβ + ǫ, ǫ ∼ N(0,Σ)

Likelihood: p(y |β) ∼ N(Xβ,Σ)

Assuming a normal prior: p(β) ∼ N(β0,Σ0)

The posterior is normal: p(β|y) ∼ N(β∗,Σp)

β∗ = (X ′Σ−1X +Σ−10 )−1(X ′Σ−1y +Σ−1

0 β0)

If Σ0 is large one gets GLS: β∗ ≈ (X ′Σ−1X )−1X ′Σ−1y

Page 23: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Bayesian analysis of the general linear model

Bayesian approach: calculate probability φ that

β exceeds some specific threshold γ given the

data.

φ = p(β > γ|y)

Frequentist approach: p values reflect how

probable the data is given that there is no

effect.

p = p(f (y) > u|β = 0)

γ

β

φ

t=f(y) u

p

Page 24: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Multilevel/hierarchical models

Multilevel / hierarchical data: data with a clusteredstructure.

• Example: repeated measures nested in subjects, subjects

nested in medical doctor, doctor nested in hospital.

For repeated measures in subjects one could consider:

taking one measure per subject

y1 y2 yk

...θ

analyzing each subject separately

1,ny

θ1

y1,1... m,ny

θm...

ym,1...

Page 25: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Multilevel/hierarchical models Multilevel / hierarchical models can be used to analyze

such data. Some parameters will model the data within a cluster and

others across clusters. Combine information across groups, but allow group

specific characteristics.

• Example: each subject can have its specific response, but

there is also a common aspect.

1,ny

θ1

y1,1... m,nyym,1...

θm

π

...

Page 26: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Hierarchical Bayesian linear models

Linear models are very commonly used.

For clustered data linear models can be written as

hierarchical models.

Bayesian approach - hierarchical Bayesian linear models

using priors and estimating posterior distributions.

Page 27: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Hierarchical Bayesian linear models- example

Single-subject level:

Y = Xβ + ǫ, ǫ ∼ N(0,Σ)

p(y |β) ∼ N(Xβ,Σ)

Group level:

β = βg + η, η ∼ N(0,Σg)

p(β|βg) ∼ N(βg ,Σg)

βg ∼ f

Subject 1

Subject 3

Subject 5

Subject 2

Subject 4

Population

Shrinkage: the cluster specific estimates get shrunktowards the common mean.

More uncertainty implies more shrinkage.

• Example: unreliable subjects will count less.

Page 28: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Hierarchical Bayesian linear models - exampleExample: Response measured across time for each subject.

yj |i response for subject i at time xj |i

yj|i ∼ N(ωj|i , λ)

ωj|i = ϕi + ξixj|i

ϕi individual intercept

ξi individual slope

ϕi ∼ N(κ, δ)

ξi ∼ N(ζ, γ)

κ, ζ ∼ N(M,H)

λ, δ, γ ∼ Γ(K , I)

Figure adapted from Kruschke 2014.

Page 29: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

Empirical Bayes methods

If a prior distribution is known - full Bayesian approach.

If a prior distribution is not known:

The data can be used to estimate:

the parameters of the prior distribution - hyperparameters

or

the prior distribution.

Empirical Bayes is used in inference with hierarchical

Bayesian models.

Page 30: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to

References

M. DeGroot and M. Schervish, Probability and Statistics,

2002.

A. Gelman et al., Bayesian Data Analysis, 2014.

J.K. Kruschke, Doing Bayesian Data Analysis, 2014.

E.-J. Wagenmakers, A practical solution to the pervasive

problems of p values, Psychonomic Bulletin & Review,

2007.