introduction to bayesian inference

99
My Adventures with Bayes Peter Chapman Wokingham U3A Maths Group 6 April 2011

Upload: peter-chapman

Post on 05-Jul-2015

96 views

Category:

Environment


0 download

TRANSCRIPT

Page 1: Introduction to Bayesian Inference

My Adventures with Bayes

Peter Chapman

Wokingham U3A Maths Group

6 April 2011

Page 2: Introduction to Bayesian Inference

Contents • My background

• Motivation

• Some data

• The normal distribution

• Classical inference

• Bayes theorem

• Who was Thomas Bayes

• Bayesian inference

• Some examples of Bayesian inference

Page 3: Introduction to Bayesian Inference

WHO AM I

Page 4: Introduction to Bayesian Inference

CV

1962-1969: Ashford Grammar School (Middlesex/ Surrey). A-level in Pure Maths, Applied Maths, Chemistry, Physics. 1969-1972: Manchester University - Pure and Applied Maths. 1973: Department of Education, London – Assistant Statistician. 1973-1977: Exeter University – PhD in Applied Statistics 1977-1982: Grassland Research Institute, Hurley – Statistician. 1982-2007: ICI/Zeneca/AstraZeneca/Syngenta, Bracknell - Statistician. 2007-2009: Unilever, Sharnbrook, Bedfordshire. 2009: Retired – joined Wokingham U3A – some consultancy

Page 5: Introduction to Bayesian Inference

MOTIVATION

Page 6: Introduction to Bayesian Inference

In September 2010 I was offered a contract by my former employer, Syngenta, of Bracknell. The contract on offer required me to (a) carry out a Bayesian analysis, and (b) use the freeware software R. Both of these were new to me and required a significant amount of learning. About the same time I was asked to make a presentation to the Wokingham U3A Maths Group. Since I was putting in a significant amount of time to learn new techniques, it seemed only appropriate to share this learning with them.

Page 7: Introduction to Bayesian Inference

This is a presentation about Bayesian methods. Although I am using UK temperature records to illustrate methods, this is not a presentation about climate change. A much more thorough analysis is necessary before we can say anything substantial about climate change. This presentation is not about the normal distribution. Because the normal distribution is well known and easy to work with I have used it to demonstrate Bayesian methodology. The ideas presented here will translate to other, more complex, distributions.

Page 8: Introduction to Bayesian Inference

SOME DATA

Page 9: Introduction to Bayesian Inference

Monthly mean, Central England temperature (degrees C) 1659-1973 Manley (Q.J.R.Meterol.Soc., 1974) 1974-onwards Parker et al. (Int.J.Clim., 1992) Parker and Horton (Int.J.Clim., 2005)

Year

oC

Average January Temperature - Central England :1659 - 2010

http://www.metoffice.gov.uk/hadobs/hadcet/cetml1659on.dat

Page 10: Introduction to Bayesian Inference

Monthly mean, Central England temperature (degrees C) 1659-1973 Manley (Q.J.R.Meterol.Soc., 1974) 1974-onwards Parker et al. (Int.J.Clim., 1992) Parker and Horton (Int.J.Clim., 2005)

Year

oC

Average June Temperature - Central England :1659 - 2010

Page 11: Introduction to Bayesian Inference

Monthly mean, Central England temperature (degrees C) 1659-1973 Manley (Q.J.R.Meterol.Soc., 1974) 1974-onwards Parker et al. (Int.J.Clim., 1992) Parker and Horton (Int.J.Clim., 2005)

Year

oC

Average Annual Temperature - Central England :1659 - 2010

Page 12: Introduction to Bayesian Inference
Page 13: Introduction to Bayesian Inference
Page 14: Introduction to Bayesian Inference

Average Monthly Temperature - Central England :1659 - 2010

January

June

Annual

Page 15: Introduction to Bayesian Inference

THE NORMAL DISTRIBUTION

Page 16: Introduction to Bayesian Inference

2

22

2

1

2

x

e

2( | , ) f x

Page 17: Introduction to Bayesian Inference

2

22 2

2

1| ,

2

x

y f x e

X

2

is called the mean

is called the variance

is called the standard deviation

Page 18: Introduction to Bayesian Inference

2

22

2

1Prob( )

2

xb

X b e dx

X

b

Page 19: Introduction to Bayesian Inference

2

2

2

21 2

21

1Prob( )

2

xb

b

b X b e dx

X

1b 2b

Page 20: Introduction to Bayesian Inference

2

22

2

1Prob( ) 1

2

x

X e dx

X

Page 21: Introduction to Bayesian Inference

2

2

22

2

1Prob( ) 0.66 ( )

2

x

X e dx approx

X

Page 22: Introduction to Bayesian Inference

4

2

2

2

2

22

1Prob( 2 2 ) 0.95 ( )

2

x

X e dx approx

X

Page 23: Introduction to Bayesian Inference

6

2

2

3

2

23

1Prob( 3 3 ) 0.99( )

2

x

X e dx approx

X

Page 24: Introduction to Bayesian Inference

2

22

2

1Prob( )

2

xx

X x e dx

1.0

0.0

b

2

221

Prob( )2

xb

X b e dx

X

Page 25: Introduction to Bayesian Inference

2

2

x-μ-

2 2σ

2(Probability)

1f x | μ,σ = e Density Fis called a .

2πσunction (PDF)

2

22 2

2

1( | , ) is called Cumulative Distribution a Funcion ( .

2CDF)

xx

F x e dx

2Vertical line indicates a distribution of x conditional on the values of μ and σ

Page 26: Introduction to Bayesian Inference

CLASSICAL INFERENCE

Page 27: Introduction to Bayesian Inference

We have some data..............and we believe that the data derives from

a normal distribution.

Fundamental principle : the parameters, μ and σ in our case,

are fixed or constant.

. 2 2ˆ ˆOur objective is therefore to estimate μ and σ ............μ and σ .

We also want to know how precise the parameter estimates are........

.......... so we need to compute confidence intervals.

2

2 2

At this stage we can compute ( | , )for a variety of values

of μ and σ , but we do not know the correct values for μ and σ .

f x

Page 28: Introduction to Bayesian Inference

Year

oC

Average June Temperature - Central England :1659 - 2010

2I am going to guess that μ = 15 and σ = 1 (σ = 1)

Page 29: Introduction to Bayesian Inference
Page 30: Introduction to Bayesian Inference

15, 1

Page 31: Introduction to Bayesian Inference

15, 1 13, 1

14, 2

Page 32: Introduction to Bayesian Inference

2

2

i

--

2 22

2

We have 352 values of temperature, t , where i = 1659 to 2010.

1We can compute ( | , ) for any value of and we like.

2

it

if t e

20102

1659

2

iIn the classical approach we compute ( | , ) for all t , i = 1659 2010.

then multiply then to ( | ,gether ).) ( | ,i

i

i

L

f t

t f t

This is called the likelihood.

2We then find the values of and that maximise the likelihood.

2ˆ ˆWe call these maximum - likelihood estimates : and .

Page 33: Introduction to Bayesian Inference
Page 34: Introduction to Bayesian Inference

( 7, 1.5)

Page 35: Introduction to Bayesian Inference

( 23, 1.5)

Page 36: Introduction to Bayesian Inference

( 14, 1)

Page 37: Introduction to Bayesian Inference

2

22010

2 2

21659

1( , )

2

it

i

L e

2010

* 2 2 2

21659

352 352 1( , ) log (2 ) log ( ) ( )

2 2 2e e e i

i

L Log L t

* 20102

21659

1( - ) 0 i

i

Lt

2010

1659

352i

i

t t

* 20102

2 2 41659

352 1 1( - ) 0

2 2i

i

Lt

20102 2

1659

1ˆ ( - )

352i

i

t t

Page 38: Introduction to Bayesian Inference

2

ˆ 14.33

ˆ 1.09

ˆ 1.188

Page 39: Introduction to Bayesian Inference

July

ˆ 15.96

ˆ 1.15

February

ˆ 3.86

ˆ 1.83

October

ˆ 9.69

ˆ 1.30

Page 40: Introduction to Bayesian Inference

Confidence Intervals

Beyond the scope of this talk

Page 41: Introduction to Bayesian Inference

BAYES THEOREM

Page 42: Introduction to Bayesian Inference

Q = set of people tested for disease

D = subset of people who have disease

D = subset of people who do not have disease

T = subset of people who test positive

T = set of people who do not test positive

D + D = T + T = Q

P(D) = probability that an individual has the disease

P(D | T) = probability that an individual has a disease given that they have tested positive

P(T) = probability that an individual tests positive

P(T | D) = probability that an individual tests positive given that they have the disease

( | ) ( )( | )

( )

P T D P DP D T

P TBayes Theorem :

Page 43: Introduction to Bayesian Inference

Sum

9,900

10,000

19,900

100

9,980,000

9,980,100

Sum

10,000

9,990,000

10,000,000

D D

T

T

Page 44: Introduction to Bayesian Inference

9,900 10000 19,900

100 9,980,000 9,980,100

10,000 9,990,000 10,000,000

D D

T

T

10000( ) 0.001

10000000P D

19900( ) 0.00199

10000000P T

( ) and ( ) are marginal probabilitiesP T P D

Page 45: Introduction to Bayesian Inference

9,900 10000 19,900

100 9,980,000 9,980,100

10,000 9,990,000 10,000,000

D D

T

T

9900( | ) 0.99

10000P T D

P(D | T) are P(T | D) are

conditional probabilities

( | ) ( )( | )

( )

P T D P DP D T

P T

0.99*0.001

0.00199

0.00099

0.00199

9900

19900 0.497487

9900( | ) 0.497487

19900P D T

Page 46: Introduction to Bayesian Inference

BAYESIAN INFERENCE

Page 47: Introduction to Bayesian Inference

A fundamental assumption of Bayesian inference is that the unknown parameters are variables.

2For the normal distribution this means that μ and σ are variables, not constants.

2 22 2 2

If we apply Bayes theorem to the normal density function we get :

( | , ) ( , )( , | ) ( | , ) ( , )

( )

L t ff t L t f

f t

Posterior

Distribution

Likelihood Prior

Distribution

Data

Page 48: Introduction to Bayesian Inference

For many years Bayesian analysis was a theoretical academic pastime.

This was because the mathematics was very difficult.

Analytic solutions for the posterior often involved complex multiple integrals.

One of the few that can be solved is the normal distribution with uniform priors.

2010

,

1659

1

352i

i

t t

2010

22

1659

1, - .

352 -1)i

i

and s t t

In what follows :

Page 49: Introduction to Bayesian Inference

If μ and logσ follow independent uniform prior distributions, then :

2

2

1( , ) , sof

2

2 2

2

352

1 1exp 352 1 352 ,

2

1s t

2010

2

23 21 59

256

1 1e

2,p

1x i

i

t

2010

2

216

259

1 1ex ,

2

1p i

i

t

2 22( | ,( )) , ), | (f fTT L

2

2352

2

2 2

1exp 35

1 12exp 352

1

2,

21 ts

2

2352

2

2

1 1exp 352 1 ,

3522

1,N ts

Page 50: Introduction to Bayesian Inference

2 2 2

:

( , | ) | , | ,

We need to factorise the posterior as follows

f T f T f T

:and it can be shown that that

22| , , ,

352T N t

2 2 2| (352 1, ), T Inv s and

2

1| , .352

n

sT t t

Marginal posterior for μ

2Marginal posterior for σ

Conditional posterior for

2 2( | ) ( , | )f T f T d

2

2 2( | ) ( , | )f T f T d

Page 51: Introduction to Bayesian Inference

MARKOV CHAIN MONTE-CARLO AND THE METROPOLIS METHOD

Page 52: Introduction to Bayesian Inference

2 22 22

Set up the Bayesian posterior :

( | , ) ( , )( , | ) ( | , )

(( , )

)L

L T ff T

ffT

y

2

20102

21659

In our case it takes the following form :

1 1

exp2

1 ,i

i

t

Page 53: Introduction to Bayesian Inference

2 2

0 0Select initial values, and , for and .

2 2

0 1 0 1Introduce jump functions : and

2

1 1

2

0 0

( , | )Compute = R.

( , | )

f T

f T

Sample a single random value from a Uniform distribution (0,1)Q U

2 2 2

1 1 1 1 0 0If min(1, ) keep ( , ) else ............( , ) ( , )Q R

2 2 2

0 0 1 1 2 2

2 2 2

1 1

Continue doing this ( , ) ( , ) ( , )

( , ) ( , ) ( , )n n n n big big

This results in a random joint sample from the posterior distribution.

Page 54: Introduction to Bayesian Inference

Posterior Distribution

n1n

1 |nf T |nf T

11

( | ) = R > 1 Q so keep

( | )

nn

n

f T

f T

Page 55: Introduction to Bayesian Inference

Posterior Distribution

1n n

|nf T 1 |nf T

11

1

( | ) = R < 1.........if keep

( | )

so keep with probability = R

nn

n

n

f TQ R

f T

Page 56: Introduction to Bayesian Inference

2,n n

2

1 1,n n Jump Function

2

1 (0, )n n Zrnorm

2 2 2

1 (0, )n n Wrnorm

Page 57: Introduction to Bayesian Inference

A COMPARISON OF THREE MODELS

Page 58: Introduction to Bayesian Inference

2

2

2

2 2

2 2

2

22 2

2 2

1| ,

2

1| , , ,

2

t

t

f t e

f t e

2

22 2

2

1 | ,

2

t

f t e

Model 1 for all years.

Model 2

for earlier years

for later years

Model 3

2

22 2

2

1 | , ,

2

t t

f t e

i.e. t

Page 59: Introduction to Bayesian Inference

2 2

2 2 2 2

| , ,

| , , , ,

f t N

f t N

2 2 | , ,f t N Model 1 for all years.

Model 2for earlier years

for later years

Model 3 2 2 | , , ,f t N t i.e. t

Page 60: Introduction to Bayesian Inference

for earlier years

for later years

2 2

2 2 2 2

| , ,

| , , , ,

f t N

f t N

Model 2

for earlier years

for later years

2 2

2 2

| , ,

| , ,

early early early early

late late late late

f t N

f t N

Model 2

Is the same as

2 2 2where and late early late early

Page 61: Introduction to Bayesian Inference

Model 1

2

22 2

2

1 | ,

2

t

f t e

2 2 | , ,f t N

Page 62: Introduction to Bayesian Inference

2

MCMC Done Very Badl June : 1659y : - 2010

2

0 0,

Jumps Too Small

High correlation between consecutive pairs of sample values.

Posterior Distribution

Page 63: Introduction to Bayesian Inference

Solution

2 2

0 0ˆ ˆBetter starting values , maximum likelihood estimates ,

Burn in sampling phase that is discarded

Main phase with infrequent sampling - e.g. every 10,000 pairth

Page 64: Introduction to Bayesian Inference

2

th

Burn in stage = 1,000,000 pairs

Main sampling = 100,000,000 pairs, sampling every 10,000

Model 1: Posterior Distribution for June 1659: - 2010

Page 65: Introduction to Bayesian Inference

th

Burn in stage = 1,000,000 pairs

Main sampling = 100,000,000 pairs, sampling every 10,000

Model 1: Posterior Distribution for June 1659: - 2010

: mean 14.33

14.213 4.440

2

2

: mean 1.20

1.034 1.389

Page 66: Introduction to Bayesian Inference

2

th

Burn in stage = 1,000,000 pairs

Main sampling = 100,000,000 pairs, sampling every 10,000

Model 1: Posterior Distribution for Januar 1659 -y : 2010

Page 67: Introduction to Bayesian Inference

Model 1: Posterior Distribution for Januar 1659 -y : 2010

1,000,000

100,000,000 , 10000th

Burn in stage iterations

Main sampling pairs sampling every

2

2

: mean 4.02

3.467 4.691

: mean 3.23

3.022 3.442

Page 68: Introduction to Bayesian Inference

Model 1: Posterior Distribution for January 1981- : 2010

th

Burn in stage = 1,000,000 iterations

Main sampling = 100,000,000 pairs, sampling every 10,000

2

Page 69: Introduction to Bayesian Inference

th

Burn in stage = 1,000,000 iterations

Main sampling = 100,000,000 pairs, sampling every 10,000

Model 1: Posterior Distribution for January 1981- : 2010

: mean 4.43

3.796 5.064

2

2

: mean 2.98

1.807 5.374

Page 70: Introduction to Bayesian Inference

Model 1: Distribution for January : 1881-1910

th

Burn in stage = 1,000,000 iterations

Main sampling = 100,000,000 pairs, sampling every 10,000

2

Page 71: Introduction to Bayesian Inference

th

Burn in stage = 1,000,000 iterations

Main sampling = 100,000,000 pairs, sampling every 10,000

Model 1: Distribution for January : 1881- 1910

: mean 3.50

2.857 4.144

2

2

: mean 3.17

1.996 5.893

Page 72: Introduction to Bayesian Inference

oJanuary Average Temperature , C,Central England

1881 1910 1981 2010

Page 73: Introduction to Bayesian Inference

oAverage January Temperature, C

1781 1810 1881 1910 1981 2010

ˆ 2.87 ˆ 3.49 ˆ 4.44

Page 74: Introduction to Bayesian Inference

oAverage June Temperature, C

1781 1810 1881 1910 1981 2010

ˆ 14.54 ˆ 14.11 ˆ 14.48

Page 75: Introduction to Bayesian Inference

Model 2

2 2

2 2 2 2

| , ,

| , , , ,

f t N

f t N

2

2

2

2 2

2 2

2

22 2

2 2

1| ,

2

1| , , ,

2

t

t

f t e

f t e

Page 76: Introduction to Bayesian Inference

th

Burn in stage = 10,000,000 iterations

Main stage = 100,000,000 sets of four, sampling every 10,000

(1881,19Model 2 for January : 10) (1981,2and 010)

22

2

2

22

Page 77: Introduction to Bayesian Inference

(1881,19Model 2 for January : 10) (1981,2and 010)

2

2

Page 78: Introduction to Bayesian Inference

2.5th Percentile Median 97.5th Percentile

2.868 3.49 4.119

0.079 0.94 1.851

1.942 3.03 4.686

-1.588 0.07 2.276

22

(1881,19Model 2 for January : 10) (1981,2and 010)

Page 79: Introduction to Bayesian Inference

th

Burn in stage = 10,000,000 iterations

Main stage = 100,000,000 sets of four, sampling every 10,000

(1881,1910)Model 2 for ( June : 1981,2 and 010)

Page 80: Introduction to Bayesian Inference

(1881,1910)Model 2 for ( June : 1981,2 and 010)

2

2

Page 81: Introduction to Bayesian Inference

2.5th Percentile Median 97.5th Percentile

13.745 14.10 14.464

-0.122 0.38 0.866

0.641 0.98 1.515

-0.587 -0.09 0.778

22

(1881,1910)Model 2 for ( June : 1981,2 and 010)

Page 82: Introduction to Bayesian Inference

Model 3

2

22 2

2

1 | , ,

2

t t

f t e

2 2 | , , ,f t N t

Page 83: Introduction to Bayesian Inference

Model 3 for Janu 1659 -ary : 2010

2

2

Page 84: Introduction to Bayesian Inference

Model 3 for Janu 1659ary : - 2010

2

Page 85: Introduction to Bayesian Inference

2.5th Percentile Median 97.5th Percentile

1.921 2.35 2.770

0.0028 0.0048 0.0068

1.271 3.790 4.411

Model 3 for Janu 1659 -ary : 2010

2

Page 86: Introduction to Bayesian Inference

Model 3 for January : 1659 - 2010

oC

Year

2.35 0.0048( 1650)temp year

Page 87: Introduction to Bayesian Inference

Model 3 for Jun 1659 -e : 2010

2

2

Page 88: Introduction to Bayesian Inference

Model 3 for Jun 1659 -e : 2010

2

Page 89: Introduction to Bayesian Inference

2.5th Percentile Median 97.5th Percentile

11.873 14.10 16.830

-0.00136 0.00013 0.00134

1.035 1.20 1.391

Model 3 for Jun 1659 -e : 2010

2

Page 90: Introduction to Bayesian Inference

Year

Model 3 for June : 1659 - 2010

oC

14.096 0.000127( 1650)temp year

Page 91: Introduction to Bayesian Inference

Model 3 for Jun 1801-e : 2010

Year

oC

14.18 0.0011( 1800)temp year

Page 92: Introduction to Bayesian Inference

Model 3 for Annual A 1v 6er 59age : - 2010

2

2

Page 93: Introduction to Bayesian Inference

2

Model 3 for Annual A 1v 6er 59age : - 2010

Page 94: Introduction to Bayesian Inference

2.5th Percentile Median 97.5th Percentile

8.619 8.75 8.880

0.0019 0.0025 0.0032

0.319 0.369 0.430

2

Model 3 for Annual Average : 1659 - 2010

Page 95: Introduction to Bayesian Inference

Model 3 for Annual A 16ver 59 -age : 2010

oC

Year

8.75 0.0025( 1650)temp year

Page 96: Introduction to Bayesian Inference

THOMAS BAYES

Page 97: Introduction to Bayesian Inference

Rev. Thomas Bayes (1702-1761) His friend Richard Price edited and presented his work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances. The French mathematician Pierre-Simon Laplace reproduced and extended Bayes' results in 1774, apparently quite unaware of Bayes' work. It is speculated that Bayes was elected as a Fellow of the Royal Society in 1742 on the strength of the Introduction to the Doctrine of Fluxions, as he is not known to have published any other mathematical works during his lifetime. It has been suggested that Bayes' theorem, as we now call it, was discovered by Nicholas Saunderson some time before Bayes. This is disputed.

Page 98: Introduction to Bayesian Inference

Comments in place of Conclusions: Bayesian Inference • This presentation was not about the Normal distribution. • The Normal distribution was used to illustrate the methods. • For the problems discussed here Bayesian inference offers few

advantages over classical. • For more complex problems, Bayesian inference offers big

advantages. • Prior to 1990 or so, Bayesian inference was a partially academic

subject. • The advent of MCMC and fast computers has made Bayesian

inference a significant player in the world of data analysis. • The numbers of PhDs in statistics is small and getting smaller but

most of them are absorbed in Bayesian issues. • Bayesian approaches are now commonplace.

Page 99: Introduction to Bayesian Inference

Comments: Climate Change

• This presentation was not about climate change. • A more thorough analysis is required before we can say anything

substantial about climate change. • The results of a limited analysis so far indicate, for Central England,

that summers are not getting warmer. The range of average summer temperatures that we see today is similar to that seen in the past.

• The range of average winter temperatures seems to be narrower than in the past with an absence of very cold months in recent years.

• This effect could, of course, be a result of thermometers being placed in urban areas.

• The statistically significant increase in annual average temperature may be caused may be caused by increasing average winter temperatures.