some preliminary q&aph7440/pubh7440/slide1.pdf · some preliminary q&a how does it work? a...

78
Some preliminary Q&A What is the philosophical difference between classical (“frequentist”) and Bayesian statistics? Chapter 1: Approaches for Statistical Inference – p. 1/19

Upload: others

Post on 14-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

What is the philosophical difference between classical(“frequentist”) and Bayesian statistics?

Chapter 1: Approaches for Statistical Inference – p. 1/19

Page 2: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

What is the philosophical difference between classical(“frequentist”) and Bayesian statistics?

To a frequentist, unknown model parameters arefixed and unknown, and only estimable byreplications of data from some experiment.

Chapter 1: Approaches for Statistical Inference – p. 1/19

Page 3: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

What is the philosophical difference between classical(“frequentist”) and Bayesian statistics?

To a frequentist, unknown model parameters arefixed and unknown, and only estimable byreplications of data from some experiment.A Bayesian thinks of parameters as random, andthus having distributions (just like the data). We canthus think about unknowns for which no reliablefrequentist experiment exists, e.g.

θ = proportion of US men withuntreated atrial fibrillation

Chapter 1: Approaches for Statistical Inference – p. 1/19

Page 4: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

How does it work?

Chapter 1: Approaches for Statistical Inference – p. 2/19

Page 5: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

How does it work?A Bayesian writes down a prior guess for θ, p(θ),then combines this with the information that the dataX provide to obtain the posterior distribution of θ,p(θ|X). All statistical inferences (point and intervalestimates, hypothesis tests) then follow asappropriate summaries of the posterior.

Chapter 1: Approaches for Statistical Inference – p. 2/19

Page 6: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

How does it work?A Bayesian writes down a prior guess for θ, p(θ),then combines this with the information that the dataX provide to obtain the posterior distribution of θ,p(θ|X). All statistical inferences (point and intervalestimates, hypothesis tests) then follow asappropriate summaries of the posterior.Note that

posterior information ≥ prior information ≥ 0 ,

with the second “≥” replaced by “=” only if the prior isnoninformative (which is often uniform, or “flat”).

Chapter 1: Approaches for Statistical Inference – p. 2/19

Page 7: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

Is the classical approach “wrong”?

Chapter 1: Approaches for Statistical Inference – p. 3/19

Page 8: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

Is the classical approach “wrong”?While a “hardcore” Bayesian might say so, it isprobably more accurate to think of classical methodsas merely “limited in scope”!

Chapter 1: Approaches for Statistical Inference – p. 3/19

Page 9: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

Is the classical approach “wrong”?While a “hardcore” Bayesian might say so, it isprobably more accurate to think of classical methodsas merely “limited in scope”!The Bayesian approach expands the class of modelswe can fit to our data, enabling us to handle

repeated measuresunbalanced or missing datanonhomogenous variancesmultivariate data

– and many other settings that are awkward orinfeasible from a classical point of view.

Chapter 1: Approaches for Statistical Inference – p. 3/19

Page 10: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Some preliminary Q&A

Is the classical approach “wrong”?While a “hardcore” Bayesian might say so, it isprobably more accurate to think of classical methodsas merely “limited in scope”!The Bayesian approach expands the class of modelswe can fit to our data, enabling us to handle

repeated measuresunbalanced or missing datanonhomogenous variancesmultivariate data

– and many other settings that are awkward orinfeasible from a classical point of view.The approach also eases the interpretation of andlearning from those models once fit.

Chapter 1: Approaches for Statistical Inference – p. 3/19

Page 11: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Simple example of Bayesian thinking

From Business Week, online edition, July 31, 2001:

Chapter 1: Approaches for Statistical Inference – p. 4/19

Page 12: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Simple example of Bayesian thinking

From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”

Chapter 1: Approaches for Statistical Inference – p. 4/19

Page 13: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Simple example of Bayesian thinking

From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”

Data: plot of turkey consumption by month

Chapter 1: Approaches for Statistical Inference – p. 4/19

Page 14: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Simple example of Bayesian thinking

From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”

Data: plot of turkey consumption by month

Prior:location of Thanksgiving in the calendarknowledge of Americans’ Thanksgiving eating habits

Chapter 1: Approaches for Statistical Inference – p. 4/19

Page 15: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Simple example of Bayesian thinking

From Business Week, online edition, July 31, 2001:“Economists might note, to take a simpleexample, that American turkey consumptiontends to increase in November. A Bayesian wouldclarify this by observing that Thanksgiving occursin this month.”

Data: plot of turkey consumption by month

Prior:location of Thanksgiving in the calendarknowledge of Americans’ Thanksgiving eating habits

Posterior: Understanding of the pattern in the data!

Chapter 1: Approaches for Statistical Inference – p. 4/19

Page 16: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes means revision of estimates

Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:

Suppose you are about to make your first submission toa particular academic journal

Chapter 1: Approaches for Statistical Inference – p. 5/19

Page 17: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes means revision of estimates

Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:

Suppose you are about to make your first submission toa particular academic journal

You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)

Chapter 1: Approaches for Statistical Inference – p. 5/19

Page 18: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes means revision of estimates

Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:

Suppose you are about to make your first submission toa particular academic journal

You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)

You submit your article and it is accepted!

Chapter 1: Approaches for Statistical Inference – p. 5/19

Page 19: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes means revision of estimates

Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:

Suppose you are about to make your first submission toa particular academic journal

You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)

You submit your article and it is accepted!

Question: What is your revised opinion regarding theacceptance probability for papers like yours?

Chapter 1: Approaches for Statistical Inference – p. 5/19

Page 20: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes means revision of estimates

Humans tend to be Bayesian in the sense that most revisetheir opinions about uncertain quantities as more dataaccumulate. For example:

Suppose you are about to make your first submission toa particular academic journal

You assess your chances of your paper being accepted(you have an opinion, but the “true” probability isunknown)

You submit your article and it is accepted!

Question: What is your revised opinion regarding theacceptance probability for papers like yours?

If you said anything other than “1”, you are a Bayesian!

Chapter 1: Approaches for Statistical Inference – p. 5/19

Page 21: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes can account for structureCounty-level breast cancer rates per 10,000 women:

79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96

With no direct data for ⋆, what estimate would you use?

Chapter 1: Approaches for Statistical Inference – p. 6/19

Page 22: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes can account for structureCounty-level breast cancer rates per 10,000 women:

79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96

With no direct data for ⋆, what estimate would you use?

Is 200 reasonable?

Chapter 1: Approaches for Statistical Inference – p. 6/19

Page 23: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes can account for structureCounty-level breast cancer rates per 10,000 women:

79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96

With no direct data for ⋆, what estimate would you use?

Is 200 reasonable?

Probably not: all the other rates are around 100

Chapter 1: Approaches for Statistical Inference – p. 6/19

Page 24: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes can account for structureCounty-level breast cancer rates per 10,000 women:

79 87 83 80 7890 89 92 99 9596 100 ⋆ 110 115101 109 105 108 11296 104 92 101 96

With no direct data for ⋆, what estimate would you use?

Is 200 reasonable?

Probably not: all the other rates are around 100

Perhaps use the average of the “neighboring” values(again, near 100)

Chapter 1: Approaches for Statistical Inference – p. 6/19

Page 25: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus

rate =2

100× 10, 000 = 200

Would you use this value as the estimate?

Chapter 1: Approaches for Statistical Inference – p. 7/19

Page 26: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus

rate =2

100× 10, 000 = 200

Would you use this value as the estimate?

Probably not: The sample size is very small, so thisestimate will be unreliable. How about a compromisebetween 200 and the rates in the neighboring counties?

Chapter 1: Approaches for Statistical Inference – p. 7/19

Page 27: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus

rate =2

100× 10, 000 = 200

Would you use this value as the estimate?

Probably not: The sample size is very small, so thisestimate will be unreliable. How about a compromisebetween 200 and the rates in the neighboring counties?

Now repeat this thought experiment if the county ⋆ datawere 20/1000, 200/10000, ...

Chapter 1: Approaches for Statistical Inference – p. 7/19

Page 28: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Accounting for structure (cont’d)Now assume that data become available for county ⋆:100 women at risk, 2 cancer cases. Thus

rate =2

100× 10, 000 = 200

Would you use this value as the estimate?

Probably not: The sample size is very small, so thisestimate will be unreliable. How about a compromisebetween 200 and the rates in the neighboring counties?

Now repeat this thought experiment if the county ⋆ datawere 20/1000, 200/10000, ...

Bayes and empirical Bayes methods can incorporatethe structure in the data, weight the data and priorinformation appropriately, and allow the data todominate as the sample size becomes large.

Chapter 1: Approaches for Statistical Inference – p. 7/19

Page 29: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Motivating Example

From Berger and Berry (1988, Amer. Scientist):Consider a clinical trial to study the effectiveness ofVitamin C in treating the common cold.

Chapter 1: Approaches for Statistical Inference – p. 8/19

Page 30: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Motivating Example

From Berger and Berry (1988, Amer. Scientist):Consider a clinical trial to study the effectiveness ofVitamin C in treating the common cold.

Observations are matched pairs of subjects (twins?),half randomized (in “double blind” fashion) to vitamin C,half to placebo. We count how many pairs had C givingsuperior relief after 48 hours.

Chapter 1: Approaches for Statistical Inference – p. 8/19

Page 31: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Two DesignsDesign #1: Sample n = 17 pairs, and test

H0 : P (C better) =1

2vs. HA : P (C better) 6= 1

2

Suppose we observe x = 13 preferences for C. Then

p-value = P (X ≥ 13 or X ≤ 4) = .049

So if α = .05, stop and reject H0.

Chapter 1: Approaches for Statistical Inference – p. 9/19

Page 32: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Two DesignsDesign #1: Sample n = 17 pairs, and test

H0 : P (C better) =1

2vs. HA : P (C better) 6= 1

2

Suppose we observe x = 13 preferences for C. Then

p-value = P (X ≥ 13 or X ≤ 4) = .049

So if α = .05, stop and reject H0.

Design #2: Sample n1 = 17 pairs. Then:

if x1 ≥ 13 or x1 ≤ 4, stop.otherwise, sample an additional n2 = 27 pairs.

Reject H0 if X1 + X2 ≥ 29 or X1 + X2 ≤ 15.

Chapter 1: Approaches for Statistical Inference – p. 9/19

Page 33: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Two Designs (cont’d)

We choose this second stage since under H0,P (X1 + X2 ≥ 29 or X1 + X2 ≤ 15) = .049– the same as Stage 1!

Chapter 1: Approaches for Statistical Inference – p. 10/19

Page 34: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Two Designs (cont’d)

We choose this second stage since under H0,P (X1 + X2 ≥ 29 or X1 + X2 ≤ 15) = .049– the same as Stage 1!

Suppose we again observe X1 = 13. Now:

p-value = P (X1 ≥ 13 or X1 ≤ 4)

+P (X1 + X2 ≥ 29 and 4 < X1 < 13)

+P (X1 + X2 ≤ 15 and 4 < X1 < 13)

= .085 ← no longer significant at α = .05!

Chapter 1: Approaches for Statistical Inference – p. 10/19

Page 35: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Two Designs (cont’d)

We choose this second stage since under H0,P (X1 + X2 ≥ 29 or X1 + X2 ≤ 15) = .049– the same as Stage 1!

Suppose we again observe X1 = 13. Now:

p-value = P (X1 ≥ 13 or X1 ≤ 4)

+P (X1 + X2 ≥ 29 and 4 < X1 < 13)

+P (X1 + X2 ≤ 15 and 4 < X1 < 13)

= .085 ← no longer significant at α = .05!

Yet the observed data was exactly the same; all we didwas contemplate a second stage (no effect on data),and it changed our answer!

Chapter 1: Approaches for Statistical Inference – p. 10/19

Page 36: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 37: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 38: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!

Q: So are p-values really “objective evidence”?

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 39: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!

Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 40: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!

Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!

Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 41: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!

Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!

Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?

A: No: This aspect of design wasn’t anticipated, sop-values not computable!

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 42: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!

Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!

Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?

A: No: This aspect of design wasn’t anticipated, sop-values not computable!

Q: Can a Bayesian?

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 43: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Additional Q & A

Q: What if we kept adding stages?A: p-value→ 1, even though x1 still 13!

Q: So are p-values really “objective evidence”?A: No, since extra info (like design) critical!

Q: What about unforeseen events?Example: First 5 patients develop an allergic reaction tothe treatment – trial is stopped by clinicians.Can a frequentist analyze these data?

A: No: This aspect of design wasn’t anticipated, sop-values not computable!

Q: Can a Bayesian?A: (obviously) Yes – as we shall see.....

Chapter 1: Approaches for Statistical Inference – p. 11/19

Page 44: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversy

Frequentist outlook: Suppose we have k unknownquantities θ = (θ1, θ2, . . . , θk), and data X = (X1, ..., Xn)with distribution which depends on θ, say p(x|θ).

Chapter 1: Approaches for Statistical Inference – p. 12/19

Page 45: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversy

Frequentist outlook: Suppose we have k unknownquantities θ = (θ1, θ2, . . . , θk), and data X = (X1, ..., Xn)with distribution which depends on θ, say p(x|θ).

The frequentist selects a loss function, L(θ, a), anddevelops procedures which perform well with respect tofrequentist risk,

R(θ, δ) = EX|θ[L(θ, a)]

Chapter 1: Approaches for Statistical Inference – p. 12/19

Page 46: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversy

Frequentist outlook: Suppose we have k unknownquantities θ = (θ1, θ2, . . . , θk), and data X = (X1, ..., Xn)with distribution which depends on θ, say p(x|θ).

The frequentist selects a loss function, L(θ, a), anddevelops procedures which perform well with respect tofrequentist risk,

R(θ, δ) = EX|θ[L(θ, a)]

A good frequentist procedure is one that enjoys low lossover repeated data sampling regardless of the truevalue of θ.

Chapter 1: Approaches for Statistical Inference – p. 12/19

Page 47: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversySo are there any such procedures? Sure:

Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/

√n).

Chapter 1: Approaches for Statistical Inference – p. 13/19

Page 48: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversySo are there any such procedures? Sure:

Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/

√n).

If we choose the loss function

L(θ, δ) =

{

0 if θ ∈ δ(x)

1 if θ 6∈ δ(x), then

R[(θ, σ), δ] = Ex|θ,σ[L(θ, δ(x))] = Pθ,σ(θ 6∈ δ(x)) = .05

Chapter 1: Approaches for Statistical Inference – p. 13/19

Page 49: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversySo are there any such procedures? Sure:

Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/

√n).

If we choose the loss function

L(θ, δ) =

{

0 if θ ∈ δ(x)

1 if θ 6∈ δ(x), then

R[(θ, σ), δ] = Ex|θ,σ[L(θ, δ(x))] = Pθ,σ(θ 6∈ δ(x)) = .05

On average in repeated use, δ fails only 5% of the time.

Chapter 1: Approaches for Statistical Inference – p. 13/19

Page 50: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversySo are there any such procedures? Sure:

Example: Suppose Xi|θ iid∼ N(θ, σ2), i = 1, . . . , n.Standard 95% frequentist confidence interval (CI) isδ(x) = (x̄± 1.96 s/

√n).

If we choose the loss function

L(θ, δ) =

{

0 if θ ∈ δ(x)

1 if θ 6∈ δ(x), then

R[(θ, σ), δ] = Ex|θ,σ[L(θ, δ(x))] = Pθ,σ(θ 6∈ δ(x)) = .05

On average in repeated use, δ fails only 5% of the time.

Has some appeal, since it works for all θ and σ!

Chapter 1: Approaches for Statistical Inference – p. 13/19

Page 51: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversy

Many frequentist analyses based on the likelihoodfunction, which is just p(x|θ) viewed as a function of θ:

L(θ;x) ≡ p(x|θ)

Larger p(x|θ) for values of θ which are more “likely”.Leads naturally to...

Chapter 1: Approaches for Statistical Inference – p. 14/19

Page 52: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversy

Many frequentist analyses based on the likelihoodfunction, which is just p(x|θ) viewed as a function of θ:

L(θ;x) ≡ p(x|θ)

Larger p(x|θ) for values of θ which are more “likely”.Leads naturally to...

The Likelihood Principle: In making inferences ordecisions about θ after x is observed, all relevantexperimental information is contained in the likelihoodfunction for the observed x. Furthermore, two likelihoodfunctions contain the same information about θ if theyare proportional to each other as functions of θ.

Chapter 1: Approaches for Statistical Inference – p. 14/19

Page 53: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayes/frequentist controversy

Many frequentist analyses based on the likelihoodfunction, which is just p(x|θ) viewed as a function of θ:

L(θ;x) ≡ p(x|θ)

Larger p(x|θ) for values of θ which are more “likely”.Leads naturally to...

The Likelihood Principle: In making inferences ordecisions about θ after x is observed, all relevantexperimental information is contained in the likelihoodfunction for the observed x. Furthermore, two likelihoodfunctions contain the same information about θ if theyare proportional to each other as functions of θ.

Seems completely reasonable (and theoreticallyjustifiable) – yet frequentist analyses may violate!...

Chapter 1: Approaches for Statistical Inference – p. 14/19

Page 54: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:

H0 : θ = 1

2vs. HA : θ > 1

2

Two possibilities for f(x|θ):

Chapter 1: Approaches for Statistical Inference – p. 15/19

Page 55: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:

H0 : θ = 1

2vs. HA : θ > 1

2

Two possibilities for f(x|θ):Binomial: n = 12 tosses (fixed beforehand)⇒ X = #H ∼ Bin(12, θ)

⇒ L1(θ) = p1(x|θ) =(

nx

)

θx(1− θ)n−x

=(

12

9

)

θ9(1− θ)3.

Chapter 1: Approaches for Statistical Inference – p. 15/19

Page 56: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:

H0 : θ = 1

2vs. HA : θ > 1

2

Two possibilities for f(x|θ):Binomial: n = 12 tosses (fixed beforehand)⇒ X = #H ∼ Bin(12, θ)

⇒ L1(θ) = p1(x|θ) =(

nx

)

θx(1− θ)n−x

=(

12

9

)

θ9(1− θ)3.

Negative Binomial: Flip until we get r = 3 tails⇒ X ∼ NB(3, θ)

⇒ L2(θ) = p2(x|θ) =(

r+x−1

x

)

θx(1− θ)r

=(

11

9

)

θ9(1− θ)3.

Chapter 1: Approaches for Statistical Inference – p. 15/19

Page 57: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative BinomialExample due to Pratt (comment on Birnbaum, 1962 JASA):Suppose 12 independent coin tosses: 9H, 3T. Test:

H0 : θ = 1

2vs. HA : θ > 1

2

Two possibilities for f(x|θ):Binomial: n = 12 tosses (fixed beforehand)⇒ X = #H ∼ Bin(12, θ)

⇒ L1(θ) = p1(x|θ) =(

nx

)

θx(1− θ)n−x

=(

12

9

)

θ9(1− θ)3.

Negative Binomial: Flip until we get r = 3 tails⇒ X ∼ NB(3, θ)

⇒ L2(θ) = p2(x|θ) =(

r+x−1

x

)

θx(1− θ)r

=(

11

9

)

θ9(1− θ)3.

Adopt the rejection region, “Reject H0 if X ≥ c.”

Chapter 1: Approaches for Statistical Inference – p. 15/19

Page 58: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative Binomialp-values:

Chapter 1: Approaches for Statistical Inference – p. 16/19

Page 59: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative Binomialp-values:

α1 = Pθ= 1

2

(X ≥ 9) = Σ12j=9

(

12

j

)

θj(1− θ)12−j = .075

Chapter 1: Approaches for Statistical Inference – p. 16/19

Page 60: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative Binomialp-values:

α1 = Pθ= 1

2

(X ≥ 9) = Σ12j=9

(

12

j

)

θj(1− θ)12−j = .075

α2 = Pθ= 1

2

(X ≥ 9) = Σ∞j=9

(

2+jj

)

θj(1− θ)3 = .0325

Chapter 1: Approaches for Statistical Inference – p. 16/19

Page 61: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative Binomialp-values:

α1 = Pθ= 1

2

(X ≥ 9) = Σ12j=9

(

12

j

)

θj(1− θ)12−j = .075

α2 = Pθ= 1

2

(X ≥ 9) = Σ∞j=9

(

2+jj

)

θj(1− θ)3 = .0325

So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!

Chapter 1: Approaches for Statistical Inference – p. 16/19

Page 62: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative Binomialp-values:

α1 = Pθ= 1

2

(X ≥ 9) = Σ12j=9

(

12

j

)

θj(1− θ)12−j = .075

α2 = Pθ= 1

2

(X ≥ 9) = Σ∞j=9

(

2+jj

)

θj(1− θ)3 = .0325

So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!

What happened? Besides the observed x = 9, we alsotook into account the “more extreme” X ≥ 10.

Chapter 1: Approaches for Statistical Inference – p. 16/19

Page 63: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative Binomialp-values:

α1 = Pθ= 1

2

(X ≥ 9) = Σ12j=9

(

12

j

)

θj(1− θ)12−j = .075

α2 = Pθ= 1

2

(X ≥ 9) = Σ∞j=9

(

2+jj

)

θj(1− θ)3 = .0325

So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!

What happened? Besides the observed x = 9, we alsotook into account the “more extreme” X ≥ 10.

Jeffreys (1961): “...a hypothesis which may be true maybe rejected because it has not predicted observableresults which have not occurred.”

Chapter 1: Approaches for Statistical Inference – p. 16/19

Page 64: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Binomial vs. Negative Binomialp-values:

α1 = Pθ= 1

2

(X ≥ 9) = Σ12j=9

(

12

j

)

θj(1− θ)12−j = .075

α2 = Pθ= 1

2

(X ≥ 9) = Σ∞j=9

(

2+jj

)

θj(1− θ)3 = .0325

So at α = .05, two different decisions! Violates theLikelihood Principle, since L1(θ) ∝ L2(θ)!!

What happened? Besides the observed x = 9, we alsotook into account the “more extreme” X ≥ 10.

Jeffreys (1961): “...a hypothesis which may be true maybe rejected because it has not predicted observableresults which have not occurred.”

In our example, the probability of the unpredicted andnonoccurring set X ≥ 10 has been used as evidenceagainst H0 !

Chapter 1: Approaches for Statistical Inference – p. 16/19

Page 65: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Conditional (Bayesian) Perspective

Always condition on data which has actually occurred;the long-run performance of a procedure is of (at most)secondary interest. Fix a prior distribution p(θ), and useBayes’ Theorem (1763):

p(θ|x) ∝ p(x|θ)p(θ)

(“posterior ∝ likelihood× prior”)

Chapter 1: Approaches for Statistical Inference – p. 17/19

Page 66: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Conditional (Bayesian) Perspective

Always condition on data which has actually occurred;the long-run performance of a procedure is of (at most)secondary interest. Fix a prior distribution p(θ), and useBayes’ Theorem (1763):

p(θ|x) ∝ p(x|θ)p(θ)

(“posterior ∝ likelihood× prior”)

Indeed, it often turns out that using the Bayesianformalism with relatively vague priors producesprocedures which perform well using traditionalfrequentist criteria (e.g., low mean squared error overrepeated sampling)!

Chapter 1: Approaches for Statistical Inference – p. 17/19

Page 67: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Conditional (Bayesian) Perspective

Always condition on data which has actually occurred;the long-run performance of a procedure is of (at most)secondary interest. Fix a prior distribution p(θ), and useBayes’ Theorem (1763):

p(θ|x) ∝ p(x|θ)p(θ)

(“posterior ∝ likelihood× prior”)

Indeed, it often turns out that using the Bayesianformalism with relatively vague priors producesprocedures which perform well using traditionalfrequentist criteria (e.g., low mean squared error overrepeated sampling)!– several examples in Chapter 4 of the C&L text!

Chapter 1: Approaches for Statistical Inference – p. 17/19

Page 68: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Frequentist criticisms

[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]

Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)

Chapter 1: Approaches for Statistical Inference – p. 18/19

Page 69: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Frequentist criticisms

[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]

Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)

Not easy/automatic (but computing keeps improving:MCMC methods, WinBUGS software....)

Chapter 1: Approaches for Statistical Inference – p. 18/19

Page 70: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Frequentist criticisms

[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]

Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)

Not easy/automatic (but computing keeps improving:MCMC methods, WinBUGS software....)

How to pick the prior p(θ): Two experimenters could getdifferent answers with the same data! How to controlinfluence of the prior? How to get objective results (say,for a court case, scientific report,....)?

Chapter 1: Approaches for Statistical Inference – p. 18/19

Page 71: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Frequentist criticisms

[from B. Efron (1986, Amer. Statist.), “Why isn’t everyone aBayesian?”]

Shouldn’t condition on x (but this renews conflict withthe Likelihood Principle)

Not easy/automatic (but computing keeps improving:MCMC methods, WinBUGS software....)

How to pick the prior p(θ): Two experimenters could getdifferent answers with the same data! How to controlinfluence of the prior? How to get objective results (say,for a court case, scientific report,....)?

⇒ Clearly this final criticism is the most serious!

Chapter 1: Approaches for Statistical Inference – p. 18/19

Page 72: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayesian Advantages in InferenceAbility to formally incorporate prior information

Chapter 1: Approaches for Statistical Inference – p. 19/19

Page 73: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayesian Advantages in InferenceAbility to formally incorporate prior information

The reason for stopping experimentation does not affectthe inference

Chapter 1: Approaches for Statistical Inference – p. 19/19

Page 74: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayesian Advantages in InferenceAbility to formally incorporate prior information

The reason for stopping experimentation does not affectthe inference

Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)

Chapter 1: Approaches for Statistical Inference – p. 19/19

Page 75: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayesian Advantages in InferenceAbility to formally incorporate prior information

The reason for stopping experimentation does not affectthe inference

Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)

All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed

Chapter 1: Approaches for Statistical Inference – p. 19/19

Page 76: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayesian Advantages in InferenceAbility to formally incorporate prior information

The reason for stopping experimentation does not affectthe inference

Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)

All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed

Any question can be directly answered (bioequivalence,multiple comparisons/hypotheses, ...)

Chapter 1: Approaches for Statistical Inference – p. 19/19

Page 77: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayesian Advantages in InferenceAbility to formally incorporate prior information

The reason for stopping experimentation does not affectthe inference

Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)

All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed

Any question can be directly answered (bioequivalence,multiple comparisons/hypotheses, ...)

Inferences are conditional on the actual data

Chapter 1: Approaches for Statistical Inference – p. 19/19

Page 78: Some preliminary Q&Aph7440/pubh7440/slide1.pdf · Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information

Bayesian Advantages in InferenceAbility to formally incorporate prior information

The reason for stopping experimentation does not affectthe inference

Answers are more easily interpretable by nonspecialists(e.g. confidence intervals)

All analyses follow directly from the posterior; noseparate theories of estimation, testing, multiplecomparisons, etc. are needed

Any question can be directly answered (bioequivalence,multiple comparisons/hypotheses, ...)

Inferences are conditional on the actual data

Bayes procedures possess many optimality properties(e.g. consistent, impose parsimony in model choice,define the class of optimal frequentist procedures, ...)

Chapter 1: Approaches for Statistical Inference – p. 19/19