why we (usually) don't worry about multiple …gelman/presentations/multiple...why we (usually)...

118
What is the multiple comparisons problem? Why don’t I (usually) care? Some stories Statistical framework and multilevel modeling Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima Columbia University 8 Nov 2007 Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Upload: others

Post on 05-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Why we (usually) don’t worry about multiplecomparisons

Andrew Gelman, Jennifer Hill, and Masanao YajimaColumbia University

8 Nov 2007

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 2: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Contents

I What is the multiple comparisons problem?

I Why don’t I (usually) care about it?

I Some stories

I Statistical framework and multilevel modeling

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 3: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Contents

I What is the multiple comparisons problem?

I Why don’t I (usually) care about it?

I Some stories

I Statistical framework and multilevel modeling

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 4: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Contents

I What is the multiple comparisons problem?

I Why don’t I (usually) care about it?

I Some stories

I Statistical framework and multilevel modeling

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 5: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Contents

I What is the multiple comparisons problem?

I Why don’t I (usually) care about it?

I Some stories

I Statistical framework and multilevel modeling

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 6: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Contents

I What is the multiple comparisons problem?

I Why don’t I (usually) care about it?

I Some stories

I Statistical framework and multilevel modeling

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 7: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

What is the multiple comparisons problem?

I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”

I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 8: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

What is the multiple comparisons problem?

I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”

I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 9: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

What is the multiple comparisons problem?

I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”

I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 10: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

What is the multiple comparisons problem?

I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”

I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 11: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

What is the multiple comparisons problem?

I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”

I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 12: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Why don’t I (usually) care about multiple comparisons?

I When looked at one way, multiple comparisons seem like amajor worry

I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 13: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Why don’t I (usually) care about multiple comparisons?

I When looked at one way, multiple comparisons seem like amajor worry

I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 14: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Why don’t I (usually) care about multiple comparisons?

I When looked at one way, multiple comparisons seem like amajor worry

I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 15: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Why don’t I (usually) care about multiple comparisons?

I When looked at one way, multiple comparisons seem like amajor worry

I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 16: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Why don’t I (usually) care about multiple comparisons?

I When looked at one way, multiple comparisons seem like amajor worry

I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 17: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Why don’t I (usually) care about multiple comparisons?

I When looked at one way, multiple comparisons seem like amajor worry

I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 18: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Some stories

I SAT coaching in 8 schools

I Effects of electromagnetic fields at 38 frequencies

I Teacher and school effects in NYC schools

I Grades and classroom seating

I Beautiful parents have more daughters

I Comparing test scores across states

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 19: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Some stories

I SAT coaching in 8 schools

I Effects of electromagnetic fields at 38 frequencies

I Teacher and school effects in NYC schools

I Grades and classroom seating

I Beautiful parents have more daughters

I Comparing test scores across states

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 20: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Some stories

I SAT coaching in 8 schools

I Effects of electromagnetic fields at 38 frequencies

I Teacher and school effects in NYC schools

I Grades and classroom seating

I Beautiful parents have more daughters

I Comparing test scores across states

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 21: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Some stories

I SAT coaching in 8 schools

I Effects of electromagnetic fields at 38 frequencies

I Teacher and school effects in NYC schools

I Grades and classroom seating

I Beautiful parents have more daughters

I Comparing test scores across states

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 22: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Some stories

I SAT coaching in 8 schools

I Effects of electromagnetic fields at 38 frequencies

I Teacher and school effects in NYC schools

I Grades and classroom seating

I Beautiful parents have more daughters

I Comparing test scores across states

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 23: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Some stories

I SAT coaching in 8 schools

I Effects of electromagnetic fields at 38 frequencies

I Teacher and school effects in NYC schools

I Grades and classroom seating

I Beautiful parents have more daughters

I Comparing test scores across states

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 24: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Some stories

I SAT coaching in 8 schools

I Effects of electromagnetic fields at 38 frequencies

I Teacher and school effects in NYC schools

I Grades and classroom seating

I Beautiful parents have more daughters

I Comparing test scores across states

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 25: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

SAT coaching in 8 schools

Estimated Standard errortreatment of effect

School effect, yj estimate, σj

A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18

I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis

I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 26: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

SAT coaching in 8 schools

Estimated Standard errortreatment of effect

School effect, yj estimate, σj

A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18

I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis

I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 27: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

SAT coaching in 8 schools

Estimated Standard errortreatment of effect

School effect, yj estimate, σj

A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18

I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis

I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 28: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

SAT coaching in 8 schools

Estimated Standard errortreatment of effect

School effect, yj estimate, σj

A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18

I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis

I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 29: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

SAT coaching in 8 schools

Estimated Standard errortreatment of effect

School effect, yj estimate, σj

A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18

I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis

I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 30: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

SAT coaching in 8 schools

Estimated Standard errortreatment of effect

School effect, yj estimate, σj

A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18

I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis

I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 31: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Effects of electromagnetic fields at 38 frequencies

0 100 200 300 400 500Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0.0

0.2

0.4

Estimates with statistical significance

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)E

st. t

reat

men

t effe

ct

I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 32: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Effects of electromagnetic fields at 38 frequencies

0 100 200 300 400 500Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0.0

0.2

0.4

Estimates with statistical significance

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)E

st. t

reat

men

t effe

ct

I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 33: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Effects of electromagnetic fields at 38 frequencies

0 100 200 300 400 500Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0.0

0.2

0.4

Estimates with statistical significance

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)E

st. t

reat

men

t effe

ct

I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 34: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Effects of electromagnetic fields at 38 frequencies

0 100 200 300 400 500Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0.0

0.2

0.4

Estimates with statistical significance

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)E

st. t

reat

men

t effe

ct

I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 35: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Separate estimates and hierarchical Bayes estimates

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0 100 200 300 400 500

0.0

0.2

0.4

Multilevel estimates ± standard errors

Frequency of magnetic field (Hz)E

stim

ated

trea

tmen

t effe

ct

I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 36: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Separate estimates and hierarchical Bayes estimates

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0 100 200 300 400 500

0.0

0.2

0.4

Multilevel estimates ± standard errors

Frequency of magnetic field (Hz)E

stim

ated

trea

tmen

t effe

ct

I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 37: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Separate estimates and hierarchical Bayes estimates

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0 100 200 300 400 500

0.0

0.2

0.4

Multilevel estimates ± standard errors

Frequency of magnetic field (Hz)E

stim

ated

trea

tmen

t effe

ct

I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 38: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Separate estimates and hierarchical Bayes estimates

0 100 200 300 400 500

0.0

0.2

0.4

Estimates ± standard errors

Frequency of magnetic field (Hz)

Est

. tre

atm

ent e

ffect

0 100 200 300 400 500

0.0

0.2

0.4

Multilevel estimates ± standard errors

Frequency of magnetic field (Hz)E

stim

ated

trea

tmen

t effe

ct

I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 39: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Teacher and school effects in NYC schools

I Goal is to estimate range of variation(How important are teachers? Schools?)

I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)

I The “multiple comparisons” issue never arises!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 40: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Teacher and school effects in NYC schools

I Goal is to estimate range of variation(How important are teachers? Schools?)

I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)

I The “multiple comparisons” issue never arises!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 41: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Teacher and school effects in NYC schools

I Goal is to estimate range of variation(How important are teachers? Schools?)

I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)

I The “multiple comparisons” issue never arises!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 42: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Teacher and school effects in NYC schools

I Goal is to estimate range of variation(How important are teachers? Schools?)

I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)

I The “multiple comparisons” issue never arises!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 43: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Grades and classroom seating

I Classroom demonstration

I Assign students random numbers as “grades”

I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand

I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)

I This is a pure multiple comparisons problem!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 44: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Grades and classroom seating

I Classroom demonstration

I Assign students random numbers as “grades”

I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand

I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)

I This is a pure multiple comparisons problem!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 45: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Grades and classroom seating

I Classroom demonstration

I Assign students random numbers as “grades”

I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand

I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)

I This is a pure multiple comparisons problem!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 46: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Grades and classroom seating

I Classroom demonstration

I Assign students random numbers as “grades”

I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand

I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)

I This is a pure multiple comparisons problem!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 47: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Grades and classroom seating

I Classroom demonstration

I Assign students random numbers as “grades”

I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand

I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)

I This is a pure multiple comparisons problem!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 48: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Grades and classroom seating

I Classroom demonstration

I Assign students random numbers as “grades”

I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand

I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)

I This is a pure multiple comparisons problem!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 49: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 50: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 51: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 52: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 53: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 54: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 55: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 56: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Beautiful parents have more daughters

I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.

I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)

I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls

I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)

I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)

I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 57: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Comparing test scores across states

I National Assessment of Educational Progress (NAEP)

I Comparing states: which comparisons are statisticallysignificant?

I 50× 49/2: a classic multiple comparions problem!

I Our multilevel inferences

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 58: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Comparing test scores across states

I National Assessment of Educational Progress (NAEP)

I Comparing states: which comparisons are statisticallysignificant?

I 50× 49/2: a classic multiple comparions problem!

I Our multilevel inferences

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 59: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Comparing test scores across states

I National Assessment of Educational Progress (NAEP)

I Comparing states: which comparisons are statisticallysignificant?

I 50× 49/2: a classic multiple comparions problem!

I Our multilevel inferences

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 60: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Comparing test scores across states

I National Assessment of Educational Progress (NAEP)

I Comparing states: which comparisons are statisticallysignificant?

I 50× 49/2: a classic multiple comparions problem!

I Our multilevel inferences

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 61: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Comparing test scores across states

I National Assessment of Educational Progress (NAEP)

I Comparing states: which comparisons are statisticallysignificant?

I 50× 49/2: a classic multiple comparions problem!

I Our multilevel inferences

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 62: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Classical multiple comparisons inferences for NAEP

Ma

ine

(M

E)

Min

ne

sota

(M

N)

Co

nn

ect

icu

t (C

T)

Wis

con

sin

(W

I)

No

rth

Da

kota

(N

D)

Ind

ian

a (

IN)

Iow

a (

IA) ‡

Ma

ssa

chu

sett

s (M

A)

Texa

s (T

X)

Ne

bra

ska

(N

E)

Mo

nta

na

(M

T) ‡

New

Je

rsey

(N

J)‡

Uta

h (

UT

)

Mic

hig

an

(M

I)‡

Pe

nn

sylv

an

ia (

PA

)‡

Co

lora

do

(C

O)

Wa

shin

gto

n (

WA

)

Ve

rmo

nt

(VT

) ‡

Mis

sou

ri (

MO

)

No

rth

Ca

rolin

a (

NC

)

DD

ES

S (

DD

)

Ala

ska

(A

K) ‡

Ore

go

n (

OR

)

We

st V

irg

inia

(W

V)

Do

DD

S (

DO

)

Wyo

min

g (

WY

)

Vir

gin

ia (

VA

)

New

Yo

rk (

NY

) ‡

Ma

ryla

nd

(M

D)

Rh

od

e I

sla

nd

(R

I)

Ke

ntu

cky

(KY

)

Ten

ne

sse

e (

TN

)

Nev

ad

a (

NV

) ‡

Ari

zon

a (

AZ

)‡

Ark

an

sas

(AR

)

Flo

rid

a (

FL

)

Ge

org

ia (

GA

)

De

law

are

(D

E)

Haw

aii

(HI)

New

Mex

ico

(N

M)

So

uth

Ca

rolin

a (

SC

) ‡

Ala

ba

ma

(A

L)

Ca

lifo

rnia

(C

A)

Lo

uis

ian

a (

LA

)

Mis

siss

ipp

i (M

S)

Gu

am

(G

U)

Dis

tric

t o

f C

olu

mb

ia (

DC

)

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 63: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Classical inferences for NAEP: close-up

Ma

ine

(M

E)

Min

ne

sota

(M

N)

Co

nn

ect

icu

t (C

T)

Wis

con

sin

(W

I)

No

rth

Da

kota

(N

D)

Ind

ian

a (

IN)

Iow

a (

IA) ‡

Ma

ssa

chu

sett

s (M

A)

Texa

s (T

X)

Ne

bra

ska

(N

E)

Mo

nta

na

(M

T) ‡

New

Je

rsey

(N

J)‡

Uta

h (

UT

)

Mic

hig

an

(M

I)‡

Pe

nn

sylv

an

ia (

PA

)‡

Co

lora

do

(C

O)

Wa

shin

gto

n (

WA

)

Ve

rmo

nt

(VT

) ‡

Mis

sou

ri (

MO

)

No

rth

Ca

rolin

a (

NC

)

DD

ES

S (

DD

)

Ala

ska

(A

K) ‡

Ore

go

n (

OR

)

We

st V

irg

inia

(W

V)

Do

DD

S (

DO

)

Wyo

min

g (

WY

)

Vir

gin

ia (

VA

)

New

Yo

rk (

NY

) ‡

Ma

ryla

nd

(M

D)

Rh

od

e I

sla

nd

(R

I)

Ke

ntu

cky

(KY

)

Ten

ne

sse

e (

TN

)

Nev

ad

a (

NV

) ‡

Ari

zon

a (

AZ

)‡

Ark

an

sas

(AR

)

Flo

rid

a (

FL

)

Ge

org

ia (

GA

)

De

law

are

(D

E)

Haw

aii

(HI)

New

Mex

ico

(N

M)

So

uth

Ca

rolin

a (

SC

) ‡

Ala

ba

ma

(A

L)

Ca

lifo

rnia

(C

A)

Lo

uis

ian

a (

LA

)

Mis

siss

ipp

i (M

S)

Gu

am

(G

U)

Dis

tric

t o

f C

olu

mb

ia (

DC

)

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

ME

MN

C T

WI

ND

IN

IA

MA

T X

NE

MT

NJ

UT

MI

PA

C O

WA

V T

MO

NC

DD

AK

OR

WV

DO

WY

VA

NY

MD

R I

K Y

T N

NV

AZ

AR

F L

G A

DE

HI

NM

S C

AL

C A

LA

MS

G U

DC

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 64: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Multilevel inferences for NAEP: close-up

Ma

ine

M

inn

eso

ta

Co

nn

ect

icu

t W

isco

nsi

n

No

rth

Da

kota

In

dia

na

Io

wa

M

ass

ach

use

tts

Te

xas

Ne

bra

ska

M

on

tan

a

Ne

w J

ers

ey

Uta

h

Mic

hig

an

P

en

nsy

lva

nia

C

olo

rad

o

Wa

shin

gto

n

Ve

rmo

nt

Mis

sou

ri

No

rth

Ca

rolin

a

Ala

ska

O

reg

on

W

est

Vir

gin

ia

Wyo

min

g

Vir

gin

ia

Ne

w Y

ork

M

ary

lan

d

Rh

od

e I

sla

nd

K

en

tuck

y T

en

ne

sse

e

Ne

vad

a

Ari

zon

a

Ark

an

sas

Flo

rid

a

Ge

org

ia

De

law

are

H

aw

aii

Ne

w M

exi

co

So

uth

Ca

rolin

a

Ala

ba

ma

C

alif

orn

ia

Lo

uis

ian

a

Mis

siss

ipp

i

C omparis ons of A verage Mathematic s S c ale S c hores forG rade 4 P ublic S c hools in P artic ipating J uris dic tions

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

MS

LA

C A

AL

S C

NM

HI

DE

G A

F L

AR

AZ

NV

T N

K Y

R I

MD

NY

V A

WY

WV

OR

AK

NC

MO

V T

WA

C O

P A

MI

UT

NJ

MT

NE

T X

MA

IA

IN

ND

WI

C T

MN

ME

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 65: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

NAEP: classical vs. multilevel

I Both procedures are algorithmic (“push a button”)

I Both procedures treat 50 states exchangeably

I Multilevel inferences are sharper (more comparisons are“statistically significant”)

I How can this be?

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 66: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

NAEP: classical vs. multilevel

I Both procedures are algorithmic (“push a button”)

I Both procedures treat 50 states exchangeably

I Multilevel inferences are sharper (more comparisons are“statistically significant”)

I How can this be?

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 67: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

NAEP: classical vs. multilevel

I Both procedures are algorithmic (“push a button”)

I Both procedures treat 50 states exchangeably

I Multilevel inferences are sharper (more comparisons are“statistically significant”)

I How can this be?

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 68: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

NAEP: classical vs. multilevel

I Both procedures are algorithmic (“push a button”)

I Both procedures treat 50 states exchangeably

I Multilevel inferences are sharper (more comparisons are“statistically significant”)

I How can this be?

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 69: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

NAEP: classical vs. multilevel

I Both procedures are algorithmic (“push a button”)

I Both procedures treat 50 states exchangeably

I Multilevel inferences are sharper (more comparisons are“statistically significant”)

I How can this be?

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 70: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Something for nothing? A free lunch?

I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50

I Not an issue with NAEP

I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust

I Classical procedure does not learn from the data

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 71: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Something for nothing? A free lunch?

I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50

I Not an issue with NAEP

I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust

I Classical procedure does not learn from the data

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 72: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Something for nothing? A free lunch?

I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50

I Not an issue with NAEP

I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust

I Classical procedure does not learn from the data

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 73: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Something for nothing? A free lunch?

I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50

I Not an issue with NAEP

I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust

I Classical procedure does not learn from the data

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 74: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states

Something for nothing? A free lunch?

I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50

I Not an issue with NAEP

I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust

I Classical procedure does not learn from the data

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 75: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Message from the examples

I Classical multiple comparisons corrections don’t seem soimportant when we fit hierarchical models

I But they can be crucial for classical comparisons

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 76: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Message from the examples

I Classical multiple comparisons corrections don’t seem soimportant when we fit hierarchical models

I But they can be crucial for classical comparisons

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 77: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Message from the examples

I Classical multiple comparisons corrections don’t seem soimportant when we fit hierarchical models

I But they can be crucial for classical comparisons

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 78: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Statistical framework

I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)

I Comparisons have the form, θj − θk .

I For simplicity, suppose data come from J separate experiments

I Type S errors

I Multilevel modeling as a solution to the multiple comparisonsissue

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 79: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Statistical framework

I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)

I Comparisons have the form, θj − θk .

I For simplicity, suppose data come from J separate experiments

I Type S errors

I Multilevel modeling as a solution to the multiple comparisonsissue

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 80: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Statistical framework

I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)

I Comparisons have the form, θj − θk .

I For simplicity, suppose data come from J separate experiments

I Type S errors

I Multilevel modeling as a solution to the multiple comparisonsissue

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 81: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Statistical framework

I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)

I Comparisons have the form, θj − θk .

I For simplicity, suppose data come from J separate experiments

I Type S errors

I Multilevel modeling as a solution to the multiple comparisonsissue

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 82: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Statistical framework

I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)

I Comparisons have the form, θj − θk .

I For simplicity, suppose data come from J separate experiments

I Type S errors

I Multilevel modeling as a solution to the multiple comparisonsissue

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 83: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Statistical framework

I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)

I Comparisons have the form, θj − θk .

I For simplicity, suppose data come from J separate experiments

I Type S errors

I Multilevel modeling as a solution to the multiple comparisonsissue

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 84: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 85: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 86: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 87: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 88: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 89: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 90: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 91: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 92: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 93: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 94: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Type S (sign) errors

I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk

I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk

I But I make errors all the time!

I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)

I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)

I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 95: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 96: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 97: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 98: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 99: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 100: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 101: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 102: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 103: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 104: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 105: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Multilevel (hierarchical) modeling

I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:

I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking

comparisonsI Very few claims with confidence (far fewer than 5%)

I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed

I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 106: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Conclusions

I “Multiple comparisons” is a real concern, but . . .

I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider

I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence

I Learn from the data how much to adjust

I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 107: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Conclusions

I “Multiple comparisons” is a real concern, but . . .

I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider

I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence

I Learn from the data how much to adjust

I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 108: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Conclusions

I “Multiple comparisons” is a real concern, but . . .

I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider

I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence

I Learn from the data how much to adjust

I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 109: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Conclusions

I “Multiple comparisons” is a real concern, but . . .

I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider

I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence

I Learn from the data how much to adjust

I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 110: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Conclusions

I “Multiple comparisons” is a real concern, but . . .

I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider

I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence

I Learn from the data how much to adjust

I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 111: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Conclusions

I “Multiple comparisons” is a real concern, but . . .

I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider

I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence

I Learn from the data how much to adjust

I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 112: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Further thoughts inspired by the workshop: part 1

I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)

I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.

I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 113: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Further thoughts inspired by the workshop: part 1

I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)

I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.

I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 114: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Further thoughts inspired by the workshop: part 1

I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)

I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.

I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 115: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Further thoughts inspired by the workshop: part 1

I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)

I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.

I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.

Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 116: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Further thoughts inspired by the workshop: part 2

I The key concern about multiple comparisons is when the noise level ishigh, so that classical Type S error rates will be high. Having manycomparisons (100, or 1000, or whatever) is not a problem at all in thecontext of multilevel modeling. This gets back to our willingness to bewrong occasionally. If we are testing 1000 comparisons, and there’s reallynothing much going on, then the multilevel analysis will do lots ofpooling, and the result is to make very few claims with confidence.

I For this presentation, I’d been thinking about the sort of examples wherethe differences are unquestionably real, and the key concern is Type Serrors: for example, saying that School A is better than School B, whenSchool B is really better than School A. But at the meeting, there waslots of discussion of examples where the differences might actually bevery close to zero: for example, comparing different educationalinterventions, none of which might be very effective. Here we would wantto be thinking about “Type M” (magnitude) errors: saying that atreatment effect is near zero when it is actually large, or saying that it’slarge when it’s near zero.Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 117: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Further thoughts inspired by the workshop: part 2

I The key concern about multiple comparisons is when the noise level ishigh, so that classical Type S error rates will be high. Having manycomparisons (100, or 1000, or whatever) is not a problem at all in thecontext of multilevel modeling. This gets back to our willingness to bewrong occasionally. If we are testing 1000 comparisons, and there’s reallynothing much going on, then the multilevel analysis will do lots ofpooling, and the result is to make very few claims with confidence.

I For this presentation, I’d been thinking about the sort of examples wherethe differences are unquestionably real, and the key concern is Type Serrors: for example, saying that School A is better than School B, whenSchool B is really better than School A. But at the meeting, there waslots of discussion of examples where the differences might actually bevery close to zero: for example, comparing different educationalinterventions, none of which might be very effective. Here we would wantto be thinking about “Type M” (magnitude) errors: saying that atreatment effect is near zero when it is actually large, or saying that it’slarge when it’s near zero.Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models

Page 118: Why we (usually) don't worry about multiple …gelman/presentations/multiple...Why we (usually) don’t worry about multiple comparisons Andrew Gelman, Jennifer Hill, and Masanao Yajima

What is the multiple comparisons problem?Why don’t I (usually) care?

Some storiesStatistical framework and multilevel modeling

Message from the examplesStatistical frameworkConclusionsFurther thoughts

Further thoughts inspired by the workshop: part 2

I The key concern about multiple comparisons is when the noise level ishigh, so that classical Type S error rates will be high. Having manycomparisons (100, or 1000, or whatever) is not a problem at all in thecontext of multilevel modeling. This gets back to our willingness to bewrong occasionally. If we are testing 1000 comparisons, and there’s reallynothing much going on, then the multilevel analysis will do lots ofpooling, and the result is to make very few claims with confidence.

I For this presentation, I’d been thinking about the sort of examples wherethe differences are unquestionably real, and the key concern is Type Serrors: for example, saying that School A is better than School B, whenSchool B is really better than School A. But at the meeting, there waslots of discussion of examples where the differences might actually bevery close to zero: for example, comparing different educationalinterventions, none of which might be very effective. Here we would wantto be thinking about “Type M” (magnitude) errors: saying that atreatment effect is near zero when it is actually large, or saying that it’slarge when it’s near zero.Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models