# how likely is simpson’s paradox

Post on 06-Aug-2015

11 views

Category:

## Documents

0 download

Embed Size (px)

TRANSCRIPT

HowLikelyisSimpsonsParadox?Marios G.PavlidesMichaelD.PerlmanJanuary9,2009AbstractWhatproportionofall2 2 2contingencytablesexhibitSimp-sonsParadox? Anapproximateanswerisobtainedforlargesamplesizes and extended to 22 tables. Several conditional probabilitiesoftheoccurrenceofSimpsonsParadoxarealsoderived. GiventhattheobservedcellfrequenciessatisfyaSimpsonreversal,theposteriorprobability that the population parameters satisfy the same reversal isobtained. ThisBayesiananalysisisappliedtothewellknownSimp-sonreversal of the 19951997 battingaverages of DerekJeter andDavidJustice.Department of Mechanical Engineering, Frederick University Cyprus, Box 24729, 1303Nicosia,CYPRUS.(email: m.pavlides@frederick.ac.cy)DepartmentofStatistics,UniversityofWashington,Box354322,Seattle,WA98195,USA.(email: michael@stat.washington.edu)1Keywordsandphrases: Simpson, paradox, reversal, Bayesian, Dirich-let,distribution,multinomial1 IntroductionThe celebrated Simpsons Paradox occurs when two random eventsA and Bare conditionally positively (negatively) correlated given a third event C, andalsogivenitscomplementC,butare unconditionallynegatively(positively)correlated. Thatis,eitherP(A B| C) P(A | C) P(B| C), (1.1)P(A B|C) P(A |C) P(B|C), (1.2)P(A B) P(A) P(B), (1.3)withatleastoneinequalitystrict, orthethreeinequalitiesarereplacedbytheiropposites. Weshall referto(1.1)-(1.3)asapositiveSimpsonreversal,andtotheiroppositesasanegativeSimpsonreversal.As anexample of this phenomenon, JuliousandMullee (1994) reportastudyof twotreatments for kidneystones inwhichthe rst treatmentwasmoreeectiveforbothlargestonesandsmallstonesbutappearedlesseectivewhenthedatawereaggregatedoverbothclassesofstones.AlthoughaSimpsonreversal atrstappearsparadoxical, manyexam-plesoccurthroughouttheliterature,especiallyinthesocialandbiomedical2sciences. See, for example, Bickeletal. (1975), Blyth(1972) andtheref-erencesavailable athttp://en.wikipedia.org/wiki/Simpsons_paradox.Thusonemayask: JusthowlikelyisaSimpsonreversal?Weshalladdressseveralversionsofthisquestion.First, if [pijk| i, j, k= 1, 2] are the eight cell probabilitiesin the 2 2 2tablecorrespondingtothedichotomiesAorA,BorB,andCorC,where

i,j,kpijk= 1,then(1.1)-(1.3)canbeexpressedequivalentlyasp111p221 p121p211, (1.4)p112p222 p122p212, (1.5)(p111 + p112)(p221 + p222) (p121 + p122)(p211 + p212). (1.6)Question1.1. Assumethat thearrayof cell probabilities[pijk] israndomanddistributeduniformlyovertheprobabilitysimplexS8_[pijk]pijk 0,

i,j,kpijk= 1_(1.7)(a7dimensional set in R8). What is the probabilitythat [pijk] satisfyapositiveornegativeSimpsonreversal,i.e.,satisfytheinequalities(1.4)-(1.6)ortheiropposites?(Theanswerisabout0.0166seeSection2.)Itfollowsthatif [Mijk| i, j, k=1, 2] arethecell countsand[ pi,j,k] [Mi,j,k/n] the correspondingcell frequenciesin a 2 2 2 contingencytable,3thenfor large n

i,j,kMijk, approximately1.66%of all possible suchfrequencytablesexhibitaSimpsonreversal.InSection3, relatedquestions involvingconditional priorprobabilitiesofaSimpsonreversal areaddressed, suchasthefollowing: Giventhattheevents A and B are negatively correlated (as in (1.3)), what is the conditionalprior probability that they are positivelycorrelated under both CandC(asin(1.1) and(1.2))? For example, if one treatment is less eective thananother intheaggregateover largeandsmall kidneystones, what is thepriorprobabilitythatitisactuallymoreeectiveforbothlargeandsmallkidneystonesconsideredseparately?Moreprecisely:Question1.2. Againassumethat [pijk] is uniformlydistributedoverS8.Giventhat [pijk] satises (1.6), what is the probabilitythat [pijk] satises(1.4)and(1.5)?Conversely,giventhat[pijk] satises (1.4)and(1.5),whatis theprobabilitythat [pijk] satises (1.6)? (Theanswers aresurprisinglyeasytoobtain.)Lastly,ifitisassumedthat[Mijk][pijk] M8(n; [pijk]), (1.8)the multinomial distribution with 8 cells, n total observations, and cell prob-abilities[pijk],thenwecanposethefollowinginferentialquestion.Question1.3. Againassumethat[pijk] isuniformlydistributedoverS8. If4theobservedcell frequencies[ pi,j,k] exhibit aSimpsonreversal, what is theposteriorprobabilitythat[pijk]satisfythesameSimpsonreversal?ThisquestionisaddressedinSection4asaBayesiandecisionproblemunder boththe uniformprior andthe Jereys invariant prior onS8forthemultinomial samplingmodel. Infact, themethodcanbeappliedforall DirichletpriordistributionsonS8, whichclassisrichenoughtoincludeinformativepriors,i.e.,thosethatactuallyreectpriorinformation.Thesethreequestions will alsobeaddressedforthemoregeneral casewherethethird(conditioning)factorChasmorethan2levels. Section4alsocontainsseveralillustrativeexamples,includingananalysisofthewell-known Simpson reversal of the batting averages of baseball stars Derek JeterandDavidJustice. SomerelatedtopicsarediscussedinSection5.Simpsons Paradoxis usuallyattributedtoSimpson(1951). However,accordingtoGood andMittal (1987, p.695), theunderlyingideagoesbackatleastasfarasPearsonetal. (1899)andYule(1903). Good andMittal(1987)attributethetermSimpsonsParadoxtoBlyth(1972).2 The prior probability of Simpsons ParadoxSupposethat factorsAand Beachhave2levelswhilefactor Chas levels, {2, 3, . . . }. As inSection1we coulddenote the 22 table ofpopulation cell probabilities by [pijk]. However, to avoid triple subscripts, weinsteadrepresentpijkbyp4k+2i+j6fori, j=1, 2andk=1, . . . , . Thusa5positiveSimpsonreversal occurswhenp1p4 p2p3, (R+1 )p5p8 p6p7, (R+2 )...p43p4 p42p41, (R+)but_

k=1p4k3__

k=1p4k__

k=1p4k2__

k=1p4k1_, (R++1)with at least one inequalitystrict. A negativeSimpsonreversal occurs whenthese +1 inequalities are replaced by their opposites, which events we denotebyR1 , . . . , R, R+1. Thus:{positiveSimpsonreversal } =+1

k=1R+k , (2.1){negativeSimpsonreversal } =+1

k=1Rk , (2.2){SimpsonsParadox} =_+1

k=1R+k__+1

k=1Rk_. (2.3)Thearrayof cellprobabilitiesp [p1, . . . , p4] for the2 2 tablelies6intheprobabilitysimplexS4:=_p p1 0, . . . , p4 0,p1 + + p4= 1_, (2.4)a(4 1)-dimensionalsetinR4. ThegeneralDirichletdistributiononS4,denotedbyD4(),hasprobabilitydensityfunctionproportionalto4

r=1pr1r, (2.5)where =(1, . . . , 4) andeachr>0(cf. Hartigan(1983, p. 97)).Because

+1k=1R+kand

+1k=1RkareessentiallydisjointeventsinS4,() := Pr (SimpsonsParadoxunderD4()) (2.6)= P_+1

k=1R+k_+ P_+1

k=1Rk_. (2.7)=: +() + (). (2.8)The followingclassical representation(cf. Wilks (1962, 7.7) or Hartigan(1983, p. 97)) of the general Dirichlet distributioninterms of indepen-dentgammarandomvariablessimplies thecalculationof probabilitiesofeventsinS4thatinvolveR+1 , . . . , R+ , R++1and/orR1 , . . . , R, R+1, suchas

+1k=1R+kand

+1k=1Rk .Proposition2.1.Let V1, . . . , V4be independent random variables with Vr7Gamma(r, 1)forr = 1, . . . , . Thenp :=_V1

4r=1Vr, . . . ,V4

4r=1Vr_ D4(). (2.9)ThusR+1 , . . . , R+, R++1canbeexpressedintermsofV1, . . . , V4asfollows:V1V4 V2V3, (R+1 )...V43V4 V42V41, (R+)_

k=1V4k3__

k=1V4k__

k=1V4k2__

k=1V4k1_. (R++1)Remark2.1. It follows from this representation that the events R+1 , . . . , R+aremutuallyindependentunderanyDirichletdistributiononS4;thesameistrueforR1 , . . . , R.BysimulatingtheindependentgammarandomvariablesV1, . . . , V4wecan apply Proposition 2.1 to approximate the prior probability of a SimpsonreversalunderanyDirichletdistributionD4()onS4.When = (, . . . , ) with > 0 we write D4() as D4(), a symmetric= exchangeable distribution on S4. Note that D4(1) is the uniform distribu-tion on S4, while D4(0.5) is the Jereys invariant prior for the multinomialmodel(cf. Hartigan(1983,p.100);Davison(2003,p.575)).Wenowapproximatethepriorprobabilityof SimpsonsParadoxunder8D4()forseveral valuesof. HereV1, . . . , V4arei.i.d. Gamma(, 1)ran-domvariables, hence exchangeable, sothepositive andnegative Simpsonreversals

+1k=1R+kand

+1k=1Rkhave the same prior probabilities. Thusfrom(2.7),() := Pr (SimpsonsParadoxunderD4()) (2.10)= 2 P_+1

k=1R+k_(2.11)=: 2+(). (2.12)Table1: 99%condenceintervalsfor()underthesymmetricDirichletpriorD4(). Thecondenceintervalsarebasedon107MonteCarlosimu-lationsofthei.i.d. Gamma(, 1)randomvariablesV1, . . . , V4(see(2.9)).Table1displays99%condenceintervalsfor()basedon107MonteCarlosimulationsofV1, . . . , V4. Thefollowingfeaturesarenoteworthy:9(a) [=1]: Asstatedearlier, thiscasecorrespondstotheuniformpriorontheprobabilitysimplexS4. The99%condenceintervalfor2(1)is0.0166 .00010,sothepriorprobabilityofSimpsonsParadoxina2 2 2 table under the uniform prior is 0.0166 (correct to three signicantgures).(b) [ = 0.5]: ThiscasecorrespondstotheJereysprioronS4. The99%condenceinterval for2(0.5)is0.0267 .00013, sothepriorprobabil-ityofSimpsonsParadoxina2 2 2tableundertheJereyspriorisabout0.0267. Thatthisexceedsthevalue0.0166undertheuniformpriorisexpectedbecausetheJereyspriorputsmoreofitsmassneartheboundaryofS4, wheretheSimpsonreversalstendtobemostpro-nounced. (See,forexample,Blyth(1972,p.365).)(c) Foreachxed, ()decreasesasincreases. Thiscanbeexplainedasfollows. Ifp D4()thenforeachr = 1, . . . , 4,E(pr) =14, (2.13)Var(pr) =4 1(4)2(4 + 1)= O_1_, (2.14)(cf. Wilks (1962, p.179)). Thus asincreases, thedistributionof pmovesawayfromtheboundaryofS4andconcentratesarounditscen-tral point (1/4, . . . , 1/4), wheretheinequalities inR+1 , . . . , R++1andR1 , . . . , R+1become equalities,so the Simpson reversals should become10lesslikely.(d) For each xed , () decreases as increases. This might be expected,because a reversal from positive conditional correlations to one negativeunconditionalcorrelationseemsincreasinglyunlikelyforlarge.Infactitiseasytoshowthat()decreasesatleastexponentiallyfastin: From(2.11),Remark2.1.,andtheexchangeabilityofV1, . . . , V4,() 2 P_

k=1R+k_=121 . (2.15)Weconjecturethat()decreasesatanexponentialrate.Conjecture2.1. Foreach > 0thereexistsk1() > 0suchthat() 2() exp_k1()_21__, = 2, 3, . .