may 2, 2017 bff 4- mayo comments on reidthis takes me to my last point: an irony about today’s...
TRANSCRIPT
<1>
FusionConfusion?CommentsonNancyReid:“BFFFour–AreweConverging?”
DeborahG.Mayo
TheFourthBayesian,FiducialandFrequentistWorkshop(BFF4):HarvardUniversity
May2,2017
<2>
I’mdelightedtobepartofaworkshoplinkingstatisticsandphilosophyofstatistics!Ithanktheorganizersforinvitingme.NancyReid’s“BFFFour–AreweConverging?”givesnumerousavenuesfordiscussionShezeroesinonobstaclestofusion:Confusionordisagreementonthenatureofprobabilityanditsuseinstatisticalinference
<3>
FromNancyReid:Natureofprobabilityprobabilitytodescribephysicalhaphazardvariability• probabilitiesrepresentfeaturesofthe“real”worldinidealizedform
• subjecttoempiricaltestandimprovement• conclusionsofstatisticalanalysisexpressedintermsofinterpretableparameters
• enhancedunderstandingofthedatageneratingprocessprobabilitytodescribetheuncertaintyofknowledge• measuresrational,supposedlyimpersonal,degreeofbeliefgivenrelevantinformation(Jeffreys)
• measuresaparticularperson’sdegreeofbelief,subjecttypicallytosomeconstraintsofself-consistency…
• oftenlinkedwithpersonaldecision-making
<4>
• Asiscommon,shelabelsthesecond“epistemological”
Butakeyquestionformeis:what’srelevantforanormativeepistemology,foranaccountofwhat’swarranted/unwarrantedtoinfer
<5>
Reidquiterightlyasks:• inwhatsenseareconfidencedistributionfunctions,significancefunctions,structuralorfiducialprobabilitiestobeinterpreted?
• empirically?degreeofbelief?
• literatureisnotveryclear
<6>
Reid:Wemayavoidtheneedforadifferentversionofprobabilitybyappealtoanotionofcalibration(Cox2006,Reid&Cox2015)• Thisismycentralfocus
Iapproachthisindirectly,withanalogybetweenphilosophyofstatisticsandstatistics
<7>
Carnap:BayesiansasPopper:Frequentists(N-P/Fisher)Can’tsolveinductionbutcanbuildlogicsofinductionorconfirmationtheories(e.g.,Carnap1962).• Defineaconfirmationrelation:C(H,e)(,ratherthan|)• logicalprobabilitiesdeducedfromfirstorderlanguages• tomeasurethe”degreeofimplication”orconfirmationthateaffordsH(syntactical)
<8>
Problems
• Languagestoorestricted• Therewasacontinuumofinductivelogics(triedtorestrictvia“inductiveintuition”)
• Howcanaprioriassignmentsofprobabilityberelevanttoreliability?(“guidetolife”)
• Fewphilosophersofsciencearelogicalpositivists,butthehankeringforalogicofinductionremainsinsomequarters
<9>
Popper:“Inoppositionto[the]inductivistattitude,IassertthatC(H,e)mustnotbeinterpretedasthedegreeofcorroborationofHbye,unlessereportstheresultsofoursincereeffortstooverthrowH.”(Popper1959,418)“Therequirementofsinceritycannotbeformalized--“(ibid.)“Observationsorexperimentscanbeacceptedassupportingatheory(orahypothesis,orascientificassertion)onlyiftheseobservationsorexperimentsareseveretestsofthetheory–orinotherwords,onlyiftheyresultfromseriousattemptstorefutethetheory.”(Popper1994,89)-neversuccessfullyformulatedthenotion
<10>
IanHacking(1965)givesalogicofinductionthatdoesnotrequirepriors,basedon(Barnard,Royall,Edwards)“LawofLikelihood”:xsupporthypothesisH1morethanH0if,Pr(x;H1)>Pr(x;H0)(i.e.,ifthelikelihoodratioLR>1).
GeorgeBarnard,“therealwaysissucharivalhypothesisviz.,thatthingsjusthadtoturnoutthewaytheyactuallydid”(1972,129).
Pr(LRinfavorofH1overH0;H0)=high.
<11>
Neyman-Pearson: “Inordertofixalimitbetween‘small’and‘large’valuesof[thelikelihoodratio]wemustknowhowoftensuchvaluesappearwhenwedealwithatruehypothesis.”(PearsonandNeyman1967,106)
SamplingdistributionofLRAcrucialcriticisminstatisticalfoundations
<12>
Instatistics:
“Samplingdistributions,significancelevels,power,alldependonsomethingmore[thanthelikelihoodfunction]–somethingthatisirrelevantinBayesianinference–namelythesamplespace.”(Lindley1971,436)Oncethedataareinhand:InferenceshouldfollowtheLikelihoodPrinciple(LP):Inphilosophy(R.RosenkrantzdefendingtheLP):
“TheLPimplies…theirrelevanceofpredesignation,ofwhetherahypothesiswasthoughtofbeforehandorwasintroducedtoexplainknowneffects.”(Rosenkrantz1977,122)(don’tmixdiscoverywithjustification)
<13>
ProbabilismvsPerformance
• Areyoulookingforawaytoassigndegreeofbelief,confirmation,supportinahypothesis–consideredepistemological
• Ortoensurelong-runreliabilityofmethods,coverageprobabilities(viathesamplingdistribution)–consideredonlyforlong-runbehavior,acceptancesampling
<14>
Werequireathirdrole:• Probativism(severe-testing).Toassessandcontrolerroneousinterpretationsofdata,post-data
Theproblemswithselectivereporting(Fisher)non-novel-data(Popper),arenotproblemsaboutlong-runs—It’sthatwecannotsayaboutthecaseathandthatithasdoneagoodjobofavoidingthesourcesofmisinterpretation.
<15>
IanHacking:“thereisnosuchthingasalogicofstatisticalinference”(1980,145)
ThoughI’mresponsibleformuchofthecriticism….“InowbelievethatNeyman,Peirce,andBraithwaitewereontherightlinestofollowintheanalysisofinductivearguments”• Probabilityenterstoqualifyaclaiminferred,itreportsthemethod’scapabilitiestocontrolandalertustoerroneousinterpretations(errorprobabilities)
• Assigningprobabilitytotheconclusionratherthanthemethodis“foundedonafalseanalogywithdeductivelogic”(Hacking,141).–he’sconvincedbyPeirce
<16>
Theonlytwowhoareclearonthefalseanalogy:Fisher(1935,54):
“Indeductivereasoningallknowledgeobtainableisalreadylatentinthepostulates...Theconclusionsarenevermoreaccuratethanthedata.Ininductivereasoning..[t]heconclusionsnormallygrowmoreandmoreaccurateasmoredataareincluded.Itshouldneverbetrue,thoughitisstilloftensaid,thattheconclusionsarenomoreaccuratethanthedataonwhichtheyarebased.”
Peirce(“TheprobabilityofInduction”1878):
“Inthecaseofanalytic[deductive]inferenceweknowtheprobabilityofourconclusion(ifthepremisesaretrue),butinthecaseofsynthetic[inductive]inferencesweonlyknowthedegreeoftrustworthinessofourproceeding.”
<17>
NeymanandHisPerformance
YoucouldsayNeymangetshisperformanceideatryingtoclarifyFisher’sfiducialintervals• NeymanthoughthisconfidenceintervalswerethesameasFisher’sfiducialintervals.• Ina(1934)paper(togeneralizefiduciallimits),Neymansaidaconfidencecoefficientrefersto“theprobabilityofourbeingrightwhenapplyingacertainrule”formakingstatementssetoutinadvance.(623)• Fisherwashighlycomplimentary:Neyman“hadeveryreasontobeproudofthelineofargumenthehaddevelopedforitsperfectclarity.”(FishercommentinNeyman1934,618)
<18>
Neymanthinkshe’sclarifyingFisher’s(1936,253)equivocalreferencetothe“aggregateofallsuchstatements…”.[1]
“Thisthenisadefiniteprobabilitystatementaboutthe
unknownparameter…”(Fisher1930,533)
<19>
It’sinterestingtoo,tohearNeyman’sresponsetoCarnap’scriticismof“Neyman’sfrequentism”
Neyman: “Iamconcernedwiththeterm‘degreeofconfirmation’introducedbyCarnap.…[if]theapplicationofthelocallybestone-sidedtest…failedtorejectthe[test]hypothesis…“(Neyman1955,40)Thequestionis:doesafailuretorejectthehypothesisconfirmit?
AsampleX=(X1,…,Xn)eachXiisNormal, N(μ,σ2),(NIID),σ assumedknown;
H0:μ≤μ0againstH1:μ>μ0.
TestfailstorejectH0,d(x0)≤cα.
<20>
Carnapsaysyes…
Neyman:“….theattitudedescribedisdangerous.…thechanceofdetectingthepresence[ofdiscrepancyδfromH0],whenonly[thisnumberof]observationsareavailable,isextremelyslim,evenif[δispresent].”(Neyman1955,41)“Thesituationwouldhavebeenradicallydifferentifthepowerfunction…weregreaterthan…0.95.”(ibid.)
Merelysurvivingthestatisticaltestistooeasy,occurstoofrequently,evenwhenH0 isfalse.
<21>
Apost-dataanalysisisevenbetter*:
MayoandCox2006(“Frequentistprincipleofevidence”):
“FEV:insignificantresult:AmoderateP-valueisevidenceoftheabsenceofadiscrepancyδfromH0,onlyifthereisahighprobability(1–c)thetestwouldhavegivenaworsefitwithH0(i.e.,d(X)>d(x0))wereadiscrepancyδtoexist.”(83-4)
IfPr(d(X)>d(x0);μ=μ0+δ)ishigh
d(X)≤d(x0);
infer:anydiscrepancyfromμ0<δ[Infer:µ<CIu)
(*severityfor“acceptance”:Mayo&Spanos2006/2011)
<22>
Howtojustifydetachingtheinference?
Rubbingoff:Theprocedureisrarelywrong,therefore,theprobabilityitiswronginthiscaseislow.
What’srubbedoff?
(couldbeaprobabilismoraperformance)
Bayesianepistemologists:
(Havingnootherrelevantinformation):Arationaldegreeofbelieforepistemicprobabilityrubsoff
Attachingtheprobabilitytotheclaimdiffersfromareportofwell-testednessoftheclaim
<23>
SevereProbingReasoning
Thereasoningoftheseveretestingtheoristiscounterfactual:
H: μ≤𝑥0+1.96σx
(i.e.,μ≤CIu)
Hpassesseverelybecausewerethisinferencefalse,andthetruemeanμ>CIuthen,veryprobably,wewouldhaveobservedalargersamplemean.(Idon’tsaddleCoxwithmytake,norPopper)
<24>
HowWellTested(Corroborated,Probed)≠HowProbableWecanbuilda“logic”forseverity(itwon’tbeprobability)• bothCand~Ccanbepoorlytested• lowseverityisnotjustalittlebitofevidence,butbadornoevidence
• Formalerrorprobabilitiesmayservetoquantifyprobativenessorseverityoftests(foragiveninference),theydonotautomaticallygivethis-mustberelevant
<25>
WhatNancyReid’spapergotmethinkingaboutisthecalibrationpoint:
Here’sthelongerquote:“Wemayavoidtheneedforadifferentversionofprobabilitybyappealtoanotionofcalibration,asmeasuredbythebehaviourofaprocedureunderhypotheticalrepetition.Thatis,westudyassessinguncertainty,aswithothermeasuringdevices,byassessingtheperformanceofproposedmethodsunderhypotheticalrepetition.Withinthisschemeofrepetition,probabilityisdefinedasahypotheticalfrequency.”(ReidandCox2015,295)
<26>
Notionsofcalibrationalsovary!
(1) If we calibrate p-values by a Bayes factor or other probabilism, p-values exaggerate evidence (2) If we calibrate Bayes factors by performance or severity they exaggerate what’s warranted to infer
“dependsonone’sphilosophyofstatistics”Greenland,Senn,Rothman,Carlin,Poole,Goodman,Altman(2016,342).
<27>
Notionsofcalibrationalsovary!
(1)If we calibrate p-values by a Bayes factor or other probabilism, p-values exaggerate evidence (2) If we calibrate Bayes factors by performance or severity, they exaggerate what’s warranted to infer
“dependsonone’sphilosophyofstatistics,”Greenland,Senn,Rothman,Carlin,Poole,Goodman,Altman(2016,342).Reid:• itisunacceptableifaprocedureyieldinghigh-probabilityregionsinsomenon-frequencysensearepoorlycalibrated
Iagree.Itakethisascallingforthesecond(2),frequentist,calibration
<28>
Thistakesmetomylastpoint:anironyabouttoday’s‘replicationcrisis’Insomecasesit’sthoughtBigDatafoistedstatisticsonfieldsunfamiliarwithitsdangers,andReiddiscussessomefoibles
Alotofconsciousness-raisingisgoingonMorehand-wringingthaneverregardingcherry-picking,selectioneffects(p-hacking,significanceseeking)R.A. Fisher: “it’s easy to lie with statistics by selective reporting”
(1955, p. 75)—new names, same problem
<29>
Returnstoaquestionfrombackwhenthepossibilityofalogicofinductionwasstillviable:can’tdataspeakforthemselves?Preregistrationcallsareeverywhere:“Authorsmustdecidetheruleforterminatingdatacollectionbeforedatacollectionbeginsandreportthisruleinthearticle.”(Simmons,Nelson,andSimonsohn2011,1362)Atthesametime…
“UseoftheBayesfactorgivesexperimentersthefreedomtoemployoptionalstoppingwithoutpenalty.(Infact,Bayesfactorscanbeusedinthecompleteabsenceofasamplingplan…)”(Bayarri,Benjamin,Berger,Sellke2016,100)
<30>
WhatItakeawayfromNancyReid’stalkis:ifwedon’tknowwhatwemeanbyanaccount“works”wecan’ttellhowtocalibrate
<31>
Intheseveretestingview:Inorderforacalibrationtoberelevanttonormativeepistemology,thatistowhatiswarrantedtoinfer,(what’swellandpoorlytested)1. Itmustbedirectlyaffectedbyselectioneffects(cherry
picking,multipletesting,stoppingrules)
2. enabletestingassumptions3. enablestatisticalfalsification.
Pointstotheneedforfurtherphilosophical-statisticalinteraction
<32>
PhilosophyofInductive/StatisticalInference
InductiveLogics Falsification,testingaccounts
CarnapC(H,e),Hacking Popper
ParallelsinFormalStatistics(goesmuchfurther)
BayesianandLikelihoodistaccounts
Probability:toassigndegreeofconfirmation,support,belief(posteriororcomparative)
Probabilisms
Fiducial?
Fisherian,Neyman-Pearsonfrequentistmethods:
Probability:(a)toensurereliableperformance
(b)severityoftestsprobativeness
Fiducial?
<33>
[1](endnote)
<34>
REFERENCES
Barnard,G.(1972).“TheLogicofStatisticalInference(reviewof“TheLogicofStatisticalInference”byIanHacking).”BritishJournalforthePhilosophyofScience23(2):123-132.
Bayarri,M.,Benjamin,D.,Berger,J.,Sellke,T.(2016).“RejectionOddsandRejectionRatios:AProposalforStatisticalPracticeinTestingHypotheses."JournalofMathematicalPsychology72:90-103.
Berger,J.O.andWolpert,R.(1988).TheLikelihoodPrinciple.2nded.Vol.6.LectureNotes-MonographSeries.Hayward,California:InstituteofMathematicalStatistics.
Carnap,R.(1962).LogicalFoundationsofProbability.2nded.Chicago:UniversityofChicagoPress.
Cox,D.R.(2006).PrinciplesofStatisticalInference.Cambridge:CambridgeUniversityPress.Fisher,R.A.(1930).“InverseProbability.”MathematicalProceedingsoftheCambridge
PhilosophicalSociety26(4):528-535.Fisher,R.A.(1935).“TheLogicofInductiveInference.”JournaloftheRoyalStatisticalSociety
98(1):39–82.Fisher,R.A.(1936).“UncertainInference.”ProceedingsoftheAmericanAcademyofArtsand
Sciences71:248-258.Fisher,R.A.(1955).“StatisticalMethodsandScientificInduction.”JournaloftheRoyal
StatisticalSociety,SeriesB(Methodological)17(1):69–78.Hacking,I.(1965).LogicofStatisticalInference.Cambridge:CambridgeUniversityPress.Hacking,I.(1972).“Review:Likelihood.”BritishJournalforthePhilosophyofScience23(2):
132-7.Hacking,I.(1980).“TheTheoryofProbableInference:Neyman,PeirceandBraithwaite,”in
Mellor,D.(ed,),pp.141–60.Science,BeliefandBehavior:EssaysinHonourofR.B.Braithwaite.Cambridge:CUP.
<35>
Jeffreys,H.(1939).TheoryofProbability.Oxford:OxfordUniversityPress.Lindley,D.(1971).“TheEstimationofManyParameters.”InFoundationsofStatistical
Inference,editedbyV.P.GodambeandD.A.Sprott,435–455.Toronto:Holt,RinehartandWinston.
Mayo,D.G.(1996).ErrorandtheGrowthofExperimentalKnowledge.ScienceandItsConceptualFoundation.Chicago:UniversityofChicagoPress.
Mayo,D.G.(2014).“OntheBirnbaumArgumentfortheStrongLikelihoodPrinciple(withdiscussion).”StatisticalScience29(2):227-39,261-6.
Mayo,D.G.(2016).“Don'tThrowOuttheErrorControlBabywiththeBadStatisticsBathwater:ACommentary”onWasserstein,R.L.&Lazar,N.A.2016,“TheASA'sStatementonp-Values:Context,Process,andPurpose.”TheAmericanStatistician,vol.70,no.2,supplementalmaterials.
Mayo,D.G.andCox,D.R.(2006)."FrequentistStatisticsasaTheoryofInductiveInference,"inOptimality:TheSecondErichL.LehmannSymposium(ed.J.Rojo),LectureNotes-Monographseries,InstituteofMathematicalStatistics(IMS)49:77-97.
Mayo, D. G. and Spanos, A. (2006). "Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction," British Journal of Philosophy of Science, 57: 323-357.
Mayo,D.G.andSpanos,A.(2011).“ErrorStatistics,”inBandyopadhyay,P.andForster,M.(eds.)pp.152–198.PhilosophyofStatistics,Vol.7,HandbookofthePhilosophyofScience.TheNetherlands:Elsevier.
Neyman,J.(1930).“Methodesnouvellesdeverificationdeshypotheses.”ComptRendPremierCongrMathPaysSlaves:355-366.
Neyman,J.(1934).“OntheTwoDifferentAspectsoftheRepresentativeMethod:TheMethodofStratifiedSamplingandtheMethodofPurposiveSelection.”EarlyStatisticalPapersofJ.
<36>
Neyman:98-141.[Originallypublished(1934)inTheJournaloftheRoyalStatisticalSociety97(4):558-625.]
Neyman,J.(1955).“TheProblemofInductiveInference.”CommunicationsonPureandAppliedMathematics8(1):13–46.
Pearson,E.andNeyman,J.(1967).“OntheProblemofTwoSamples.”InJointStatisticalPapers,byJ.NeymanandE.S.Pearson,99-115(Berkeley:UniversityofCaliforniaPress).FirstpublishedinBull.Acad.Pol.Sci(1930):73-96.
Peirce,C.S.(1931).CollectedPapersofCharlesSandersPeirce,Hartsthorne,CandWeiss,P.(eds.),6vols.Cambridge:HarvardUniversityPress.
Popper,K.(1959).TheLogicofScientificDiscovery.NewYork:BasicBooks.Popper,K.(1994).TheMythoftheFramework:InDefenseofScienceandRationality.(ed.M.A.
Notturno).London&NewYork:Routledge.Reid,C.(1997).Neyman.NewYork:SpringerScience&BusinessMedia.Reid,N.&Cox,D.R.(2015)."OnSomePrinciplesofStatisticalInference."International
StatisticalReview83(2):293-308.Rosenkrantz,R.(1977).Inference,MethodandDecision:TowardsaBayesianPhilosophyof
Science.Dordrecht,TheNetherlands:D.Reidel.Royall,R.(1997).StatisticalEvidence:ALikelihoodParadigm.ChapmanandHall,CRCPress.Sellke,T.,Bayarri,M.&Berger,J.O.(2001).“CalibrationofρValuesforTestingPrecise
Hypotheses.”TheAmericanStatistician55(1):62-71.Simmons,J.Nelson,L.andSimonsohn,U.(2011).“False-PositivePsychology:
UndisclosedFlexibilityinDataCollectionandAnalysisAllowPresentingAnythingasSignificant.”Psych.Sci.22(11):1359-1366.