may 2, 2017 bff 4- mayo comments on reidthis takes me to my last point: an irony about today’s...

<1>

FusionConfusion?CommentsonNancyReid:“BFFFour–AreweConverging?”

DeborahG.Mayo

TheFourthBayesian,FiducialandFrequentistWorkshop(BFF4):HarvardUniversity

May2,2017

<2>

I’mdelightedtobepartofaworkshoplinkingstatisticsandphilosophyofstatistics!Ithanktheorganizersforinvitingme.NancyReid’s“BFFFour–AreweConverging?”givesnumerousavenuesfordiscussionShezeroesinonobstaclestofusion:Confusionordisagreementonthenatureofprobabilityanditsuseinstatisticalinference

<3>

FromNancyReid:Natureofprobabilityprobabilitytodescribephysicalhaphazardvariability• probabilitiesrepresentfeaturesofthe“real”worldinidealizedform

• subjecttoempiricaltestandimprovement• conclusionsofstatisticalanalysisexpressedintermsofinterpretableparameters

• enhancedunderstandingofthedatageneratingprocessprobabilitytodescribetheuncertaintyofknowledge• measuresrational,supposedlyimpersonal,degreeofbeliefgivenrelevantinformation(Jeffreys)

• measuresaparticularperson’sdegreeofbelief,subjecttypicallytosomeconstraintsofself-consistency…

• oftenlinkedwithpersonaldecision-making

<4>

• Asiscommon,shelabelsthesecond“epistemological”

Butakeyquestionformeis:what’srelevantforanormativeepistemology,foranaccountofwhat’swarranted/unwarrantedtoinfer

<5>

Reidquiterightlyasks:• inwhatsenseareconfidencedistributionfunctions,significancefunctions,structuralorfiducialprobabilitiestobeinterpreted?

• empirically?degreeofbelief?

• literatureisnotveryclear

<6>

Reid:Wemayavoidtheneedforadifferentversionofprobabilitybyappealtoanotionofcalibration(Cox2006,Reid&Cox2015)• Thisismycentralfocus

Iapproachthisindirectly,withanalogybetweenphilosophyofstatisticsandstatistics

<7>

Carnap:BayesiansasPopper:Frequentists(N-P/Fisher)Can’tsolveinductionbutcanbuildlogicsofinductionorconfirmationtheories(e.g.,Carnap1962).• Defineaconfirmationrelation:C(H,e)(,ratherthan|)• logicalprobabilitiesdeducedfromfirstorderlanguages• tomeasurethe”degreeofimplication”orconfirmationthateaffordsH(syntactical)

<8>

Problems

• Languagestoorestricted• Therewasacontinuumofinductivelogics(triedtorestrictvia“inductiveintuition”)

• Howcanaprioriassignmentsofprobabilityberelevanttoreliability?(“guidetolife”)

• Fewphilosophersofsciencearelogicalpositivists,butthehankeringforalogicofinductionremainsinsomequarters

<9>

Popper:“Inoppositionto[the]inductivistattitude,IassertthatC(H,e)mustnotbeinterpretedasthedegreeofcorroborationofHbye,unlessereportstheresultsofoursincereeffortstooverthrowH.”(Popper1959,418)“Therequirementofsinceritycannotbeformalized--“(ibid.)“Observationsorexperimentscanbeacceptedassupportingatheory(orahypothesis,orascientificassertion)onlyiftheseobservationsorexperimentsareseveretestsofthetheory–orinotherwords,onlyiftheyresultfromseriousattemptstorefutethetheory.”(Popper1994,89)-neversuccessfullyformulatedthenotion

<10>

IanHacking(1965)givesalogicofinductionthatdoesnotrequirepriors,basedon(Barnard,Royall,Edwards)“LawofLikelihood”:xsupporthypothesisH1morethanH0if,Pr(x;H1)>Pr(x;H0)(i.e.,ifthelikelihoodratioLR>1).

GeorgeBarnard,“therealwaysissucharivalhypothesisviz.,thatthingsjusthadtoturnoutthewaytheyactuallydid”(1972,129).

Pr(LRinfavorofH1overH0;H0)=high.

<11>

Neyman-Pearson: “Inordertofixalimitbetween‘small’and‘large’valuesof[thelikelihoodratio]wemustknowhowoftensuchvaluesappearwhenwedealwithatruehypothesis.”(PearsonandNeyman1967,106)

SamplingdistributionofLRAcrucialcriticisminstatisticalfoundations

<12>

Instatistics:

“Samplingdistributions,significancelevels,power,alldependonsomethingmore[thanthelikelihoodfunction]–somethingthatisirrelevantinBayesianinference–namelythesamplespace.”(Lindley1971,436)Oncethedataareinhand:InferenceshouldfollowtheLikelihoodPrinciple(LP):Inphilosophy(R.RosenkrantzdefendingtheLP):

“TheLPimplies…theirrelevanceofpredesignation,ofwhetherahypothesiswasthoughtofbeforehandorwasintroducedtoexplainknowneffects.”(Rosenkrantz1977,122)(don’tmixdiscoverywithjustification)

<13>

ProbabilismvsPerformance

• Areyoulookingforawaytoassigndegreeofbelief,confirmation,supportinahypothesis–consideredepistemological

• Ortoensurelong-runreliabilityofmethods,coverageprobabilities(viathesamplingdistribution)–consideredonlyforlong-runbehavior,acceptancesampling

<14>

Werequireathirdrole:• Probativism(severe-testing).Toassessandcontrolerroneousinterpretationsofdata,post-data

Theproblemswithselectivereporting(Fisher)non-novel-data(Popper),arenotproblemsaboutlong-runs—It’sthatwecannotsayaboutthecaseathandthatithasdoneagoodjobofavoidingthesourcesofmisinterpretation.

<15>

IanHacking:“thereisnosuchthingasalogicofstatisticalinference”(1980,145)

ThoughI’mresponsibleformuchofthecriticism….“InowbelievethatNeyman,Peirce,andBraithwaitewereontherightlinestofollowintheanalysisofinductivearguments”• Probabilityenterstoqualifyaclaiminferred,itreportsthemethod’scapabilitiestocontrolandalertustoerroneousinterpretations(errorprobabilities)

• Assigningprobabilitytotheconclusionratherthanthemethodis“foundedonafalseanalogywithdeductivelogic”(Hacking,141).–he’sconvincedbyPeirce

<16>

Theonlytwowhoareclearonthefalseanalogy:Fisher(1935,54):

“Indeductivereasoningallknowledgeobtainableisalreadylatentinthepostulates...Theconclusionsarenevermoreaccuratethanthedata.Ininductivereasoning..[t]heconclusionsnormallygrowmoreandmoreaccurateasmoredataareincluded.Itshouldneverbetrue,thoughitisstilloftensaid,thattheconclusionsarenomoreaccuratethanthedataonwhichtheyarebased.”

Peirce(“TheprobabilityofInduction”1878):

“Inthecaseofanalytic[deductive]inferenceweknowtheprobabilityofourconclusion(ifthepremisesaretrue),butinthecaseofsynthetic[inductive]inferencesweonlyknowthedegreeoftrustworthinessofourproceeding.”

<17>

NeymanandHisPerformance

YoucouldsayNeymangetshisperformanceideatryingtoclarifyFisher’sfiducialintervals• NeymanthoughthisconfidenceintervalswerethesameasFisher’sfiducialintervals.• Ina(1934)paper(togeneralizefiduciallimits),Neymansaidaconfidencecoefficientrefersto“theprobabilityofourbeingrightwhenapplyingacertainrule”formakingstatementssetoutinadvance.(623)• Fisherwashighlycomplimentary:Neyman“hadeveryreasontobeproudofthelineofargumenthehaddevelopedforitsperfectclarity.”(FishercommentinNeyman1934,618)

<18>

Neymanthinkshe’sclarifyingFisher’s(1936,253)equivocalreferencetothe“aggregateofallsuchstatements…”.[1]

“Thisthenisadefiniteprobabilitystatementaboutthe

unknownparameter…”(Fisher1930,533)

<19>

It’sinterestingtoo,tohearNeyman’sresponsetoCarnap’scriticismof“Neyman’sfrequentism”

Neyman: “Iamconcernedwiththeterm‘degreeofconfirmation’introducedbyCarnap.…[if]theapplicationofthelocallybestone-sidedtest…failedtorejectthe[test]hypothesis…“(Neyman1955,40)Thequestionis:doesafailuretorejectthehypothesisconfirmit?

AsampleX=(X1,…,Xn)eachXiisNormal, N(μ,σ2),(NIID),σ assumedknown;

H0:μ≤μ0againstH1:μ>μ0.

TestfailstorejectH0,d(x0)≤cα.

<20>

Carnapsaysyes…

Neyman:“….theattitudedescribedisdangerous.…thechanceofdetectingthepresence[ofdiscrepancyδfromH0],whenonly[thisnumberof]observationsareavailable,isextremelyslim,evenif[δispresent].”(Neyman1955,41)“Thesituationwouldhavebeenradicallydifferentifthepowerfunction…weregreaterthan…0.95.”(ibid.)

Merelysurvivingthestatisticaltestistooeasy,occurstoofrequently,evenwhenH0 isfalse.

<21>

Apost-dataanalysisisevenbetter*:

MayoandCox2006(“Frequentistprincipleofevidence”):

“FEV:insignificantresult:AmoderateP-valueisevidenceoftheabsenceofadiscrepancyδfromH0,onlyifthereisahighprobability(1–c)thetestwouldhavegivenaworsefitwithH0(i.e.,d(X)>d(x0))wereadiscrepancyδtoexist.”(83-4)

IfPr(d(X)>d(x0);μ=μ0+δ)ishigh

d(X)≤d(x0);

infer:anydiscrepancyfromμ0<δ[Infer:µ<CIu)

(*severityfor“acceptance”:Mayo&Spanos2006/2011)

<22>

Howtojustifydetachingtheinference?

Rubbingoff:Theprocedureisrarelywrong,therefore,theprobabilityitiswronginthiscaseislow.

What’srubbedoff?

(couldbeaprobabilismoraperformance)

Bayesianepistemologists:

(Havingnootherrelevantinformation):Arationaldegreeofbelieforepistemicprobabilityrubsoff

Attachingtheprobabilitytotheclaimdiffersfromareportofwell-testednessoftheclaim

<23>

SevereProbingReasoning

Thereasoningoftheseveretestingtheoristiscounterfactual:

H: μ≤𝑥0+1.96σx

(i.e.,μ≤CIu)

Hpassesseverelybecausewerethisinferencefalse,andthetruemeanμ>CIuthen,veryprobably,wewouldhaveobservedalargersamplemean.(Idon’tsaddleCoxwithmytake,norPopper)

<24>

HowWellTested(Corroborated,Probed)≠HowProbableWecanbuilda“logic”forseverity(itwon’tbeprobability)• bothCand~Ccanbepoorlytested• lowseverityisnotjustalittlebitofevidence,butbadornoevidence

• Formalerrorprobabilitiesmayservetoquantifyprobativenessorseverityoftests(foragiveninference),theydonotautomaticallygivethis-mustberelevant

<25>

WhatNancyReid’spapergotmethinkingaboutisthecalibrationpoint:

Here’sthelongerquote:“Wemayavoidtheneedforadifferentversionofprobabilitybyappealtoanotionofcalibration,asmeasuredbythebehaviourofaprocedureunderhypotheticalrepetition.Thatis,westudyassessinguncertainty,aswithothermeasuringdevices,byassessingtheperformanceofproposedmethodsunderhypotheticalrepetition.Withinthisschemeofrepetition,probabilityisdefinedasahypotheticalfrequency.”(ReidandCox2015,295)

<26>

Notionsofcalibrationalsovary!

(1) If we calibrate p-values by a Bayes factor or other probabilism, p-values exaggerate evidence (2) If we calibrate Bayes factors by performance or severity they exaggerate what’s warranted to infer

“dependsonone’sphilosophyofstatistics”Greenland,Senn,Rothman,Carlin,Poole,Goodman,Altman(2016,342).

<27>

Notionsofcalibrationalsovary!

(1)If we calibrate p-values by a Bayes factor or other probabilism, p-values exaggerate evidence (2) If we calibrate Bayes factors by performance or severity, they exaggerate what’s warranted to infer

“dependsonone’sphilosophyofstatistics,”Greenland,Senn,Rothman,Carlin,Poole,Goodman,Altman(2016,342).Reid:• itisunacceptableifaprocedureyieldinghigh-probabilityregionsinsomenon-frequencysensearepoorlycalibrated

Iagree.Itakethisascallingforthesecond(2),frequentist,calibration

<28>

Thistakesmetomylastpoint:anironyabouttoday’s‘replicationcrisis’Insomecasesit’sthoughtBigDatafoistedstatisticsonfieldsunfamiliarwithitsdangers,andReiddiscussessomefoibles

Alotofconsciousness-raisingisgoingonMorehand-wringingthaneverregardingcherry-picking,selectioneffects(p-hacking,significanceseeking)R.A. Fisher: “it’s easy to lie with statistics by selective reporting”

(1955, p. 75)—new names, same problem

<29>

Returnstoaquestionfrombackwhenthepossibilityofalogicofinductionwasstillviable:can’tdataspeakforthemselves?Preregistrationcallsareeverywhere:“Authorsmustdecidetheruleforterminatingdatacollectionbeforedatacollectionbeginsandreportthisruleinthearticle.”(Simmons,Nelson,andSimonsohn2011,1362)Atthesametime…

“UseoftheBayesfactorgivesexperimentersthefreedomtoemployoptionalstoppingwithoutpenalty.(Infact,Bayesfactorscanbeusedinthecompleteabsenceofasamplingplan…)”(Bayarri,Benjamin,Berger,Sellke2016,100)

<30>

WhatItakeawayfromNancyReid’stalkis:ifwedon’tknowwhatwemeanbyanaccount“works”wecan’ttellhowtocalibrate

<31>

Intheseveretestingview:Inorderforacalibrationtoberelevanttonormativeepistemology,thatistowhatiswarrantedtoinfer,(what’swellandpoorlytested)1. Itmustbedirectlyaffectedbyselectioneffects(cherry

picking,multipletesting,stoppingrules)

2. enabletestingassumptions3. enablestatisticalfalsification.

Pointstotheneedforfurtherphilosophical-statisticalinteraction

<32>

PhilosophyofInductive/StatisticalInference

InductiveLogics Falsification,testingaccounts

CarnapC(H,e),Hacking Popper

ParallelsinFormalStatistics(goesmuchfurther)

BayesianandLikelihoodistaccounts

Probability:toassigndegreeofconfirmation,support,belief(posteriororcomparative)

Probabilisms

Fiducial?

Fisherian,Neyman-Pearsonfrequentistmethods:

Probability:(a)toensurereliableperformance

(b)severityoftestsprobativeness

Fiducial?

<33>

[1](endnote)

<34>

REFERENCES

Barnard,G.(1972).“TheLogicofStatisticalInference(reviewof“TheLogicofStatisticalInference”byIanHacking).”BritishJournalforthePhilosophyofScience23(2):123-132.

Bayarri,M.,Benjamin,D.,Berger,J.,Sellke,T.(2016).“RejectionOddsandRejectionRatios:AProposalforStatisticalPracticeinTestingHypotheses."JournalofMathematicalPsychology72:90-103.

Berger,J.O.andWolpert,R.(1988).TheLikelihoodPrinciple.2nded.Vol.6.LectureNotes-MonographSeries.Hayward,California:InstituteofMathematicalStatistics.

Carnap,R.(1962).LogicalFoundationsofProbability.2nded.Chicago:UniversityofChicagoPress.

Cox,D.R.(2006).PrinciplesofStatisticalInference.Cambridge:CambridgeUniversityPress.Fisher,R.A.(1930).“InverseProbability.”MathematicalProceedingsoftheCambridge

PhilosophicalSociety26(4):528-535.Fisher,R.A.(1935).“TheLogicofInductiveInference.”JournaloftheRoyalStatisticalSociety

98(1):39–82.Fisher,R.A.(1936).“UncertainInference.”ProceedingsoftheAmericanAcademyofArtsand

Sciences71:248-258.Fisher,R.A.(1955).“StatisticalMethodsandScientificInduction.”JournaloftheRoyal

StatisticalSociety,SeriesB(Methodological)17(1):69–78.Hacking,I.(1965).LogicofStatisticalInference.Cambridge:CambridgeUniversityPress.Hacking,I.(1972).“Review:Likelihood.”BritishJournalforthePhilosophyofScience23(2):

132-7.Hacking,I.(1980).“TheTheoryofProbableInference:Neyman,PeirceandBraithwaite,”in

Mellor,D.(ed,),pp.141–60.Science,BeliefandBehavior:EssaysinHonourofR.B.Braithwaite.Cambridge:CUP.

<35>

Jeffreys,H.(1939).TheoryofProbability.Oxford:OxfordUniversityPress.Lindley,D.(1971).“TheEstimationofManyParameters.”InFoundationsofStatistical

Inference,editedbyV.P.GodambeandD.A.Sprott,435–455.Toronto:Holt,RinehartandWinston.

Mayo,D.G.(1996).ErrorandtheGrowthofExperimentalKnowledge.ScienceandItsConceptualFoundation.Chicago:UniversityofChicagoPress.

Mayo,D.G.(2014).“OntheBirnbaumArgumentfortheStrongLikelihoodPrinciple(withdiscussion).”StatisticalScience29(2):227-39,261-6.

Mayo,D.G.(2016).“Don'tThrowOuttheErrorControlBabywiththeBadStatisticsBathwater:ACommentary”onWasserstein,R.L.&Lazar,N.A.2016,“TheASA'sStatementonp-Values:Context,Process,andPurpose.”TheAmericanStatistician,vol.70,no.2,supplementalmaterials.

Mayo,D.G.andCox,D.R.(2006)."FrequentistStatisticsasaTheoryofInductiveInference,"inOptimality:TheSecondErichL.LehmannSymposium(ed.J.Rojo),LectureNotes-Monographseries,InstituteofMathematicalStatistics(IMS)49:77-97.

Mayo, D. G. and Spanos, A. (2006). "Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction," British Journal of Philosophy of Science, 57: 323-357.

Mayo,D.G.andSpanos,A.(2011).“ErrorStatistics,”inBandyopadhyay,P.andForster,M.(eds.)pp.152–198.PhilosophyofStatistics,Vol.7,HandbookofthePhilosophyofScience.TheNetherlands:Elsevier.

Neyman,J.(1930).“Methodesnouvellesdeverificationdeshypotheses.”ComptRendPremierCongrMathPaysSlaves:355-366.

Neyman,J.(1934).“OntheTwoDifferentAspectsoftheRepresentativeMethod:TheMethodofStratifiedSamplingandtheMethodofPurposiveSelection.”EarlyStatisticalPapersofJ.

<36>

Neyman:98-141.[Originallypublished(1934)inTheJournaloftheRoyalStatisticalSociety97(4):558-625.]

Neyman,J.(1955).“TheProblemofInductiveInference.”CommunicationsonPureandAppliedMathematics8(1):13–46.

Pearson,E.andNeyman,J.(1967).“OntheProblemofTwoSamples.”InJointStatisticalPapers,byJ.NeymanandE.S.Pearson,99-115(Berkeley:UniversityofCaliforniaPress).FirstpublishedinBull.Acad.Pol.Sci(1930):73-96.

Peirce,C.S.(1931).CollectedPapersofCharlesSandersPeirce,Hartsthorne,CandWeiss,P.(eds.),6vols.Cambridge:HarvardUniversityPress.

Popper,K.(1959).TheLogicofScientificDiscovery.NewYork:BasicBooks.Popper,K.(1994).TheMythoftheFramework:InDefenseofScienceandRationality.(ed.M.A.

Notturno).London&NewYork:Routledge.Reid,C.(1997).Neyman.NewYork:SpringerScience&BusinessMedia.Reid,N.&Cox,D.R.(2015)."OnSomePrinciplesofStatisticalInference."International

StatisticalReview83(2):293-308.Rosenkrantz,R.(1977).Inference,MethodandDecision:TowardsaBayesianPhilosophyof

Science.Dordrecht,TheNetherlands:D.Reidel.Royall,R.(1997).StatisticalEvidence:ALikelihoodParadigm.ChapmanandHall,CRCPress.Sellke,T.,Bayarri,M.&Berger,J.O.(2001).“CalibrationofρValuesforTestingPrecise

Hypotheses.”TheAmericanStatistician55(1):62-71.Simmons,J.Nelson,L.andSimonsohn,U.(2011).“False-PositivePsychology:

UndisclosedFlexibilityinDataCollectionandAnalysisAllowPresentingAnythingasSignificant.”Psych.Sci.22(11):1359-1366.

may 2, 2017 bff 4- mayo comments on reidthis takes me to my last point: an irony about today’s...

Documents