praccal issues in stascal interpretaon of tevatron datatrj/trjslacstats.pdf · praccal issues in...
TRANSCRIPT
Prac%calIssuesinSta%s%calInterpreta%onofTevatronData
6/5/12 1T.JunkSLACStatsProgress
ThomasR.JunkFermilab
ProgressonSta<s<calIssuesinSearchesConferenceSLACNa<onalAcceleratorLaboratory
June5,2012
• Mul<pleParametersofInterest• HandlingNuisanceParameters• Overbinning,Smoothing,andDistribu<onsthatOughtNottobeSmoothed• Modelvalida<onwithMul<variateAnalyses• ABCDMethods• Look‐Elsewhere
6/5/12 T.JunkSLACStatsProgress 2
2
Protons‐an<protoncollisionsforRunIandRunII
1.96 TeVpps =
Main Injector and Recycler
Antiproton source
Booster
Tevatronringradius=1kmCommissionedin1983
MainInjectorcommissionedforRunII
Recyclerusedasanotheran<protonaccumulator
RunIIendedSep.30,201110W‐1ofanalyzabledata/experiment.500+papers/experimentandcoun<ng!
RunII:
6/5/12 T.JunkSLACStatsProgress 3
3
6/5/12 T.JunkSLACStatsProgress 4
Two(ormore)ParametersofInterest
Fromthe2011PDGSta<s<csReview
h`p://pdg.lbl.gov/2011/reviews/rpp2011‐rev‐sta<s<cs.pdf
Forquo<ngGaussianuncertain<esonsingleparameters.Ellipseisacontourof2ΔlnL=1
Fordisplayingjointes<ma<onofseveralparameters
6/5/12 T.JunkSLACStatsProgress 5
1Dor2DPresenta%on
Parameter1
Parameter2
68%
68%
2ΔlnL=168%
2ΔlnL=2.3
Ipreferwhenshowinga2Dplot,showingthecontourswhichcoverin2D.The2ΔlnL=1contouronlycoversforthe1Dparameters,oneata<me.
6/5/12 T.JunkSLACStatsProgress 6
PRD85,071106PRD83,032009
h`p://www‐cdf.fnal.gov/physics/new/top/2011/WhelDil/index.html
AVarietyofwaystoshow2DFitresults
68%and95%contours
6/5/12 T.JunkSLACStatsProgress 7
HypothesisTes%ngwithTwoParametersofInterest?
Seeforexample,D0’sevidencefort‐channelsingletopproduc<on:Phys.Le`.B705(2011)313‐319
“...usingthelog‐likelihoodapproach...wecomputeforthefirst<methesignificanceofthetqbcrosssec<onindependentlyofanyassump<onontheproduc<onrateoftb.”
Inthisspecificcasethecorrela<onissmall,andthisclaimisn’tsobad.Buttocalculateap‐valuep(λ≥λobs|signal=0),oneneedsasamplespaceofpseudoexperiments,andthusanassump<onofthes‐channelrate.
6/5/12 T.JunkSLACStatsProgress 8
HypothesisTes%ngwithTwoParametersofInterest?
Anotherissue:Avaria<onontheLEEtheme.
s‐channelandt‐channelsingletopeventsaredifferintheirkinema<cdistribu<ons
Forobserva<onandmeasurementofthetotalsingletopproduc<onrateσs+t,we’dtrainourMVA’sontheStandardModelmixtureofboth.
Forseparatemeasurementofσsandσt,wehavechoicesoftrainingstrategies.Single‐tagevents–t‐channel;Double‐tagevents–s‐channel.
Supposewewanttodoahypothesistestonjustthet‐channel?Re‐op<mizeallMVA’switht‐channelasthesignalands‐channelasbackground.Re‐dothisfors‐channel.
Ideallywe’dpickthemostsensi<veMVA(highestexpectedsignificance)forthetestwewanttodo,butthereisatempta<ontopicktheonewiththehighestmeasuredsignificance(wetellanalyzerstopickthemostsensi<ve).
Ifyouhaveanexcessofdataeventsthatcouldbes‐ort‐channel(can’ttell),thisstrategymayendupgivinganobserva<onofboth(orneither),whenwe’rereallyonlysurethere’satleastoneprocesspresent.
6/5/12 T.JunkSLACStatsProgress 9
SeveralAnalysesontheSameData• Differentgroupsareinterestedinthesamesearch/measurementusingthesamedata.• Mayhaveslightlydifferentselec<onrequirements(Jetenergies,leptontypes,missingEt,etc).• UsuallyhavedifferentchoicesofMVAoreventrainingstrategiesforthesameMVA• Alwayswillgivedifferentresults!
• Whattodo?• Pickoneandpublishit–criterion:bestsensi<vity.Medianexpectedlimit,medianexpectedp‐value,medianexpectedmeasurementuncertainty.Howtopickitiftheresultis2D?Needa1Dfigureofmerit.• Cancheckconsistencywithpseudoexperiments.Ap‐valueusingΔ(measurement)asateststa<s<c.What’sthechanceofrunningtwoanalysesonthesamedataandgevngaresultasdiscrepantaswhatwegot?• CombineMVA’sintoasuper‐MVA
• Keepseveryonehappyandinvolved• Usuallyhelpssensi<vity• Requirescoordina<onandalignmentofeacheventindataandMC• Easiestwhenoverlapindatasamplesis100%.Otherwisehavetobreaksampleupintosharedandnon‐sharedsubsetsandanalyzethemseparately
• Whatnottodo:Picktheonewiththe“best”observedresult.(LEE!)
6/5/12 T.JunkSLACStatsProgress 10
AnExampleofRunningThreeAnalysesontheSameEventsinMonteCarloRepe%%ons
Differentques<onscanbeasked:What’sthedistribu<onofthemaximumdifferencebetweenthemeasurementsanytwoteams?What’sthequadraturesumofthepairwisedifferences?Condi<ononthesum?(Probablynot..)
6/5/12 T.JunkSLACStatsProgress 11
Systema%cUncertaintyHandlingForaverythoroughreview,seeLucDemor<er’snote:“PValues:WhatTheyAreandHowtoUseThem“
h`p://www‐cdf.fnal.gov/~luc/sta<s<cs/cdf8662.pdf
Plausibleop<ons:
1)Prior‐Predic<vemethod2)Supremummethod3)Confidence‐Levelmethod4)Plug‐Inp‐values5)Defineallnuisanceparameterstobeparametersofinterest6)Defineonlytheimportantnuisanceparameterstobeparametersofinterest
Theprior‐predic<vemethodisamixtureofBayesianandFrequen<streasoning
Thesupremummethodisveryconserva<veandIarguenotfullynon‐Bayesian.Italsoproducesmixedresults–canhaveanoutcomewhichisanexcessoverbackgroundwhensevnglimitsandadeficitwhencompu<ngap‐value.
6/5/12 T.JunkSLACStatsProgress 12
TreatNuisanceParametersAsParametersofInterest!
• Oneperson’snuisanceparameterisanother’sparameterofinterest.
• Reallyonlygoodifyouhaveonedominantsourceofsystema<cuncertainty,andyouwanttoshowyourjointmeasurementofthenuisanceparameterandtheparameterofinterest.
Example:topquarkmass(parameterofinterest),vs.CDF’sjetenergyscaleinall‐hadronic`barevents.Doesn’tfollowmysugges<on!Butoneparameter(JES)isnotaparameterof“interest”Ifitwere,we’duseΔLnL=1.15insteadof0.5
Difficulttoapplytocaseswithmanynuisanceparameters.
6/5/12 T.JunkSLACStatsProgress 13
“Strong”SidebandConstraints
6/5/12 T.JunkSLACStatsProgress 14
AnotherStrongSidebandConstraintExample
6/5/12 T.JunkSLACStatsProgress 15
“Weak”SidebandConstraints
CDF’sΩbobserva<onpaper:
Phys.Rev.D80(2009)072003
6/5/12 T.JunkSLACStatsProgress 16
AMixtureofTheoryandDataisNeededforamoreComplicatedsitua<on
HZZFourLeptonsMainBackgrounds:
ppZZ(MC)*theoryppZ+jets,ZW+jet(s),...Datapp`bar
Low‐mass4Lside:off‐shellZ’s,“radia<vetail”,andotherbackgrounds
Dependentontheore<calpredic<onsoftheshapeofthedominantZZbackground.
Withmoredata,replace“bad”systema<ccswith“good”ones(theoryreplacedwithdata).Butintheearlystages,the“bad”systema<cuncertain<esaresmaller!
6/5/12 T.JunkSLACStatsProgress 17
AnotherWeakSidebandConstraintExamplethatLooksLikeaStrongSidebandConstraint
Phys.Rev.D85(2012)032005e‐Print:arXiv:1106.4782[hep‐ex]
6/5/12 T.JunkSLACStatsProgress 18
BreakingtheFlavorDegeneracywithaTagVariable
6/5/12 T.JunkSLACStatsProgress 19
NoSidebandConstraints?Example:Coun<ngexperiment,onlyhaveaprioripredic<onsofexpectedsignalandbackground
Allteststa<s<csareequivalenttotheeventcount–theyservetoorderoutcomesasmoresignal‐likeandlesssignal‐like.Moreevents==moresignal‐like.
Classicalexample:RayDavis’sSolarNeutrinoDeficitobserva<on.Comparingdata(neutrinointerac<onsonaChlorinedetectorattheHomestakemine)withamodel(JohnBahcall’sStandardSolarModel).Calibra<onsofdetec<onsystemwereexquisite.Butitlackedastandardcandle.
Howtoincorporatesystema<cuncertain<es?Fewerop<onsle�.
Anotherexample:Beforeyouruntheexperiment,youhavetoes<matethesensi<vity.Nosidebandconstraintsyet(exceptfromotherexperiments).
Priorpredic<vemethodthenisequivalenttotheprofilemethodusingthecontrolsamplestoes<matenuisanceparameters.Andit’smoregeneralincasesthatthesignalcontamina<onofthesidebandsisimportant.
6/5/12 T.JunkSLACStatsProgress 20
Several“on’s”,Several“off’s”
6/5/12 T.JunkSLACStatsProgress 21
Morethanone“off”sample
Conflic<nges<matesofbackground–whattodo?
Verytypicalexample:Pythiavs.Herwig(herethe“off”samplesareMonteCarlo).“Takethedifferenceasasystema<c”“Taketheaverageandhalfthedifferenceasasystema<c”Trytolearnsomethingaboutwhichoneismorereliable.
Butyoucaninvertmorethanonecutandhave“conflic<ng”offsamplesinthedatatoo.Reallytheextrapola<onfactorsorthesamplecomposi<ones<matesarewhat’swrong,nottheactualdata.
6/5/12 T.JunkSLACStatsProgress 22
Lessonlearned:Trytodoabe`erjobwiththepredic<ons!Sta<s<calmethodswon’tsaveus.
SeeGlenCowan’stalkyesterday
6/5/12 T.JunkSLACStatsProgress 23
MCSta%s%csand“Broken”Bins
• Limitcalculators/discoverytoolscannottellifthebackgroundexpecta<onisreallyzeroorjustadownwardMCfluctua<on.• Realbackgroundes<ma<onsaresumsofpredic<onswithverydifferentweightsineachMCevent(ordataevent)• Rebinningorjustcollec<ngthelastfewbinstogethero�enhelps.• Problemcompoundedbyrequiringshapeuncertain<estobeevaluated!AlternateshapeMCsamplesareo�enevenmorethinlypopulatedthanthenominalsamples.Valida<onofadequateprepara<onofresultsisnecessary?(butwhatarethecriteria?)
NDOF=?
OverbinningislikeovertrainingaNN.s,b,anddcanallbeindifferentbins.Ahistogramcanbepar<allyoverbinned,likethisonehere:
6/5/12 T.JunkSLACStatsProgress 24
SomeVeryEarlyPlotsfromATLASSufferfromlimitedsamplesizesincontrolsamplesandMonteCarloNearlyallexperimentsareguiltyofthis,especiallyintheearlydays!
Thele�plothasadequatebinninginthe“uninteres<ng”region.Fallsapartontheright‐handside,wherethesignalisexpected.Sugges<ons:MoreMC,Widerbins,transforma<onofthevariable(e.g.,takethelogarithm).Notsurewhattodowiththeright‐handplotexceptgetmoremodelingevents.
Datapoints’errorbarsarenotsqrt(n).Whatarethey?Idon’tknow.Howabouttheuncertaintyonthepredic<on?
6/5/12 T.JunkSLACStatsProgress 25
ThisHistogramisProbablyOkay
Thebinningisali`leodd,though.Youcangetthiskindofdistribu<onfromadecisiontreeoralikelihoodMVA.(forestofdeltafunc<ons)
Watchoutthough!Smoothingandsomekindsofinterpola<ons(E.G.horizontalmorphingalaAlexRead)areinappropriateforthisdistribu<on.
Some<mesdistribu<onslikethesehavenaturalcauses:Leptonφdistribu<onsfordetectorswithmanycracks,forexample.
MVAmark
6/5/12 T.JunkSLACStatsProgress 26
AMoreCommonExample–muoncoverageathighangles.
Nosmoothing/extrapoa<onallowedhere!
6/5/12 T.JunkSLACStatsProgress 27
Twocompe<ngeffects:
1)Separa<onofeventsintoclasseswithdifferents/bimprovesthesensi<vityofasearchorameasurement.Addingeventsincategorieswithlows/btoeventsincategorieswithhighers/bdilutesinforma<onandreducessensi<vity.
Pushestowardsmorebins
2)InsufficientMonteCarlocancausesomebinstobeempty,ornearlyso.Thisonlyhastobetrueforonehigh‐weightcontribu<on.
Needreliablepredic<onsofsignalsandbackgroundsineachbin
Pushestowardsfewerbins
Note:Itdoesn’tma`erthattherearebinswithzerodataevents–there’salwaysaPoissonprobabilityforobservingzero.
Theproblemisinadequatepredic<on.Zerobackgroundexpecta<onandnonzerosignalexpecta<onisadiscovery!
Op%mizingHistogramBinning
6/5/12 T.JunkSLACStatsProgress 28
ACommonpi�all–Choosingselec<oncriteriaa�erseeingthedata.“Drawingsmallboxesaroundindividualdataevents”
ThesamethingcanhappenwithMonteCarloPredic<ons–
Limi<ngcase–eacheventinsignalandbackgroundMCgetsitsownbin.FakePerfectsepara<onofsignalandbackground!.
Sta<s<caltoolsshouldn’tgiveadifferentanswerifbinsareshuffled/sorted.
Trysor<ngbys/b.Andcollectbinswithsimilars/btogether.Cangetarbitrarilygoodperformancefromananalysisjustbyoverbinningit.
Note:Emptydatabinsareokay–justemptypredic<onisaproblem.Itisourjobhowevertoproperlyassigns/btodataeventsthatwedidget(andallpossibleones).
Overbinning=Overlearning
6/5/12 T.JunkSLACStatsProgress 29
ModelValida%on
• Notnormallyasta<s<csissue,butsomethingHEPexperimentalistsspendmostoftheir<meworryingabout.
• Systema<cUncertain<esonpredic<onsareusuallyconstrainedbydatapredic<ons.
• O�endiscrepanciesbetweendataandpredic<onarethebasisfores<ma<ngsystema<cuncertainty
6/5/12 T.JunkSLACStatsProgress 30
CheckingInputDistribu%onstoanMVA• Relaxselec<onrequirements–showmodelinginaninclusivesample(example–nob‐tagrequiredforthecheck,butrequireitinthesignalsample)• Checkthedistribu<onsinsidebands(requirezerob‐tags)• Checkthedistribu<oninthesignalsampleforallselectedevents• Checkthedistribu<ona�erahigh‐scorecutontheMVA
Example:Qlepton*ηuntaggedjetinCDF’ssingletopanalysis.Goodsepara<onpowerfort‐channelsignal.
highest|η|jetasawell‐chosenproxy
Phys.Rev.D82:112005(2010)
6/5/12 31T.JunkSta<s<csETHZurich30Jan‐3Feb
CheckingMVAOutputDistribu%ons• CalculatethesameMVAfunc<onforeventsinsideband(control)regions• Forvariablesthatarenotdefinedoutsideofthesignalregions,putinproxies.(some<mesjustazerofortheinputvariableworkswellifthequan<tyreallyisn’tdefinedatall–pickatypicalvalue,notonewayoffontheedgeofitsdistribu<on)• BesuretousethesameMVAfunc<onasforanalyzingthesignaldata.
Example:CDFNNsingle‐topNNvalidatedusingeventswithzerob‐tag
signalregion
Phys.Rev.D82:112005(2010)
6/5/12 T.JunkSLACStatsProgress 32
AComparisoninaControlSamplethatisLessthanPerfectCDF’ssingletopLikelihoodFunc<ondiscriminantcheckedinuntaggedevents
Phys.Rev.D82:112005(2010)
Strategy:Assessashapesystema<ccoveringthedifferencebetweendataandMC–extrapolatetheuncertaintyfromthecontrolsampletothesignalsample.
Ifthecomparisonisokaywithinsta<s<calprecision,donotassesanaddi<onaluncertainty(even/especiallyiftheprecisionisweak).Barlow,hep‐ex/0207026(2002).
6/5/12 T.JunkSLACStatsProgress 33
AnotherValida%onPossibility–TrainDiscriminantstoSeparateEachBackground
Phys.Rev.D82:112005(2010)
SameinputvariablesassignalLF.LFhasthepropertythatthesumoftheseplusthesignalLFis1.0foreachevent.Givesconfidence.Ifthecheckfails,it’sastar<ngpointforaninves<ga<on,andnotawaytoes<mateanuncertainty.
6/5/12 T.JunkSLACStatsProgress 34
ModelValida%onwithMVA’s
• Eventhoughinputdistribu<onscanlookwellmodeled,theMVAoutputcoulds<llbemismodeled.Possiblecause–correla<onsbetweenoneormorevariablescouldbemismodeled• Checksinsubsetsofeventscanalsobeincomplete.Asumofdistribu<onswhoseshapesarewellreproducedbythetheorycans<llbemismodelediftherela<venormaliza<onsofthecomponentsismismodeled.
• Cancheckthecorrela<onsbetweenvariablespairwisebetweendataandpredic<on• Difficulttodoifsomeofthepredic<onisaone‐dimensionalextrapola<onfromcontrolregions(e.g.,ABCDmethods).
• Myfavorite:ChecktheMVAoutputdistribu<oninbinsoftheinputvariables!WecaremoreabouttheMVAoutputmodelingthantheinputvariablemodelinganyway.• Makesuretousethesamenormaliza<onschemeasfortheen<redistribu<on–donotrescaletoeachbin’scontents.
Ideally,we’dtrytofindacontrolsampledepletedinsignalthathasexactlythesamekindofbackgroundasthesignalregion(usuallythisisunavailable).
6/5/12 T.JunkSLACStatsProgress 35
TheSumofUncorrelated2DDistribu%onsmaybeCorrelated
x
y
Knowledgeofonevariablehelpsiden<fywhichsampletheeventcamefromandthushelpspredicttheothervariable’svalueeveniftheindividualsampleshavenocovariance.
6/5/12 T.JunkSLACStatsProgress 36
“ABCD”MethodsCDF’sWCrossSec<onMeasurement
Isola<onfrac<on=
Energyinaconeofradius0.4aroundleptoncandidatenotincludingtheleptoncandidate/EnergyofleptoncandidateMissingTransverseEnergy(MET)
WantQCDcontribu<ontothe“D”regionwheresignalisselected.
Assumes:METandISOareuncorrelatedsamplebysampleSignalcontribu<ontoA,B,andCaresmallandsubtractable
ABCDmethodsarereallyjuston‐offmethodswhereτismeasuredusingdatasamples
6/5/12 T.JunkSLACStatsProgress 37
“ABCD”MethodsAdvantages• Purelydatabased,goodifyoudon’ttrustthesimula<on• Modelassump<onsareinjectedbyhandandnotinacomplicatedMonteCarloprogram(mostly)• Modelassump<onsareintui<ve
Disadvantages• Thelackofcorrela<onbetweenMETandISOassump<onmaybefalse.e.g.,semileptonicBdecaysproduceunisolatedleptonsandMETfromtheneutrinos.• Evenatwo‐componentbackgroundcanbecorrelatedwhenthecontribu<onsaren’tbythemselves.• Anotherwayofsayingthatextrapola<onsaretobechecked/assignedsufficientuncertainty• WorksbestwhentherearemanyeventsinregionsA,B,andC.Otherwisealltheproblemsoflowstatsinthe“Off”sampleintheOn/Offproblemreappearhere.LargenumbersofeventsGaussianapproxima<ontouncertaintyinbackgroundinD• Requiressubtrac<onofsignalfromdatainregionsA,B,andCintroducesmodeldependence• Worse,thesignalsubtrac<onfromthesidebandsdependsonthesignalratebeingmeasured/tested.Asmalleffectifs/binthesidebandsissmallYoucaniteratethemeasurementanditwillconvergequickly
6/5/12 T.JunkSLACStatsProgress 38
ExamplesofABCDMethods• Sidebandcalibra<onofbackgroundunderapeak.(“whatifthebackgroundpeaksalsowherethesignalpeaks?”)• Theon‐offproblemwithτ=A/C.VeryfrequentlysamplesAandCareinMCsimula<ons,wherewecanbesurenottocontaminatethebackgroundes<ma<onsw<hsignal.Example:UsingtheMCtoes<mateacceptanceforacutforbackground,tobescaledwithadatacontrolsample.ButwepaythepriceofunknownMCmismodeling.
Uncorrelatedvariableassump<on==assump<onthatτisthesameinthedataandtheMC.(checkmodelingofshapeofdistribu<onintheMC)
Equivalentofpreviousproblem:EvenifthebackgroundshapesarewellmodeledbytheMC,iftherearemul<plebackgroundprocesseswhichcontribute,theycanhavedifferentfrac<onalcontribu<ons,distor<ngthetotalshapes.
• FivnganMVAshapetothedata.Low‐scoreMC=A,High‐ScoreMC=CLow‐scoredata=B,High‐scoreData=D.
6/5/12 T.JunkSLACStatsProgress 39
AnApproximateLEECorrec%onforPeakHun%ngSeeE.GrossandO.Vitells,Eur.Phys.J.C70(2010)525‐530.
Approximateformulaappliestobumphuntsonasmoothbackground.
Requiresafewfullysimulatedpseudoexperimentswithcompletep‐valuecalcula<onsovertheregionofinterest.Countup‐crossingsofathreshold.Extrapolatestohigherthresholdsassuminglarge‐samplebehavior.Specifically,thattheLRteststa<s<chasachisquareddistribu<on.
Aninteres<ngfeature–specifictobumphuntsbutmaybemoregeneral:
Astheexpectedsignificancegoesup,sodoestheLEEcorrec%on
Thismakeslotsofsense:LEEdependsonthenumberofseparatemodelsthatcanbetested.Aswecollectmoredata,wecanmeasuretheposi<onofthepeakmoreprecisely.
Sowecantellmorepeaksapartfromeachother,evenwiththesamereconstruc<onresolu<on.
But:Combineapoorresolu<onlows/bsearchwithahighresolu<onhighs/bbutvery<nysandvery<nybsearch–maynotgettherightanswer.
6/5/12 T.JunkSLACStatsProgress 40
CDF’s2011HγγSearch
+2otherchannelswithsmallerexcesses
Insufficientsensi<vitytoaSMHiggsboson.Rateruledoutbyothersearches(ggHWWforexample).Soweknowthebumpisastatfluctua<on.
6/5/12 T.JunkSLACStatsProgress 41
arXiv:1203.3774
CDF+D0HiggsSearchChannelsCombined
>300nuisanceparameters
6/5/12 T.JunkSLACStatsProgress 42
TevatronHiggsSearchLEELocalp‐valuevsHiggsbosonmass
Bandsshowexpecta<onassumingasignalispresent(ateachmHseparately)
CrossSec<on<mesBranchingRa<oFitsvs.mH
Acomplica<on:MostsearchchannelstraintheirMVA’sseparatelyateachmHtoop<mizesensi<vity
6/5/12 T.JunkSLACStatsProgress 43
Signal,Background,andDataintheZHllbbSearch
Reconstructedmjjdistribu%on
YoucantellmHisonthehighsideoftherangeonlybywhat’smissingandnotwhat’sthere!
6/5/12 T.JunkSLACStatsProgress 44
)2Higgs Mass (GeV/c
100 110 120 130 140 150
95
% C
L L
imit
/SM
1
10
210 ! 1!Expected Limits
! 2!Expected Limits
2=115 GeV/cH
Expected with Injected M
)-1CDF II Preliminary (10 fb
CDF’sWHchannelexpecta<on(x3luminositytosimulatethepresenceofotherchannels:llbb,METbb)
Witha115GeVsignalinjected
6/5/12 T.JunkSLACStatsProgress 45
S<rringitAllTogether–D0’sLLRTest
Assumingobservedandexpected+3sigmaexcess,andmedianoutcome.Resolu<onfrom‐2ΔLLR=Δχ2=1
Resolu<onat115GeV:±5GeVResolu<onat135GeV:~±10GeV
6/5/12 T.JunkSLACStatsProgress 46
Aninteres<ngBiasBillMurrayShowedatTheNextStretchoftheHiggsMagnificentMileConference
h`ps://twindico.hep.anl.gov/indico/conferenceOtherViews.py?view=standard&confId=856
SeekabumponasmoothbackgroundExample:LHC(orTevatron)Hγγsearch.
AllowmHtofloatandpickthemHthatmaximizesthefi`edcrosssec<on.
Thefi`edcrosssec<onwillbebiasedupwardsandtheposi<onresolu<onof“lucky”outcomeswillbeworsethanunluckyonesevenifasignalistrulypresent.
Why?Atruebumpcancoalescewithafluctua<oneithertothele�ortotherightofthebump(twochancestofluctuateupwards).
Effectcanbesubstan<al!Calibratewithsimulatedexperimentaloutcomes(FC).
6/5/12 T.JunkSLACStatsProgress 47
LEEforLimits?
10-6
10-5
10-4
10-3
10-2
10-1
1
100 102 104 106 108 110 112 114 116 118 120
mH
(GeV/c2)
CL
s
114.4115.3
LEP
Observed
Expected forbackground
No,butthereistheoppositeeffect.Wetakethemostconserva<vemassexclusion.IftheCLscurvecrossesseveral<meswequotethesmallest(LEP).
Hardtosaywhatthemedianexpectedlimitis–theplacewherethemedianCLscrossesthelineishigherthanthemedianlowestlimit.
LHCandTevatronexperimentsquotemul<pledisjointmHlimits.
NoLEE:jus<fica<on–eachtestateachmHisanindependentsearchwithitsownerrorrate,assumingapar<cleistrulythereateachmassoneata<me.
6/5/12 T.JunkSLACStatsProgress 48
1
10
100 110 120 130 140 150 160 170 180 190 200
1
10
mH (GeV/c2)
95
% C
L L
imit
/SM
Tevatron Run II Preliminary, L ! 10.0 fb-1
Expected
Observed
!1 s.d. Expected
!2 s.d. Expected
LE
P E
xc
lus
ion
Tevatron
+ATLAS+CMS
Exclusion
SM=1
Te
va
tro
n +
LE
P E
xc
lus
ion
CM
S E
xclu
sio
n
AT
LA
S E
xclu
sio
n
AT
LA
S E
xclu
sio
n
LE
P+
AT
LA
S E
xclu
sio
n
ATLAS+CMS
Exclusion
ATLAS+CMS
Exclusion
February 27, 2012
LEEforLimits?Mul<pleexperimentssearchingforthesamepar<cle.
Mul<plechancestofalselyexcludeapar<clethat’sactuallythere.
Veryeasytotaketheunionofexcludedregions,butthisdoesnothave95%coverage.
Thebestthingtodoistocombineforasingleinterpreta<on.
Butthelimitsareofsecondaryimportancehere.
6/5/12 T.JunkSLACStatsProgress 49
Whereis“Elsewhere?”Acollidercollabora<onistypicallyverylarge;>1000Ph.D.students.ATLAS+CMSisanotherfactoroftwo.(FourLEPcollabora<ons,TwoTevatroncollabora<ons).
Manyongoinganalysesfornewphysics.Thechanceofseeingabumpsomewhereislarge.WhatistheLEE?
Dowehavetocorrectourpreviouslypublishedp‐valuesforalargerLEEwhenweaddnewanalysestoourpor�olio?
Howaboutthephysicistwhogoestothelibraryandhand‐picksallthelargestexcesses?WhatisLEEthen?
“Consensus”attheBanff2010Sta<s<csWorkshop:LEEshouldcorrectonlyforthosemodelsthataretestedwithinasinglepublishedanalysis.Usuallyonepapercoversoneanalysis,butreviewpaperssummarizingmanyanalysesdonothavetoputinaddi<onalcorrec<onfactors.
FortheWinter2012Higgssearchanalyses,wehadseveralLEE’scomputed,dependingonthemassrangedefinedtobeelsewhere.
Caveatlector.
6/5/12 T.JunkSLACStatsProgress 50
Whereis“Elsewhere?”LEEiso�enhardenoughtoevaluate.Rightwaytodoit–computep‐valueofp‐valuessimulateexperimentassumingzerosignalmany<mesandforeachsimulatedoutcomefindthemodelwiththesmallestp‐value.Mul<dimensionalmodelsareharder,andLEEisworse.
Kane,Wang,Nelson,Wang,Phys.Rev.D71,035006(2005)
ALEPH,DELPHI,L3,OPAL,andtheLHWGPhys.Lel.B565(2003)61‐75
ALELPH,DELPHI,L3,OPAL,andtheLHWGEur.Phys.J.C47(2006)547‐587
Twoexcessesseen;proposedmodelsexplainbothwithtwoHiggsbosons.Combinedlocalsignificanceisgreater,butLEEnowismuchlarger(andunevaluated).Publishedplotgraysoutregionbeyondexperimentalsensi<vity.
6/5/12 T.JunkSLACStatsProgress 51
6/5/12 T.JunkSLACStatsProgress 52
ChoosingaRegionofInterest
• Idonothaveafoolproofprescrip<onforthis,justsomethoughts.
• Analysesaredesignedtoop<mizesensi<vity,butLEEdilutessensi<vity.Thereisapenaltyforlookingformanyindependentlytestablemodels.Canweop<mizethis?
• Butyoushouldalwaysdoasearchanyway!Ifyouexpecttobeabletotestamodel,youshould.
• Tes<ngpreviouslyexcludedmodels?Wedothisanyway,justincasesomenewphysicsshowsupinawaythatevadedtheprevioustest.
• Thereisnosuchthingasamodel‐independentsearch.MerelybuildingtheLHCortheTevatronmeanswehadsomethinginmind.AndtheSM(orjustourimplementa<onofit)iswrong,butpossiblynotinawaythatisbothinteres<ngandtestable.
6/5/12 T.JunkSLACStatsProgress 53
LookElseWHENRunningaveragesconvergeoncorrectanswer,butthedevia<onsinunitsoftheexpecteduncertaintyhavearandomwalkinthelogarithmofthenumberoftrials
TherkareIIDnumbersdrawnfromaunitGaussian.
€
dn =
rk /nk=1
n
∑1/ n
Trial Number
d-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
1 10 102
103
104
105
106
107
108
109
=nIt’spossibletocherry‐pickadatasetwithamaximumdevia<on.“Samplingtoaforegoneconclusion”
StoppingRule:InHEP,we(almostalways!)takedataun<lourmoneyisgone.Weproduceresultsforthemajorconferencesalongtheway.Somewillcoincidentallystopwhenthefluctua<onsarebiggest.Wetakethemostrecent/largestdatasampleresultandignore(orshould!)resultsperformedonsmallerdatasets.p‐valuess<lldistributeduniformlyfrom0to1.Arecipeforgenera<ng“effectsthatgoaway”
6/5/12 T.JunkSLACStatsProgress 54
LookElseWHENC.Paus,Implica<onsWorkshop,Mar.27,2012
h`ps://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=162621
6/5/12 T.JunkSLACStatsProgress 55
ParameterEs%ma%on–MarginalizeorProfile?
0
5
10
15
20
25
30
-3 -2 -1 0 1 2 3Nuisance Parameter ! (units of ")
Yie
ld
PredictedObserved
Predicted = 10+6
Observed = 15+3
Nuisance Parameter ! (units of ")
Lik
elih
oo
d
0
0.02
0.04
0.06
0.08
0.1
-3 -2 -1 0 1 2 3
IfPred=10‐6‐3,andobs=15,thenthelikelihoodwouldhaveonemaximum,butitwouldhaveacorner.MINUITmayquoteinappropriateuncertain<esasthesecondderiva<veisn’twelldefined.
Thecornercanbesmoothedout–SeeButIknowofnowayR.Barlow,h`p://arxiv.org/abs/physics/0406120,togetridofthedouble‐peakh`p://arxiv.org/abs/physics/0401042(norshouldtherebeaway‐‐h`p://arxiv.org/abs/physics/0306138itcanbearealeffect.SeetheLEP2TGCmeasurements)
6/5/12 T.JunkSLACStatsProgress 56
AnalysisOp%miza%oninIsola%onorinCombina%on?Typicalsitua<on:
Ameasurementhasasta<s<calandasystema<cuncertainty,wherethesta<s<caluncertaintyincludes“good”systema<csthatareconstrainedbythedata,andthe“bad”onesnevergetbe`erconstrainednoma`erhowmuchdataarecollected.
Wesome<meshaveachoiceofhowtoanalyzemarkedPoissondata.
1)aggressivereconstruc<onmakingassump<onsaboutpar<cledistribu<ons–moresta<s<calpowerpereventatthecostofintroducingsystema<cuncertainty
2)moremodel‐independentanalysiswithfewerassump<ons–lesssta<s<calpowerpereventbutbe`ercontroloversystema<cs.
Combina<onwithothermeasurements(fromotherdatarunsorothercollabora<ons)islikecollec<ngmoredata.Method1hitsthesystema<climitandlosesweightinthecombina<oneventhoughitmaybethemostpowerfulmethodbyitself.
Moregeneral:Withli`ledata,wearemoredependentonourassump<ons,withmoredatawecanrelaxtheassump<onsandconstrainourmodels.
Recommenda<on:Forcombina<ons,op<mizeforthelargeluminositycase.
6/5/12 T.JunkSLACStatsProgress 57
Correla%onsamongUncertain%es–WhenisitConserva%ve,whennot?
• Withinachannel–contribu<onsthataddtogether:includingcorrela<onsusuallyweakensthesensi<vity(always:sensi<vityisexpected)• Betweenchannels–accoun<ngforcorrela<onsisnotconserva<veOnechannel’sobserveddatabecomesanother“off”sampleforanother’s.Havetotrustalltheτfactors,andevenoffsetsfromcentralpredic<onsinordertoputinthesecorrela<ons.
• Overes<ma<ngtheimpactsofsystema<cuncertaintyonapredic<onisnotconserva<veifacorrela<onistakenintoaccount.Canresultinunderes<matedsystema<cerroronacombinedresult.
Example(systema<cuncertainty1is100%correlated,systuncertainty2is100%correlated)
Measurement1:m1=5±1(syst1)±1(syst2)CombinewithBLUE:mbest=2m1‐m2Measurement2:m2=5±1(syst1)±2(syst2)mbest=5±1(syst1)±0(syst2)
Hereaccoun<ngforcorrela<onandanoveres<matedsystema<cuncertaintyresultsinanaggressiveresult.
6/5/12 T.JunkSLACStatsProgress 58
ExtraSlides
6/5/12 T.JunkSLACStatsProgress 59
Whereis“Elsewhere?”• Mostsearchesfornewphysicshavea“regionofinterest”
• Defini<onisachoiceoftheanalyzer/collabora<on• O�enboundedbelowbyprevioussearches,boundedabovebykinema<creachoftheaccelerator/detector• Limitstheamountofworkinvolvedinpreparingananalysis.Some<mesa2DsearchinvolveslotsoftrainingofMVA’sandcheckingsidebandsandvalida<onofinputsandoutputs
Example:Asearchforpair‐producedstopquarkswhichdecaytoc+Neutralino
IfMstop>mW+mb+mneutralinothenanotheranalysistakesover.
6/5/12 60T.JunkSta<s<csETHZurich30Jan‐3Feb
AnExample:Double‐TagMethods
x
yDijeteventsatLEP1/SLD
Zu,ubard,dbars,sbarb,bbarleptonsneutrinos
PrimaryVertex
Adouble‐vertex‐B‐taggedeventwithasemileptonicdecay
B‐taggingefficiencies(efficiencyoffindingthedisplacedvertex)areabout40%.WedonottrustMCmodelingoftheb‐tagefficiency.WouldliketomeasuretheB‐tagefficiencyandtheBr(Zb,bbar)branchingfrac<ontogetherinthesamedata.Counteventswith0,1,and2vertextags.Enoughinforma<ontosolvefortheBrandtheefficiency.
x=b‐tagofjet1,y=b‐tagofjet2.Assumeuncorrelatedprobabili<esfortaggingthejets.Buttheflavorofthejetsiscorrelated!Itisthisflavorcorrela<onthatallowsustoextractBrandTageff.
6/5/12 T.JunkSLACStatsProgress 61
On‐OffMeasurements–AveragedorCombined
• Oneglobalon‐offmeasurementvs.breakingthedataintosubsamples• Assumethe“off”dataarecollectedalongwiththe“on”data(controlsampleontheothersideofacutforexample)
• Globalon‐offmeasurementallowseachdatasubsample’soffmeasurementstohelpmeasureeachotherdatasubsample’sonsample’sbackgrounds.Assump<onwhichmaybefalse:youareallowedtodothis.Ifthedetectororacceleratorchangedpartwaythroughtherun,thenyoumayneedtobreakthesamplesup.
• Breakingthemapartallowsonlyeachsubsample’soffmeasurementstocalibratethebackgroundinthecorrespondingonsamples.
• SameforABCDmethods–averagingsubsamples
6/5/12 T.JunkSLACStatsProgress 62
SingleTopProduc5onMechanisms
“s‐Channel” “t‐Channel”
“NLOContribu<onstot‐ChannelProduc<on”
6/5/12 T.JunkSLACStatsProgress 63
LeveragingourRateMeasurementstoMeasuretheHiggsBosonMass
AssumingSMcrosssec<onsandbranchingfrac<ons,measuredratesarestrongfunc<onsofmH.ExampleatmH=115GeV,assuming+3sigmaexcess,andamedianoutcomeinboththebb,ττchannelsandtheWWchannels:
Tauchannelscancontributehere,evenwithlessprecisemrecthanthebbchannels
6/5/12 T.JunkSLACStatsProgress 64
6/5/12 T.JunkSLACStatsProgress 65
6/5/12 66
Interes%ngBehaviorofCLsCLsmaynotbeamonotonicfunc<onof‐2lnQ
Tailsinthe‐2lnQdistribu<onsharedinthes+bandb‐onlyhypothesis(fitfailures)
Notreallyapathologyofthemethod,butratherareflec<onthattheteststa<s<cisn’talwaysdoingitsjobofsepara<ngs+b‐likeoutcomesfromb‐likeoutcomesinsomefrac<onofthecases.
CLs=1for‐2lnQ<‐15or‐2lnQ>+15
Distribu<onsaresumsoftwoGaussianseach.ThewideGaussianiscenteredonzero.
Prac<calreasonthiscouldhappen–everythousandthexperimentaloutcome,thefitprogram“fails”
andgivesarandomanswer.
T.JunkSLACStatsProgress
6/5/12 T.JunkSLACStatsProgress 67
ABumpthatGotAway
Dijet mass sum in e+e-→jjjj ALEPH Collaboration, Z. Phys. C71, 179 (1996)
“the width of the bins is designed to correspond to twice the expected resolution ... and their origin is deliberately chosen to maximize the number of events found in any two consecutive bins”
6/5/12 T.JunkSLACStatsProgress 68
ASamplewithZeroCovarianceisNotNecessarilyUncorrelated
y
x
Example–perimeterofacircle.Knowledgeofxprovidesknowledgeofyuptoa2‐foldambiguity.Butthecovarianceofthesamplevanishes!
SomethingtowatchoutforwithPrincipalComponentsAnalysis–doesnotremovecorrela<on,onlycovariance.
6/5/12 T.JunkSLACStatsProgress 69
Notallsearchesarebumphuntsonasmoothbackground
–Mul<variateAnalysesareusuallytrainedupateachmassseparately,andthereisnotasingledistribu<onwecanlookelsewherein.
Sta<s<caleffectsonly.Ifthere’sasystema<ceffectinthebackgroundmodeling,a“signal”maygrowinsignificancewithaddi<onaldatainawaythat’snotdescribedhere.
Themismodelingmaybeconcentratedinasmallpor<onofthehistogram(thisisnotaLEEeffectbutamoredifficultques<on).
Backgroundparameteriza<onmaygrowinsophis<ca<onasdataarecollected.
NotallLRteststa<s<cdistribu<onsaremodeledwellbychisquareddistribu<ons.
Combinealarge‐data‐samplebumphuntwithahighs/b,low‐background(say,b=1e‐5)searchandthedistribu<onoftheLRisaconvolu<onofchisquaredandPoisson.
CasestobeCarefulaboutApplyingtheLEEApproxima%on