statistics for susy/exoticscowan/atlas/cowan_cambridge_1may12.pdf · g. cowan atlas uk cambridge 1...

47
G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 1 Statistics for SUSY/Exotics ATLAS UK SUSY/Exotics Cambridge, 1 May, 2012 Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan

Upload: others

Post on 29-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 1

    Statistics for SUSY/Exotics

    ATLAS UK SUSY/Exotics

    Cambridge, 1 May, 2012

    Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan

    TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 2

    Outline Some developments for Bayesian methods

    Bayesian limits Reference priors Bayes factors for discovery

    Discovery sensitivity: beyond s/√b

    Some comments on unfolding

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 3

    Quick review of probablility

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 4

    The Bayesian approach to limits In Bayesian statistics need to start with ‘prior pdf’ π(θ), this reflects degree of belief about θ before doing the experiment.

    Bayes’ theorem tells how our beliefs should be updated in light of the data x:

    Integrate posterior pdf p(θ | x) to give interval with any desired probability content.

    For e.g. n ~ Poisson(s+b), 95% CL upper limit on s from

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 5

    Bayesian prior for Poisson parameter Include knowledge that s ≥0 by setting prior π(s) = 0 for s

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 6

    Bayesian interval with flat prior for s Solve numerically to find limit sup.

    For special case b = 0, Bayesian upper limit with flat prior numerically same as one-sided frequentist case (‘coincidence’).

    Otherwise Bayesian limit is everywhere greater than the one-sided frequentist limit, and here (Poisson problem) it coincides with the CLs limit.

    Never goes negative.

    Doesn’t depend on b if n = 0.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 7

    Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements.

    Often called “objective priors” Form basis of Objective Bayesian Statistics

    The priors do not reflect a degree of belief (but might represent possible extreme cases).

    In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 8

    Priors from formal rules (cont.) For a review of priors obtained by formal rules see, e.g.,

    Formal priors have not been widely used in HEP, but there is recent interest in this direction, especially the reference priors of Bernardo and Berger; see e.g.

    L. Demortier, S. Jain and H. Prosper, Reference priors for high energy physics, Phys. Rev. D 82 (2010) 034002, arXiv:1002.1111.

    D. Casadei, Reference analysis of the signal + background model in counting experiments, JINST 7 (2012) 01012; arXiv:1108.4270.

    Casadei’s approach: describe nuisance parameters with informative gamma priors and marginalize; from this find the Jeffreys’ prior for the (single) parameter of interest.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 9

    Jeffreys’ prior According to Jeffreys’ rule, take prior according to

    where

    is the Fisher information matrix.

    One can show that this leads to inference that is invariant under a transformation of parameters.

    For a Gaussian mean, the Jeffreys’ prior is constant; for a Poisson mean µ it is proportional to 1/√µ.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 10

    Jeffreys’ prior for Poisson mean Suppose n ~ Poisson(µ). To find the Jeffreys’ prior for µ,

    So e.g. for µ = s + b, this means the prior π(s) ~ 1/√(s + b), which depends on b. Note this is not designed as a degree of belief about s.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 11

    Bayesian limit on s with uncertainty on b Uncertainty on b goes into the prior, e.g.,

    Put this into Bayes’ theorem,

    Marginalize over the nuisance parameter b,

    Then use p(s|n) to find intervals for s with any desired probability content.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 12

    Digression: marginalization with MCMC Bayesian computations involve integrals like

    often high dimensionality and impossible in closed form, also impossible with ‘normal’ acceptance-rejection Monte Carlo.

    Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation.

    MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers:

    cannot use for many applications, e.g., detector MC; effective stat. error greater than if all values independent .

    Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 13

    Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?)

    Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC:

    Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 14

    Bayesian model selection (‘discovery’)

    no Higgs

    Higgs

    The probability of hypothesis H0 relative to an alternative H1 is often given by the posterior odds:

    Bayes factor B01 prior odds

    The Bayes factor is regarded as measuring the weight of evidence of the data in support of H0 over H1.

    Interchangeably use B10 = 1/B01

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 15

    Rewriting the Bayes factor Suppose we have models Hi, i = 0, 1, ...,

    each with a likelihood

    and a prior pdf for its internal parameters

    so that the full prior is

    where is the overall prior probability for Hi.

    The Bayes factor comparing Hi and Hj can be written

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 16

    Bayes factor is independent of P(Hi) For Bij we need the posterior probabilities marginalized over all of the internal parameters of the models:

    Use Bayes theorem

    So therefore the Bayes factor is

    The prior probabilities pi = P(Hi) cancel.

    Ratio of marginal likelihoods

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 17

    Numerical determination of Bayes factors Both numerator and denominator of Bij are of the form

    ‘marginal likelihood’

    Various ways to compute these, e.g., using sampling of the posterior pdf (which we can do with MCMC).

    Harmonic Mean (and improvements) Importance sampling Parallel tempering (~thermodynamic integration) Nested sampling (MultiNest) ...

    See e.g.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 18

    Assessing Bayes factors One can use the Bayes factor much like a p-value (or Z value).

    There is an “established” scale, analogous to HEP's 5σ rule: B10 Evidence against H0 -------------------------------------------- 1 to 3 Not worth more than a bare mention 3 to 20 Positive 20 to 150 Strong > 150 Very strong

    Kass and Raftery, Bayes Factors, J. Am Stat. Assoc 90 (1995) 773.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 19

    The Look-Elsewhere Effect

    Gross and Vitells, EPJC 70:525-530,2010, arXiv:1005.1891

    Suppose a model for a mass distribution allows for a peak at a mass m with amplitude µ.

    The data show a bump at a mass m0.

    How consistent is this with the no-bump (µ = 0) hypothesis?

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 20

    p-value for fixed mass First, suppose the mass m0 of the peak was specified a priori.

    Test consistency of bump with the no-signal (µ = 0) hypothesis with e.g. likelihood ratio

    where “fix” indicates that the mass of the peak is fixed to m0.

    The resulting p-value

    gives the probability to find a value of tfix at least as great as observed at the specific mass m0.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 21

    p-value for floating mass But suppose we did not know where in the distribution to expect a peak.

    What we want is the probability to find a peak at least as significant as the one observed anywhere in the distribution.

    Include the mass as an adjustable parameter in the fit, test significance of peak using

    (Note m does not appear in the µ = 0 model.)

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 22

    Distributions of tfix, tfloat For a sufficiently large data sample, tfix ~chi-square for 1 degree of freedom (Wilks’ theorem).

    For tfloat there are two adjustable parameters, µ and m, and naively Wilks theorem says tfloat ~ chi-square for 2 d.o.f.

    In fact Wilks’ theorem does not hold in the floating mass case because on of the parameters (m) is not-defined in the µ = 0 model.

    So getting tfloat distribution is more difficult.

    Gross and Vitells

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 23

    Approximate correction for LEE We would like to be able to relate the p-values for the fixed and floating mass analyses (at least approximately).

    Gross and Vitells show the p-values are approximately related by

    where 〈N(c)〉 is the mean number “upcrossings” of -2ln λ(0) in the fit range based on a threshold

    and where Zfix is the significance for the fixed mass case. So we can either carry out the full floating-mass analysis (e.g. use MC to get p-value), or do fixed mass analysis and apply a correction factor (much faster than MC).

    Gross and Vitells

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 24

    Upcrossings of -2lnλ

    〈N(c)〉 can be estimated from MC (or the real data) using a much lower threshold c0:

    Gross and Vitells

    The Gross-Vitells formula for the trials factor requires 〈N(c)〉, the mean number “upcrossings” of -2lnλ(0) in the fit range based on a threshold c = tfix= Zfix2.

    In this way 〈N(c)〉 can be estimated without need of large MC samples, even if the the threshold c is quite high.

  • 25 G. Cowan

    Multidimensional look-elsewhere effect Generalization to multiple dimensions: number of upcrossings replaced by expectation of Euler characteristic:

    Applications: astrophysics (coordinates on sky), search for resonance of unknown mass and width, ...

    ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

    Vitells and Gross, Astropart. Phys. 35 (2011) 230-234; arXiv:1105.4355

  • Remember the Look-Elsewhere Effect is when we test a single model (e.g., SM) with multiple observations, i..e, in mulitple places.

    Note there is no look-elsewhere effect when considering exclusion limits. There we test specific signal models (typically once) and say whether each is excluded.

    With exclusion there is, however, the analogous issue of testing many signal models (or parameter values) and thus excluding some even in the absence of signal (“spurious exclusion”)

    Approximate correction for LEE should be sufficient, and one should also report the uncorrected significance.

    “There's no sense in being precise when you don't even know what you're talking about.” –– John von Neumann

    26 G. Cowan

    Summary on Look-Elsewhere Effect

    ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 27

    Discovery significance for n ~ Poisson(s + b)

    Consider the frequently seen case where we observe n events , model as following Poisson distribution with mean s + b (assume b is known). 1)  For an observed n, what is the significance Z0 with which we would reject the s = 0 hypothesis? 2)  What is the expected (or more precisely, median ) Z0 if the true value of the signal rate is s?

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 28

    Gaussian approximation for Poisson significance For large s + b, n → x ~ Gaussian(µ,σ) , µ = s + b, σ = √(s + b).

    For observed value xobs, p-value of s = 0 is Prob(x > xobs | s = 0),:

    Significance for rejecting s = 0 is therefore

    Expected (median) significance assuming signal rate s is

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 29

    Better approximation for Poisson significance

    Likelihood function for parameter s is

    or equivalently the log-likelihood is

    Find the maximum by setting

    gives the estimator for s:

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 30

    Approximate Poisson significance (continued) The likelihood ratio statistic for testing s = 0 is

    For sufficiently large s + b, (use Wilks’ theorem),

    To find median[Z0|s+b], let n → s + b (i.e., the Asimov data set):

    This reduces to s/√b for s

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 31

    n ~ Poisson(µ s+b), median significance, assuming µ = 1, of the hypothesis µ = 0

    “Exact” values from MC, jumps due to discrete data. Asimov √q0,A good approx. for broad range of s, b. s/√b only good for s « b.

    CCGV, arXiv:1007.1727

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 32

    Some comments on unfolding

    Given an observed distribution subject to statistical fluctuations and event migration between bins: try to estimate the “true” underlying distribution:

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 33

    The (naive) Maximum Likelihood solution

    Catastrophic failure???

    Maximizing the likelihood:

    gives

    response matrix

    back- ground

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 34

    Regularized unfolding

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 35

    Stat. and sys. errors of unfolded solution In general the statistical covariance matrix of the unfolded estimators is not diagonal; need to report full

    But unfolding necessarily introduces biases as well, corresponding to a systematic uncertainty (also correlated between bins).

    This is more difficult to estimate. Suppose, nevertheless, we manage to report both Ustat and Usys.

    To test a new theory depending on parameters θ, use e.g.

    Mixes frequentist and Bayesian elements; interpretation of result can be problematic, especially if Usys itself has large uncertainty.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 36

    Folding Suppose a theory predicts f(y) → µ (may depend on parameters θ). Given the response matrix R and expected background β, this predicts the expected numbers of observed events:

    From this we can get the likelihood, e.g., for Poisson data,

    And using this we can fit parameters and/or test, e.g., using the likelihood ratio statistic

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 37

    Versus unfolding If we have an unfolded spectrum and full statistical and systematic covariance matrices, to compare this to a model µ compute likelihood

    where

    Complications because one needs estimate of systematic bias Usys.

    If we find a gain in sensitivity from the test using the unfolded distribution, e.g., through a decrease in statistical errors, then we are exploiting information inserted via the regularization (e.g., imposed smoothness).

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 38

    ML solution again From the standpoint of testing a theory or estimating its parameters, the ML solution, despite catastrophically large errors, is equivalent to using the uncorrected data (same information content).

    There is no bias (at least from unfolding), so use

    The estimators of θ should have close to optimal properties: zero bias, minimum variance.

    The corresponding estimators from any unfolded solution cannot in general match this.

    Crucial point is to use full covariance, not just diagonal errors.

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 39

    Some unfolding references H. Prosper and L. Lyons (eds.), Proceedings of PHYSTAT 2011, CERN-2011-006; many contributions on unfolding from p 225.

    G. Cowan, A Survey Of Unfolding Methods For Particle Physics, www.ippp.dur.ac.uk/old/Workshops/02/statistics/proceedings/cowan.pdf

    G. Cowan, 2012 CERN Academic Traning Lecture #4: indico.cern.ch/conferenceDisplay.py?confId=173728

    G. Cowan, Statistical Data Analysis, OUP (1998), Chapter 11.

    V. Blobel, An Unfolding Method for High Energy Physics Experiments, arXiv:hep-ex/0208022v1

    G. D'Agostini, Improved iterative Bayesian unfolding, arXiv:1010.0632

    G. Choudalakis, Fully Bayesian Unfolding, arXiv:1201.4612

    Tim Adye, Unfolding algorithms and tests using RooUnfold, arXiv:1105.1160v1

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 40

    Summary

    Bayesian methods for limits

    Beware of issues with flat priors

    Consider reference priors

    Issues related to discovery

    For (frequentist) sensitivity, use

    Look-Elsewhere Effect ~solved

    For Bayesian discovery, use Bayes Factor

    Unfolding

    A big topic – avoid if possible!

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 41

    Extra slides

  • Common practice in HEP has been to claim a discovery if the p-value of the no-signal hypothesis is below 2.9 × 10-7, corresponding to a significance Z = Φ-1 (1 – p) = 5 (a 5σ effect).

    There a number of reasons why one may want to require such a high threshold for discovery:

    The “cost” of announcing a false discovery is high.

    Unsure about systematics.

    Unsure about look-elsewhere effect.

    The implied signal may be a priori highly improbable (e.g., violation of Lorentz invariance).

    42 G. Cowan

    Why 5 sigma?

    ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  • But the primary role of the p-value is to quantify the probability that the background-only model gives a statistical fluctuation as big as the one seen or bigger.

    It is not intended as a means to protect against hidden systematics or the high standard required for a claim of an important discovery.

    In the processes of establishing a discovery there comes a point where it is clear that the observation is not simply a fluctuation, but an “effect”, and the focus shifts to whether this is new physics or a systematic.

    Providing LEE is dealt with, that threshold is probably closer to 3σ than 5σ.

    43 G. Cowan

    Why 5 sigma (cont.)?

    ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  • Reference priors J. Bernardo, L. Demortier, M. Pierini Maximize the expected Kullback–Leibler

    divergence of posterior relative to prior:

    This maximizes the expected posterior information about θ when the prior density is π(θ).

    Finding reference priors “easy” for one parameter:

    G. Cowan 44 ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

    (PHYSTAT 2011)

  • Reference priors (2) J. Bernardo, L. Demortier, M. Pierini

    Actual recipe to find reference prior nontrivial; see references from Bernardo’s talk, website of Berger (www.stat.duke.edu/~berger/papers) and also Demortier, Jain, Prosper, PRD 82:33, 34002 arXiv:1002.1111:

    Prior depends on order of parameters. (Is order dependence important? Symmetrize? Sample result from different orderings?)

    G. Cowan 45 ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

    (PHYSTAT 2011)

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 46

    MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points

    1) Start at some point

    2) Generate

    Proposal density e.g. Gaussian centred about

    3) Form Hastings test ratio

    4) Generate

    5) If

    else

    move to proposed point

    old point repeated

    6) Iterate

  • G. Cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics 47

    Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one).

    For our purposes this correlation is not fatal, but statistical errors larger than naive

    The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric:

    Test ratio is (Metropolis-Hastings):

    I.e. if the proposed step is to a point of higher , take it; if not, only take the step with probability If proposed step rejected, hop in place.