systematic uncertainties in statistical data analysis for

48
G. Cowan Systematic uncertainties in statistical data analysis page 1 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan

Upload: others

Post on 01-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 1

Systematic uncertainties in statisticaldata analysis for particle physics

DESY Seminar

Hamburg, 31 March, 2009

Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan

Page 2: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 2

OutlinePreliminaries

Role of probability in data analysis (Frequentist, Bayesian)Systematic errors and nuisance parameters

A simple fitting problemFrequentist solution / Bayesian solutionWhen does tot

2 = stat2 + sys

2 make sense?

Systematic uncertainties in a searchExample of search for Higgs (ATLAS)

Examples of nuisance parameters in fitsb→s with recoil method (BaBar)Towards a general strategy for nuisance parameters

Conclusions

Page 3: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 3

Data analysis in HEPParticle physics experiments are expensive

e.g. LHC, ~ $1010 (accelerator and experiments)

the competition is intense(ATLAS vs. CMS) vs. Tevatron

and the stakes are high:

4 sigma effect

5 sigma effect

So there is a strong motivation to know precisely whether one'ssignal is a 4 sigma or 5 sigma effect.

Page 4: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 4

Frequentist vs. Bayesian approachesIn frequentist statistics, probabilities are associated only withthe data, i.e., outcomes of repeatable observations.

Probability = limiting frequency

The preferred hypotheses (theories, models, parameter values, ...) are those for which our observations would be considered ‘usual’.

In Bayesian statistics, interpretation of probability extended todegree of belief (subjective probability).

Use Bayes' theorem to relate (posterior) probability for hypothesis H given data x to probability of x given H (the likelihood):

Need prior probability, (H), i.e., before seeing the data.

Page 5: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 5

Statistical vs. systematic errors Statistical errors:

How much would the result fluctuate upon repetition of the measurement?

Implies some set of assumptions to define probability of outcome of the measurement.

Systematic errors:

What is the uncertainty in my result due to uncertainty in my assumptions, e.g.,

model (theoretical) uncertainty;modelling of measurement apparatus.

Usually taken to mean the sources of error do not vary upon repetition of the measurement. Often result from uncertain value of calibration constants, efficiencies, etc.

Page 6: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 6

Systematic errors and nuisance parametersModel prediction (including e.g. detector effects) never same as "true prediction" of the theory:

x

y model:

truth:

Model can be made to approximate better the truth by includingmore free parameters.

systematic uncertainty ↔ nuisance parameters

Page 7: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 7

Example: fitting a straight line

Data:

Model: measured yi independent, Gaussian:

assume xi and i known.

Goal: estimate 0

(don’t care about 1).

Page 8: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 8

Correlation between

causes errors

to increase.

Standard deviations from

tangent lines to contour

Frequentist approach

Page 9: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 9

The information on 1

improves accuracy of

Frequentist case with a measurement t1 of 1

Page 10: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 10

Bayesian methodWe need to associate prior probabilities with 0 and 1, e.g.,

Putting this into Bayes’ theorem gives:

posterior Q likelihood prior

← based on previous measurement

reflects ‘prior ignorance’, in anycase much broader than

Page 11: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 11

Bayesian method (continued)

Usually need numerical methods (e.g. Markov Chain MonteCarlo) to do integral.

We then integrate (marginalize) p(0, 1 | x) to find p(0 | x):

In this example we can do the integral (rare). We find

Page 12: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 12

Bayesian method with alternative priorsSuppose we don’t have a previous measurement of 1 but rather, e.g., a theorist says it should be positive and not too much greaterthan 0.1 "or so", i.e., something like

From this we obtain (numerically) the posterior pdf for 0:

This summarizes all knowledge about 0.

Look also at result from variety of priors.

Page 13: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 13

A more general fit (symbolic)Given measurements:

and (usually) covariances:

Predicted value:

control variable parameters bias

Often take:

Minimize

Equivalent to maximizing L() » e2/2, i.e., least squares same as maximum likelihood using a Gaussian likelihood function.

expectation value

Page 14: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 14

Its Bayesian equivalent

and use Bayes’ theorem:

To get desired probability for , integrate (marginalize) over b:

→ Posterior is Gaussian with mode same as least squares estimator, same as from 2 = 2

min + 1. (Back where we started!)

Take

Joint probabilityfor all parameters

Page 15: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 15

Alternative priors for systematic errorsGaussian prior for the bias b often not realistic, especially if oneconsiders the "error on the error". Incorporating this can givea prior with longer tails:

b(b)

Represents ‘erroron the error’; standard deviation of s(s) is s.

b

Page 16: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 16

A simple testSuppose fit effectively averages four measurements.

Take sys = stat = 0.1, uncorrelated.

Case #1: data appear compatible Posterior p(|y):

Usually summarize posterior p(|y) with mode and standard deviation:

experiment

mea

sure

men

t

p(

|y)

Page 17: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 17

Simple test with inconsistent dataCase #2: there is an outlier

→ Bayesian fit less sensitive to outlier.

Posterior p(|y):

experiment

mea

sure

men

t

p(

|y)

(See also D'Agostini 1999; Dose & von der Linden 1999)

Page 18: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 18

Example of systematics in a searchCombination of Higgs search channels (ATLAS) Expected Performance of the ATLAS Experiment: Detector, Trigger and Physics, arXiv:0901.0512, CERN-OPEN-2008-20.

Standard Model Higgs channels considered (more to be used later): H → H → WW (*)→ eH → ZZ(*) → 4l (l = e, ) H → → ll, lh

Used profile likelihood method for systematic uncertainties:background rates, signal & background shapes.

Page 19: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 19

Statistical model for Higgs search

Bin i of a given channel has ni events, expectation value is

Expected signal and background are:

is global strength parameter, common to all channels.= 0 means background only, = 1 is SM hypothesis.

btot, s, b arenuisance parameters

Page 20: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 20

The likelihood function

represents all nuisance parameters, e.g., background rate, shapes

here signal rate is only parameterof interest

The single-channel likelihood function uses Poisson modelfor events in signal and control histograms:

The full likelihood function is

There is a likelihood Li(,i) for each channel, i = 1, …, N.

data in signal histogramdata in control histogram

Page 21: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 21

Profile likelihood ratioTo test hypothesized value of , construct profile likelihood ratio:

Maximized L for given

Maximized L

Equivalently use q = 2 ln ():

data agree well with hypothesized → q small

data disagree with hypothesized → q largeDistribution of qunder assumption of related to chi-square

(Wilks' theorem, here approximation valid for roughly L > 2 fb):

Page 22: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 22

p-value / significance of hypothesized

Test hypothesized by givingp-value, probability to see data with ≤ compatibility with compared to data observed:

Equivalently use significance,Z, defined as equivalent numberof sigmas for a Gaussian fluctuation in one direction:

Page 23: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 23

SensitivityDiscovery:

Generate data under s+b (= 1) hypothesis;Test hypothesis = 0 → p-value → Z.

Exclusion:Generate data under background-only (= 0) hypothesis;Test hypothesis .If = 1 has p-value < 0.05 exclude mH at 95% CL.

Presence of nuisance parameters leads to broadening of theprofile likelihood, reflecting the loss of information, and givesappropriately reduced discovery significance, weaker limits.

Page 24: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 24

Combined discovery significance

Discovery signficance (in colour) vs. L, mH:

Approximations used here not always accurate for L < 2 fb but in most cases conservative.

Page 25: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 25

Combined 95% CL exclusion limits

p-value of mH (in colour) vs. L, mH:

Page 26: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 26

Fit example: b → s (BaBar)B. Aubert et al. (BaBar), Phys. Rev. D 77, 051103(R) (2008).

Decay of one B fully reconstructed (Btag).

Look for high-energy from rest of event.

Signal and background yields from fit to mES in bins of E.

ee-- DD**

ee++

BBtagtag

BBsignalsignal

XsXs

high-energy

"recoil method"

Page 27: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 27

Fitting mES distribution for b → s

Fit mES distribution using superposition of Crystal Ball and Argus functions:

CrystalBall

Argus

shapesrates obs./pred. events in ith bin

log-likelihood:

Page 28: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 28

Simultaneous fit of all mES distributions

Need fits of mES distributions in 14 bins of E:

At high E, not enough events to constrain shape,

so combine all E bins into global fit:

Start with no energy dependence, and include oneby one more parameters until data well described.

Shape parameters could vary (smoothly) with E.

So make Ansatz for shape parameters such as

Page 29: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 29

Finding appropriate model flexibilityHere for Argus parameter, linear dependence gives significantimprovement; fitted coefficient of linear term ±.

Inclusion of additional free parameters (e.g., quadratic Edependence for parameter ) do not bring significant improvement.

So including the additional energy dependence for the shapeparameters converts the systematic uncertainty into a statisticaluncertainty on the parameters of interest.

D. Hopkins, PhD thesis, RHUL (2007).

2(1) 2(2) = 3.48p-value of (1) = 0.062→data want extra par.

Page 30: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 30

Towards a general strategy (frequentist)

Suppose one needs to know the shape of a distribution.Initial model (e.g. MC) is available, but known to be imperfect.

Q: How can one incorporate the systematic error arising fromuse of the incorrect model?

A: Improve the model.

That is, introduce more adjustable parameters into the modelso that for some point in the enlarged parameter space it is very close to the truth.

Then use profile the likelihood with respect to the additional(nuisance) parameters. The correlations with the nuisance parameters will inflate the errors in the parameters of interest.

Difficulty is deciding how to introduce the additional parameters.

In progress together with: S. Caron, S. Horner, J. Sundermann, E. Gross, O Vitells, A. Alam

Page 31: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 31

A simple example

The naive model (a) could have been e.g. from MC (herestatistical errors suppressed; point is to illustrate how toincorporate systematics.)

0th order modelTrue model (Nature) Data

Page 32: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 32

Comparing model vs. data

In the example shown, the model and data clearly don't agree well.

To compare, use e.g.

Model number of entries ni in ith bin as ~Poisson(i)

Will follow chi-square distribution for N dof for sufficientlylarge ni.

Page 33: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 33

Model-data comparison with likelihood ratioThis is very similar to a comparison based on the likelihood ratio

where L() = P(n;) is the likelihood and the hat indicatesthe ML estimator (value that maximizes the likelihood).

Here easy to show that

Equivalently use logarithmic variable

If model correct, q ~ chi-square for N degrees of freedom.

Page 34: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 34

p-valuesUsing either 2

P or q, state level of data-model agreementby giving the p-value: the probability, under assumption of themodel, of obtaining an equal or greater incompatibility with thedata relative to that found with the actual data:

where (in both cases) the integrand is the chi-square distributionfor N degrees of freedom,

Page 35: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 35

Comparison with the 0th order model

The 0th order model gives q = 258.8, p ×

Page 36: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 36

Enlarging the modelHere try to enlarge the model by multiplying the 0th orderdistribution by a function s:

where s(x) is a linear superposition of Bernstein basis polynomials of order m:

Page 37: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 37

Bernstein basis polynomials

Using increasingly high order for the basis polynomials givesan increasingly flexible function.

Page 38: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 38

Enlarging the parameter space

Using increasingly high order for the basis polynomials givesan increasingly flexible function.

At each stage compare the p-value to some threshold, e.g., 0.1 or 0.2, to decide whether to include the additional parameter.

Now iterate this procedure, and stop when the data do notrequire addition of further parameters based on the likelihoodratio test. (And overall goodness-of-fit should also be good.)

Once the enlarged model has been found, simply includeit in any further statistical procedures, and the statistical errorsfrom the additional parameters will account for the systematicuncertainty in the original model.

Page 39: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 39

Fits using increasing numbers of parameters

Stop here

Page 40: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 40

Deciding appropriate level of flexibility

Stop here

says whether data prefer additionalparameter

says whether data well described overall

When p-value exceeds ~0.1 to 0.2, fit is good enough.

Page 41: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 41

Issues with finding an improved modelSometimes, e.g., if the data set is very large, the total 2 canbe very high (bad), even though the absolute deviation betweenmodel and data may be small.

It may be that including additional parameters "spoils" theparameter of interest and/or leads to an unphysical fit resultwell before it succeeds in improving the overall goodness-of-fit.

Include new parameters in a clever (physically motivated,local) way, so that it affects only the required regions.

Use Bayesian approach -- assign priors to the new nuisanceparameters that constrain them from moving too far (or use equivalent frequentist penalty terms in likelihood).

Unfortunately these solutions may not be practical and one maybe forced to use ad hoc recipes (last resort).

Page 42: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 42

Summary and conclusions Key to covering a systematic uncertainty is to include the appropriate nuisance parameters, constrained by all available info.

Enlarge model so that for at least one point in itsparameter space, its difference from the truth is negligible.

In frequentist approach can use profile likelihood (similar withintegrated product of likelihood and prior -- not discussed today).

Too many nuisance parameters spoils information aboutparameter(s) of interest.

In Bayesian approach, need to assign priors to (all) parameters.Can provide important flexibility over frequentist methods.Can be difficult to encode uncertainty in priors.Exploit recent progress in Bayesian computation (MCMC).

Finally, when the LHC announces a 5 sigma effect, it's importantto know precisely what the "sigma" means.

Page 43: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 43

Extra slides

Page 44: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 44

Digression: marginalization with MCMCBayesian computations involve integrals like

often high dimensionality and impossible in closed form,also impossible with ‘normal’ acceptance-rejection Monte Carlo.

Markov Chain Monte Carlo (MCMC) has revolutionizedBayesian computation.

MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers:

cannot use for many applications, e.g., detector MC;effective stat. error greater than naive √n .

Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.

Page 45: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 45

Although numerical values of answer here same as in frequentistcase, interpretation is different (sometimes unimportant?)

Example: posterior pdf from MCMCSample the posterior pdf from previous example with MCMC:

Summarize pdf of parameter ofinterest with, e.g., mean, median,standard deviation, etc.

Page 46: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 46

MCMC basics: Metropolis-Hastings algorithmGoal: given an n-dimensional pdf

generate a sequence of points

1) Start at some point

2) Generate

Proposal densitye.g. Gaussian centredabout

3) Form Hastings test ratio

4) Generate

5) If

else

move to proposed point

old point repeated

6) Iterate

Page 47: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 47

Metropolis-Hastings (continued)This rule produces a correlated sequence of points (note how each new point depends on the previous one).

For our purposes this correlation is not fatal, but statisticalerrors larger than naive

The proposal density can be (almost) anything, but chooseso as to minimize autocorrelation. Often take proposaldensity symmetric:

Test ratio is (Metropolis-Hastings):

I.e. if the proposed step is to a point of higher , take it;

if not, only take the step with probability

If proposed step rejected, hop in place.

Page 48: Systematic uncertainties in statistical data analysis for

G. Cowan Systematic uncertainties in statistical data analysis page 48

Metropolis-Hastings caveatsActually one can only prove that the sequence of points followsthe desired pdf in the limit where it runs forever.

There may be a “burn-in” period where the sequence doesnot initially follow

Unfortunately there are few useful theorems to tell us when thesequence has converged.

Look at trace plots, autocorrelation.

Check result with different proposal density.

If you think it’s converged, try starting from a differentpoint and see if the result is similar.