difficulties in limit setting and the strong confidence approach

Difficulties in Limit setting and the Strong Confidence

approach

Giovanni Punzi

SNS and INFN - Pisa

Advanced Statistical Techniques in Particle Physics

Durham, 18-22 March 2002

Durham 2002

G. Punzi - Strong CL2

Outline

• Motivations for a Strong CL • Summary of properties of Strong CL• Some examples• Limits in presence of systematic

uncertainties.

Durham 2002


Motivation• The set of Neyman’s bands is large,

and contains all sorts of inferences like:

“I bought a lottery ticket. If I win, I will conclude then donkeys can fly @99.9999% CL”

• I want to get rid of those, but keep being frequentist.

Durham 2002


Why should you care ?• Wrong reason: to make the CL look

more like p(hypothesis | data). • Right reason:

You don’t want to have to quote a conclusion you know is bad. If you think harder, you can do better:– You are drawing conclusions based on

irrelevant facts (like a bad fit).

– As a consequence, you are not exploiting at best the information you have

– Your results are counter-intuitive and convey little information.

• You must make sure your conclusions do not depend on irrelevant information

Durham 2002


SOLUTION:Impose a form of

Likelihood Principle• Take any two experiments whose pdf

are equal for some subset of observable values of x, apart for a multiplicative constant. Any valid Confidence Limits you can derive in one experiment from observing x in must also be valid for the other experiment.

• If you ask the Limits to be univocally determined, there is no solution.

Durham 2002


RESULT

Neyman’s CL bands

Strong bands

Non-coverageland

Surprise: a solution exists, and gives for any experiment a well-defined, unique subset of Confidence Bands

Durham 2002


Construction of CL bands

μ

Probability of incorrectconclusion

< 1- CL

x

Observation

Confidence Region

∫p(x|µ) dx

x

μ

Probabilityof incorrect

conclusion

Maximumprobability

in this subset

<1-sCL

μmax

RegularRegular

Strong Strong

Durham 2002


Strong CL vs. standard CL• A new parameter emerges: sCL. Every

valid band @xx% sCL is also a valid band @xx% CL.

• You can check sCL for a band built in any other way.

• sCL requirement effectively amounts to re-applying the usual Neyman’s condition locally on every subsample of possible results.This ensures uniform treatment of all experimental results, but in a frequentist way.

• Strong Band definition is not an ordering algorithm and answer is still not unique. You may need to add an ordering to obtain a unique solution.

Durham 2002


Strong CL

• It is similar to conditioning, a standard practice in modern frequentist statistics.

• “There is a long history of attempts to modify frequentist theory by utilizing some form of conditioning. Earlier works are summarized in Kiefer(1977), Berger and Wolpert(1988) […] Kiefer(1977) formally established the conditional confidence approach”

• “The first point to stress is the unreasonable nature of the unconditional test […] the unconditional test is arguably the worst possible frequentist test […] it is in some sense true that, the more one can condition, the better”

• “It is sometimes argued that conditioning on non-ancillary statistics will ‘lose information’, but nothing loses as much information as use of unconditional testing” (J. Berger)

∀χ∀μp(x ∈χ ∧μ ∉CR(x)|μ)

supμp(x ∈χ|μ)

≤1−sCL

∀μ p(μ ∉CR(x)|μ) ≤1−CLNeyman:

(CR(x) is the accepted region for µ given the observation of x. is an arbitrary subset of x space)

Durham 2002


Summary of sCL properties

• 100% frequentist, completely general.• The only frequentist method

complying with Likelihood Principle • Invariant for any change of variables• No empty regions, in full generality• No “unlucky results”, no need for

quoting additional information on sensitivity. No pathologies.

• Robust for small changes of pdf• More information gives tighter limits• Easier incorporation of systematics• Price tag:

– Overcoverage

– Heavy computation

(see CLW proceedings and hep-ex/9912048)

Durham 2002


Invariance for change of the observable

• All classical bands are invariant for change of variable in the parameter (unlike Bayesian limits)

• The CL definition is invariant for change of variable in the observable, too. But, most rules for constructing bands break this invariance !

• Strong-CL is also invariant for any change of variable.

• Likelihood Ratio is also invariant (non-advertised property?), so it is a natural choice of ordering to select a unique Strong Band.

Durham 2002


Effect of changing variables

Neyman’s CL bands

Strong bands

Non-coverageland

LR-ordered bands

Durham 2002


Poisson+background

• The upper limit on µ decreases with expected background in all unconditioned approaches.

• Often criticized on the basis that for n=0 the value of b should be irrelevant.

1 2 3 4 5 6 7

0.5

1

1.5

2

2.5

3

LR-ordering

upper limit @90%CL for n=0

background

sCL = 90%, or R.-W.

Durham 2002


Behavior when new observables are added

• Do you expect limits to improve when you add extra information ?

• A simple example shows that neither PO or LRO have this property (conjecture: no ordering algorithm has it !)

• Example: comparing a signal level with gaussian noise with some fixed thresholds

• Problem: the limit loosens dramatically when adding an extra threshold measurement.

Durham 2002


Example

• Unknown electrical level µ plus gaussian noise ( =1). Limited to |µ|< 0.5.

• Compare with a fixed threshold (2.5 ), get a (0,1) response.

• Observe > threshold:– PO: empty region @90%CL

– LR: 0.49 < µ < 0.50 @90%CL

– sCL: -0.34 < µ < 0.50 @90%sCL

• N.B. you MUST overcover unless you want an empty region.

L(µ) LR(µ)

Durham 2002


Add another threshold

• Now, add a second independent threshold measurement at 0: limit become much looser !

• sCL limit is unaffected

• Conjecture: no ordering algorithm can provide a sensible answer in all cases.

L(µ) LR(µ)

0.27< µ < 0.5

Durham 2002


Observations

• It may be impossible to get sensible results without accepting some overcoverage. Why blame sCL for overcoverage ?

• Ordering algorithms alone seem unable to prevent very strange results: the inclusion of additional (irrelevant) information may produce a dramatic worsening of limits.

Durham 2002


Adding systematics to CL limits

• Problem:– My pdf p(x|µ) is actually a p(x|µ,),

where is an unknown parameter I don’t care about, but it influences my measurement (nuisance)

– I may have some info of coming from another measurement y: q(y|)

– My problem is:

• p(x,y|µ,) = p(x|µ,)*q(y|)

• Many attempts to get rid of : three main routes:– Integration/smearing (a la Bayes)

– Maximization (“profile Likelihood”)

– Projection (strictly classical)

Durham 2002


Hybrid method: Bayesian Smearing

• 1) define a new (smeared) pdf:

p’(x|µ) = p(x|µ,)π() d where π() is obtained through Bayes:– π() = q(y| )p()/q(y)

– Need to assume some prior p()

• 2) Use p’ to obtain Conf. Limits as usual

• GOOD:– Simple and fast

– Used in many places

– Intuitively appealing

• BAD:– Intuitively appealing

– Interpretation: mix Bayes and Neyman. Output results have neither coverage nor correct Bayesian probability => waste effort of calculating a rigorous CL

– May undercover

– May exhibit paradoxical tightening of limits

Durham 2002


A simple example + Bayes systematics

• Introduce a systematic uncertainty on the actual position of the 0 threshold. Assume a flat prior in [-1,1].

• Do smearing => get tighter limits !

• No reason to expect a good behavior

µ > 0.272 µ > 0.294

LR(µ)LR(µ)

Durham 2002


Approximate classical method: Profile Likelihood

• 1) define a new (profile) pdf:p prof(x|µ) = p(x,y0|µ,best (µ))

where best(µ) maximizes the value of a) p(x0,y0|µ,best)b) p(x ,y0|µ,best) (best = best(µ,x) !)

This means maximizing the likelihood wrt the nuisance parameters, for each µ

• 2) Use p prof to obtain Conf. Limits as usual

• GOOD:– Reasonably simple and fast

– Approximation of an actual frequentist method

• BAD:– Flip-flop in case a), non-normalized in case b) !!

– Only approximate for low-statistics, which is when you need limits after all.

– You don’t know how far off it is unless you explicitly calculate correct limits.

– Systematically undercovers

Durham 2002


Exact Classical Treatment of Systematics in Limits

1) Use p(x,y|µ,) = p(x|µ,)*q(y|), and consider it as p( (x,y) | (µ,) )

2) Evaluate CR in (µ,) from the measurement (x0,y0)

3) Project on µ space to get rid of uninteresting information on

• It is clean and conceptually simple.

• It is well-behaved.

• No issues like Bayesian integrals

Why is it used so rarely ?

1) It produces overcoverage

2) The idea is simple, but computation is heavy. Have to deal with large dimensions

3) Results may strongly depend on ordering algorithm, even more than usual.

Durham 2002


(x,y)

(μ,)

( 0, 0)x y

μ

μ

μ

best

min

max

“profile method”

Durham 2002


“Overcoverage”

• Projecting on µ effectively widens the CR overcoverage. BUT:– You chose to ignore information on - cannot ask

Neyman to give it all back to you as information on µ - the two things are just not interchangeable.

overcoverage is a natural consequence, not a weakness

• Q: can you find a smaller µ interval that does not undercover ? (same situation with discretization)

Extra coverageμ

μ

μmin

max

Durham 2002


Optimization issue

• You want to stretch out the CR along direction as far as possible.

• BUT:– The choice of band is constrained by

the need to avoid paradoxes (empty regions, and the like) !

– No method on the marked allows you to treat µ and in a different fashion

• Strong CL allows you to specify µ as the parameters of interest, and to obtain the narrowest µ interval

• The solution does not require constructing a multidimensional region

Durham 2002


Strong CL Band with systematics

• The solution does not require explicit construction of a multidimensional region

• The narrowest µ interval compatible with Strong CL is readily found.

∀∀μ

p( x ∈ ∧μ ∉ CR(x)|μ,α)

supα,μ

p(x ∈ | )≤1−sCL

supα

μ,α

p( x |μ,α)

supα,μ

p(x| )

supα

μ,αLRprof=

Durham 2002


Conclusions

• Strong Confidence bands have all good properties you may ask for.

• Systematics can be included naturally and rigorously

• They can even be actually evaluated

difficulties in limit setting and the strong confidence approach

Documents