developments in bayesian priors roger barlow manchester iop meeting november 16 th 2005
Post on 22-Dec-2015
215 views
TRANSCRIPT
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 2
Plan
• Probability– Frequentist– Bayesian
• Bayes Theorem– Priors
• Prior pitfalls (1): Le Diberder• Prior pitfalls (2): Heinrich• Jeffreys’ Prior
– Fisher Information
• Reference Priors: Demortier
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 3
Probability
Probability as limit of frequencyP(A)= Limit NA/Ntotal
Usual definition taught to studentsMakes senseWorks well most of the time-
But not all
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 4
Frequentist probability
“It will probably rain tomorrow.”“ Mt=174.3±5.1 GeV means the top quark
mass lies between 169.2 and 179.4, with 68% probability.”
“The statement ‘It will rain tomorrow.’ is probably true.”
“Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.”
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 5
Bayesian Probability
P(A) expresses my belief that A is true
Limits 0(impossible) and 1 (certain)
Calibrated off clear-cut instances (coins, dice, urns)
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 6
Frequentist versus Bayesian?
Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.)
Rivals? Religious differences? Particle Physicists tend to be frequentists.
Cosmologists tend to be BayesiansNo. Two different tools for practitionersImportant to:• Be aware of the limits and pitfalls of both• Always be aware which you’re using
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 7
Bayes Theorem (1763)
P(A|B) P(B) = P(A and B) = P(B|A) P(A)P(A|B)=P(B|A) P(A)
P(B)Frequentist use eg Čerenkov counter
P( | signal)=P(signal | ) P() / P(signal)Bayesian useP(theory |data) = P(data | theory) P(theory) P(data)
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 8
Bayesian Prior
P(theory) is the PriorExpresses prior belief theory is trueCan be function of parameter:
P(Mtop), P(MH), P(α,β,γ)
Bayes’ Theorem describes way prior belief is modified by experimental data
But what do you take as initial prior?
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 9
Uniform Prior
General usage: choose P(a) uniform in a(principle of insufficient reason)
Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible
BUT!If P(a) uniform, P(a2) , P(ln a) , P(√a).. are notInsufficient reason not valid (unless a is ‘most
fundamental’ – whatever that means)Statisticians handle this: check results for
‘robustness’ under different priors
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 10
Example – Le Diberder
Sad StoryFitting CKM angle α from B6 observables3 amplitudes: 6 unknown parameters
(magnitudes, phases) α is the fundamentally interesting one
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 11
Results
Frequentist
BayesianSet one phase to zeroUniform priors in other two phases and 3 magnitudes
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 12
More ResultsBayesianParametrise Tree and Penguin amplitudes
0
00
1
21
2
P
TC
TC P
ii
iiC
i iiC
A Te Pe
A e T T e
A e T e Pe
Bayesian3 Amplitudes: 3 real parts, 3 Imaginary parts
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 13
Interpretation
• B shows same (mis)behaviour
• Removing all experimental info gives similar P(α)
• The curse of high dimensions is at work
Uniformity in x,y,z makes P(r) peak at large rThis result is not robust
under changes of prior
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 14
Example - Heinrich
CDF statistics group looking at problem of estimating signal cross section S in presence of background and efficiency.
N= εS+bEfficiency and Background from separate calibration
experiments (sidebands or MC). Scaling factors κ, ω are known.
Everything done using Bayesian methods with uniform priors and Poisson statistics formula. Calibration experiments use uniform prior for ε and for b, yielding posteriors used for S
P(N|S)=(1/N!)∫∫e-(εS+b) (εS+b )N P(ε) P(b) dε db Check coverage – all fine
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 15
But it all goes pear shaped..
If particle decays in several channelsHγγ H τ+ τ- Hbb
Each channel with different b and ε: total 2N+1 parameters, 2N+1 experiments
Heavy undercoverage!e.g. with 4 channels, all ε=25±10%, b=0.75±0.25For s≈10 get ’90% upper limit’ above s in only 80% of cases
90%
100%
S10 20
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 16
The curse strikes again
Uniform prior in ε: fineUniform prior in ε1, ε2…
εN
εN-1 prior in total εPrejudice in favour of
high efficiencySignal size downgraded
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 17
Happy ending
Effect avoided by using Jeffreys’ Priors instead of uniform priors for ε and b
Not uniform but like 1/ε, 1/b
Not entirely realistic but interestingUniform prior in S is not a problem –
but maybe should consider 1/√S?Coverage (a very frequentist concept)
is a useful tool for Bayesians
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 18
Fisher InformationAn informative experiment
is one for which a measurement of x will give precise information about the parameter a.
Quantify: I(a)= -<2 ln L/a2>
(Second derivative – curvature)
P(x,a): everything
P(x)|a is the pdf
P(a)|x is the likelihood L(a)
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 19
Jeffreys’ PriorA prior may be uniform in a – but if
I(a) depends on a it’s still not ‘flat’: special values of a give better measurements
Transform a a’ such that I(a’) is
constant. Then choose a uniform prior• location parameter – uniform prior OK• scale parameter – a’ is ln a. prior 1/a• Poisson mean – prior 1/√a
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 20
Objective Prior?
Jeffreys called this an ‘objective’ prior as opposed to ‘subjective’ or straight guesswork, but not everyone was convinced
For statisticians ‘flat prior’ means Jeffreys prior. For physicists it means uniform prior
Prior depends on likelihood. Your ‘prior belief’ P(MH) (or whatever) depends on the analysis
Equivalent to a prior proportional to √I
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 21
Reference Priors (Demortier)
4 steps• Intrinsic DiscrepancyBetween two PDFs δ{P1(z),P2(z)}=Min{∫P1(z)ln(P1(z)/P2(z)) dz,
∫P2(z)ln(P2(z)/P1(z))dz}Sensible measure of differenceδ=0 iff P1(z) & P2(z) are the same, else +ve Invariant under all transformations of z
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 22
Reference Priors (2)
2) Expected Intrinsic InformationMeasurement M: x is sampled from p(x|a)Parameter a has a prior p(a)Joint distribution p(x,a)=p(x|a) p(a) Marginal distribution p(x)=∫p(x|a) p(a) da
I(p(a),M)=δ{p(x,a),p(x)p(a)}Depends on (i) x-a relationship and (ii)
breadth of p(a)Expected Intrinsic (Shannon) Information from measurement M about parameter a
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 23
Reference Priors (3)
3) Missing informationMeasurement Mk – k samples of xEnough measurements fix a completelyLimit k∞ I(p(a),Mk) is the difference
between knowledge encapsulated in prior p(a) and complete knowledge of a. Hence Missing Information given p(a).
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 24
Reference Priors(4)
4) Family of priors P (e.g. Fourier series, polynomials, histogram). p(a)P
Ignorance principle: choose the least informative (dumbest) prior in the family: the one for which the missing information Limit k∞ I(p(a),Mk) is largest.
Technical difficulties in taking k limit and integrating over infinite range of a
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 25
Family of Priors (Google)
Manchester IoP Half Day Meeting
Roger Barlow: Developments in Bayesian Priors
Slide 26
Reference Priors
Do not represent subjective belief – in fact the opposite (like a jury selection). Allow the most input to come from the data. Formal consensus practitioners can use to arrive at sensible posterior
Depend on measurement p(x|a) – cf JeffreysAlso require family of P of possible priorsMay be improper but this doesn’t matter (do not
represent…). For 1 parameter (if measurement is asymptoticallly
Gaussian, which the CLT usually secures) give Jeffreys prior
But can also (unlike Jeffreys) work for several parameters