Ryan [email protected]://rreece.github.io/
Feb 16, 2018Insight Data Science - AI Fellows Workshop
Primer on statistics:
MLE, Confidence Intervals, and Hypothesis Testing
Ryan Reece
Outline
1. Maximum likelihood estimators
2. Variance of MLE → Confidence intervals
3.
4. Efficiency example
5. A/B testing
2
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Ryan Reece
4
[GeV]γγm100 110 120 130 140 150 160
Even
ts /
GeV
100
200
300
400
500
600
700
800 Data 2011
Total background
=125 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.9 fb∫ = 7 TeV, s
γγ→H
(a)
[GeV]llllm0 100 200 300 400 500 600
Even
ts /
10 G
eV
0
2
4
6
8
10
12
14Data 2011
Total background
=125 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.8 fb∫ = 7 TeV, s
4l→(*)ZZ→H
(b)
[GeV]llllm100 120 140 160 180 200 220 240
Even
ts /
5 G
eV
0
2
4
6
8
10
12 Data 2011
Total background
=125 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.8 fb∫ = 7 TeV, s
4l→(*)ZZ→H
(c)
[GeV]Tm200 300 400 500 600 700 800 900
Even
ts /
50 G
eV
0102030405060708090
Data 2011
Total background
=350 GeV, 1 x SMHm
ATLAS -1 Ldt = 4.7 fb∫ = 7 TeV, sννll→ZZ→H
(d)
[GeV]llbbm200 300 400 500 600 700 800
Even
ts /
18 G
eV
1
2
3
4
5
6
7
8Data 2011
Total background
=350 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.7 fb∫ = 7 TeV, s
llbb→ZZ→H
(e)
[GeV]lljjm200 300 400 500 600 700 800
Even
ts /
20 G
eV
020406080
100120140160180200
Data 2011
Total background
=350 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.7 fb∫ = 7 TeV, s
llqq→ZZ→H
(f)
[GeV]Tm60 80 100 120 140 160 180 200 220 240
Even
ts /
10 G
eV
0102030405060708090
100
-1 Ldt = 4.7 fb∫ = 7 TeV, s
ATLASData 2011
=125 GeV, 1 x SMHm
Total background
+0jνlνl→WW→H
(g)
[GeV]Tm60 80 100 120 140 160 180 200 220 240
Even
ts /
10 G
eV
0
5
10
15
20
25
30
-1 Ldt = 4.7 fb∫ = 7 TeV, s
ATLAS Data 2011
=125 GeV, 1xSMHm
Total background
+1jνlνl→WW→H
(h)
[GeV]Tm50 100 150 200 250 300 350 400 450
Even
ts /
10 G
eV
00.5
11.5
22.5
33.5
44.5
5
-1 Ldt = 4.7 fb∫ = 7 TeV, s
ATLAS Data 2011
=125 GeV, 1 x SMHm
Total background
+2jνlνl→WW→HNot final selection
(i)
[GeV]jjνlm300 400 500 600 700 800 900 1000
Even
ts /
10 G
eV
0
20
40
60
80
100
120
Data 2011
Total background
=350 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.7 fb∫ = 7 TeV, s
qq+0jνl→WW→H
(j)
[GeV]jjνlm300 400 500 600 700 800 900 1000
Even
ts /
10 G
eV
0
20
40
60
80
100
Data 2011
Total background
=350 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.7 fb∫ = 7 TeV, s
qq+1jνl→WW→H
(k)
[GeV]jjνlm300 400 500 600 700 800 900 1000
Even
ts /
10 G
eV
0
5
10
15
20
25
30
35
Data 2011
Total background
=350 GeV, 1 x SMHm
ATLAS
-1 Ldt = 4.7 fb∫ = 7 TeV, s
qq+2jνl→WW→H
(l)
FIG. 1. Invariant or transverse mass distributions for the selected candidate events, the total background and the signal expectedin the following channels: (a) H → γγ, (b) H → ZZ(∗) → ℓ+ℓ−ℓ+ℓ− in the entire mass range, (c) H → ZZ(∗) → ℓ+ℓ−ℓ+ℓ− inthe low mass range, (d) H → ZZ → ℓ+ℓ−νν, (e) b-tagged selection and (f) untagged selection for H → ZZ → ℓ+ℓ−qq, (g) H →WW (∗) → ℓ+νℓ−ν+0-jets, (h) H → WW (∗) → ℓ+νℓ−ν+1-jet, (i) H → WW (∗) → ℓ+νℓ−ν+2-jets, (j) H → WW → ℓνqq′+0-jets, (k) H → WW → ℓνqq′+1-jet and (l) H → WW → ℓνqq′+2-jets. The H → WW (∗) → ℓ+νℓ−ν+2-jets distribution isshown before the final selection requirements are applied.
Is this significant?
3[arxiv:1207.0319]
• How can we calculate the best-fit estimate of some parameter?
‣ Point estimation and confidence intervals
• How can we be precise and rigorous about how confident we are that a model is wrong?
‣ Hypothesis testing
3 events
Has a local p0 of ≈ 2%
Statistical questions:
Ryan Reece
Confidence Intervals
4
• A frequentist confidence interval is constructed such that, given the model, if the experiment were repeated, each time creating an interval, 95% (or other CL) of the intervals would contain the true population parameter (i.e. the interval has ≈95% coverage).
‣ They can be one-sided exclusions, e.g. X > 100.2 at 95% CL
‣ Two-sided measurements, e.g. X = 125.1 ± 0.2 at 68% CL
‣ Contours in 2 or more parameters
• This is not the same as saying “There is a 95% probability that the true parameter is in my interval”. Any probability assigned to a parameter strictly involves a Bayesian prior probability.
• Bayes’ theorem: P(Theory | Data) ∝ P(Data | Theory) P(Theory)
�2 Fits and Confidence Contours
�2 =
X
i
(Ci � Pi)2
�(Ci)2 + �(Pi)2
+ �2(search constraints)
+X
i
(fobsSMi � f
fitSMi)
2
�(fSMi)20 200 400 600 800 1000 1200
0
200
400
600
800
1000
1200
with WMAP
without WMAP
m1/2
[GeV
]
m0 [GeV]
[arXiv:0808.4128v1]
We’ll discuss how one goes from the statistic on the left to the plot on theright.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 4 / 24“likelihood” “prior”
Maximum likelihood method
Ryan Reece
Maximum likelihood
6
Primer on the Maximum Likelihood Method
Consider a Gaussian distributed measurement:
f1(x|µ,�) =1p
2⇡�2exp
✓�(x� µ)2
2�2
◆
If we repeat the measurement, the joint PDF is just a product:
f(~x|µ,�) =Y
i
f1(xi|µ,�)
The likelihood function is the same function as the PDF, only thought ofas a function of the parameters, given the data. The experiment is over.
L(µ,�|~x) = f(~x|µ,�)
The likelihood principle states that the best estimate of the trueparameters are the values which maximize the likelihood.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 5 / 24
Ryan Reece
Maximum likelihood
7
It is often more convenient to consider the log likelihood, which has thesame maximum.
lnL = lnY
f1 =X
ln f1
=X
i
✓�1
2ln(2⇡�
2)� (xi � µ)2
2�2
◆
Maximize:@ lnL
@µ=
X
i
xi � µ
�2= 0
)X
i
(xi � µ) = 0, ) µ =1N
X
i
xi = x
Which agrees with our intuition that the best estimate of the mean of aGaussian is the sample mean.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 6 / 24
Ryan Reece
Maximum likelihood
8
Note that in the case of a Gaussian PDF, maximizing likelihood isequivalent to minimizing �
2.
lnL =X
i
✓�1
2ln(2⇡�
2)� (x� µ)2
2�2
◆
is maximized when
�2 =
X
i
(x� µ)2
�2
is minimized.
This was a simple example of what statisticians call point estimation.Now we would like to quantify our error on this estimate.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 7 / 24
Variance of MLEs
Ryan Reece
Variance
10
Primer on the Maximum Likelihood Method
Consider a Gaussian distributed measurement:
f1(x|µ,�) =1p
2⇡�2exp
✓�(x� µ)2
2�2
◆
If we repeat the measurement, the joint PDF is just a product:
f(~x|µ,�) =Y
i
f1(xi|µ,�)
The likelihood function is the same function as the PDF, only thought ofas a function of the parameters, given the data. The experiment is over.
L(µ,�|~x) = f(~x|µ,�)
The likelihood principle states that the best estimate of the trueparameters are the values which maximize the likelihood.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 5 / 24
It is often more convenient to consider the log likelihood, which has thesame maximum.
lnL = lnY
f1 =X
ln f1
=X
i
✓�1
2ln(2⇡�
2)� (xi � µ)2
2�2
◆
Maximize:@ lnL
@µ=
X
i
xi � µ
�2= 0
)X
i
(xi � µ) = 0, ) µ =1N
X
i
xi = x
Which agrees with our intuition that the best estimate of the mean of aGaussian is the sample mean.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 6 / 24
“True parameter”
Maximum Likelihood Estimator (MLE)
It is often more convenient to consider the log likelihood, which has thesame maximum.
lnL = lnY
f1 =X
ln f1
=X
i
✓�1
2ln(2⇡�
2)� (xi � µ)2
2�2
◆
Maximize:@ lnL
@µ=
X
i
xi � µ
�2= 0
)X
i
(xi � µ) = 0, ) µ =1N
X
i
xi = x
Which agrees with our intuition that the best estimate of the mean of aGaussian is the sample mean.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 6 / 24
Primer on the Maximum Likelihood Method
Consider a Gaussian distributed measurement:
f1(x|µ,�) =1p
2⇡�2exp
✓�(x� µ)2
2�2
◆
If we repeat the measurement, the joint PDF is just a product:
f(~x|µ,�) =Y
i
f1(xi|µ,�)
The likelihood function is the same function as the PDF, only thought ofas a function of the parameters, given the data. The experiment is over.
L(µ,�|~x) = f(~x|µ,�)
The likelihood principle states that the best estimate of the trueparameters are the values which maximize the likelihood.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 5 / 24
It is often more convenient to consider the log likelihood, which has thesame maximum.
lnL = lnY
f1 =X
ln f1
=X
i
✓�1
2ln(2⇡�
2)� (xi � µ)2
2�2
◆
Maximize:@ lnL
@µ=
X
i
xi � µ
�2= 0
)X
i
(xi � µ) = 0, ) µ =1N
X
i
xi = x
Which agrees with our intuition that the best estimate of the mean of aGaussian is the sample mean.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 6 / 24
Assuming this model, how confident are we that is close to ? ➡ What is the variance of ?
?
Ryan Reece
Variance
11
One would think that if the likelihood function varies rather slowly nearthe peak, then there is a wide range of values of the parameters that areconsistent with the data, and thus the estimate should have a large error.
To see the behavior of the likelihood function near the peak, consider theTaylor expansion of a general lnL of some parameter ✓, near its maximumlikelihood estimate ✓:
lnL(✓) = ln L(✓) +⇢
⇢⇢
⇢⇢>0
@ lnL
@✓
����✓
(✓ � ✓) +12!
@2 lnL
@✓2
����✓| {z }
�1/s2
(✓ � ✓)2 + · · ·
Dropping the remaining terms would imply that
L(✓) = L(✓) exp
�(✓ � ✓)2
2s2
!
Note that the lnL(✓) is parabolic.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 8 / 24
Ryan Reece
Variance
12
So if this were a good approximation, we would expect that the variance of✓ would be given by
V [✓] ⌘ �2✓
= s2 = �
✓@
2 lnL
@✓2
����✓
◆�1
It turns out that there is more truth to this than you would think, given byan important theorem in statistics, the Cramer-Rao Inequality:
V [✓] �✓
1 +@b
@✓
◆2
/E
� @
2 lnL
@✓2
����✓
�
An estimator’s e�ciency is defined to measure to what extent thisinequality is equivalent:
"[✓] ⌘1/E
h� @2 ln L
@✓2
���✓
i
V [✓]
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 9 / 24
Ryan Reece
Variance
13
It can be shown that in the large sample limit:
Maximum likelihood estimators are unbiased and 100% e�cient.
Therefore, in principle, one can calculate the variance of an ML estimatorwith
V [✓] = �✓
E
@
2 lnL
@✓2
����✓
�◆�1
Calculating the expectation value would involve an analytic integrationover the PDFs of all our possible measurements, or a Monte Carlosimulation of it. In practice, one usually uses the observed maximumlikelihood estimate as the expectation.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 10 / 24
Ryan Reece
Variance
14
Let’s go back to our simple example of a Gaussian likelihood to test thismethod of calculating the ML estimator’s variance.
V [µ] = �
@2 lnL
@µ2
����µ
!�1
@2 lnL
@µ2=
@2
@µ2
X
i
✓�1
2ln(2⇡�
2)� (xi � µ)2
2�2
◆
=@
@µ
X
i
xi � µ
�2=X
i
�1�2
=�N
�2
) V [µ] =�
2
N) �µ =
�pN
Which many of you will recognize as the proper error on the sample mean.If you are unfamiliar with it, we can actually derive it analytically in thiscase.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 11 / 24
Ryan Reece
Variance
15
V [x] = E[x2]�����*µ
E[x]2
= E
2
4
1N
X
i
xi
!0
@ 1N
X
j
xj
1
A
3
5� µ2
=1
N2E
2
4X
i6=j
xi xj +X
i
x2i
3
5� µ2
=1
N2
0
@X
i6=j����*µ
2
E[x]2 +X
i
E[x2]
1
A� µ2
To find E[x2], consider
V [x] = �2 = E[x2]� E[x]2 = E[x2]� µ
2
) E[x2] = �2 + µ
2
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 12 / 24
Analytic variance of a gaussian:
Primer on the Maximum Likelihood Method
Consider a Gaussian distributed measurement:
f1(x|µ,�) =1p
2⇡�2exp
✓�(x� µ)2
2�2
◆
If we repeat the measurement, the joint PDF is just a product:
f(~x|µ,�) =Y
i
f1(xi|µ,�)
The likelihood function is the same function as the PDF, only thought ofas a function of the parameters, given the data. The experiment is over.
L(µ,�|~x) = f(~x|µ,�)
The likelihood principle states that the best estimate of the trueparameters are the values which maximize the likelihood.
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 5 / 24
Ryan Reece
Variance
16
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Ryan Reece
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Variance by
17
Back to our Taylor expansion of lnL:
lnL(✓) = ln L(✓) +12!
@2 lnL
@✓2
����✓| {z }
�1/�2✓
(✓ � ✓)2 + · · ·
Let � lnL(✓) ⌘ lnL(✓)� lnL(✓)
� lnL(✓) ' �(✓ � ✓)2
2�2✓
✓ ! ✓ ± n�✓
� lnL(✓ ± n�✓) = �(±n�✓)
2
2�2✓
� lnL(✓ ± n�✓) = �n2
2
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 14 / 24
Ryan Reece
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Variance by
18
� lnL(✓ ± n�✓) = �n2
2
This is the most common definition of the 68% and 95% confidenceintervals:
68%/ 1 �✓: � lnL = �12
95%/ 2 �✓: � lnL = �2
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 15 / 24
Ryan Reece
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Variance by
19
Recall that in the case that the PDF is Gaussian, the lnL is just the �2
statistic.
lnL = ��2
2, �
2 =X (x� ✓)2
�2
� lnL(✓ ± n�✓) = lnL(✓ ± n�✓)� lnLmax = �n2
2
) �12
⇣�
2(✓ ± n�✓)� �2min
⌘= �n
2
2
��2(✓ ± n�✓) = n
2
68%/ 1 �✓: ��2 = 1
95%/ 2 �✓: ��2 = 4
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 16 / 24
(3.84 for 2σ)
Ryan Reece
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Variance by
20
Qc
c n=1 n=2 n=30.683 1.00 2.30 3.530.95 3.84 5.99 7.820.99 6.63 9.21 11.3
A note about confidence contours/intervals
A 95% confidence contour doesn’t mean that the true value of theparameter is in the contour with 95% probabilty. That would be aBayesian probabilty (a probability of belief). It means that if the model iscorrect, we have properly estimated our errors, and if we were to repeatthe experiment over and over again, each time creating a new contour,then the contour would cover the true parameter in 95% of theexperiments (a frequentist probability).
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 20 / 24
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
=
Multi-dimensional case
Example: measuring an
efficiency& A/B testing
Ryan Reece 22
EfficiencyBayesian Statistics: Mode Vs Mean
Louise Heelan Calc Efficiency Uncertainties 14
Out of n trials I measure k “conversions”
Estimate the conversionrate its precision/confidence.
Without a precision, we cannotknow if observed changes aresignificant.
Ryan Reece 23
EfficiencyMeasuring Occupancy ⇠ Measuring a Coin’sFairness
Binomial Distribution
f(k;n, p) =✓
n
k
◆pk (1� p)n�k
�k2 = n p (1� p)
�o2 =
⇣�k
n
⌘2=
p (1� p)n
p ⇡ 0.01 ) �o2 =
(0.01) (0.99)n
⇡ 0.01n
�o =1
10p
n
Ryan D. Reece (Penn) TRT Low Threshold Optimization [email protected] 19 / 28
MLE:
"
" =n
k
"
"If
Number of Events Needed for some Precision
for p = 0.01
n �o �o/p
400 0.0050 50%1,000 0.0032 32%2,000 0.0022 22%
10,000 0.0010 10%106 0.0001 1%
Ryan D. Reece (Penn) TRT Low Threshold Optimization [email protected] 20 / 28
" "
Ryan Reece 24
A/B testingExample taken from here: https://www.blitzresults.com/en/ab-tests/
Recall that in the case that the PDF is Gaussian, the lnL is just the �2
statistic.
lnL = ��2
2, �
2 =X (x� ✓)2
�2
� lnL(✓ ± n�✓) = lnL(✓ ± n�✓)� lnLmax = �n2
2
) �12
⇣�
2(✓ ± n�✓)� �2min
⌘= �n
2
2
��2(✓ ± n�✓) = n
2
68%/ 1 �✓: ��2 = 1
95%/ 2 �✓: ��2 = 4
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 16 / 24
Data like a 1-dim, 4-bin histogram
Construct
�2 =(N 0
A �NA ")2
NA "+
((NA �N 0A)�NA (1� "))2
NA (1� ")
+ (A ! B)
" =N 0
A
NA' N 0
B
NB' N 0
B +N 0B
NA +NB
Assume conversion rateunchanged from A to B:
Ryan Reece
A/B testing
25
= 7.52
Remember:
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
1σ (68%) : 12σ (95%) : 3.843σ (99%) : 6.63
→ >99% CL → significant improvement in B model
Plugin above values gives:
Ryan Reece 26
Hypothesis test
Ryan Reece
Take-aways• MLEs are the best
• Don’t just calculate MLE, find its variance!
• Quantify significance with a confidence interval
• Under many common assumptionsand one can calculate to determine confidence intervals/contours.
• In the simplest case of measuring an efficiency, A/B-testing amounts to a 4-term that can be calculated by hand.
• If the is large, the change is significant!27
Recall that in the case that the PDF is Gaussian, the lnL is just the �2
statistic.
lnL = ��2
2, �
2 =X (x� ✓)2
�2
� lnL(✓ ± n�✓) = lnL(✓ ± n�✓)� lnLmax = �n2
2
) �12
⇣�
2(✓ ± n�✓)� �2min
⌘= �n
2
2
��2(✓ ± n�✓) = n
2
68%/ 1 �✓: ��2 = 1
95%/ 2 �✓: ��2 = 4
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 16 / 24
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
1σ (68%) : 12σ (95%) : 3.843σ (99%) : 6.63
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Back up slides
Ryan Reece 29
EfficiencyResults
o example of different techniques for my top trigger analysis (105200, r635, ≈30kevents)o two examples of a particular bin (low and high statistics)
Method Numerator Denominator Mean (Mode) Variance Uncertainty σ
Poisson 1 45 0.0222 0.00050 0.02246Binomial 1 45 0.0222 0.00048 0.02197Bayesian 1 45 0.04255 (0.0222) 0.00085 0.02913
Method Numerator Denominator Mean (Mode) Variance Uncertainty σ
Poisson 100 106 0.9433 0.01729 0.13151Binomial 100 106 0.9433 0.00050 0.02244Bayesian 100 106 0.9352 (0.9433) 0.00056 0.02358
Louise Heelan Calc Efficiency Uncertainties 15
Other examples with numbers:
Ryan Reece
Hypothesis testing
30
• Null hypothesis, H0: the SM
• Alternative hypothesis, H1: some new physics
• Type-I error: false positive rate (α)
• Type-II error: false negative rate (β)
• Power: 1-β• Want to maximize power for a fixed false positive rate
• Particle physics has a tradition of claiming discovery at 5σ ⇒ p0 = 2.9⨉10-7 = 1 in 3.5 million, and presents exclusion with p0 = 5%, (95% CL “coverage”).
• Neyman-Pearson lemma (1933): the most powerful test for fixed α is the likelihood ratio:
Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009 76
The Neyman-Pearson Lemma
The region W that minimizes the probability of wrongly acceptingthe H0 is just a contour of the Likelihood Ratio:
L(x|H0)
L(x|H1)> kα
This is the goal!
The problem is we don’t have access to L(x|H0) & L(x|H1)
April 11, 2005
EFI High Energy Physics Seminar
Modern Data Analysis Techniques
for High Energy Physics (page 7)
Kyle Cranmer
Brookhaven National Laboratory
The Neyman-Pearson Lemma
Prediction via Monte Carlo Simulation
The enormous detectors are still being constructed, but we have detailedsimulations of the detectors response.
L(x|H0) =W
W
Hµ+
µ−
⊕
The advancements in theoretical predictions, detector simulation, tracking,calorimetry, triggering, and computing set the bar high for equivalentadvances in our statistical treatment of the data.
September 13, 2005
PhyStat2005, OxfordStatistical Challenges of the LHC (page 6) Kyle Cranmer
Brookhaven National Laboratory
Tuesday, February 3, 2009
Neyman
Ryan Reece 31
Power & Significance
Ryan Reece
) V [x] =1
N2
0
@X
i6=j
µ2 +
X
i
(�2 + µ2)
1
A� µ2
=1
N2
�(N2 �N)µ2 + N(�2 + µ
2)�� µ
2
=�
2
N
Which verifies the result we got from calculating derivatives of thelikelihood function.
V [✓] = �✓
@2 lnL
@✓2
����✓
◆�1
In practice, one usually doesn’t calculate this analytically, but instead:
calculates the derivatives numerically, or
uses the � lnL or ��2 method, described now
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24
Variance by
32
What if L(✓) is not Gaussian, i.e. lnL(✓) is not parabolic?
Likelihood functions have an invariance property, such that if g(x) is amonotonic function, then the maximum likelihood estimate of g(✓) is g(✓).In principle, one can find a change of variables function g(✓), for which thelnL(g(✓)) is parabolic as a function of g(✓). Therefore, using theinvariance of the likelihood function, one can make inferences about aparameter of a non-Gaussian likelihood function without actually findingsuch a transformation [James p. 234].
Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 17 / 24
Ryan Reece
Systematics
33from: http://philosophy-in-figures.tumblr.com/