Download - MLE, Conﬁdence Intervals, and Hypothesis Testingrreece.github.io/talks/pdf/2018-02-16-RReece-statistics-workshop-insight.pdfFeb 16, 2018 · Primer on statistics: MLE, Conﬁdence

Ryan [email protected]://rreece.github.io/

Feb 16, 2018Insight Data Science - AI Fellows Workshop

Primer on statistics:

MLE, Confidence Intervals, and Hypothesis Testing

mailto:[email protected]

http://rreece.github.io/

Ryan Reece

Outline

1. Maximum likelihood estimators

2. Variance of MLE → Confidence intervals

3.

4. Efficiency example

5. A/B testing

2

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N

Which verifies the result we got from calculating derivatives of thelikelihood function.

V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1

In practice, one usually doesn’t calculate this analytically, but instead:

calculates the derivatives numerically, or

uses the � lnL or ��2 method, described now

Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 13 / 24

Ryan Reece

4

[GeV]γγm100 110 120 130 140 150 160

Even

ts /

GeV

100

200

300

400

500

600

700

800 Data 2011

Total background

=125 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.9 fb∫ = 7 TeV, s

γγ→H

(a)

[GeV]llllm0 100 200 300 400 500 600

Even

ts /

10 G

eV

0

2

4

6

8

10

12

14Data 2011

Total background

=125 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.8 fb∫ = 7 TeV, s

4l→(*)ZZ→H

(b)

[GeV]llllm100 120 140 160 180 200 220 240

Even

ts /

5 G

eV

0

2

4

6

8

10

12 Data 2011

Total background

=125 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.8 fb∫ = 7 TeV, s

4l→(*)ZZ→H

(c)

[GeV]Tm200 300 400 500 600 700 800 900

Even

ts /

50 G

eV

0102030405060708090

Data 2011

Total background

=350 GeV, 1 x SMHm

ATLAS -1 Ldt = 4.7 fb∫ = 7 TeV, sννll→ZZ→H

(d)

[GeV]llbbm200 300 400 500 600 700 800

Even

ts /

18 G

eV

1

2

3

4

5

6

7

8Data 2011

Total background

=350 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.7 fb∫ = 7 TeV, s

llbb→ZZ→H

(e)

[GeV]lljjm200 300 400 500 600 700 800

Even

ts /

20 G

eV

020406080

100120140160180200

Data 2011

Total background

=350 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.7 fb∫ = 7 TeV, s

llqq→ZZ→H

(f)

[GeV]Tm60 80 100 120 140 160 180 200 220 240

Even

ts /

10 G

eV

0102030405060708090

100

-1 Ldt = 4.7 fb∫ = 7 TeV, s

ATLASData 2011

=125 GeV, 1 x SMHm

Total background

+0jνlνl→WW→H

(g)

[GeV]Tm60 80 100 120 140 160 180 200 220 240

Even

ts /

10 G

eV

0

5

10

15

20

25

30

-1 Ldt = 4.7 fb∫ = 7 TeV, s

ATLAS Data 2011

=125 GeV, 1xSMHm

Total background

+1jνlνl→WW→H

(h)

[GeV]Tm50 100 150 200 250 300 350 400 450

Even

ts /

10 G

eV

00.5

11.5

22.5

33.5

44.5

5

-1 Ldt = 4.7 fb∫ = 7 TeV, s

ATLAS Data 2011

=125 GeV, 1 x SMHm

Total background

+2jνlνl→WW→HNot final selection

(i)

[GeV]jjνlm300 400 500 600 700 800 900 1000

Even

ts /

10 G

eV

0

20

40

60

80

100

120

Data 2011

Total background

=350 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.7 fb∫ = 7 TeV, s

qq+0jνl→WW→H

(j)

[GeV]jjνlm300 400 500 600 700 800 900 1000

Even

ts /

10 G

eV

0

20

40

60

80

100

Data 2011

Total background

=350 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.7 fb∫ = 7 TeV, s

qq+1jνl→WW→H

(k)

[GeV]jjνlm300 400 500 600 700 800 900 1000

Even

ts /

10 G

eV

0

5

10

15

20

25

30

35

Data 2011

Total background

=350 GeV, 1 x SMHm

ATLAS

-1 Ldt = 4.7 fb∫ = 7 TeV, s

qq+2jνl→WW→H

(l)

FIG. 1. Invariant or transverse mass distributions for the selected candidate events, the total background and the signal expectedin the following channels: (a) H → γγ, (b) H → ZZ(∗) → ℓ+ℓ−ℓ+ℓ− in the entire mass range, (c) H → ZZ(∗) → ℓ+ℓ−ℓ+ℓ− inthe low mass range, (d) H → ZZ → ℓ+ℓ−νν, (e) b-tagged selection and (f) untagged selection for H → ZZ → ℓ+ℓ−qq, (g) H →WW (∗) → ℓ+νℓ−ν+0-jets, (h) H → WW (∗) → ℓ+νℓ−ν+1-jet, (i) H → WW (∗) → ℓ+νℓ−ν+2-jets, (j) H → WW → ℓνqq′+0-jets, (k) H → WW → ℓνqq′+1-jet and (l) H → WW → ℓνqq′+2-jets. The H → WW (∗) → ℓ+νℓ−ν+2-jets distribution isshown before the final selection requirements are applied.

Is this significant?

3[arxiv:1207.0319]

• How can we calculate the best-fit estimate of some parameter?

‣ Point estimation and confidence intervals

• How can we be precise and rigorous about how confident we are that a model is wrong?

‣ Hypothesis testing

3 events

Has a local p0 of ≈ 2%

Statistical questions:

Ryan Reece

Confidence Intervals

4

• A frequentist confidence interval is constructed such that, given the model, if the experiment were repeated, each time creating an interval, 95% (or other CL) of the intervals would contain the true population parameter (i.e. the interval has ≈95% coverage).

‣ They can be one-sided exclusions, e.g. X > 100.2 at 95% CL

‣ Two-sided measurements, e.g. X = 125.1 ± 0.2 at 68% CL

‣ Contours in 2 or more parameters

• This is not the same as saying “There is a 95% probability that the true parameter is in my interval”. Any probability assigned to a parameter strictly involves a Bayesian prior probability.

• Bayes’ theorem: P(Theory | Data) ∝ P(Data | Theory) P(Theory)

�2 Fits and Confidence Contours

�2 =

X

i

(Ci � Pi)2

�(Ci)2 + �(Pi)2

+ �2(search constraints)

+X

i

(fobsSMi � f

fitSMi)

2

�(fSMi)20 200 400 600 800 1000 1200

0

200

400

600

800

1000

1200

with WMAP

without WMAP

m1/2

[GeV

]

m0 [GeV]

[arXiv:0808.4128v1]

We’ll discuss how one goes from the statistic on the left to the plot on theright.

Ryan D. Reece (Penn) Likelihood Functions for SUSY [email protected] 4 / 24“likelihood” “prior”

Maximum likelihood method

Ryan Reece

Maximum likelihood

6

Primer on the Maximum Likelihood Method

Consider a Gaussian distributed measurement:

f1(x|µ,�) =1p

2⇡�2exp

✓�(x� µ)2

2�2

◆

If we repeat the measurement, the joint PDF is just a product:

f(~x|µ,�) =Y

i

f1(xi|µ,�)

The likelihood function is the same function as the PDF, only thought ofas a function of the parameters, given the data. The experiment is over.

L(µ,�|~x) = f(~x|µ,�)

The likelihood principle states that the best estimate of the trueparameters are the values which maximize the likelihood.


Ryan Reece

Maximum likelihood

7

It is often more convenient to consider the log likelihood, which has thesame maximum.

lnL = lnY

f1 =X

ln f1

=X

i

✓�1

2ln(2⇡�

2)� (xi � µ)2

2�2

◆

Maximize:@ lnL

@µ=

X

i

xi � µ

�2= 0

)X

i

(xi � µ) = 0, ) µ =1N

X

i

xi = x

Which agrees with our intuition that the best estimate of the mean of aGaussian is the sample mean.


Ryan Reece

Maximum likelihood

8

Note that in the case of a Gaussian PDF, maximizing likelihood isequivalent to minimizing �

2.

lnL =X

i

✓�1

2ln(2⇡�

2)� (x� µ)2

2�2

◆

is maximized when

�2 =

X

i

(x� µ)2

�2

is minimized.

This was a simple example of what statisticians call point estimation.Now we would like to quantify our error on this estimate.


Variance of MLEs

Ryan Reece

Variance

10



f1(x|µ,�) =1p

2⇡�2exp

✓�(x� µ)2

2�2

◆


f(~x|µ,�) =Y

i

f1(xi|µ,�)


L(µ,�|~x) = f(~x|µ,�)




lnL = lnY

f1 =X

ln f1

=X

i

✓�1

2ln(2⇡�

2)� (xi � µ)2

2�2

◆

Maximize:@ lnL

@µ=

X

i

xi � µ

�2= 0

)X

i

(xi � µ) = 0, ) µ =1N

X

i

xi = x



“True parameter”

Maximum Likelihood Estimator (MLE)


lnL = lnY

f1 =X

ln f1

=X

i

✓�1

2ln(2⇡�

2)� (xi � µ)2

2�2

◆

Maximize:@ lnL

@µ=

X

i

xi � µ

�2= 0

)X

i

(xi � µ) = 0, ) µ =1N

X

i

xi = x





f1(x|µ,�) =1p

2⇡�2exp

✓�(x� µ)2

2�2

◆


f(~x|µ,�) =Y

i

f1(xi|µ,�)


L(µ,�|~x) = f(~x|µ,�)




lnL = lnY

f1 =X

ln f1

=X

i

✓�1

2ln(2⇡�

2)� (xi � µ)2

2�2

◆

Maximize:@ lnL

@µ=

X

i

xi � µ

�2= 0

)X

i

(xi � µ) = 0, ) µ =1N

X

i

xi = x



Assuming this model, how confident are we that is close to ? ➡ What is the variance of ?

?

Ryan Reece

Variance

11

One would think that if the likelihood function varies rather slowly nearthe peak, then there is a wide range of values of the parameters that areconsistent with the data, and thus the estimate should have a large error.

To see the behavior of the likelihood function near the peak, consider theTaylor expansion of a general lnL of some parameter ✓, near its maximumlikelihood estimate ✓:

lnL(✓) = ln L(✓) +⇢

⇢⇢

⇢⇢>0

@ lnL

@✓

��✓

(✓ � ✓) +12!

@2 lnL

@✓2

��✓| {z }

�1/s2

(✓ � ✓)2 + · · ·

Dropping the remaining terms would imply that

L(✓) = L(✓) exp

�(✓ � ✓)2

2s2

!

Note that the lnL(✓) is parabolic.


Ryan Reece

Variance

12

So if this were a good approximation, we would expect that the variance of✓ would be given by

V [✓] ⌘ �2✓

= s2 = �

✓@

2 lnL

@✓2

��✓

◆�1

It turns out that there is more truth to this than you would think, given byan important theorem in statistics, the Cramer-Rao Inequality:

V [✓] �✓

1 +@b

@✓

◆2

/E

� @

2 lnL

@✓2

��✓

�

An estimator’s e�ciency is defined to measure to what extent thisinequality is equivalent:

"[✓] ⌘1/E

h� @2 ln L

@✓2

��✓

i

V [✓]


Ryan Reece

Variance

13

It can be shown that in the large sample limit:

Maximum likelihood estimators are unbiased and 100% e�cient.

Therefore, in principle, one can calculate the variance of an ML estimatorwith

V [✓] = �✓

E

@

2 lnL

@✓2

��✓

�◆�1

Calculating the expectation value would involve an analytic integrationover the PDFs of all our possible measurements, or a Monte Carlosimulation of it. In practice, one usually uses the observed maximumlikelihood estimate as the expectation.

V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1


Ryan Reece

Variance

14

Let’s go back to our simple example of a Gaussian likelihood to test thismethod of calculating the ML estimator’s variance.

V [µ] = �

@2 lnL

@µ2

��µ

!�1

@2 lnL

@µ2=

@2

@µ2

X

i

✓�1

2ln(2⇡�

2)� (xi � µ)2

2�2

◆

=@

@µ

X

i

xi � µ

�2=X

i

�1�2

=�N

�2

) V [µ] =�

2

N) �µ =

�pN

Which many of you will recognize as the proper error on the sample mean.If you are unfamiliar with it, we can actually derive it analytically in thiscase.


Ryan Reece

Variance

15

V [x] = E[x2]��*µ

E[x]2

= E

2

4

1N

X

i

xi

!0

@ 1N

X

j

xj

1

A

3

5� µ2

=1

N2E

2

4X

i6=j

xi xj +X

i

x2i

3

5� µ2

=1

N2

0

@X

i6=j��*µ

2

E[x]2 +X

i

E[x2]

1

A� µ2

To find E[x2], consider

V [x] = �2 = E[x2]� E[x]2 = E[x2]� µ

2

) E[x2] = �2 + µ

2


Analytic variance of a gaussian:



f1(x|µ,�) =1p

2⇡�2exp

✓�(x� µ)2

2�2

◆


f(~x|µ,�) =Y

i

f1(xi|µ,�)


L(µ,�|~x) = f(~x|µ,�)



Ryan Reece

Variance

16

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





Ryan Reece

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





Variance by

17

Back to our Taylor expansion of lnL:

lnL(✓) = ln L(✓) +12!

@2 lnL

@✓2

��✓| {z }

�1/�2✓

(✓ � ✓)2 + · · ·

Let � lnL(✓) ⌘ lnL(✓)� lnL(✓)

� lnL(✓) ' �(✓ � ✓)2

2�2✓

✓ ! ✓ ± n�✓

� lnL(✓ ± n�✓) = �(±n�✓)

2

2�2✓

� lnL(✓ ± n�✓) = �n2

2


Ryan Reece

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





Variance by

18

� lnL(✓ ± n�✓) = �n2

2

This is the most common definition of the 68% and 95% confidenceintervals:

68%/ 1 �✓: � lnL = �12

95%/ 2 �✓: � lnL = �2


Ryan Reece

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





Variance by

19

Recall that in the case that the PDF is Gaussian, the lnL is just the �2

statistic.

lnL = ��2

2, �

2 =X (x� ✓)2

�2

� lnL(✓ ± n�✓) = lnL(✓ ± n�✓)� lnLmax = �n2

2

) �12

⇣�

2(✓ ± n�✓)� �2min

⌘= �n

2

2

��2(✓ ± n�✓) = n

2

68%/ 1 �✓: ��2 = 1

95%/ 2 �✓: ��2 = 4


(3.84 for 2σ)

Ryan Reece

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





Variance by

20

Qc

c n=1 n=2 n=30.683 1.00 2.30 3.530.95 3.84 5.99 7.820.99 6.63 9.21 11.3

A note about confidence contours/intervals

A 95% confidence contour doesn’t mean that the true value of theparameter is in the contour with 95% probabilty. That would be aBayesian probabilty (a probability of belief). It means that if the model iscorrect, we have properly estimated our errors, and if we were to repeatthe experiment over and over again, each time creating a new contour,then the contour would cover the true parameter in 95% of theexperiments (a frequentist probability).


) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





=

Multi-dimensional case

Example: measuring an

efficiency& A/B testing

Ryan Reece 22

EfficiencyBayesian Statistics: Mode Vs Mean

Louise Heelan Calc Efficiency Uncertainties 14

Out of n trials I measure k “conversions”

Estimate the conversionrate its precision/confidence.

Without a precision, we cannotknow if observed changes aresignificant.

Ryan Reece 23

EfficiencyMeasuring Occupancy ⇠ Measuring a Coin’sFairness

Binomial Distribution

f(k;n, p) =✓

n

k

◆pk (1� p)n�k

�k2 = n p (1� p)

�o2 =

⇣�k

n

⌘2=

p (1� p)n

p ⇡ 0.01 ) �o2 =

(0.01) (0.99)n

⇡ 0.01n

�o =1

10p

n

Ryan D. Reece (Penn) TRT Low Threshold Optimization [email protected] 19 / 28

MLE:

"

" =n

k

"

"If

Number of Events Needed for some Precision

for p = 0.01

n �o �o/p

400 0.0050 50%1,000 0.0032 32%2,000 0.0022 22%

10,000 0.0010 10%106 0.0001 1%

Ryan D. Reece (Penn) TRT Low Threshold Optimization [email protected] 20 / 28

" "

Ryan Reece 24

A/B testingExample taken from here: https://www.blitzresults.com/en/ab-tests/


statistic.

lnL = ��2

2, �

2 =X (x� ✓)2

�2


2

) �12

⇣�

2(✓ ± n�✓)� �2min

⌘= �n

2

2

��2(✓ ± n�✓) = n

2

68%/ 1 �✓: ��2 = 1

95%/ 2 �✓: ��2 = 4


Data like a 1-dim, 4-bin histogram

Construct

�2 =(N 0

A �NA ")2

NA "+

((NA �N 0A)�NA (1� "))2

NA (1� ")

+ (A ! B)

" =N 0

A

NA' N 0

B

NB' N 0

B +N 0B

NA +NB

Assume conversion rateunchanged from A to B:

Ryan Reece

A/B testing

25

= 7.52

Remember:

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





1σ (68%) : 12σ (95%) : 3.843σ (99%) : 6.63

→ >99% CL → significant improvement in B model

Plugin above values gives:

Ryan Reece 26

Hypothesis test

Ryan Reece

Take-aways• MLEs are the best

• Don’t just calculate MLE, find its variance!

• Quantify significance with a confidence interval

• Under many common assumptionsand one can calculate to determine confidence intervals/contours.

• In the simplest case of measuring an efficiency, A/B-testing amounts to a 4-term that can be calculated by hand.

• If the is large, the change is significant!27


statistic.

lnL = ��2

2, �

2 =X (x� ✓)2

�2


2

) �12

⇣�

2(✓ ± n�✓)� �2min

⌘= �n

2

2

��2(✓ ± n�✓) = n

2

68%/ 1 �✓: ��2 = 1

95%/ 2 �✓: ��2 = 4


) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





1σ (68%) : 12σ (95%) : 3.843σ (99%) : 6.63

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





Back up slides

Ryan Reece 29

EfficiencyResults

o example of different techniques for my top trigger analysis (105200, r635, ≈30kevents)o two examples of a particular bin (low and high statistics)

Method Numerator Denominator Mean (Mode) Variance Uncertainty σ

Poisson 1 45 0.0222 0.00050 0.02246Binomial 1 45 0.0222 0.00048 0.02197Bayesian 1 45 0.04255 (0.0222) 0.00085 0.02913

Method Numerator Denominator Mean (Mode) Variance Uncertainty σ

Poisson 100 106 0.9433 0.01729 0.13151Binomial 100 106 0.9433 0.00050 0.02244Bayesian 100 106 0.9352 (0.9433) 0.00056 0.02358

Louise Heelan Calc Efficiency Uncertainties 15

Other examples with numbers:

Ryan Reece

Hypothesis testing

30

• Null hypothesis, H0: the SM

• Alternative hypothesis, H1: some new physics

• Type-I error: false positive rate (α)

• Type-II error: false negative rate (β)

• Power: 1-β• Want to maximize power for a fixed false positive rate

• Particle physics has a tradition of claiming discovery at 5σ ⇒ p0 = 2.9⨉10-7 = 1 in 3.5 million, and presents exclusion with p0 = 5%, (95% CL “coverage”).

• Neyman-Pearson lemma (1933): the most powerful test for fixed α is the likelihood ratio:

Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009 76

The Neyman-Pearson Lemma

The region W that minimizes the probability of wrongly acceptingthe H0 is just a contour of the Likelihood Ratio:

L(x|H0)

L(x|H1)> kα

This is the goal!

The problem is we don’t have access to L(x|H0) & L(x|H1)

April 11, 2005

EFI High Energy Physics Seminar

Modern Data Analysis Techniques

for High Energy Physics (page 7)

Kyle Cranmer

Brookhaven National Laboratory

The Neyman-Pearson Lemma

Prediction via Monte Carlo Simulation

The enormous detectors are still being constructed, but we have detailedsimulations of the detectors response.

L(x|H0) =W

W

Hµ+

µ−

⊕

The advancements in theoretical predictions, detector simulation, tracking,calorimetry, triggering, and computing set the bar high for equivalentadvances in our statistical treatment of the data.

September 13, 2005

PhyStat2005, OxfordStatistical Challenges of the LHC (page 6) Kyle Cranmer

Brookhaven National Laboratory

Tuesday, February 3, 2009

Neyman

Ryan Reece 31

Power & Significance

Ryan Reece

) V [x] =1

N2

0

@X

i6=j

µ2 +

X

i

(�2 + µ2)

1

A� µ2

=1

N2

�(N2 �N)µ2 + N(�2 + µ

2)�� µ

2

=�

2

N


V [✓] = �✓

@2 lnL

@✓2

��✓

◆�1





Variance by

32

What if L(✓) is not Gaussian, i.e. lnL(✓) is not parabolic?

Likelihood functions have an invariance property, such that if g(x) is amonotonic function, then the maximum likelihood estimate of g(✓) is g(✓).In principle, one can find a change of variables function g(✓), for which thelnL(g(✓)) is parabolic as a function of g(✓). Therefore, using theinvariance of the likelihood function, one can make inferences about aparameter of a non-Gaussian likelihood function without actually findingsuch a transformation [James p. 234].


Ryan Reece

Systematics

33from: http://philosophy-in-figures.tumblr.com/

Download - MLE, Conﬁdence Intervals, and Hypothesis Testingrreece.github.io/talks/pdf/2018-02-16-RReece-statistics-workshop-insight.pdfFeb 16, 2018 · Primer on statistics: MLE, Conﬁdence

Top Related