seminar applied mathematical statistics - tu … · seminar applied mathematical statistics...
TRANSCRIPT
SeminarApplied Mathematical Statistics
Contiguity, Local Asymptotic Normality, Likelihood Ratio Tests
Johannes Musebeck
TU Kaiserslautern
13.02.2015
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 1 / 41
Contents
1 Chapter 6: Contiguity
2 Chapter 7: Local Asymptotic NormalityMaximum Likelihood
3 Chapter 16: Likelihood Ratio Tests
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 2 / 41
Contiguity
Contiguity
Contiguity
Abbreviation for asymptotic absolute continuity.
Technic to obtain the limit distribution of a sequence of statistics.
Recall
Let P and Q be two measures on the measurable space (Ω,A).
Q is absolutely continuous w.r.t. P if P(A) = 0 implies Q(A) = 0 for allA ∈ A. This is denoted by Q P.
P and Q are orthogonal if Ω can be partitioned as Ω = ΩP ∪ ΩQ withΩP ∩ ΩQ = ∅ such that P(ΩQ) = Q(ΩP) = 0. This is denoted by P⊥Q.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 3 / 41
Contiguity
Contiguity
Contiguity
Abbreviation for asymptotic absolute continuity.
Technic to obtain the limit distribution of a sequence of statistics.
Recall
Let P and Q be two measures on the measurable space (Ω,A).
Q is absolutely continuous w.r.t. P if P(A) = 0 implies Q(A) = 0 for allA ∈ A. This is denoted by Q P.
P and Q are orthogonal if Ω can be partitioned as Ω = ΩP ∪ ΩQ withΩP ∩ ΩQ = ∅ such that P(ΩQ) = Q(ΩP) = 0. This is denoted by P⊥Q.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 3 / 41
Contiguity
Contiguity
Suppose that the measures P and Q have densities p and q w.r.t. a measure µ.
Definition (Lebesgue decomposition)
The measure Q can be written as Q = Qa + Q⊥ where Qa(A) = Q(A ∩ p > 0)is called the absolutely continuous part and Q⊥(A) = Q(A ∩ p = 0) is calledthe orthogonal part of Q w.r.t. P.
LemmaLet P and Q be probability measures with densities p and q w.r.t µ.Then we have:
(i) Q = Qa + Q⊥ where Qa P and Q⊥ ⊥ P.
(ii) Qa(A) =∫A
qp dP for every A ∈ A.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 4 / 41
Contiguity
Contiguity
Suppose that the measures P and Q have densities p and q w.r.t. a measure µ.
Definition (Lebesgue decomposition)
The measure Q can be written as Q = Qa + Q⊥ where Qa(A) = Q(A ∩ p > 0)is called the absolutely continuous part and Q⊥(A) = Q(A ∩ p = 0) is calledthe orthogonal part of Q w.r.t. P.
LemmaLet P and Q be probability measures with densities p and q w.r.t µ.Then we have:
(i) Q = Qa + Q⊥ where Qa P and Q⊥ ⊥ P.
(ii) Qa(A) =∫A
qp dP for every A ∈ A.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 4 / 41
Contiguity
Contiguity
Likelihood Ratio
The function qp is a density of Qa with respect to P. It is denoted by dQ
dP .
The random variable dQdP : Ω→ [0,∞) is called Radon-Nikodym density or
likelihood ratio.
Note that for any P and Q and nonnegative measurable function f we have∫f dQ ≥
∫p>0
fq dµ =
∫p>0
fq
pp dµ =
∫f
dQdP
dP.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 5 / 41
Contiguity
Contiguity
Likelihood Ratio
The function qp is a density of Qa with respect to P. It is denoted by dQ
dP .
The random variable dQdP : Ω→ [0,∞) is called Radon-Nikodym density or
likelihood ratio.
Note that for any P and Q and nonnegative measurable function f we have∫f dQ ≥
∫p>0
fq dµ =
∫p>0
fq
pp dµ =
∫f
dQdP
dP.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 5 / 41
Contiguity
Contiguity
Consider (Ωn,An) measurable spaces equipped with probability measures Pn,Qn
and random vectors Xn : Ωn → Rk .
GoalDerive a Qn-limit law of Xn from a Pn-limit law.
Non-asymptotic situation
Let Q be absolutely continuous w.r.t P and X : Ω→ Rk . Then
EQ[f (X )] = EP
[f (X )
dQdP
].
In the asymptotic case we need Qn to be asymptotically absolutely continuouswith respect to Pn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 6 / 41
Contiguity
Contiguity
Consider (Ωn,An) measurable spaces equipped with probability measures Pn,Qn
and random vectors Xn : Ωn → Rk .
GoalDerive a Qn-limit law of Xn from a Pn-limit law.
Non-asymptotic situation
Let Q be absolutely continuous w.r.t P and X : Ω→ Rk . Then
EQ[f (X )] = EP
[f (X )
dQdP
].
In the asymptotic case we need Qn to be asymptotically absolutely continuouswith respect to Pn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 6 / 41
Contiguity
Contiguity
Consider (Ωn,An) measurable spaces equipped with probability measures Pn,Qn
and random vectors Xn : Ωn → Rk .
GoalDerive a Qn-limit law of Xn from a Pn-limit law.
Non-asymptotic situation
Let Q be absolutely continuous w.r.t P and X : Ω→ Rk . Then
EQ[f (X )] = EP
[f (X )
dQdP
].
In the asymptotic case we need Qn to be asymptotically absolutely continuouswith respect to Pn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 6 / 41
Contiguity
Contiguity
DefinitionThe sequence Qn is called contiguous w.r.t. the sequence Pn if
Pn(An)→ 0 implies Qn(An)→ 0
for every sequence of measurable sets An ∈ An. We write Qn / Pn.The sequences are mutually contiguous if both Pn /Qn and Qn / Pn.This is denoted by Pn / .Qn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 7 / 41
Contiguity
Characterization of contiguity
Lemma (Le Cam’s first lemma)
Let Pn and Qn be probability measures on (Ωn,An). Then the following areequivalent:
(i) Qn / Pn.
(ii) If dPn
dQn
Qn=⇒ U along a subsequence, then P(U > 0) = 1.
(iii) If dQn
dPn
Pn=⇒ V along a subsequence, then E[V ] = 1.
(iv) For any statistics Tn : Ωn → Rk : If TnPn−→ 0, then Tn
Qn−→ 0.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 8 / 41
Contiguity
Example (Aymptotic log normality)
Let Pn and Qn be probability measures sucht that
dPn
dQn
Qn=⇒ eN (µ,σ2).
Then Qn / Pn due to Le Cam’s first lemma and because of
P(eN (µ,σ2) = 0) = 0.
Furthermore Pn /Qn if and only if E[eN (µ,σ2)
]= eµ+σ2
2 = 1.
(remember the moment generating function: MX (t) := E[et·X
]If X ∼ N (µ, σ2), then MX (t) = eµt+σ2t2
2 .)
⇒ Qn / .Pn if and only if µ = −σ2
2.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 9 / 41
Contiguity
Le Cam’s third lemma
Central resultObtaining a Qn-limit law from a Pn-limit law.
TheoremLet Pn and Qn be sequences of probability measures on measurables spaces(Ωn,An) and let Xn : Ωn → Rk be a sequence of random vectors. Suppose thatQn / Pn and (
Xn,dQn
dPn
)Pn=⇒ (X ,V ).
Then L(B) = E[1B(X )V ] defines a probability measure and XnQn=⇒ L.
Proof.On the board.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 10 / 41
Contiguity
Le Cam’s third lemma
Example (Le Cam’s third lemma)
If we have (Xn, log
dQn
dPn
)Pn=⇒ Nk+1
((µ− 1
2σ2
),
(Σ ττT σ2
)),
thenXn
Qn=⇒ Nk(µ+ τ,Σ).
special case of previous theorem
Assume that we are given(
Xn, log dQn
dPn
)Pn=⇒ (X ,W ) where (X ,W ) have the
(k + 1)-dimensional normal distribution. Then, by continuous mapping principle(Xn,
dQn
dPn
)Pn=⇒ (X , eW ).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 11 / 41
Contiguity
Le Cam’s third lemma
Example (Le Cam’s third lemma)
If we have (Xn, log
dQn
dPn
)Pn=⇒ Nk+1
((µ− 1
2σ2
),
(Σ ττT σ2
)),
thenXn
Qn=⇒ Nk(µ+ τ,Σ).
special case of previous theorem
Assume that we are given(
Xn, log dQn
dPn
)Pn=⇒ (X ,W ) where (X ,W ) have the
(k + 1)-dimensional normal distribution. Then, by continuous mapping principle(Xn,
dQn
dPn
)Pn=⇒ (X , eW ).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 11 / 41
Contiguity
Le Cam’s third lemma
example continued
Notice that Pn / .Qn because of W ∼ N(−σ
2
2 , σ2)
.
Using the theorem it follows XnQn=⇒ L with L(B) = E[1B(X )ew ]. The
characteristic function of L is given by
L(t) =
∫eitT x L(dx) = E
[eitTX eW
]= E
[exp
(i
(t−i
)T (XW
))],
which is the characteristic function of our (k + 1)-dimensional normal vector(X ,W ) at (t,−i)T .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 12 / 41
Contiguity
Le Cam’s third lemma
Reminder
The characteristic function of Nk(µ,Σ) is exp(itTµ− 1
2 tTΣt).
example continued
Nk+1
((µ
−σ2
2
),
(Σ ττT σ2
))(t−i
)= exp
(itTµ− 1
2σ2 − 1
2(tT ,−i)
(Σ ττ t σ2
)(t−i
))= exp
(itT (µ+ τ)− 1
2tTΣt
)= Nk(µ+ τ,Σ)(t).
Since a distribution is uniquely determined by its characteristic function the claimfollows.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 13 / 41
Local Asymptotic Normality
Local Asymptotic Normality
A sequence of statistical models is locally asymptotically normal if, asymptotically,their likelihood ratio processes are similar to those for a normal model.
Model/Experiment
Consider an i.i.d. sample X1, ...,Xn from a distribution Pθ on a measurable space(X ,A) where θ lies in an open subset Θ of Rk . Then X = (X1, ...,Xn)T is asingle observation from Pn
θ in the sample space (X n,An).→ The experiment can completely be described by (Pn
θ : θ ∈ Θ).
GoalApproximation of this statistical experiment by a Gaussian experiment after asuitable reparametrization.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 14 / 41
Local Asymptotic Normality
Local Asymptotic Normality
A sequence of statistical models is locally asymptotically normal if, asymptotically,their likelihood ratio processes are similar to those for a normal model.
Model/Experiment
Consider an i.i.d. sample X1, ...,Xn from a distribution Pθ on a measurable space(X ,A) where θ lies in an open subset Θ of Rk . Then X = (X1, ...,Xn)T is asingle observation from Pn
θ in the sample space (X n,An).→ The experiment can completely be described by (Pn
θ : θ ∈ Θ).
GoalApproximation of this statistical experiment by a Gaussian experiment after asuitable reparametrization.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 14 / 41
Local Asymptotic Normality
Local Asymptotic Normality
A sequence of statistical models is locally asymptotically normal if, asymptotically,their likelihood ratio processes are similar to those for a normal model.
Model/Experiment
Consider an i.i.d. sample X1, ...,Xn from a distribution Pθ on a measurable space(X ,A) where θ lies in an open subset Θ of Rk . Then X = (X1, ...,Xn)T is asingle observation from Pn
θ in the sample space (X n,An).→ The experiment can completely be described by (Pn
θ : θ ∈ Θ).
GoalApproximation of this statistical experiment by a Gaussian experiment after asuitable reparametrization.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 14 / 41
Local Asymptotic Normality
Reparametrization
Define local parameter h =√
n (θ − θ0) with fixed θ0.
Rewrite Pnθ as Pn
θ0+h/√n
and consider experiments with parameter h.
We will see that (Pnθ0+h
√n
: h ∈ Rk) and(N (h, I−1
θ0) : h ∈ Rk
)have similar
statistical properties for large n.
local parameter set Hn =√n (Θ− θ0).
We take Hn equal to Rk , since if
1 Θ = Rk ⇒ Hn = Rk .
2 Θ ⊂ Rk ⇒ Hn 6= Rk , but if θ0 is an inner point of Θ, then 0 is an inner pointof the set (Θ− θ0). ⇒ Hn converges to Rk for n→∞.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 15 / 41
Local Asymptotic Normality
Reparametrization
Define local parameter h =√
n (θ − θ0) with fixed θ0.
Rewrite Pnθ as Pn
θ0+h/√n
and consider experiments with parameter h.
We will see that (Pnθ0+h
√n
: h ∈ Rk) and(N (h, I−1
θ0) : h ∈ Rk
)have similar
statistical properties for large n.
local parameter set Hn =√n (Θ− θ0).
We take Hn equal to Rk , since if
1 Θ = Rk ⇒ Hn = Rk .
2 Θ ⊂ Rk ⇒ Hn 6= Rk , but if θ0 is an inner point of Θ, then 0 is an inner pointof the set (Θ− θ0). ⇒ Hn converges to Rk for n→∞.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 15 / 41
Local Asymptotic Normality
Score function
Let pθ be an density of Pθ and assume that the log likelihood `θ(x) = log pθ(x) istwice-diffenrentiable with respect to θ.The first derivative ˙
θ(x) = ∂∂θ log pθ(x) is called the score function.
Moments of the score function
Eθ[ ˙θ] =
∫˙θpθdµ =
∫pθpθ
pθdµ =
∫pθdµ =
∂
∂θ
∫pθdµ =
∂
∂θ1 = 0.
Eθ[¨θ] =
∫¨θpθdµ =
∫ (pθpθ− pθpT
θ
p2θ
)pθdµ
=
∫pθdµ−
∫˙θ
˙Tθ pθdµ =
∂
∂θ
∫pθdµ− Eθ[ ˙
θ˙Tθ ] = −Iθ
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 16 / 41
Local Asymptotic Normality
Score function
Let pθ be an density of Pθ and assume that the log likelihood `θ(x) = log pθ(x) istwice-diffenrentiable with respect to θ.The first derivative ˙
θ(x) = ∂∂θ log pθ(x) is called the score function.
Moments of the score function
Eθ[ ˙θ] =
∫˙θpθdµ =
∫pθpθ
pθdµ =
∫pθdµ =
∂
∂θ
∫pθdµ =
∂
∂θ1 = 0.
Eθ[¨θ] =
∫¨θpθdµ =
∫ (pθpθ− pθpT
θ
p2θ
)pθdµ
=
∫pθdµ−
∫˙θ
˙Tθ pθdµ =
∂
∂θ
∫pθdµ− Eθ[ ˙
θ˙Tθ ] = −Iθ
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 16 / 41
Local Asymptotic Normality
Expanding the Likelihood
Assume that θ is one-dimensional. A Taylor expansion of the log likelihood ratioaround θ yields
logpθ+h
pθ(x) = log pθ+h(x)− log pθ(x) = `θ+h(x)− `θ(x)
= h ˙θ(x) +
1
2h2 ¨
θ(x) + ox(h2).
From this it follows that
logdPn
θ+h/√n
dPnθ
(X ) = logn∏
i=1
pθ+h/√n
pθ(Xi ) =
n∑i=1
logpθ+h/
√n
pθ(Xi )
=h√n
n∑i=1
˙θ(Xi ) +
h2
2n
n∑i=1
¨θ(Xi ) + Remn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 17 / 41
Local Asymptotic Normality
Expanding the Likelihood
Assume that θ is one-dimensional. A Taylor expansion of the log likelihood ratioaround θ yields
logpθ+h
pθ(x) = log pθ+h(x)− log pθ(x) = `θ+h(x)− `θ(x)
= h ˙θ(x) +
1
2h2 ¨
θ(x) + ox(h2).
From this it follows that
logdPn
θ+h/√n
dPnθ
(X ) = logn∏
i=1
pθ+h/√n
pθ(Xi ) =
n∑i=1
logpθ+h/
√n
pθ(Xi )
=h√n
n∑i=1
˙θ(Xi ) +
h2
2n
n∑i=1
¨θ(Xi ) + Remn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 17 / 41
Local Asymptotic Normality
Expanding the Likelihood
Using Central Limit Theorem:
1√n
n∑i=1
(˙θ(Xi )− E[ ˙
θ(Xi )])
=⇒ N (0,Cov[ ˙θ(X1)]).
Because of E[ ˙θ(Xi )] = 0 and Cov[ ˙
θ(X1)] = E[ ˙θ(X1)2] = Iθ we have
1√n
n∑i=1
˙θ(Xi ) =⇒ N (0, Iθ).
By the Law of Large Numbers:
1
n
n∑i=1
¨θ(Xi ) =⇒ E
[¨θ(X1)
]= −Iθ.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 18 / 41
Local Asymptotic Normality
Expanding the Likelihood
Using Central Limit Theorem:
1√n
n∑i=1
(˙θ(Xi )− E[ ˙
θ(Xi )])
=⇒ N (0,Cov[ ˙θ(X1)]).
Because of E[ ˙θ(Xi )] = 0 and Cov[ ˙
θ(X1)] = E[ ˙θ(X1)2] = Iθ we have
1√n
n∑i=1
˙θ(Xi ) =⇒ N (0, Iθ).
By the Law of Large Numbers:
1
n
n∑i=1
¨θ(Xi ) =⇒ E
[¨θ(X1)
]= −Iθ.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 18 / 41
Local Asymptotic Normality
Expanding the Likelihood
Asymptotically we get
logn∏
i=1
pθ+h/√n
pθ(Xi ) =
h√n
n∑i=1
˙θ(Xi ) +
h2
2n
n∑i=1
¨θ(Xi ) + Remn
Pθ=⇒ hN (0, Iθ)− h2
2Iθ = N
(−h2
2Iθ, h
2Iθ
).
for every h.
Conclusionexpansion of the likelihood process in a neighborhood of θ→ local asymptotic normality.
We will see that the likelihood process of a normal experiment has a similarform.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 19 / 41
Local Asymptotic Normality
Expanding the Likelihood
Asymptotically we get
logn∏
i=1
pθ+h/√n
pθ(Xi ) =
h√n
n∑i=1
˙θ(Xi ) +
h2
2n
n∑i=1
¨θ(Xi ) + Remn
Pθ=⇒ hN (0, Iθ)− h2
2Iθ = N
(−h2
2Iθ, h
2Iθ
).
for every h.
Conclusionexpansion of the likelihood process in a neighborhood of θ→ local asymptotic normality.
We will see that the likelihood process of a normal experiment has a similarform.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 19 / 41
Local Asymptotic Normality
DefinitionThe mapping θ 7→ √pθ is called differentiable in quadratic mean if there exists a
vector of measurable functions ˙θ = ( ˙
θ,1, ..., ˙θ,k)T such that∫ [
√pθ+h −
√pθ −
1
2hT ˙
θ√
pθ
]2
dµ = o(||h||2) for h→ 0
In this case, the model (Pθ : θ ∈ Θ) is called differentiable in quadratic mean at θ.
Lemma
For every θ in an open subset of Rk let pθ be the density of Pθ w.r.t µ. If the mapθ 7→
√pθ(x) is continuously differentiable for every x and the elements of the
matrix Iθ =
∫ (pθpθ
)(pTθ
pθ
)pθdµ are well defined and continuous in θ, then the
map θ 7→ √pθ is differentiable in quadratic mean with ˙θ = pθ/pθ.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 20 / 41
Local Asymptotic Normality
DefinitionThe mapping θ 7→ √pθ is called differentiable in quadratic mean if there exists a
vector of measurable functions ˙θ = ( ˙
θ,1, ..., ˙θ,k)T such that∫ [
√pθ+h −
√pθ −
1
2hT ˙
θ√
pθ
]2
dµ = o(||h||2) for h→ 0
In this case, the model (Pθ : θ ∈ Θ) is called differentiable in quadratic mean at θ.
Lemma
For every θ in an open subset of Rk let pθ be the density of Pθ w.r.t µ. If the mapθ 7→
√pθ(x) is continuously differentiable for every x and the elements of the
matrix Iθ =
∫ (pθpθ
)(pTθ
pθ
)pθdµ are well defined and continuous in θ, then the
map θ 7→ √pθ is differentiable in quadratic mean with ˙θ = pθ/pθ.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 20 / 41
Local Asymptotic Normality
Local Aymptotic Normality
Under the condition of differentiablility in quadratic mean we can establish localasymptotic normality:
Theorem
Suppose that Θ is an open subset of Rk and that the model (Pθ : θ ∈ Θ) isdifferentiable in quadratic mean at θ. Then Eθ[ ˙
θ] = 0 and the Fisher informationmatrix Iθ = Eθ[ ˙
θ˙Tθ ] exists. Additionally, for every converging sequence hn → h,
logn∏
i=1
pθ+hn/√n
pθ(Xi ) =
1√n
n∑i=1
hT ˙θ(Xi )−
1
2hT Iθh + oθ(1)
Pθ=⇒ N(−1
2hT Iθh, hT Iθh
).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 21 / 41
Local Asymptotic Normality
Examples
Location modelsLet f be a positive, continuously differentiable density w.r.t. µ.Consider pθ(x) = f (x − θ) and the location model (f (x − θ) : θ ∈ R). For theFisher information we get
Iθ = Eθ
[(∂
∂θlog f (x − θ)
)2]
=
∫ (− 1
f (x − θ)f ′(x − θ)
)2
f (x − θ) dx
=
∫ (f ′
f
)2
(x) f (x) dx .
which is continuous in θ. By the preceding lemma we obtain differentiability in
quadratic mean with ˙θ(x) = −
(f ′
f
)(x − θ).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 22 / 41
Local Asymptotic Normality
Examples
Uniform distributionIf a family of distributions is differentiable in quadratic mean we have∫ [
√pθ+h −
√pθ −
1
2hT ˙
θ√
pθ
]2
dµ = o(||h||2) for h→ 0
and after restriction of the integral to the set pθ = 0,
Pθ+h(pθ = 0) =
∫pθ=0
pθ+h dµ = o(h2)
. This is not true for the family (U([0, θ]) : θ ∈ Θ), because for h ≥ 0,
Pθ+h(pθ = 0) =
∫[0,θ]c
1
θ + h1[0,θ+h](x) dx =
h
θ + h= O(h).
⇒ The uniform distribution is nowhere differentiable in quadratic mean.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 23 / 41
Local Asymptotic Normality
Convergence to a normal experiment
Limit distribution
Now we consider the limit distribution N (h, I−1θ ). The log likelihood ration
process is given by
logdN (h, I−1
θ )
dN (0, I−1θ )
(X ) = log
1
(2π)n/2√
detI−1θ
exp(− 1
2 (X − h)T Iθ(X − h))
1
(2π)n/2√
detI−1θ
exp(− 1
2 XT IθX)
= −1
2(X − h)T Iθ(X − h) +
1
2XT IθX
= hT IθX − 1
2hT Iθh
The right hand side looks similar to the Taylor expansion of the log likelihood ratio
logdPn
θ+h/√n
dPnθ
(X ) = logn∏
i=1
pθ+h/√n
pθ(Xi ).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 24 / 41
Local Asymptotic Normality
Convergence to a normal experiment
We to study the local approximation?Let us consider limit distributions of a sequence of statistics Tn = Tn(X1, ...,Xn)in the experiment (Pn
θ+h/√n
: h ∈ Rk) for a fixed θ. If we have convergence in
distribution
Tn
Pnθ+h/
√n
=⇒ Lθ,h for every h,
then the distributions (Lθ,h : h ∈ Rk) has to be distributions of a statistic T in thenormal experiment (N (h, I−1
θ ) : h ∈ Rk) −→ Theorem below.
ConclusionEvery weak converging sequence of statistics is matched by a statistic in the limitexperiment. → Application: measure quality of a statistics
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 25 / 41
Local Asymptotic Normality
Convergence to a normal experiment
We to study the local approximation?Let us consider limit distributions of a sequence of statistics Tn = Tn(X1, ...,Xn)in the experiment (Pn
θ+h/√n
: h ∈ Rk) for a fixed θ. If we have convergence in
distribution
Tn
Pnθ+h/
√n
=⇒ Lθ,h for every h,
then the distributions (Lθ,h : h ∈ Rk) has to be distributions of a statistic T in thenormal experiment (N (h, I−1
θ ) : h ∈ Rk) −→ Theorem below.
ConclusionEvery weak converging sequence of statistics is matched by a statistic in the limitexperiment. → Application: measure quality of a statistics
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 25 / 41
Local Asymptotic Normality
Convergence to a normal experiment
Statistical interpretation:Look at measures of quality of a statistic
If Tn is a test statistic: power function h 7→ Ph(Tn > c).
If Tn is an estimator of h: mean squared error h 7→ Eh[(Tn − h)2].
OberservationMeasures of quality only depend on the distribution of the statistic Tn.⇒ After approximation of the the law of Tn by the law of a statistic T , theasymptotic quality of Tn is the same as the quality of T .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 26 / 41
Local Asymptotic Normality
Convergence to a normal experiment
Statistical interpretation:Look at measures of quality of a statistic
If Tn is a test statistic: power function h 7→ Ph(Tn > c).
If Tn is an estimator of h: mean squared error h 7→ Eh[(Tn − h)2].
OberservationMeasures of quality only depend on the distribution of the statistic Tn.⇒ After approximation of the the law of Tn by the law of a statistic T , theasymptotic quality of Tn is the same as the quality of T .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 26 / 41
Local Asymptotic Normality
Technical complication: Need randomized statistics
DefinitionA randomized statistic T based on the observation X is defined as a measurablemap T = T (X ,U) that depends on X but may additionally depend on anindependent uniform distributed random variable U ∼ U([0, 1]).
Theorem
Let the experiment (Pθ : θ ∈ Θ) be differentiable in quadratic mean at θ withinvertible Fisher information matrix Iθ. Let Tn be a sequence of statistics in theexperiment (Pn
θ+h/√n
: h ∈ Rk) such that Tn converges in distribution under every
h. Then there exists a randomized statistic T in the experiment(N (h, I−1
θ ) : h ∈ Rk)
such that Tn
Pnθ+h/
√n
=⇒ T for every h.
Proof.On the board.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 27 / 41
Local Asymptotic Normality
Maximum-Likelihood
ML-estimator for h in the experiment (N (h, I−1θ ) : h ∈ Rk) is h = X (which
is normally distributed)
We expect: ML-estimators hn in (Pnθ+h/
√n
: h ∈ Rk) should converge in
distribution to X .
Note: The local parameter h = 0 is related to the value θ of the originalparameter (Remember: Hn =
√n (Θ− θ)).
→ We expect that hn =√
n (θn − θ)Pnθ=⇒ N (0, I−1
θ ) and therefore
I1/2θ
√n (θn − θ)
Pnθ=⇒ N (0, Id).
Compare to Theorem 5.39
We have shown that result under the assumption of differentiability inquadratic mean, a Lipschitz condition on log pθ(x) and consistency of θn.
Restriction: θ had to be an inner point of Θ.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 28 / 41
Local Asymptotic Normality
Let Θ ⊂ Rk be arbitrary and Hn =√
n (Θ− θ) the local parameter space. TheML-estimator hn maximizes the random function
h 7→ logdPn
θ+h/√n
dPnθ
over Hn.
If (Pθ : θ ∈ Θ) is differentiable in quadratic mean, this processes converge indistribution to the process
h 7→ logdN (h, I−1
θ )
dN (0, I−1θ )
(X ) = −1
2(X − h)T Iθ(X − h) +
1
2XT IθX .
If Hn converges to a set H we expect that hn converges to the maximizer h of thelatter process over H. This means h minimizes d(X , h) over h ∈ H where themetric is defined as d(x , y) = (x − y)T Iθ(x − y).
=⇒ h is the projection of X onto H with respect to d .If H = Rk , this projection reduces to h = X .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 29 / 41
Local Asymptotic Normality
Let Θ ⊂ Rk be arbitrary and Hn =√
n (Θ− θ) the local parameter space. TheML-estimator hn maximizes the random function
h 7→ logdPn
θ+h/√n
dPnθ
over Hn.
If (Pθ : θ ∈ Θ) is differentiable in quadratic mean, this processes converge indistribution to the process
h 7→ logdN (h, I−1
θ )
dN (0, I−1θ )
(X ) = −1
2(X − h)T Iθ(X − h) +
1
2XT IθX .
If Hn converges to a set H we expect that hn converges to the maximizer h of thelatter process over H. This means h minimizes d(X , h) over h ∈ H where themetric is defined as d(x , y) = (x − y)T Iθ(x − y).
=⇒ h is the projection of X onto H with respect to d .If H = Rk , this projection reduces to h = X .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 29 / 41
Local Asymptotic Normality
Let Θ ⊂ Rk be arbitrary and Hn =√
n (Θ− θ) the local parameter space. TheML-estimator hn maximizes the random function
h 7→ logdPn
θ+h/√n
dPnθ
over Hn.
If (Pθ : θ ∈ Θ) is differentiable in quadratic mean, this processes converge indistribution to the process
h 7→ logdN (h, I−1
θ )
dN (0, I−1θ )
(X ) = −1
2(X − h)T Iθ(X − h) +
1
2XT IθX .
If Hn converges to a set H we expect that hn converges to the maximizer h of thelatter process over H. This means h minimizes d(X , h) over h ∈ H where themetric is defined as d(x , y) = (x − y)T Iθ(x − y).
=⇒ h is the projection of X onto H with respect to d .If H = Rk , this projection reduces to h = X .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 29 / 41
Local Asymptotic Normality
Theorem
Suppose that the experiment (Pθ : θ ∈ Θ) is differentiable in quadratic mean at θ0
with nonsingular Fisher information matrix Iθ0 . Assume that for every θ1 and θ2 ina neighborhood of θ0 and a measurable function ˙ with Eθ0 [ ˙2] <∞,
|log pθ1 (x)− log pθ2 (x)| ≤ ˙(x)||θ1 − θ2||.
If the sequence of maximum likelihood estimators θn is consistent and the setsHn =
√n (Θ− θ0) converge to a nonempty, convex set H, then the sequence
I1/2θ0
√n (θn − θ0) converges under θ0 in distribution to the projection of a standard
normal vector onto the set I1/2θ0
H.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 30 / 41
Local Asymptotic Normality
Limit Distributions under Alternatives
Under local asymptotic normality,
logdPn
θ+h/√n
dPnθ
Pnθ=⇒ N
(−1
2hT Iθh, hT Iθh
).
Therefore Pnθ+h/
√n
and Pnθ are mutually contiguous.
Aim
Obtain limit distributions of statistics under the parameters θ + h/√
n from thelimit behaviour under θ by using Le Cam’s third lemma.
Suppose that a sequence of statistics Tn can be written as
√n (Tn − µθ) =
1√n
n∑i=1
ψθ(Xi ) + oPθ (1).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 31 / 41
Local Asymptotic Normality
Limit Distributions under Alternatives
Under local asymptotic normality,
logdPn
θ+h/√n
dPnθ
Pnθ=⇒ N
(−1
2hT Iθh, hT Iθh
).
Therefore Pnθ+h/
√n
and Pnθ are mutually contiguous.
Aim
Obtain limit distributions of statistics under the parameters θ + h/√
n from thelimit behaviour under θ by using Le Cam’s third lemma.
Suppose that a sequence of statistics Tn can be written as
√n (Tn − µθ) =
1√n
n∑i=1
ψθ(Xi ) + oPθ (1).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 31 / 41
Local Asymptotic Normality
Limit Distributions under Alternatives
Under local asymptotic normality,
logdPn
θ+h/√n
dPnθ
Pnθ=⇒ N
(−1
2hT Iθh, hT Iθh
).
Therefore Pnθ+h/
√n
and Pnθ are mutually contiguous.
Aim
Obtain limit distributions of statistics under the parameters θ + h/√
n from thelimit behaviour under θ by using Le Cam’s third lemma.
Suppose that a sequence of statistics Tn can be written as
√n (Tn − µθ) =
1√n
n∑i=1
ψθ(Xi ) + oPθ (1).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 31 / 41
Local Asymptotic Normality
Limit Distributions under Alternatives
By the CLT:If E[ψθ] = 0 and E[ψθψ
Tθ ] <∞ we get 1√
n
∑ni=1(ψθ(Xi ), ˙
θ(Xi )) is asymptotically
multivariate normal under θ.
With Slutsky’s Lemma: (√
n (Tn − µθ), logdPn
θ+h/√n
dPnθ
)
Pnθ=⇒ N
((0
− 12 hT Iθh
),
(Eθ[ψθψ
Tθ ] Eθ[ψθhT ˙
θ]
Eθ[ψTθ hT ˙
θ] hT Iθh
))By Le Cam’s third Example:
√n (Tn − µθ)
Pnθ+h/
√n
=⇒ N(Eθ[ψθhT ˙
θ],Eθ[ψθψTθ ]).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 32 / 41
Likelihood Ratio Tests
Likelihood Ratio Tests
TaskDerive the asymptotic distribution of the likelihood ratio statistic and investigateits asymptotic quality.
Consider an i.i.d. sample X1, ...,Xn from a distribution Pθ with density pθ. Wewant to test
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1.
Neyman-Pearson-test
If Θ0 = θ0 and Θ1 = θ1 we know the Neyman-Pearson-test using the statistic
logL(θ1|X )
L(θ0|X )= log
∏ni=1 pθ1 (Xi )∏ni=1 pθ0 (Xi )
.
This is the most powerful test.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 33 / 41
Likelihood Ratio Tests
Likelihood Ratio Tests
TaskDerive the asymptotic distribution of the likelihood ratio statistic and investigateits asymptotic quality.
Consider an i.i.d. sample X1, ...,Xn from a distribution Pθ with density pθ. Wewant to test
H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1.
Neyman-Pearson-test
If Θ0 = θ0 and Θ1 = θ1 we know the Neyman-Pearson-test using the statistic
logL(θ1|X )
L(θ0|X )= log
∏ni=1 pθ1 (Xi )∏ni=1 pθ0 (Xi )
.
This is the most powerful test.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 33 / 41
Likelihood Ratio Tests
Likelihood Ratio Tests
Extension:We replace single points by the suprema over the hypotheses
Λn = logsupθ∈Θ1
∏ni=1 pθ(Xi )
supθ∈Θ0
∏ni=1 pθ(Xi )
.
H0 is rejected for large values of Λn.In the following we consider the alternative statistic
Λn = 2 logsupθ∈Θ
∏ni=1 pθ(Xi )
supθ∈Θ0
∏ni=1 pθ(Xi )
,
where Θ = Θ0 ∪Θ1.
GoalStudy distribution properties of Λn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 34 / 41
Likelihood Ratio Tests
Likelihood Ratio Tests
Extension:We replace single points by the suprema over the hypotheses
Λn = logsupθ∈Θ1
∏ni=1 pθ(Xi )
supθ∈Θ0
∏ni=1 pθ(Xi )
.
H0 is rejected for large values of Λn.In the following we consider the alternative statistic
Λn = 2 logsupθ∈Θ
∏ni=1 pθ(Xi )
supθ∈Θ0
∏ni=1 pθ(Xi )
,
where Θ = Θ0 ∪Θ1.
GoalStudy distribution properties of Λn.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 34 / 41
Likelihood Ratio Tests
Example
Multinomial distribution
We consider a multinomially distributed random vector N = (N1, ...,Nk) withparameters n and p = (p1, ..., pk). The ML-estimator for pi is known as pi = Ni
n .The log likelihood ratio for testing H0 : p ∈ P0 against H1 : p /∈ P0 is given by
Λn = 2 log
(n
N1 · · ·Nk
)(N1
n
)N1
· · ·(
Nk
n
)Nk
supp∈P0
(n
N1 · · ·Nk
)pN1
1 · · · pNk
k
= 2 infp∈P0
k∑i=1
Ni log
(Ni
npi
).
From the general result of this chapter it will follow that the statistic Λn isasymptotically χ2
k−1-distributed.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 35 / 41
Likelihood Ratio Tests
Application
Testing goodness-of-fit
We want to test if the distribution of an i.i.d. sample X1, ...,Xn with values in χbelongs to the parametric model (Pθ : θ ∈ Θ).Let χ1, ..., χk be a partition of the sample space and N1, ...,Nk the number ofobservations falling into the sets of the partition.Then N = (N1, ...,Nk) is multinomially distributed with some parameters(k, p1, ..., pk). The original test can be formulated as testing
H0 : (p1, ..., pk) = (Pθ(χ1), ...,Pθ(χk))
for some θ ∈ Θ against
H1 : (p1, ..., pk) 6= (Pθ(χ1), ...,Pθ(χk)) .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 36 / 41
Likelihood Ratio Tests
We want to use local asymptotic normality to derive the asymptotic distribution ofthe likelihood ratio statistic.Local parameter spaces: Hn =
√n (Θ− ϑ) and Hn,0 =
√n (Θ0 − ϑ).
Λn = 2 logsupθ∈Θ
∏ni=1 pθ(Xi )
supθ∈Θ0
∏ni=1 pθ(Xi )
= 2 logsuph∈Hn
∏ni=1 pϑ+h/
√n(Xi )
suph∈Hn,0
∏ni=1 pϑ+h/
√n(Xi )
= 2 logsuph∈Hn
∏ni=1 pϑ+h/
√n(Xi )
/∏ni=1 pϑ(Xi )
suph∈Hn,0
∏ni=1 pϑ+h/
√n(Xi )
/∏ni=1 pϑ(Xi )
= 2 suph∈Hn
log
∏ni=1 pϑ+h/
√n(Xi )∏n
i=1 pϑ(Xi )− 2 sup
h∈Hn,0
log
∏ni=1 pϑ+h/
√n(Xi )∏n
i=1 pϑ(Xi ).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 37 / 41
Likelihood Ratio Tests
Connection to Chapter 7
For large n, the above likelihood ratio process is similar to the likelihood ratioprocess of the normal experiment
(N (h, I−1
θ ) : h ∈ Rk).
If Hn and Hn,0 converge to sets H and H0, the sequence Λn converges indistribution to Λ, given by
Λ = 2 suph∈H
logdN (h, I−1
ϑ )
dN (0, I−1ϑ )
(X )− 2 suph∈H0
logdN (h, I−1
ϑ )
dN (0, I−1ϑ )
(X ).
This is related to testing h ∈ H0 versus h ∈ H \ H0 based on the obersvation X inthe normal experiment.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 38 / 41
Likelihood Ratio Tests
Reminder: Limit distribution N (h, I−1θ )
The log likelihood ration process is given by
logdN (h, I−1
ϑ )
dN (0, I−1ϑ )
(X ) = log
1
(2π)n/2√
detI−1ϑ
exp(− 1
2 (X − h)T Iϑ(X − h))
1
(2π)n/2√
detI−1ϑ
exp(− 1
2 XT IϑX)
= −1
2(X − h)T Iϑ(X − h) +
1
2XT IϑX .
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 39 / 41
Likelihood Ratio Tests
Likelihood ratio in the normal case
Λ = 2 suph∈H
(−1
2(X − h)T Iϑ(X − h) +
1
2XT IϑX
)− 2 sup
h∈H0
(−1
2(X − h)T Iϑ(X − h) +
1
2XT IϑX
)= inf
h∈H0
(X − h)T Iϑ(X − h)− infh∈H
(X − h)T Iϑ(X − h)
= infh∈H0
(I
1/2ϑ (X − h)
)TI
1/2ϑ (X − h)− inf
h∈H
(I
1/2ϑ (X − h)
)TI
1/2ϑ (X − h)
= infh∈H0
||I 1/2ϑ X − I
1/2ϑ h||2 − inf
h∈H||I 1/2ϑ X − I
1/2ϑ h||2
= ||I 1/2ϑ X − I
1/2ϑ H0||2 − ||I 1/2
ϑ X − I1/2ϑ H||2.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 40 / 41
Likelihood Ratio Tests
Rigorous formulation
Theorem
Let the model (Pθ : θ ∈ Θ) be differentiable in quadratic mean at ϑ withnonsingular Fisher information matrix Iϑ, and suppose that for every θ1 and θ2 ina neighborhood of ϑ and for a measurable function ˙ such that Eϑ[ ˙2] <∞,
|log pθ1 (x)− log pθ2 (x)| ≤ ˙(x)||θ1 − θ2||.
If the maximum likelihood estimators θn,0 and θn are consistent under ϑ and thesets Hn,0 and Hn converge to H0 and H, then
Λnϑ+h/
√n
=⇒ Λ,
whereΛ = ||I 1/2
ϑ X − I1/2ϑ H0||2 − ||I 1/2
ϑ X − I1/2ϑ H||2
with normally distributed X ∼ N (h, I−1ϑ ).
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 41 / 41
Likelihood Ratio Tests
Chi-squared distribution
If ϑ is an inner point of Θ we have H = Rk and therefore
Λ = ||I 1/2ϑ X − I
1/2ϑ H0||2.
If ϑ is the true parameter the distribution of Λn corresponds to the
distribution of Λ under h = 0. In this case the random vector I1/2ϑ X is
standard normal.
LemmaLet Z be a k-dimensional random vector with a standard normal distribution andlet H0 be an r-dimensional linear subspace of Rk .Then ||Z − H0||2 is χ2
k−r -distributed.
Hence, if√
n (Θ0 − ϑ) −→ H0, where H0 is a linear subspace of Rk withdim H0 = r , then the likelihood ratio Λn is asymptotically χ2
k−r -distributed.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 42 / 41
Likelihood Ratio Tests
Chi-squared distribution
If ϑ is an inner point of Θ we have H = Rk and therefore
Λ = ||I 1/2ϑ X − I
1/2ϑ H0||2.
If ϑ is the true parameter the distribution of Λn corresponds to the
distribution of Λ under h = 0. In this case the random vector I1/2ϑ X is
standard normal.
LemmaLet Z be a k-dimensional random vector with a standard normal distribution andlet H0 be an r-dimensional linear subspace of Rk .Then ||Z − H0||2 is χ2
k−r -distributed.
Hence, if√
n (Θ0 − ϑ) −→ H0, where H0 is a linear subspace of Rk withdim H0 = r , then the likelihood ratio Λn is asymptotically χ2
k−r -distributed.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 42 / 41
Likelihood Ratio Tests
Chi-squared distribution
If ϑ is an inner point of Θ we have H = Rk and therefore
Λ = ||I 1/2ϑ X − I
1/2ϑ H0||2.
If ϑ is the true parameter the distribution of Λn corresponds to the
distribution of Λ under h = 0. In this case the random vector I1/2ϑ X is
standard normal.
LemmaLet Z be a k-dimensional random vector with a standard normal distribution andlet H0 be an r-dimensional linear subspace of Rk .Then ||Z − H0||2 is χ2
k−r -distributed.
Hence, if√
n (Θ0 − ϑ) −→ H0, where H0 is a linear subspace of Rk withdim H0 = r , then the likelihood ratio Λn is asymptotically χ2
k−r -distributed.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 42 / 41
Likelihood Ratio Tests
Examples
Location scale
Suppose we have a sample from the density 1σ f(x−µσ
)for a given probability
density f with the location scale parameter θ = (µ, σ) ∈ Θ = R× R+. Note thatϑ = (0, σ) is an inner point of Θ and Hn =
√n (Θ− ϑ) = R× (−
√n σ,∞)
converges to R× R.Consider some testing problems:
H0 : µ = 0 versus H1 : µ 6= 0:This corresponds to the set Θ0 = 0 × R+. From
Hn,0 =√
n (Θ0 − ϑ) = 0 × (−√
n σ,∞)n→∞−→ 0 × R
it follows that the sequence of likelihood ratio statistics is asymptoticallyχ2
1-distributed.→ Level-α test: Reject the null hypothesis if Λn exceeds the (1− α)-quantileof the χ2
1-distribution.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 43 / 41
Likelihood Ratio Tests
Examples
Location scale (continued)
H0 : µ ≤ 0 versus H1 : µ > 0:This corresponds to the set Θ0 = (−∞, 0]× R+ and
Hn,0 = (−∞, 0]× (−√
n σ,∞)n→∞−→ (−∞, 0]× R = H0,
which is no linear subspace of R× R. Thus, the limit distribution of thelikelihood ratio statistic is not χ2 but it equals the distribution of
||Z − I1/2ϑ H0||2
with Z belonging to the standard normal distribution. The set
I1/2ϑ H0 = h : 〈h, I−1/2
ϑ e1〉 ≤ 0 is a half space with boundary line throughthe origin.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 44 / 41
Likelihood Ratio Tests
Examples
Location scale (continued)
Because a standard normal vector is rotationally symmetric, the limitdistribution equals the squared distance of Z to the half space h : h2 ≤ 0.This is the distribution of (Z ∨ 0)2 for Z ∼ N (0, 1). Because of
P((Z ∨ 0)2 > c
)=
1
2P(Z 2 > c
)for every c > 0 we choose the critical value equal to the (1− 2α)-quantile ofχ2
1 to reach level α.
If ϑ is an inner point of Θ0 the sets Hn,0 converge to R×R and the sequenceof likelihood ratio statistics converges in distribution to 0.→ Probability of an error of the first kind converges to 0.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 45 / 41
Likelihood Ratio Tests
Thank you for your attention.
Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 46 / 41