seminar applied mathematical statistics - tu … · seminar applied mathematical statistics...

SeminarApplied Mathematical Statistics

Contiguity, Local Asymptotic Normality, Likelihood Ratio Tests

Johannes Musebeck

TU Kaiserslautern

13.02.2015

Johannes Musebeck (TU Kaiserslautern) Seminar Applied Mathematical Statistics 13.02.2015 1 / 41

Contents

1 Chapter 6: Contiguity

2 Chapter 7: Local Asymptotic NormalityMaximum Likelihood

3 Chapter 16: Likelihood Ratio Tests


Contiguity

Contiguity

Contiguity

Abbreviation for asymptotic absolute continuity.

Technic to obtain the limit distribution of a sequence of statistics.

Recall

Let P and Q be two measures on the measurable space (Ω,A).

Q is absolutely continuous w.r.t. P if P(A) = 0 implies Q(A) = 0 for allA ∈ A. This is denoted by Q P.

P and Q are orthogonal if Ω can be partitioned as Ω = ΩP ∪ ΩQ withΩP ∩ ΩQ = ∅ such that P(ΩQ) = Q(ΩP) = 0. This is denoted by P⊥Q.


Contiguity

Contiguity

Suppose that the measures P and Q have densities p and q w.r.t. a measure µ.

Definition (Lebesgue decomposition)

The measure Q can be written as Q = Qa + Q⊥ where Qa(A) = Q(A ∩ p > 0)is called the absolutely continuous part and Q⊥(A) = Q(A ∩ p = 0) is calledthe orthogonal part of Q w.r.t. P.

LemmaLet P and Q be probability measures with densities p and q w.r.t µ.Then we have:

(i) Q = Qa + Q⊥ where Qa P and Q⊥ ⊥ P.

(ii) Qa(A) =∫A

qp dP for every A ∈ A.


Contiguity

Contiguity

Likelihood Ratio

The function qp is a density of Qa with respect to P. It is denoted by dQ

dP .

The random variable dQdP : Ω→ [0,∞) is called Radon-Nikodym density or

likelihood ratio.

Note that for any P and Q and nonnegative measurable function f we have∫f dQ ≥

∫p>0

fq dµ =

∫p>0

fq

pp dµ =

∫f

dQdP

dP.


Contiguity

Contiguity

Consider (Ωn,An) measurable spaces equipped with probability measures Pn,Qn

and random vectors Xn : Ωn → Rk .

GoalDerive a Qn-limit law of Xn from a Pn-limit law.

Non-asymptotic situation

Let Q be absolutely continuous w.r.t P and X : Ω→ Rk . Then

EQ[f (X )] = EP

[f (X )

dQdP

].

In the asymptotic case we need Qn to be asymptotically absolutely continuouswith respect to Pn.


Contiguity

Contiguity

DefinitionThe sequence Qn is called contiguous w.r.t. the sequence Pn if

Pn(An)→ 0 implies Qn(An)→ 0

for every sequence of measurable sets An ∈ An. We write Qn / Pn.The sequences are mutually contiguous if both Pn /Qn and Qn / Pn.This is denoted by Pn / .Qn.


Contiguity

Characterization of contiguity

Lemma (Le Cam’s first lemma)

Let Pn and Qn be probability measures on (Ωn,An). Then the following areequivalent:

(i) Qn / Pn.

(ii) If dPn

dQn

Qn=⇒ U along a subsequence, then P(U > 0) = 1.

(iii) If dQn

dPn

Pn=⇒ V along a subsequence, then E[V ] = 1.

(iv) For any statistics Tn : Ωn → Rk : If TnPn−→ 0, then Tn

Qn−→ 0.


Contiguity

Example (Aymptotic log normality)

Let Pn and Qn be probability measures sucht that

dPn

dQn

Qn=⇒ eN (µ,σ2).

Then Qn / Pn due to Le Cam’s first lemma and because of

P(eN (µ,σ2) = 0) = 0.

Furthermore Pn /Qn if and only if E[eN (µ,σ2)

]= eµ+σ2

2 = 1.

(remember the moment generating function: MX (t) := E[et·X

]If X ∼ N (µ, σ2), then MX (t) = eµt+σ2t2

2 .)

⇒ Qn / .Pn if and only if µ = −σ2

2.


Contiguity

Le Cam’s third lemma

Central resultObtaining a Qn-limit law from a Pn-limit law.

TheoremLet Pn and Qn be sequences of probability measures on measurables spaces(Ωn,An) and let Xn : Ωn → Rk be a sequence of random vectors. Suppose thatQn / Pn and (

Xn,dQn

dPn

)Pn=⇒ (X ,V ).

Then L(B) = E[1B(X )V ] defines a probability measure and XnQn=⇒ L.

Proof.On the board.


Contiguity


Example (Le Cam’s third lemma)

If we have (Xn, log

dQn

dPn

)Pn=⇒ Nk+1

((µ− 1

2σ2

),

(Σ ττT σ2

)),

thenXn

Qn=⇒ Nk(µ+ τ,Σ).

special case of previous theorem

Assume that we are given(

Xn, log dQn

dPn

)Pn=⇒ (X ,W ) where (X ,W ) have the

(k + 1)-dimensional normal distribution. Then, by continuous mapping principle(Xn,

dQn

dPn

)Pn=⇒ (X , eW ).


Contiguity


example continued

Notice that Pn / .Qn because of W ∼ N(−σ

2

2 , σ2)

.

Using the theorem it follows XnQn=⇒ L with L(B) = E[1B(X )ew ]. The

characteristic function of L is given by

L(t) =

∫eitT x L(dx) = E

[eitTX eW

]= E

[exp

(i

(t−i

)T (XW

))],

which is the characteristic function of our (k + 1)-dimensional normal vector(X ,W ) at (t,−i)T .


Contiguity


Reminder

The characteristic function of Nk(µ,Σ) is exp(itTµ− 1

2 tTΣt).

example continued

Nk+1

((µ

−σ2

2

),

(Σ ττT σ2

))(t−i

)= exp

(itTµ− 1

2σ2 − 1

2(tT ,−i)

(Σ ττ t σ2

)(t−i

))= exp

(itT (µ+ τ)− 1

2tTΣt

)= Nk(µ+ τ,Σ)(t).

Since a distribution is uniquely determined by its characteristic function the claimfollows.


Local Asymptotic Normality


A sequence of statistical models is locally asymptotically normal if, asymptotically,their likelihood ratio processes are similar to those for a normal model.

Model/Experiment

Consider an i.i.d. sample X1, ...,Xn from a distribution Pθ on a measurable space(X ,A) where θ lies in an open subset Θ of Rk . Then X = (X1, ...,Xn)T is asingle observation from Pn

θ in the sample space (X n,An).→ The experiment can completely be described by (Pn

θ : θ ∈ Θ).

GoalApproximation of this statistical experiment by a Gaussian experiment after asuitable reparametrization.



Reparametrization

Define local parameter h =√

n (θ − θ0) with fixed θ0.

Rewrite Pnθ as Pn

θ0+h/√n

and consider experiments with parameter h.

We will see that (Pnθ0+h

√n

: h ∈ Rk) and(N (h, I−1

θ0) : h ∈ Rk

)have similar

statistical properties for large n.

local parameter set Hn =√n (Θ− θ0).

We take Hn equal to Rk , since if

1 Θ = Rk ⇒ Hn = Rk .

2 Θ ⊂ Rk ⇒ Hn 6= Rk , but if θ0 is an inner point of Θ, then 0 is an inner pointof the set (Θ− θ0). ⇒ Hn converges to Rk for n→∞.



Score function

Let pθ be an density of Pθ and assume that the log likelihood `θ(x) = log pθ(x) istwice-diffenrentiable with respect to θ.The first derivative ˙

θ(x) = ∂∂θ log pθ(x) is called the score function.

Moments of the score function

Eθ[ ˙θ] =

∫˙θpθdµ =

∫pθpθ

pθdµ =

∫pθdµ =

∂

∂θ

∫pθdµ =

∂

∂θ1 = 0.

Eθ[¨θ] =

∫¨θpθdµ =

∫ (pθpθ− pθpT

θ

p2θ

)pθdµ

=

∫pθdµ−

∫˙θ

˙Tθ pθdµ =

∂

∂θ

∫pθdµ− Eθ[ ˙

θ˙Tθ ] = −Iθ



Expanding the Likelihood

Assume that θ is one-dimensional. A Taylor expansion of the log likelihood ratioaround θ yields

logpθ+h

pθ(x) = log pθ+h(x)− log pθ(x) = `θ+h(x)− `θ(x)

= h ˙θ(x) +

1

2h2 ¨

θ(x) + ox(h2).

From this it follows that

logdPn

θ+h/√n

dPnθ

(X ) = logn∏

i=1

pθ+h/√n

pθ(Xi ) =

n∑i=1

logpθ+h/

√n

pθ(Xi )

=h√n

n∑i=1

˙θ(Xi ) +

h2

2n

n∑i=1

¨θ(Xi ) + Remn.




Using Central Limit Theorem:

1√n

n∑i=1

(˙θ(Xi )− E[ ˙

θ(Xi )])

=⇒ N (0,Cov[ ˙θ(X1)]).

Because of E[ ˙θ(Xi )] = 0 and Cov[ ˙

θ(X1)] = E[ ˙θ(X1)2] = Iθ we have

1√n

n∑i=1

˙θ(Xi ) =⇒ N (0, Iθ).

By the Law of Large Numbers:

1

n

n∑i=1

¨θ(Xi ) =⇒ E

[¨θ(X1)

]= −Iθ.




Asymptotically we get

logn∏

i=1

pθ+h/√n

pθ(Xi ) =

h√n

n∑i=1

˙θ(Xi ) +

h2

2n

n∑i=1

¨θ(Xi ) + Remn

Pθ=⇒ hN (0, Iθ)− h2

2Iθ = N

(−h2

2Iθ, h

2Iθ

).

for every h.

Conclusionexpansion of the likelihood process in a neighborhood of θ→ local asymptotic normality.

We will see that the likelihood process of a normal experiment has a similarform.



DefinitionThe mapping θ 7→ √pθ is called differentiable in quadratic mean if there exists a

vector of measurable functions ˙θ = ( ˙

θ,1, ..., ˙θ,k)T such that∫ [

√pθ+h −

√pθ −

1

2hT ˙

θ√

pθ

]2

dµ = o(||h||2) for h→ 0

In this case, the model (Pθ : θ ∈ Θ) is called differentiable in quadratic mean at θ.

Lemma

For every θ in an open subset of Rk let pθ be the density of Pθ w.r.t µ. If the mapθ 7→

√pθ(x) is continuously differentiable for every x and the elements of the

matrix Iθ =

∫ (pθpθ

)(pTθ

pθ

)pθdµ are well defined and continuous in θ, then the

map θ 7→ √pθ is differentiable in quadratic mean with ˙θ = pθ/pθ.



Local Aymptotic Normality

Under the condition of differentiablility in quadratic mean we can establish localasymptotic normality:

Theorem

Suppose that Θ is an open subset of Rk and that the model (Pθ : θ ∈ Θ) isdifferentiable in quadratic mean at θ. Then Eθ[ ˙

θ] = 0 and the Fisher informationmatrix Iθ = Eθ[ ˙

θ˙Tθ ] exists. Additionally, for every converging sequence hn → h,

logn∏

i=1

pθ+hn/√n

pθ(Xi ) =

1√n

n∑i=1

hT ˙θ(Xi )−

1

2hT Iθh + oθ(1)

Pθ=⇒ N(−1

2hT Iθh, hT Iθh

).



Examples

Location modelsLet f be a positive, continuously differentiable density w.r.t. µ.Consider pθ(x) = f (x − θ) and the location model (f (x − θ) : θ ∈ R). For theFisher information we get

Iθ = Eθ

[(∂

∂θlog f (x − θ)

)2]

=

∫ (− 1

f (x − θ)f ′(x − θ)

)2

f (x − θ) dx

=

∫ (f ′

f

)2

(x) f (x) dx .

which is continuous in θ. By the preceding lemma we obtain differentiability in

quadratic mean with ˙θ(x) = −

(f ′

f

)(x − θ).



Examples

Uniform distributionIf a family of distributions is differentiable in quadratic mean we have∫ [

√pθ+h −

√pθ −

1

2hT ˙

θ√

pθ

]2

dµ = o(||h||2) for h→ 0

and after restriction of the integral to the set pθ = 0,

Pθ+h(pθ = 0) =

∫pθ=0

pθ+h dµ = o(h2)

. This is not true for the family (U([0, θ]) : θ ∈ Θ), because for h ≥ 0,

Pθ+h(pθ = 0) =

∫[0,θ]c

1

θ + h1[0,θ+h](x) dx =

h

θ + h= O(h).

⇒ The uniform distribution is nowhere differentiable in quadratic mean.



Convergence to a normal experiment

Limit distribution

Now we consider the limit distribution N (h, I−1θ ). The log likelihood ration

process is given by

logdN (h, I−1

θ )

dN (0, I−1θ )

(X ) = log

1

(2π)n/2√

detI−1θ

exp(− 1

2 (X − h)T Iθ(X − h))

1

(2π)n/2√

detI−1θ

exp(− 1

2 XT IθX)

= −1

2(X − h)T Iθ(X − h) +

1

2XT IθX

= hT IθX − 1

2hT Iθh

The right hand side looks similar to the Taylor expansion of the log likelihood ratio

logdPn

θ+h/√n

dPnθ

(X ) = logn∏

i=1

pθ+h/√n

pθ(Xi ).




We to study the local approximation?Let us consider limit distributions of a sequence of statistics Tn = Tn(X1, ...,Xn)in the experiment (Pn

θ+h/√n

: h ∈ Rk) for a fixed θ. If we have convergence in

distribution

Tn

Pnθ+h/

√n

=⇒ Lθ,h for every h,

then the distributions (Lθ,h : h ∈ Rk) has to be distributions of a statistic T in thenormal experiment (N (h, I−1

θ ) : h ∈ Rk) −→ Theorem below.

ConclusionEvery weak converging sequence of statistics is matched by a statistic in the limitexperiment. → Application: measure quality of a statistics




Statistical interpretation:Look at measures of quality of a statistic

If Tn is a test statistic: power function h 7→ Ph(Tn > c).

If Tn is an estimator of h: mean squared error h 7→ Eh[(Tn − h)2].

OberservationMeasures of quality only depend on the distribution of the statistic Tn.⇒ After approximation of the the law of Tn by the law of a statistic T , theasymptotic quality of Tn is the same as the quality of T .



Technical complication: Need randomized statistics

DefinitionA randomized statistic T based on the observation X is defined as a measurablemap T = T (X ,U) that depends on X but may additionally depend on anindependent uniform distributed random variable U ∼ U([0, 1]).

Theorem

Let the experiment (Pθ : θ ∈ Θ) be differentiable in quadratic mean at θ withinvertible Fisher information matrix Iθ. Let Tn be a sequence of statistics in theexperiment (Pn

θ+h/√n

: h ∈ Rk) such that Tn converges in distribution under every

h. Then there exists a randomized statistic T in the experiment(N (h, I−1

θ ) : h ∈ Rk)

such that Tn

Pnθ+h/

√n

=⇒ T for every h.

Proof.On the board.



Maximum-Likelihood

ML-estimator for h in the experiment (N (h, I−1θ ) : h ∈ Rk) is h = X (which

is normally distributed)

We expect: ML-estimators hn in (Pnθ+h/

√n

: h ∈ Rk) should converge in

distribution to X .

Note: The local parameter h = 0 is related to the value θ of the originalparameter (Remember: Hn =

√n (Θ− θ)).

→ We expect that hn =√

n (θn − θ)Pnθ=⇒ N (0, I−1

θ ) and therefore

I1/2θ

√n (θn − θ)

Pnθ=⇒ N (0, Id).

Compare to Theorem 5.39

We have shown that result under the assumption of differentiability inquadratic mean, a Lipschitz condition on log pθ(x) and consistency of θn.

Restriction: θ had to be an inner point of Θ.



Let Θ ⊂ Rk be arbitrary and Hn =√

n (Θ− θ) the local parameter space. TheML-estimator hn maximizes the random function

h 7→ logdPn

θ+h/√n

dPnθ

over Hn.

If (Pθ : θ ∈ Θ) is differentiable in quadratic mean, this processes converge indistribution to the process

h 7→ logdN (h, I−1

θ )

dN (0, I−1θ )

(X ) = −1

2(X − h)T Iθ(X − h) +

1

2XT IθX .

If Hn converges to a set H we expect that hn converges to the maximizer h of thelatter process over H. This means h minimizes d(X , h) over h ∈ H where themetric is defined as d(x , y) = (x − y)T Iθ(x − y).

=⇒ h is the projection of X onto H with respect to d .If H = Rk , this projection reduces to h = X .



Theorem

Suppose that the experiment (Pθ : θ ∈ Θ) is differentiable in quadratic mean at θ0

with nonsingular Fisher information matrix Iθ0 . Assume that for every θ1 and θ2 ina neighborhood of θ0 and a measurable function ˙ with Eθ0 [ ˙2] <∞,

|log pθ1 (x)− log pθ2 (x)| ≤ ˙(x)||θ1 − θ2||.

If the sequence of maximum likelihood estimators θn is consistent and the setsHn =

√n (Θ− θ0) converge to a nonempty, convex set H, then the sequence

I1/2θ0

√n (θn − θ0) converges under θ0 in distribution to the projection of a standard

normal vector onto the set I1/2θ0

H.



Limit Distributions under Alternatives

Under local asymptotic normality,

logdPn

θ+h/√n

dPnθ

Pnθ=⇒ N

(−1

2hT Iθh, hT Iθh

).

Therefore Pnθ+h/

√n

and Pnθ are mutually contiguous.

Aim

Obtain limit distributions of statistics under the parameters θ + h/√

n from thelimit behaviour under θ by using Le Cam’s third lemma.

Suppose that a sequence of statistics Tn can be written as

√n (Tn − µθ) =

1√n

n∑i=1

ψθ(Xi ) + oPθ (1).



Limit Distributions under Alternatives

By the CLT:If E[ψθ] = 0 and E[ψθψ

Tθ ] <∞ we get 1√

n

∑ni=1(ψθ(Xi ), ˙

θ(Xi )) is asymptotically

multivariate normal under θ.

With Slutsky’s Lemma: (√

n (Tn − µθ), logdPn

θ+h/√n

dPnθ

)

Pnθ=⇒ N

((0

− 12 hT Iθh

),

(Eθ[ψθψ

Tθ ] Eθ[ψθhT ˙

θ]

Eθ[ψTθ hT ˙

θ] hT Iθh

))By Le Cam’s third Example:

√n (Tn − µθ)

Pnθ+h/

√n

=⇒ N(Eθ[ψθhT ˙

θ],Eθ[ψθψTθ ]).


Likelihood Ratio Tests


TaskDerive the asymptotic distribution of the likelihood ratio statistic and investigateits asymptotic quality.

Consider an i.i.d. sample X1, ...,Xn from a distribution Pθ with density pθ. Wewant to test

H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1.

Neyman-Pearson-test

If Θ0 = θ0 and Θ1 = θ1 we know the Neyman-Pearson-test using the statistic

logL(θ1|X )

L(θ0|X )= log

∏ni=1 pθ1 (Xi )∏ni=1 pθ0 (Xi )

.

This is the most powerful test.




Extension:We replace single points by the suprema over the hypotheses

Λn = logsupθ∈Θ1

∏ni=1 pθ(Xi )

supθ∈Θ0

∏ni=1 pθ(Xi )

.

H0 is rejected for large values of Λn.In the following we consider the alternative statistic

Λn = 2 logsupθ∈Θ

∏ni=1 pθ(Xi )

supθ∈Θ0

∏ni=1 pθ(Xi )

,

where Θ = Θ0 ∪Θ1.

GoalStudy distribution properties of Λn.



Example

Multinomial distribution

We consider a multinomially distributed random vector N = (N1, ...,Nk) withparameters n and p = (p1, ..., pk). The ML-estimator for pi is known as pi = Ni

n .The log likelihood ratio for testing H0 : p ∈ P0 against H1 : p /∈ P0 is given by

Λn = 2 log

(n

N1 · · ·Nk

)(N1

n

)N1

· · ·(

Nk

n

)Nk

supp∈P0

(n

N1 · · ·Nk

)pN1

1 · · · pNk

k

= 2 infp∈P0

k∑i=1

Ni log

(Ni

npi

).

From the general result of this chapter it will follow that the statistic Λn isasymptotically χ2

k−1-distributed.



Application

Testing goodness-of-fit

We want to test if the distribution of an i.i.d. sample X1, ...,Xn with values in χbelongs to the parametric model (Pθ : θ ∈ Θ).Let χ1, ..., χk be a partition of the sample space and N1, ...,Nk the number ofobservations falling into the sets of the partition.Then N = (N1, ...,Nk) is multinomially distributed with some parameters(k, p1, ..., pk). The original test can be formulated as testing

H0 : (p1, ..., pk) = (Pθ(χ1), ...,Pθ(χk))

for some θ ∈ Θ against

H1 : (p1, ..., pk) 6= (Pθ(χ1), ...,Pθ(χk)) .



We want to use local asymptotic normality to derive the asymptotic distribution ofthe likelihood ratio statistic.Local parameter spaces: Hn =

√n (Θ− ϑ) and Hn,0 =

√n (Θ0 − ϑ).

Λn = 2 logsupθ∈Θ

∏ni=1 pθ(Xi )

supθ∈Θ0

∏ni=1 pθ(Xi )

= 2 logsuph∈Hn

∏ni=1 pϑ+h/

√n(Xi )

suph∈Hn,0

∏ni=1 pϑ+h/

√n(Xi )

= 2 logsuph∈Hn

∏ni=1 pϑ+h/

√n(Xi )

/∏ni=1 pϑ(Xi )

suph∈Hn,0

∏ni=1 pϑ+h/

√n(Xi )

/∏ni=1 pϑ(Xi )

= 2 suph∈Hn

log

∏ni=1 pϑ+h/

√n(Xi )∏n

i=1 pϑ(Xi )− 2 sup

h∈Hn,0

log

∏ni=1 pϑ+h/

√n(Xi )∏n

i=1 pϑ(Xi ).



Connection to Chapter 7

For large n, the above likelihood ratio process is similar to the likelihood ratioprocess of the normal experiment

(N (h, I−1

θ ) : h ∈ Rk).

If Hn and Hn,0 converge to sets H and H0, the sequence Λn converges indistribution to Λ, given by

Λ = 2 suph∈H

logdN (h, I−1

ϑ )

dN (0, I−1ϑ )

(X )− 2 suph∈H0

logdN (h, I−1

ϑ )

dN (0, I−1ϑ )

(X ).

This is related to testing h ∈ H0 versus h ∈ H \ H0 based on the obersvation X inthe normal experiment.



Reminder: Limit distribution N (h, I−1θ )

The log likelihood ration process is given by

logdN (h, I−1

ϑ )

dN (0, I−1ϑ )

(X ) = log

1

(2π)n/2√

detI−1ϑ

exp(− 1

2 (X − h)T Iϑ(X − h))

1

(2π)n/2√

detI−1ϑ

exp(− 1

2 XT IϑX)

= −1

2(X − h)T Iϑ(X − h) +

1

2XT IϑX .



Likelihood ratio in the normal case

Λ = 2 suph∈H

(−1

2(X − h)T Iϑ(X − h) +

1

2XT IϑX

)− 2 sup

h∈H0

(−1

2(X − h)T Iϑ(X − h) +

1

2XT IϑX

)= inf

h∈H0

(X − h)T Iϑ(X − h)− infh∈H

(X − h)T Iϑ(X − h)

= infh∈H0

(I

1/2ϑ (X − h)

)TI

1/2ϑ (X − h)− inf

h∈H

(I

1/2ϑ (X − h)

)TI

1/2ϑ (X − h)

= infh∈H0

||I 1/2ϑ X − I

1/2ϑ h||2 − inf

h∈H||I 1/2ϑ X − I

1/2ϑ h||2

= ||I 1/2ϑ X − I

1/2ϑ H0||2 − ||I 1/2

ϑ X − I1/2ϑ H||2.



Rigorous formulation

Theorem

Let the model (Pθ : θ ∈ Θ) be differentiable in quadratic mean at ϑ withnonsingular Fisher information matrix Iϑ, and suppose that for every θ1 and θ2 ina neighborhood of ϑ and for a measurable function ˙ such that Eϑ[ ˙2] <∞,

|log pθ1 (x)− log pθ2 (x)| ≤ ˙(x)||θ1 − θ2||.

If the maximum likelihood estimators θn,0 and θn are consistent under ϑ and thesets Hn,0 and Hn converge to H0 and H, then

Λnϑ+h/

√n

=⇒ Λ,

whereΛ = ||I 1/2

ϑ X − I1/2ϑ H0||2 − ||I 1/2

ϑ X − I1/2ϑ H||2

with normally distributed X ∼ N (h, I−1ϑ ).



Chi-squared distribution

If ϑ is an inner point of Θ we have H = Rk and therefore

Λ = ||I 1/2ϑ X − I

1/2ϑ H0||2.

If ϑ is the true parameter the distribution of Λn corresponds to the

distribution of Λ under h = 0. In this case the random vector I1/2ϑ X is

standard normal.

LemmaLet Z be a k-dimensional random vector with a standard normal distribution andlet H0 be an r-dimensional linear subspace of Rk .Then ||Z − H0||2 is χ2

k−r -distributed.

Hence, if√

n (Θ0 − ϑ) −→ H0, where H0 is a linear subspace of Rk withdim H0 = r , then the likelihood ratio Λn is asymptotically χ2

k−r -distributed.



Examples

Location scale

Suppose we have a sample from the density 1σ f(x−µσ

)for a given probability

density f with the location scale parameter θ = (µ, σ) ∈ Θ = R× R+. Note thatϑ = (0, σ) is an inner point of Θ and Hn =

√n (Θ− ϑ) = R× (−

√n σ,∞)

converges to R× R.Consider some testing problems:

H0 : µ = 0 versus H1 : µ 6= 0:This corresponds to the set Θ0 = 0 × R+. From

Hn,0 =√

n (Θ0 − ϑ) = 0 × (−√

n σ,∞)n→∞−→ 0 × R

it follows that the sequence of likelihood ratio statistics is asymptoticallyχ2

1-distributed.→ Level-α test: Reject the null hypothesis if Λn exceeds the (1− α)-quantileof the χ2

1-distribution.



Examples

Location scale (continued)

H0 : µ ≤ 0 versus H1 : µ > 0:This corresponds to the set Θ0 = (−∞, 0]× R+ and

Hn,0 = (−∞, 0]× (−√

n σ,∞)n→∞−→ (−∞, 0]× R = H0,

which is no linear subspace of R× R. Thus, the limit distribution of thelikelihood ratio statistic is not χ2 but it equals the distribution of

||Z − I1/2ϑ H0||2

with Z belonging to the standard normal distribution. The set

I1/2ϑ H0 = h : 〈h, I−1/2

ϑ e1〉 ≤ 0 is a half space with boundary line throughthe origin.



Examples

Location scale (continued)

Because a standard normal vector is rotationally symmetric, the limitdistribution equals the squared distance of Z to the half space h : h2 ≤ 0.This is the distribution of (Z ∨ 0)2 for Z ∼ N (0, 1). Because of

P((Z ∨ 0)2 > c

)=

1

2P(Z 2 > c

)for every c > 0 we choose the critical value equal to the (1− 2α)-quantile ofχ2

1 to reach level α.

If ϑ is an inner point of Θ0 the sets Hn,0 converge to R×R and the sequenceof likelihood ratio statistics converges in distribution to 0.→ Probability of an error of the first kind converges to 0.



Thank you for your attention.


seminar applied mathematical statistics - tu … · seminar applied mathematical statistics...

Documents