lecture 3: estimating parameters and assessing …thulin/mm/l3.pdfdepartment of mathematics, uppsala...

Lecture 3: Estimating Parameters and AssessingNormality

Mans Thulin

Department of Mathematics, Uppsala University

[email protected]

Multivariate Methods • 1/4 2011

1/36

Homeworks

I To pass the course (grade 3), all mandatory problems must besolved, you must hold an oral presentation of a clusteringmethod and you must pass the exam.

I For grade 4, you must present satisfactory solutions to at least4 bonus problems (at least one from each homework).

I For grade 5, you must present satisfactory solutions to at least8 bonus problems (at least two from each homework).

I Bonus problems on the exam can be counted as belonging tothe corresponding homework.

2/36

Outline

I Sample momentsI UnbiasednessI Asymptotics

I Estimation for the multivariate normal distributionI Maximum likelihood estimationI Distributions of estimators

I Assessing normalityI How to investigate the validity of the assumption of normality

I Outliers

I Transformations to normality

3/36

Multivariate data

We study a p-dimensional data set consisting of n observations.The data is stored in a matrix:

X =

x11 x12 . . . x1px21 x22 . . . x2p

......

. . ....

xn1 xn2 . . . xnp

Row j contains the p measurements for subject j .xjk = measurement k for subject j .

4/36

Sample moments

Sample mean: X =

x1x2...

xp

where xk =1

n

n∑j=1

xjk .

Sample covariance matrix: S =

s11 s12 · · · s1ps12 s22 · · · s2p...

.... . .

...s1p s2p · · · spp

wheresk` = s`k = 1

n−1∑n

j=1(xjk − xk)(xj` − x`), k , ` = 1, 2, . . . , p.

5/36

Sample moments: Unbiasedness

Theorem. Let X1, . . . ,Xn be an i.i.d. sample from a jointdistribution with mean µ and covariance matrix Σ.

Then

E (X) = µ

Cov(X) =1

nΣ

Furthermore,E (S) = Σ

X is an unbiased estimator of µ and S is an unbiased estimator ofΣ.

6/36

Sample moments: Unbiasedness

Theorem. Let X1, . . . ,Xn be an i.i.d. sample from a jointdistribution with mean µ and covariance matrix Σ.Then

E (X) = µ

Cov(X) =1

nΣ

Furthermore,E (S) = Σ

X is an unbiased estimator of µ and S is an unbiased estimator ofΣ.

6/36

Sample moments: Asymptotics

Let X1, . . . ,Xn be i.i.d. observations with mean µ. Then we have

The multivariate law of large numbers:

Xp−→ µ as n→∞,

that is, P(|X− µ| > ε)→ 0 as n→∞ for all ε > 0.

If we further assume that the observations have a finite covariancematrix Σ, then we also have

The multivariate central limit theorem:

√n(X− µ)

d−→ N(0,Σ) as n→∞,

that is, the distribution function of√

n(X− µ) converges to thedistribution function of the N(0,Σ) distribution.

7/36

Estimation

We would like to be able to estimate the parameters µ and Σ forthe multivariate normal distribution.

I X and S seem like natural (unbiased!) estimators. What aretheir properties?

I We’ll find the maximum likelihood estimators of µ and Σ andstudy their distributions.

8/36

Estimation: ML principle

Let X1, . . . ,Xn be observations with densities fXi,θwith an

unknown parameter θ. The maximum likelihood estimate of θ isthe value of θ that maximizes the likelihood function

L(θ) =n∏

i=1

fXi(xi ).

In general, maximum likelihood estimators have desirableproperties, such as consistency and asymptotic efficiency.

9/36

Estimation: MLE for MVN

For i.i.d. X1, . . . ,Xn from Np(µ,Σ), the likelihood function is

L(µ,Σ) =1

(2π)np/21

|Σ|n/2exp

(−1

2

n∑i=1

(xi − µ)′Σ−1(xi − µ))

I Taking µ = X maximizes L(µ,Σ) with respect to µ.

I Taking Σ = n−1n S maximizes L(µ,Σ) with respect to Σ.

10/36


Some further remarks:I Functions of parameters: If θ is the ML estimator of θ, then

h(θ) is the ML estimator of h(θ).

I For the multivariate normal distribution:– The ML estimator of µ′Σ−1µ is µ′Σ

−1µ.

– The ML estimator of√σii is

√σii .

I For the multivariate normal distribution, X and Sn aresufficient statistics.

I Thus all the information about µ and Σ in the data matrix Xis contained in X and Sn.

11/36


Some further remarks:I Functions of parameters: If θ is the ML estimator of θ, then

h(θ) is the ML estimator of h(θ).I For the multivariate normal distribution:– The ML estimator of µ′Σ−1µ is µ′Σ

−1µ.

– The ML estimator of√σii is

√σii .

I For the multivariate normal distribution, X and Sn aresufficient statistics.

I Thus all the information about µ and Σ in the data matrix Xis contained in X and Sn.

11/36

Estimation: Distribution of X

Theorem. Let X1, . . . ,Xn be i.i.d. observations from Np(µ,Σ).Then

X ∼ Np

(µ,

1

nΣ).

12/36

Estimation: Distribution of Sn

In the univariate setting,(n − 1)s2 =

∑ni=1(Xi − X )2 ∼ σ2 · χ2(n − 1).

By the definition of the χ2-distribution, this means that∑ni=1(Xi − X )2 is distributed as

σ2(Z 21 + . . .Z 2

n−1) = (σZ1)2 + . . .+ (σZn−1)2,

where the Zi are i.i.d. N(0, 1), so that σZi ∼ N(0, σ2).

Returning to the multivariate setting, let Z1, . . . ,Zm bei.i.d.Np(0,Σ). The distribution of the matrix

∑mi=1 ZiZi

′ is calledthe Wishart distribution, denoted Wm(Σ), where m is calleddegrees of freedom and Σ is called the scale matrix.

13/36

Estimation: Distribution of Sn

Theorem. Let X1, . . . ,Xn be i.i.d. observations from Np(µ,Σ).Then

(n − 1)S ∼Wn−1(Σ).

Properties of the Wishart distribution:

I If A1 ∼Wm1(Σ), A2 ∼Wm2(Σ) and A1 and A1 areindependent, then A1 + A2 ∼Wm1+m2(Σ), i.e. their sum isWishart distributed with m1 + m2 degrees of freedom.

I If A ∼Wm1(Σ) then CAC′ ∼Wm1(CΣC′).

14/36

Assessing normality

I The assumption of normality is fundamental for manymethods of multivariate statistics.

I Due to the multivariate central limit theorem, methods basedon the normal distribution can often be used asapproximations for large n, but it is often better to use other(perhaps non-parametric) methods if the data is non-normalor if n is small.

I For multivariate data, the ways in which distributions candeviate from normality are many and varied.

I Using univariate normality tests on the marginal distributionsmay miss departures in multivariate combinations of variables.

I Using multivariate tests may dilute the effects of a singlenon-normal variable.

15/36

Assessing normality: Graphical methods

Graphical presentations of the data can be very useful for detectingdeviations from normality. Useful methods include:

I Scatter plots

I Q-Q-plots of marginal distributions

I χ2-plots (β-plots)

16/36

Assessing normality: Scatter plotsHere all variables are normal. The histograms resemble the normaldensity and the point clouds are elliptic.

x1

−2 −1 0 1 2

●

●●

●●

●●

●●

●

●

●● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●●

●●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●●

●●

● ●

●●

●

●●●

●

●

●

●

●

● ●

●

●

●

●●

●●●

●

●

●●

●●

●

●●

●

●●

●

●

●●

●

●●

● ●

●●●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●●

●●

●

●

●

●

●●

●●

●●

●

●●

●

● ●●●●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●●

●●

●●

●●

●●

●

●

●● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

● ●

●●

●

●

●

● ●●

●

●

●●

●

●

●

●

●

●●

●●

● ●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●●●

●

●

●●

●●

●

●●●

●●

●

●

●●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●●

●●

●

●

●

●

●●

●●

●●

●

●●

●

● ●●● ●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

−8 −6 −4 −2

−5

05

10

●

●●

●●

●●

●●

●

●

●● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

● ●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

● ●

●●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●●

●●

● ●

●●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●●

●●●

●

●

●●

●●

●

●●

●

●●

●

●

●●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●●

●●

●

●

●

●

●●

●●

●●

●

●●

●

● ●● ●●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

−2

−1

01

2

●

●

●

●

●

●●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

● ●●

●●

● ●

●

●

●●

●

●

●●●

●

●●

●●●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

● ●●

●●

●

● ●

●

●

●

● ●

●

●

●●

●●

●●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

● ●

●●●

● ●●

●

●

●

●

●

●●●

● ●

●●

●●

●●

●●

●

●●

●

● ●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●●

x2

●

●

●

●

●

●●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

● ●●

●●

● ●

●

●

●●

●

●

● ●●

●

●●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

● ●●

●●

●

● ●

●

●

●

● ●

●

●

●●

●●

●●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

● ●

●●●

●●●

●

●

●

●

●

●●●

● ●

●●

●●

●●

●●●

●●

●

● ●

●

●

●

●●●

●

●

● ●

●

●

●

●

●

●

●● ●

●

●

●

●

●●●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●● ●

●

●

●

●

●●●

●●

●●

●

●

●●

●

●

● ●●

●

●●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●● ●

●●

●

●●

●

●

●

●●

●

●

●●

●●

●●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

● ●

●● ●

●●●

●

●

●

●

●

●●●

●●

●●

●●

●●

●●

●

●●

●

● ●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●●●●

●

●

●

●●

●●●●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●

●● ●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●●●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●●

●

●

●

●

●●●

●

●

●

●

●●

●●

●

●●

●●

●●

●

●

●●

●●

●●

●●

●

●

● ●

●

●●

●

● ●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

● ●

●

●

●●

●

●

● ●

●

●●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●●

●●●●

●

●

●

●●

●●

●●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

● ●●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

● ●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●●

● ●

●

●

●

●

● ●●

●

●

●

●

●●

●●

●

●●

●●

●●

●

●

●●

●●

●●

●●

●

●

●●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●●

●

●

●●

x3

−5

05

10

●

●

●●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●●● ●

●

●

●

●●

●●●●

●

●

●●

●

●●

●

● ●

●

●●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

● ●●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●●

●

●

●

●

● ●●

●

●

●

●

●●

●●

●

●●

●●●●

●

●

●●

●●

●●

●●

●

●

●●

●

●●

●

●●

●

●

●

●

●●

●

●

●

● ●

●

●●

●

●●

●●

●

●

●●

−5 0 5 10

−8

−6

−4

−2

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●●

●

● ●●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●●●● ●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●●

●●

●

●● ●●

●

●

●

●●

●●

●

● ●

●● ●

●

●●

●● ●

●

●●●

●

●

●

● ●●

●

● ●

●●

●

●

● ●

●

●

●●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●●

●

● ●●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●●

● ● ●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●●

●●●

● ●●●

●

●

●

●●

●●

●

●●

●●●

●

●●

●●●●

●●●

●

●

●

●●●

●

● ●

●●

●

●

●●

●

●

●●

−5 0 5 10

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●●

●

● ●●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●●

●● ●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●●

●●

●

●●●●

●

●

●

●●

●●

●

● ●

● ●●

●

●●

●●●

●

●●●

●

●

●

● ●●

●

● ●

●●

●

●

●●

●

●

●●

x4

17/36

Assessing normality: Scatter plotsHere X3 and X4 are non-normal. Their histograms are far from thenormal density and the clouds are not elliptic.

x1

−3 −1 0 1 2 3

●●

● ●

●

●

●●

●

●

●●

●

●

●

●●

●●

●

● ●

●

●

●●

●

●●

●

●●● ●

●●

●

●

●

●

●●

●

●

●●

●

●

●●

●●

●

●

●

●

●● ●

●●

●

●

●

●

●●●

● ●

●●

●

●●

●

●

●●●

●

● ●

●●

●●

●

●

●●

● ●●

●

●

●

●●●●

●●

●

● ●●

●

●●

●●

●●●

●●

●

●

●

● ●●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●●

●●

●

●

●

●

● ●

●●●

●

●

●●

●

●

●

● ●

●

●

●●

●

●

●●

●

● ●

●

●

●

● ●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●●

●

●

●●

●

●

●

●●

●●

●

●●

●

●

●●

●

●●

●

●●● ●

●●

●

●

●

●

●●●

●

●●

●

●

●●

●●

●

●

●

●

●●●

●●

●

●

●

●

● ●●

● ●

●●

●

●●

●

●

● ●●

●

● ●

●●

●●

●

●

●●

● ● ●

●

●

●

●●●●

●●

●

●●●

●

● ●

●●

●● ●

●●

●

●

●

●●●

●●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●● ●

●●

●

●

●

●

●●

●●●

●

●

●●

●

●

●

● ●

●

●

●●

●

●

●●

●

●●

●

●

●

● ●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

−0.5 0.5 1.5 2.5

−5

05

10

●●

●●

●

●

●●

●

●

●●

●

●

●

●●

●●

●

● ●

●

●

●●

●

●●

●

●● ●●●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●●

●

●

●

●

●● ●

●●

●

●

●

●

●●●

● ●

●●

●

●●

●

●

● ●●

●

●●

●●●

●

●

●

●●

●●●

●

●

●

●● ●●

●●

●

●●●

●

● ●

●●

●●●

●●

●

●

●

●●●

●●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

● ●●

●●

●

●

●

●

●●

●●●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●●● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

−3

−1

01

23

●●

●●

●●

●

●

●●

●●●

●

●

●

● ● ●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●●●● ●

●

●

●

●

●

●

●

●● ●

●●●

●●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

● ●

●

●

●●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

● ● ●●

●

●

●●

●

●

●

●●●

●●

●

●

●●

●

●

●

●●

●●

●

● ●

●

●

●

●

●

●

●

x2●

●●●●

●●

●

●●

●●●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●●●●

●

●

●

●

●

●

●● ●

●●●

●● ●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●●

●

●

●

●

●●

●

●

●●●

●

●

●●

●

●●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●●●●

●

●

●●

●

●

●

●●●

●●

●

●

●●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●●

●●●

●●

●

●●

●●●

●

●

●

●● ●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●

●●●

●● ●●● ●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●● ●●

●

●

●●

●

●

●

●●●

●●

●

●

●●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●

●●●

●

●

●

●

●

●●● ●

●

●●●

●

●●

●

●● ●

●

●

●

●

●

●

●

●

● ●

●●

●●

●

● ●●●

●

●

●

●

●●●

●●

●

●

●

●

●●

●

●

●

●

●●

● ●●

●

●

●

●

●

●

●●

●

●

●●

●

●

● ●

●

●

●

●

●●

●●

●●●

●

●

●

●●●●●

●●

● ●●

●

●

●●●

●

●

●●

●

●

●●

●

● ● ●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●●●●

●

●

●

●

●

●

●

● ● ●●

●

●

●

●

●●●

●●

● ●

●

●●●

●

●

●

●

●

●

●

●

●● ● ●

●

●

● ●

●

● ●

●

● ●●

●

●●●

●

●

●

●

●

●● ●●

●

●● ●

●

●●

●

●●●

●

●

●

●

●

●

●

●

● ●

●●

● ●

●

●● ●●

●

●

●

●

●●●

● ●

●

●

●

●

●●

●

●

●

●

●●

●●●

●

●

●

●

●

●

●●●

●

●●

●

●

●●

●

●

●

●

●●

● ●

●● ●

●

●

●

●●● ●●

● ●

● ●●

●

●

● ●●

●

●

●●

●

●

●●

●

●● ●

●

●●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●●●

●

●

●

●

●

●

●

●●● ●

●

●

●

●

●● ●

●●

●●

●

●●●

●

●

●

●

●

●

●

●

●●● ●

●

●

● ●

●

●

x3

−0.

6−

0.4

−0.

20.

0

●

●

●●●

●

● ●●

●

●

●

●

●

● ●●●

●

●● ●

●

●●

●

●● ●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●●●●

●

●

●

●

●● ●

●●

●

●

●

●

●●

●

●

●

●

●●

●●●

●

●

●

●

●

●

●●

●

●

●●

●

●

● ●

●

●

●

●

●●

●●

●●●

●

●

●

●● ●● ●

●●

●● ●

●

●

● ●●

●

●

●●

●

●

●●

●

●●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●

●● ●●

●

●

●

●

●●●

●●

●●

●

●● ●

●

●

●

●

●

●

●

●

● ●● ●

●

●

● ●

●

●

−5 0 5 10

−0.

50.

51.

52.

5

●●

●●●

●

●

●

●

●●

●●

● ●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

● ●●

●

●

●

●

● ●●

●●●

●

●

●

●

●

●

●●

●

●●

●

●

● ●● ●

●

●

●

●●

●●●

●

●

●●

●

●

● ●

●

●●

●

●●

●

●

●

●

●

●

●●

●●

● ●

●

●

●

●● ●

●

●

●

●

●

●●●●

●●●●

● ●

●

●●

●

●

●

●●

●

●●

●

●

●●

●●●

●●

●

●●

●●

●● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●●

●

●●

●●●

●●

●

● ●●

● ●

●●

●

●

●

●

●

●●

●●

●●●

●

●

●

●

●●

●●

● ●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●

●

● ●●

●● ●

●

●

●

●

●

●

●●

●

●●

●

●

●●●●

●

●

●

●●

●●●●

●

●●

●

●

●●

●

●●

●

● ●

●

●

●

●

●

●

●●

● ●

●●

●

●

●

● ●●

●

●

●

●

●

●● ●●

●●● ●

● ●

●

●●●

●

●

●●

●

● ●

●

●

●●

●● ●

●●

●

●●

●●

●● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●●●

●

●●

●●●

●●

●

●●●

● ●

●●

●

●

●

●

●

●●

−0.6 −0.4 −0.2 0.0

●●

●●●

●

●

●

●

●●

●●

● ●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●●

●

●

●●●

●

●

●

●

● ●●

●●●

●

●

●

●

●

●

●●

●

●●

●

●

●●● ●

●

●

●

●●

●●●●

●

● ●

●

●

● ●

●

●●

●

●●

●

●

●

●

●

●

●●●●

●●

●

●

●

●● ●

●

●

●

●

●

●●●●

●●● ●

●●

●

●●

●

●

●

●●

●

●●

●

●

●●

●● ●

●●

●

●●

●●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●●●

●●

●●●

●●

●

● ●●

● ●

●●

●

●

●

●

●

●●

x4

18/36

Assessing normality: Q-Q-plotsNormal samples. First row n = 15, second row n = 40 and thirdrow n = 100.

●

●

●

●●●

●

●

●

●

●

●

●

●

●

−1 0 1

−2.

0−

0.5

0.5

1.5

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●

●

●●

●

●

●

●

●

●

●

●

●

●●

−1 0 1

−1.

00.

01.

0

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

−1 0 1

−3

−2

−1

01

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

−2 −1 0 1 2

−1

01

2

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

−2 −1 0 1 2

−2

−1

01

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●●

●●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

2

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●●

●●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●●●●

●

●●

●

●

●●

● ●●

●●

●

●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●●

●

●

●●

●●●

−2 −1 0 1 2

−3

−1

12

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

● ●●●

●

●●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

−2 −1 0 1 2

−2

−1

01

2

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

−2 −1 0 1 2

−2

−1

01

2

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

19/36

Assessing normality: Q-Q-plotsFirst column: uniform distribution. Second column: exponentialdistribution: Third column: β(1/2, 1/2) distribution (bimodal).

●

●

●

●●

●

●

●●

●

●

●

● ●●

−1 0 1

0.2

0.4

0.6

0.8

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

−1 0 1

0.0

1.0

2.0

3.0

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●●

●●

●

●

●

●

●

●

●

●

●

●

−1 0 1

0.0

0.4

0.8

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

−2 −1 0 1 2

0.2

0.6

1.0

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●●●● ●

−2 −1 0 1 2

01

23

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

0.0

0.4

0.8

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

0.0

0.4

0.8

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●●

●

●

●

●●●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●●●●

●

●

●●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●●

●

●●●

●

●

●

−2 −1 0 1 2

−3

−1

01

2

Normal Q−Q Plot


Sam

ple

Qua

ntile

s ●●

●

●

● ●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●●●

●

●

●●

●

●

●

●●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

−2 −1 0 1 2

0.0

0.4

0.8

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

20/36

Assessing normality: χ2-plots (β-plots)For normal data, (Xj − X)′Σ−1(Xj − X) ∼ χ2

p.

Idea: for large n

d2j = (xj − x)′S−1(xj − x)

should be approximately χ2p-distributed. We could thus do a

Q-Q-plot of d2j against the χ2

p quantiles. This reduces thep-dimensional data to just one dimension and a single Q-Q-plotinstead of p plots.

Problem: convergence to χ2p turns out to be slow. However,

Gnanadesikan and Kettenring (1972) showed that

n · d2j

(n − 1)2∼ β(p/2, (n − p − 1)/2).

and thus the quantiles from the beta distribution are moreappropriate to use.Gnanadesikan & Kettenring (1972), Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data,Biometrics, 28, pp. 81-124.

21/36


p.

Idea: for large n

d2j = (xj − x)′S−1(xj − x)

should be approximately χ2p-distributed.

We could thus do aQ-Q-plot of d2

j against the χ2p quantiles. This reduces the

p-dimensional data to just one dimension and a single Q-Q-plotinstead of p plots.



n · d2j

(n − 1)2∼ β(p/2, (n − p − 1)/2).


21/36


p.

Idea: for large n

d2j = (xj − x)′S−1(xj − x)

should be approximately χ2p-distributed. We could thus do a

Q-Q-plot of d2j against the χ2

p quantiles. This reduces thep-dimensional data to just one dimension and a single Q-Q-plotinstead of p plots.



n · d2j

(n − 1)2∼ β(p/2, (n − p − 1)/2).


21/36

Assessing normality: Formal tests

I Univariate data:I The Shapiro-Wilk test.I Tests based on skewness and kurtosis.I Univariate tests can be used in multivariate analysis by looking

at the marginal distributions one by one.

I Mardia’s tests.I The tests are generalizations of tests based on skewness and

kurtosis.

22/36

Assessing normality: Shapiro-Wilk

For a univariate sample, consider the order statisticsx(1) ≤ x(2) ≤ . . . ≤ x(n).

The Shapiro-Wilk test is based on thestatistic

W =

(∑ni=1 aix(i)

)2∑ni=1(x(i) − x)2

where the ai come from the covariance matrix of the observations.

I Published 1956.

I Scale and location invariant.

I A formalization of Q-Q-plots.

23/36

Assessing normality: Shapiro-Wilk

For a univariate sample, consider the order statisticsx(1) ≤ x(2) ≤ . . . ≤ x(n). The Shapiro-Wilk test is based on thestatistic

W =

(∑ni=1 aix(i)

)2∑ni=1(x(i) − x)2

where the ai come from the covariance matrix of the observations.

I Published 1956.


I A formalization of Q-Q-plots.

23/36

Assessing normality: Skewness and kurtosisFor a univariate random variable X the skewness is

γ =E (X − µ)3

σ3

and the kurtosis is

κ =E (X − µ)4

σ4− 3.

Both these quantities are 0 for the normal distribution, butnon-zero for many other distributions.

In particular, all symmetric distributions have γ = 0. κ is related tohow heavy the tails of the distribution are, and to some extent tobimodality.

To use skewness and kurtosis for a tests for normality, compute

γ =1n

∑ni=1(xi − x)3

( 1n∑n

i=1(xi − x)2)3/2and κ =

1n

∑ni=1(xi − x)4

( 1n∑n

i=1(xi − x)2)2

and reject the hypothesis of normality if the statistics are too farfrom 0.

24/36

Assessing normality: Univariate tests

I The skewness test based on γ is sensitive against asymmetricdistributions but not against kurtotic distributions.

I The kurtosis test based on κ is sensitive against kurtoticdistributions but not against asymmetric distributions.

I Should we use the skewness test or the kurtosis test?Rule of thumb: for inference about µ, we should worry moreabout asymmetric distributions. For inference about σ2,deviations in kurtosis is more dangerous.

I The Shapiro-Wilk test is usually less sensitive than γ and κagainst asymmetric and kurtotic alternatives, respectively, buthas high average power against all classes of alternatives.

25/36




I Should we use the skewness test or the kurtosis test?

Rule of thumb: for inference about µ, we should worry moreabout asymmetric distributions. For inference about σ2,deviations in kurtosis is more dangerous.


25/36




I Should we use the skewness test or the kurtosis test?Rule of thumb: for inference about µ, we should worry moreabout asymmetric distributions. For inference about σ2,deviations in kurtosis is more dangerous.


25/36


I A necessary, but not sufficient, condition for a distribution tobe multivariate normal is that all marginal distributions arenormal.

I Some univariate normality test can be used for each of themarginal variables in the p-variate sample.

I However, the variables may be dependent. If so, the outcomesof the p tests will also be dependent!

I What is the joint significance level of the normality tests?How can we control this level?

I One way of handling this problem is to use Bonferroni’sinequality. (We’ll discuss this in the next lecture.)

I Some authors suggest reducing the dimension of the problemby performing a univariate normality test on e1xj , where e1 isthe eigenvector corresponding to the largest eigenvalue of S.(More on this when we discuss PCA.)

26/36

Assessing normality: Mardia’s multivariate testsMardia’s tests for multivariate normality are based on the statistic

dij = (xi − x)′S−1(xj − x).

The test statistics

γ2p =1

n2

n∑i ,j=1

d3ij and κp =

1

n

n∑i=1

d2ii

are generalizations of γ2 and κ.

I Published 1970.


I Extends the notions of skewness and kurtosis to themultivariate setting.

I Various generalizations exist, where the dij are used in slightlydifferent ways.

27/36

Assessing normality: Recommendations

When assessing multivariate normality, it is often a good idea touse more than one method in order to account for differentpossibilities of deviations from normality.

I Inspect scatter plots and univariate Q-Q-plots.

I Perform univariate tests of normality on each variable.

I Use methods based on dimension reduction – test some linearcombination and look at the β plot.

I Use a test for multivariate normality.

Care must be taken to make sure that the joint significance level ofthe tests is reasonable!

28/36

Outliers

I Graphical methodsI Scatter plotsI Chernoff facesI StarsI Andrew’s curves

I Examine standardized observations zjk = (xjk − xk)/√

skk andlook for unusually large or small values.

I Examine d2j = (xj − x)′S−1(xj − x) and look for unusually

large values.

I Wilks’ test

29/36

Outliers: Wilks’ test

Wilks’ test for outliers in a multivariate normal sample is based onthe statistic

Λ = minj

(Λj)

where Λj = 1− n(n−1)2 d2

j ; recall that d2j = (xj − x)′S−1(xj − x).

If Λ is small, there are likely outliers in the sample.

I Published in 1963.

I A formalization of β plots.

I Equivalent to looking at max(d2j ).

I Related to the hat matrix in linear regression.

I A multitude of extensions and variations of the test exists.

30/36

Transformations to normality

If the data is found to be non-normal, it is still possible that it canbe transformed to normality.

A useful family of power transformations was described by Box andCox in 1964, along with a method of determining whichtransformation to use.

31/36

Transformations: Box–Cox transformation

Assumption: xik > 0. Box and Cox (1964):

x(λ)ik =

{xλik−1λ when λ 6= 0

ln(xik) when λ = 0

where i = 1, . . . , n and k is fixed.

Note that, by L’Hospital’s rule, it can be shown that

limλ→0(xλik − 1)/λ = ln(xik)

32/36


Assumption: xik > 0. Box and Cox (1964):

x(λ)ik =

{xλik−1λ when λ 6= 0

ln(xik) when λ = 0

where i = 1, . . . , n and k is fixed.

Which λ should we choose? A maximum likelihood approach is totry to find the λ that maximizes g(λ) = −1

2n ln(s2(λ)). Rewrite as:

g(λ) = (λ− 1)n∑

i=1

ln(xik)− n

2ln[σ2(λ)]

where

σ2(λ) =1

ny(λ)′(I − H)y

R function: boxcox (in library MASS)

33/36


−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

−36

0−

340

−32

0−

300

−28

0−

260

λ

log−

Like

lihoo

d

95%

34/36

Transformations: Box–Cox method, notes

I Box–Cox gets upset by outliers — if one finds λ = 5 that isprobably the reason.

I If some xik < 0, sometimes adding a small constant to all xikcan work.

I If maxi xik/mini xik is small, Box–Cox will not do anything;power transforms are well approximated by lineartransformations over short intervals.

I Should estimation of λ count as an extra parameter to betaken into account of into the degrees of freedom? Difficultquestion: λ is not a linear parameter.

35/36

Summary

I Sample momentsI UnbiasednessI Asymptotics

I Estimation for the multivariate normal distributionI Maximum likelihood estimationI Distributions of estimators

I Assessing normalityI How to investigate the validity of the assumption of normality

I Outliers

I Transformations to normality

36/36

lecture 3: estimating parameters and assessing …thulin/mm/l3.pdfdepartment of mathematics, uppsala...

Documents