lec 3: model adequacy checkinggauss.stat.su.se/gu/ep/lec3.pdfying li lec 3: model adequacy checking...

30
Lec 3: Model Adequacy Checking Ying Li November 16, 2011 Ying Li Lec 3: Model Adequacy Checking

Upload: others

Post on 14-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Lec 3: Model Adequacy Checking

Ying Li

November 16, 2011

Ying Li Lec 3: Model Adequacy Checking

Page 2: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Model validation

Model validation is a very important step in the modelbuilding procedure. (one of the most overlooked)

A high R2 value does not guarantee that the model fits thedata well.

Use of a model that does not fit the data well can not providegood answers to the underlying scientific questions.

Ying Li Lec 3: Model Adequacy Checking

Page 3: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

An interesting example: Ascombe dataset

4 6 8 10 12 14

45

67

89

1011

x1

y1

●●

4 6 8 10 12 14

34

56

78

9

x2

y2

4 6 8 10 12 14

68

1012

x3

y3

8 10 12 14 16 18

68

1012

x4

y4

Ying Li Lec 3: Model Adequacy Checking

Page 4: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

4 6 8 10 12 14

45

67

89

1011

x1

y1a=3.0001

b=0.5001

R_square=0.667

●●

4 6 8 10 12 14

34

56

78

9

x2

y2

a=3.0001

b=0.5001

R_square=0.667

4 6 8 10 12 14

68

1012

x3

y3

a=3.0001

b=0.5001

R_square=0.667

8 10 12 14 16 18

68

1012

x4

y4

a=3.0001

b=0.5001

R_square=0.667

Ying Li Lec 3: Model Adequacy Checking

Page 5: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Main Tool: Graphical residual Analysis

Different types of plots of residuals (histgram, Plot of residualsin time sequence, plot of residuals versus fitted values) provideinformation on the adequacy of different aspects of the model.

Graphical methods have an advantage over numericalmethods in model validation

graphical methods: a broad range of complex aspectsnumerical methods: narrowly focused on a particular aspect(anumber)

Ying Li Lec 3: Model Adequacy Checking

Page 6: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Why use residuals?

If the model fit to the data were correct, the residuals wouldapproximate the random errors.

If the residuals appear to behave as the assumptions of theerror it suggests the model fit the data well.

Otherwise the model fits the data poorly.(non-normality,dependency, heteroscedasticity).

Ying Li Lec 3: Model Adequacy Checking

Page 7: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Two concepts

Homogeneity (Homoscedasticity) : In statistics, asequence or a vector of random variables is homoscedastic ifall random variables in the sequence or vector have the samefinite variance. This is also known as homogeneity of variance.

Heteroscedasticity: a collection of random variables isheteroscedastic, or heteroscedastic, if there aresub-populations that have different variabilities than others

Ying Li Lec 3: Model Adequacy Checking

Page 8: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

The assumptions for ANOVA

yij = µ+ τi + εij

Assumptions of the errors:

εij are normally distributed

εij are independent

εij has mean zero and constant variance σ2.

Ying Li Lec 3: Model Adequacy Checking

Page 9: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Check the normality

Normal probability plot.

Ying Li Lec 3: Model Adequacy Checking

Page 10: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Check the normality

Normal probability plot.

Ying Li Lec 3: Model Adequacy Checking

Page 11: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Plot of residuals in Time sequence

Detect the correlation between the residuals.

Ying Li Lec 3: Model Adequacy Checking

Page 12: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Plot of residuals versus fitted values

Ying Li Lec 3: Model Adequacy Checking

Page 13: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Plot of residuals versus fitted values

Detect the nonconstant variance.

Ying Li Lec 3: Model Adequacy Checking

Page 14: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Statistical Tests for Equality of Variance

H0 : σ21 = σ22 = · · · = σ2a

H1 : above not true for at least oneσ2i

Bartlett’s test

Modified Levene test

Ying Li Lec 3: Model Adequacy Checking

Page 15: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Bartlett’s test

The test statistic isχ20 = 2.3026

q

c

where

q = (N − a)log10S2p −

a∑i=1

(ni − 1)log10S2i

c = 1 +1

3(a− 1)(

a∑i=1

(ni − 1)−1 − (N − a)−1)

S2p =

∑ai=1(ni − 1)S2

i

N − a

and S2i is the sample variance of the ith population.

Ying Li Lec 3: Model Adequacy Checking

Page 16: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

modified Levene test

The modified Levene test uses the absolute deviation of theobservations yij in each treatment from the treatment median yi .Denote these deviations by

dij = |yij − yi |.

The modified Levene test then evaluates whether or not the meansof these deviations are equal for all treatment.

Apply the ANOVA F test.

Ying Li Lec 3: Model Adequacy Checking

Page 17: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

modified Levene test

The modified Levene test uses the absolute deviation of theobservations yij in each treatment from the treatment median yi .Denote these deviations by

dij = |yij − yi |.

The modified Levene test then evaluates whether or not the meansof these deviations are equal for all treatment.Apply the ANOVA F test.

Ying Li Lec 3: Model Adequacy Checking

Page 18: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Bartlett’s test, good accuracy, but very sensitive to normalityassumption

Modified Levene test, robust to departures from normality.

Ying Li Lec 3: Model Adequacy Checking

Page 19: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

A dilemma

Assume that we test for homogeneity and whether theresiduals are normally distributed or not.

The more observations, the easier it is to show that therequirement are not fulfilled.

Conclusion: The validity of the ANOVA is reduced with thenumber of observation.

Ying Li Lec 3: Model Adequacy Checking

Page 20: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Recommendation

ANOVA is robust against minor heteroscedasticity and minordeviations form the normal distribution.

Do not use tests, but study the residual plot.

Ying Li Lec 3: Model Adequacy Checking

Page 21: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Data transformation

Suppose that the standard deviation of y is proportional to apower of the mean of y such that

σy ∝ µα

We want to transform the data to yield a constant variance.Usually we use

y? = yλ

Then it can be shown that

σy? ∝ µλ+α−1.

If we set λ = 1− α, the variance of the transformed data y? isconstant.

Ying Li Lec 3: Model Adequacy Checking

Page 22: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

How to find α

Estimate α empirically from the data. Consider σyi = θµαi . Wemake the logs

log σyi = log θ + α logµi .

Ying Li Lec 3: Model Adequacy Checking

Page 23: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Ying Li Lec 3: Model Adequacy Checking

Page 24: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Example

A civil engineer is interested in determining whether four differentmethods of estimating flood flow frequency produce equivalentestimates of peak discharge when applied to the same watershed.

Ying Li Lec 3: Model Adequacy Checking

Page 25: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Example

Ying Li Lec 3: Model Adequacy Checking

Page 26: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Example

Ying Li Lec 3: Model Adequacy Checking

Page 27: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Example

Ying Li Lec 3: Model Adequacy Checking

Page 28: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Example

After square-root transformation

Ying Li Lec 3: Model Adequacy Checking

Page 29: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

Box-Cox transformation family

In the Box-Cox transformation model it is assumed that there issuch that a transformation of the observed data y according to:

yλ = {yλ−1λ , λ 6= 0

logy , λ = 0

Ying Li Lec 3: Model Adequacy Checking

Page 30: Lec 3: Model Adequacy Checkinggauss.stat.su.se/gu/ep/lec3.pdfYing Li Lec 3: Model Adequacy Checking modi ed Levene test The modi ed Levene test uses the absolute deviation of the observations

The Kruskal-Wallis Test

1 Rank the observation yij in ascending order.

2 Replace each observation it its rank Rij .

3 The test statistic is

H =1

S2[

a∑i=1

R2i .

ni− N(N + 1)2

4],

where

S2 =1

N − 1[

a∑i=1

ni∑j=1

R2ij −

N(N + 1)2

4].

4 If ni ≥ 5, and H0 is ture, H approximately∼ χa−1 .If H > χα,a−1, then the null hypothesis is rejected.

Ying Li Lec 3: Model Adequacy Checking