stat 497 lecture note 9 diagnostic checks 1. after identifying and estimating a time series model,...

40
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1

Upload: neil-douglas

Post on 18-Jan-2016

223 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

STAT 497LECTURE NOTE 9

DIAGNOSTIC CHECKS

1

Page 2: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

DIAGNOSTIC CHECKS

• After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the assumptions should be checked. If we have a perfect model fit, then we can construct the ARIMA forecasts.

2

Page 3: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

1. NORMALITY OF ERRORS• Check the histogram of the standardized

residuals, .

• Draw Normal QQ-plot of the standardized

residuals (should be a straight line on 450 line).

• Look at Tukey’s simple 5-number summary +

skewness (should be 0 for normal)+ kurtosis

(should be 3 for normal) or excess kurtosis

(should be 0 for normal)

ata ˆ

3

Page 4: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

1. NORMALITY OF ERRORS

• Jarque-Bera Normality Test: Skewness and kurtosis are used for constructing this test statistic. JB (1981) tests whether the coefficients of skewness and excess kurtosis are jointly 0.

kurtosisaE

aE

skewnessaE

aE

t

t

t

t

22

4

2

2/32

3

1

4

Page 5: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

1. NORMALITY OF ERRORS

• JB test statistic

.~

43ˆ

ˆ6

22

.2

221 0

approx

H under n

JB

2/3

1

2

1

3

11

1

ˆ

n

tt

n

tt

YYn

YYn

2

1

2

1

4

21

1

ˆ

n

tt

n

tt

YYn

YYn

• JB> , then reject the null hypothesis that residuals are normally distributed.

22,

5

Page 6: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

1. NORMALITY OF ERRORS

• The chi-square approximation, however, is overly sensitive for small samples, rejecting the null hypothesis often when it is in fact true. Furthermore, the distribution of p-values departs from a uniform distribution and becomes a right-skewed uni-modal distribution, especially for small p-values. This leads to a large Type I error rate. The table below shows some p-values approximated by a chi-square distribution that differ from their true alpha levels for very small samples.

• You can also use Shapiro-Wilk test.

6

Page 7: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

1. NORMALITY OF ERRORS

Calculated p-value equivalents to true alpha levels at given sample sizes

True α level 20 30 50 70 100

.1 .307 .252 .201 .183 .1560

.05 .1461 .109 .079 .067 .062

.025 .051 .0303 .020 .016 .0168

.01 .0064 .0033 .0015 .0012 .002

7

Page 8: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• In OLS regression, time residuals are often found to be serially correlated with their own lagged values.

• Serial correlation means– OLS is no longer an efficient linear estimator.– Standard errors are incorrect and generally

overstated.– OLS estimates are biased and inconsistent if a

lagged dependent variable is used as a regressor.

8

Page 9: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• Durbin-Watson test is for regular regression with independent variables. It is not appropriate for time series models with lagged dependent variables. It only tests for AR(1) errors. There should be a constant term and deterministic independent variables in the model.

9

Page 10: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• Serial Correlation Lagrange Multiplier (Breusch-Godfrey) Test is valid in the presence of lagged dependent variables. It tests for AR(p) errors.

t.at time residual theˆ ta

.,0~

ˆˆmodellinear auxilary theˆ2

11

ut

trtrtt

Nuwhere

uaaa

10

Page 11: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• The test hypothesis:

• Test statistic:

.0 oneleast at :H

order)th -r toupation autocorrel serial No(

0 and 0 and 0:

i1

210

rH

22 ~ rRrn

Obtained from the auxiliary regression11

Page 12: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• Determination of r: No obvious answer exists.

In empirical studies

– for AR, ARMA: r=p+1 lags

– For seasonal, r=s.

12

Page 13: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• Ljung-Box (Modified Box-Pierce) or Portmanteau Lack-of-Fit Test: Box and Pierce (1970) have developed a test to check the autocorrelation structure of the residuals. Then, it is modified by Ljung and Box.

• The null hypothesis to be tested:

0: 210 KH

13

Page 14: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• The test statistic:

k lagat ˆ

nsobservatio ofnumber then

length lag maximum the where

ˆ21

21

sACF

K

knnnQ

k

K

kk

14

Page 15: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

2. DETECTION OF THE SERIAL CORRELATION

• If the correct model is estimated,

• If , reject H0. This means that the autocorrelation exists in residuals. Assumption is violated. Check the model again. It is better to add another lag in AR or MA part of the model.

. where~ 2.

qpmQ mK

approx

2TableQ

15

Page 16: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• Heteroskedasticity is a violation of the constant error variance assumption. It occurs if variance of error changing by time.

.2ttaVar

16

Page 17: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• ACF-PACF PLOT OF SQUARED RESIDUALS: Since {at} is a zero mean process, the variance of at is defined by the expected value of squared at’s. So, if at’s are homoscedastic, the variance will be constant (not change over time) and when we look at the ACF and PACF plots of squared residuals, they should be in 95% WN limits. If not, this is a sign of heteroscedasticity.

17

Page 18: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY• Let rt be the log return of an asset at time t. We are going to look at

the study of volatility: the series is either serially uncorrelated or with minor lower order serial correlations, but it is a dependent series.

• Examine the ACF for the residuals and squared residuals for the calamari catch data. The catch data had a definite seasonality, which was removed. Then, the remaining series was modelled with an AR(5) model and the residuals of this model are obtained.

• There are various definitions of what constitutes weak dependence of a time series. However, the operational definition of independence here will be that both the autocorrelation functions of the series and the squared series show no autocorrelation. If there is no serial correlation of the series but there is of the squared series, then we will say there is weak dependence. This will lead us to examine the volatility of the series, since that is exemplified by the squared terms.

18

Page 19: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

Lag

Auto

corr

ela

tion

16151413121110987654321

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Autocorrelation Function for RESI1(with 5% significance limits for the autocorrelations)

Lag

Auto

corr

ela

tion

16151413121110987654321

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Autocorrelation Function for C3(with 5% significance limits for the autocorrelations)

Figure 1: Residuals after AR(5) fitted to the deseasoned calamari data

Figure 2: Autocorrelation of the squared residuals

19

Page 20: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

Figure 3: Autocorrelation for the log returns for the Intel series

Lag

Auto

corr

ela

tion

605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Autocorrelation Function for C1(with 5% significance limits for the autocorrelations)

20

Page 21: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

Lag

Auto

corr

ela

tion

605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Autocorrelation Function for C2(with 5% significance limits for the autocorrelations)

Lag

Part

ial A

uto

corr

ela

tion

605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Partial Autocorrelation Function for C2(with 5% significance limits for the partial autocorrelations)

Figure 4: ACF of the squared returns Figure 5: PACF for squared returns

Combining these three plots, it appears that this series is serially uncorrelated but dependent. Volatility models attempt to capture such dependence in the return series

21

Page 22: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• If we ignore heterocedasticity:– The OLS estimator is unbiased but not efficient.

The GLS or WLS is the Gauss-Markov estimator.– The estimate of the variance of the OLS estimator

is a biased estimator of the true variance. The classical testing procedures are invalidated.

• Now, the question is how we can detect heteroscedasticity?

22

Page 23: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• White’s General Test for Heteroscedasticity:

• After identified model is estimated, we obtain the residuals, . Then, can be written as

constant,, 221

20 atttt YYaEaVarH

ta 2ˆta

2112

2221122110

2

221102 ˆˆˆ

tttttt

tttt

YYYYYY

YYYa

23

Page 24: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• Then, construct the following artificial regression

• The homocedastic case implies that 1 = 2 = ... = 1 = 2=…= 1= 2=…= 0, therefore

tttttttt uYYYYYYa 2112

2221122110

2

.,, 0212 ttt YYaE

24

Page 25: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• Then, the test statistics is given by

under the null hypothesis of homoscedasticity where m is the number of variables in artificial regression except the constant term.

22

ˆ~2 mat

nR

25

Page 26: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• The Breush-Pagan Test: It is a Lagrange-Multiplier test for heteroscedasticity. Consider the IF of a time series. Let’s assume that we can write our model in AR(m).

• Then, consider testing tt

mm aYBB 011

constant,, 221

20 atttt YYaEaVarH

26

Page 27: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• Note that we need to evaluate the conditional (on the independent variables) expectation of the squared of the error term,

• The homocedastic case implies that 1 = 2 = ... = m = 0.

tmtmtt uYYa 1102

27

Page 28: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• The problem, however, is that we do not know the error term , but it can be replaced by an estimate . A simple approach is to run a regression,

and test if the slope coefficients are all equal to zero.

2ta

2ˆta

tmtmtt uYYa 1102ˆ

28

Page 29: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• The test statistic

under the null hypothesis of homoscedasticity where m is the number of variables in artificial regression except the constant term.

22

ˆ~2 mat

nR

29

Page 30: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

3. DETECTING HETEROSCEDASTICITY

• If we reject the null hypothesis, this means that the error variance is not constant. It is changing over time. Therefore, we need to model the volatility. ARCH (Autoregressive Conditional Heteroskedasticity) or GARCH (Generalized Autoregressive Conditional Heteroskedasticity) modeling helps us to model the error variance.

30

Page 31: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> library(TSA)> fit=arima(beer,order=c(3,1,0),seasonal=list(order=c(3,0,0), period=4))> par(mfrow=c(1,3))> plot(window(rstandard(fit),start=c(1975,1)), ylab='Standardized Residuals',type='o')> abline(h=0)> acf(as.vector(window(rstandard(fit),start=c(1975,1))), lag.max=36)> pacf(as.vector(window(rstandard(fit),start=c(1975,1))), lag.max=36)

31

Page 32: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> fit2=arima(beer,order=c(2,1,1),seasonal=list(order=c(3,0,1), period=4))> fit2

Call:arima(x = beer, order = c(2, 1, 1), seasonal = list(order = c(3, 0, 1), period = 4))

Coefficients: ar1 ar2 ma1 sar1 sar2 sar3 sma1 -0.2567 -0.4255 -0.4990 1.1333 0.2656 -0.3991 -0.9721s.e. 0.1426 0.1280 0.1501 0.1329 0.1656 0.1248 0.1160

sigma^2 estimated as 1.564: log likelihood = -157.47, aic = 328.95

> plot(window(rstandard(fit2),start=c(1975,1)), ylab='Standardized Residuals',type='o')> abline(h=0)> acf(as.vector(window(rstandard(fit2),start=c(1975,1))), lag.max=36)> pacf(as.vector(window(rstandard(fit2),start=c(1975,1))), lag.max=36)

32

Page 33: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)

33

Page 34: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> hist(rstandard(fit2), xlab='Standardized Residuals')

34

Page 35: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> qqnorm(rstandard(fit2))> qqline(rstandard(fit2))

35

Page 36: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> shapiro.test(window(rstandard(fit2),start=c(1975,1)))

Shapiro-Wilk normality test

data: window(rstandard(fit2), start = c(1975, 1)) W = 0.9857, p-value = 0.4181

> jarque.bera.test(resid(fit2))

Jarque Bera Test

data: resid(fit2) X-squared = 1.0508, df = 2, p-value = 0.5913

36

Page 37: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> tsdiag(fit2)

37

Page 38: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> Box.test(resid(fit2),lag=15,type = c("Ljung-Box"))

Box-Ljung test

data: resid(fit2) X-squared = 24.2371, df = 15, p-value = 0.06118

> Box.test(resid(fit2),lag=15,type = c("Box-Pierce"))

Box-Pierce test

data: resid(fit2) X-squared = 21.4548, df = 15, p-value = 0.1229

38

Page 39: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> rr=resid(fit2)^2> par(mfrow=c(1,2))> acf(rr)> pacf(rr)

39

Page 40: STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the

EXAMPLE (BEER)> par(mfrow=c(1,1))> result=plot(fit2,n.ahead=12,ylab='Series & Forecasts',col=NULL,pch=19)> abline(h=coef(fit2))> forecast=result$pred > cbind(beer,forecast)> plot(fit2,n1=1975,n.ahead=12,ylab='Series, Forecasts, Actuals & Limits', pch=19)

40