forecasting using r - rob j. hyndman · 2016. 10. 7. · forecasting using r di˙erencing 27....

Forecasting using R 1

Forecasting using R

Rob J Hyndman

2.3 Stationarity and differencing

Outline

1 Stationarity

2 Differencing

3 Unit root tests

4 Lab session 10

5 Backshift notation

Forecasting using R Stationarity 2

Stationarity

DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.

A stationary series is:

roughly horizontalconstant varianceno patterns predictable in the long-term

Stationarity

A stationary series is:

roughly horizontalconstant varianceno patterns predictable in the long-term

Stationary?

3600

3700

3800

3900

4000

0 50 100 150 200 250 300Day

Dow

Jon

es In

dex

Stationary?

−100

−50

0

50

0 50 100 150 200 250 300Day

Cha

nge

in D

ow J

ones

Inde

x

Stationary?

4000

5000

6000

1950 1955 1960 1965 1970 1975 1980Year

Num

ber

of s

trik

es

Stationary?

40

60

80

1975 1980 1985 1990 1995Year

Tota

l sal

es

Sales of new one−family houses, USA

Stationary?

100

200

300

1900 1920 1940 1960 1980Year

$

Price of a dozen eggs in 1993 dollars

Stationary?

80

90

100

110

1990 1991 1992 1993 1994 1995Year

thou

sand

s

Number of pigs slaughtered in Victoria

Stationary?

0

2000

4000

6000

1820 1840 1860 1880 1900 1920Year

Num

ber

trap

ped

Annual Canadian Lynx Trappings

Stationary?

400

450

500

1995 2000 2005Year

meg

alitr

es

Australian quarterly beer production

Stationarity

Transformations help to stabilize the variance.For ARIMA modelling, we also need to stabilize the mean.

Stationarity

Transformations help to stabilize the variance.For ARIMA modelling, we also need to stabilize the mean.

Non-stationarity in the mean

Identifying non-stationary series

time plot.The ACF of stationary data drops to zero relativelyquicklyThe ACF of non-stationary data decreases slowly.For non-stationary data, the value of r1 is often largeand positive.

Example: Dow-Jones index

3600

3700

3800

3900

4000

0 50 100 150 200 250 300Day

Dow

Jon

es In

dex

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20 25Lag

AC

F

Series: dj

−100

−50

0

50

0 50 100 150 200 250 300Day

Cha

nge

in D

ow J

ones

Inde

x

−0.10

−0.05

0.00

0.05

0.10

0 5 10 15 20 25Lag

AC

F

Series: diff(dj)

Outline

1 Stationarity

2 Differencing

3 Unit root tests

4 Lab session 10

Forecasting using R Differencing 18

Differencing

Differencing helps to stabilize the mean.The differenced series is the change between eachobservation in the original series: y′t = yt − yt−1.The differenced series will have only T − 1 valuessince it is not possible to calculate a difference y′1 forthe first observation.

Second-order differencing

Occasionally the differenced data will not appearstationary and it may be necessary to difference the data asecond time:

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.

y′′t will have T − 2 values.In practice, it is almost never necessary to go beyondsecond-order differences.

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.

Seasonal differencing

A seasonal difference is the difference between anobservation and the corresponding observation from theprevious year.

y′t = yt − yt−mwherem = number of seasons.

For monthly datam = 12.For quarterly datam = 4.

Electricity production

autoplot(usmelec)

200

300

400

1980 1990 2000 2010Time

usm

elec

autoplot(log(usmelec))

5.1

5.4

5.7

6.0

1980 1990 2000 2010Time

log(

usm

elec

)

autoplot(diff(log(usmelec), 12))

0.0

0.1

1980 1990 2000 2010Time

diff(

log(

usm

elec

), 1

2)

autoplot(diff(diff(log(usmelec),12),1))

−0.15

−0.10

−0.05

0.00

0.05

0.10

1980 1990 2000 2010Time

diff(

log(

usm

elec

), 1

2), 1

)

Electricity productionSeasonally differenced series is closer to beingstationary.Remaining non-stationarity can be removed withfurther first difference.

If y′t = yt − yt−12 denotes seasonally differenced series,then twice-differenced series is

y∗t = y′t − y′t−1

= (yt − yt−12)− (yt−1 − yt−13)= yt − yt−1 − yt−12 + yt−13 .

When both seasonal and first differences are applied. . .

it makes no difference which is done first—the resultwill be the same.If seasonality is strong, we recommend that seasonaldifferencing be done first because sometimes theresulting series will be stationary and there will be noneed for further first difference.

It is important that if differencing is used, the differencesare interpretable.

Interpretation of differencing

first differences are the change between oneobservation and the next;seasonal differences are the change between oneyear to the next.

But taking lag 3 differences for yearly data, for example,results in a model which cannot be sensibly interpreted.

Interpretation of differencing

first differences are the change between oneobservation and the next;seasonal differences are the change between oneyear to the next.

But taking lag 3 differences for yearly data, for example,results in a model which cannot be sensibly interpreted.

Outline

1 Stationarity

2 Differencing

3 Unit root tests

4 Lab session 10

Forecasting using R Unit root tests 29

Unit root tests

Statistical tests to determine the required order ofdifferencing.

1 Augmented Dickey Fuller test: null hypothesis is thatthe data are non-stationary and non-seasonal.

2 Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: nullhypothesis is that the data are stationary andnon-seasonal.

3 Other tests available for seasonal data.

Dickey-Fuller test

Test for “unit root”

Estimate regression model

y′t = φyt−1 + b1y′t−1 + b2y

′t−2 + · · · + bky′t−k

where y′t denotes differenced series yt − yt−1.Number of lagged terms, k, is usually set to be about3.If original series, yt, needs differencing, φ̂ ≈ 0.If yt is already stationary, φ̂ < 0.In R: Use adf.test().

Dickey-Fuller test in R

adf.test(x,alternative = c("stationary", "explosive"),k = trunc((length(x)-1)^(1/3)))

k = bT − 1c1/3Set alternative = stationary.

adf.test(dj)

#### Augmented Dickey-Fuller Test#### data: dj## Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816## alternative hypothesis: stationary

adf.test(dj)

How many differences?

ndiffs(x)nsdiffs(x)

ndiffs(dj)

## [1] 1

nsdiffs(hsales)

## [1] 0

Outline

1 Stationarity

2 Differencing

3 Unit root tests

4 Lab session 10

Forecasting using R Lab session 10 34

Lab Session 10

Forecasting using R Lab session 10 35

Outline

1 Stationarity

2 Differencing

3 Unit root tests

4 Lab session 10

Forecasting using R Backshift notation 36

Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:

Byt = yt−1 .

In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:

B(Byt) = B2yt = yt−2 .

For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.

Byt = yt−1 .

Backshift notation

The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as

y′t = yt − yt−1 = yt − Byt = (1− B)yt .

Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:

y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

Backshift notation

y′t = yt − yt−1 = yt − Byt = (1− B)yt .

y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

Backshift notation

y′t = yt − yt−1 = yt − Byt = (1− B)yt .

y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

Backshift notation

y′t = yt − yt−1 = yt − Byt = (1− B)yt .

y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

Backshift notation

Second-order difference is denoted (1− B)2.Second-order difference is not the same as a seconddifference, which would be denoted 1− B2;In general, a dth-order difference can be written as

(1− B)dyt.

A seasonal difference followed by a first differencecan be written as

(1− B)(1− Bm)yt .

Backshift notation

The “backshift” notation is convenient because the termscan be multiplied together to see the combined effect.

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

For monthly data,m = 12 and we obtain the same result asearlier.

Backshift notation

The “backshift” notation is convenient because the termscan be multiplied together to see the combined effect.

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

For monthly data,m = 12 and we obtain the same result asearlier.

StationarityDifferencingUnit root testsLab session 10Backshift notation

forecasting using r - rob j. hyndman · 2016. 10. 7. · forecasting using r di˙erencing 27....

Documents