forecasting using r - rob j. hyndman · 2016. 10. 7. · forecasting using r di˙erencing 27....
TRANSCRIPT
-
Forecasting using R 1
Forecasting using R
Rob J Hyndman
2.3 Stationarity and differencing
-
Outline
1 Stationarity
2 Differencing
3 Unit root tests
4 Lab session 10
5 Backshift notation
Forecasting using R Stationarity 2
-
Stationarity
DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontalconstant varianceno patterns predictable in the long-term
Forecasting using R Stationarity 3
-
Stationarity
DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontalconstant varianceno patterns predictable in the long-term
Forecasting using R Stationarity 3
-
Stationary?
3600
3700
3800
3900
4000
0 50 100 150 200 250 300Day
Dow
Jon
es In
dex
Forecasting using R Stationarity 4
-
Stationary?
−100
−50
0
50
0 50 100 150 200 250 300Day
Cha
nge
in D
ow J
ones
Inde
x
Forecasting using R Stationarity 5
-
Stationary?
4000
5000
6000
1950 1955 1960 1965 1970 1975 1980Year
Num
ber
of s
trik
es
Forecasting using R Stationarity 6
-
Stationary?
40
60
80
1975 1980 1985 1990 1995Year
Tota
l sal
es
Sales of new one−family houses, USA
Forecasting using R Stationarity 7
-
Stationary?
100
200
300
1900 1920 1940 1960 1980Year
$
Price of a dozen eggs in 1993 dollars
Forecasting using R Stationarity 8
-
Stationary?
80
90
100
110
1990 1991 1992 1993 1994 1995Year
thou
sand
s
Number of pigs slaughtered in Victoria
Forecasting using R Stationarity 9
-
Stationary?
0
2000
4000
6000
1820 1840 1860 1880 1900 1920Year
Num
ber
trap
ped
Annual Canadian Lynx Trappings
Forecasting using R Stationarity 10
-
Stationary?
400
450
500
1995 2000 2005Year
meg
alitr
es
Australian quarterly beer production
Forecasting using R Stationarity 11
-
Stationarity
DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.
Transformations help to stabilize the variance.For ARIMA modelling, we also need to stabilize the mean.
Forecasting using R Stationarity 12
-
Stationarity
DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.
Transformations help to stabilize the variance.For ARIMA modelling, we also need to stabilize the mean.
Forecasting using R Stationarity 12
-
Non-stationarity in the mean
Identifying non-stationary series
time plot.The ACF of stationary data drops to zero relativelyquicklyThe ACF of non-stationary data decreases slowly.For non-stationary data, the value of r1 is often largeand positive.
Forecasting using R Stationarity 13
-
Example: Dow-Jones index
3600
3700
3800
3900
4000
0 50 100 150 200 250 300Day
Dow
Jon
es In
dex
Forecasting using R Stationarity 14
-
Example: Dow-Jones index
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25Lag
AC
F
Series: dj
Forecasting using R Stationarity 15
-
Example: Dow-Jones index
−100
−50
0
50
0 50 100 150 200 250 300Day
Cha
nge
in D
ow J
ones
Inde
x
Forecasting using R Stationarity 16
-
Example: Dow-Jones index
−0.10
−0.05
0.00
0.05
0.10
0 5 10 15 20 25Lag
AC
F
Series: diff(dj)
Forecasting using R Stationarity 17
-
Outline
1 Stationarity
2 Differencing
3 Unit root tests
4 Lab session 10
5 Backshift notation
Forecasting using R Differencing 18
-
Differencing
Differencing helps to stabilize the mean.The differenced series is the change between eachobservation in the original series: y′t = yt − yt−1.The differenced series will have only T − 1 valuessince it is not possible to calculate a difference y′1 forthe first observation.
Forecasting using R Differencing 19
-
Second-order differencing
Occasionally the differenced data will not appearstationary and it may be necessary to difference the data asecond time:
y′′t = y′t − y′t−1
= (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.
y′′t will have T − 2 values.In practice, it is almost never necessary to go beyondsecond-order differences.
Forecasting using R Differencing 20
-
Second-order differencing
Occasionally the differenced data will not appearstationary and it may be necessary to difference the data asecond time:
y′′t = y′t − y′t−1
= (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.
y′′t will have T − 2 values.In practice, it is almost never necessary to go beyondsecond-order differences.
Forecasting using R Differencing 20
-
Second-order differencing
Occasionally the differenced data will not appearstationary and it may be necessary to difference the data asecond time:
y′′t = y′t − y′t−1
= (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.
y′′t will have T − 2 values.In practice, it is almost never necessary to go beyondsecond-order differences.
Forecasting using R Differencing 20
-
Seasonal differencing
A seasonal difference is the difference between anobservation and the corresponding observation from theprevious year.
y′t = yt − yt−mwherem = number of seasons.
For monthly datam = 12.For quarterly datam = 4.
Forecasting using R Differencing 21
-
Seasonal differencing
A seasonal difference is the difference between anobservation and the corresponding observation from theprevious year.
y′t = yt − yt−mwherem = number of seasons.
For monthly datam = 12.For quarterly datam = 4.
Forecasting using R Differencing 21
-
Seasonal differencing
A seasonal difference is the difference between anobservation and the corresponding observation from theprevious year.
y′t = yt − yt−mwherem = number of seasons.
For monthly datam = 12.For quarterly datam = 4.
Forecasting using R Differencing 21
-
Electricity production
autoplot(usmelec)
200
300
400
1980 1990 2000 2010Time
usm
elec
Forecasting using R Differencing 22
-
Electricity production
autoplot(log(usmelec))
5.1
5.4
5.7
6.0
1980 1990 2000 2010Time
log(
usm
elec
)
Forecasting using R Differencing 23
-
Electricity production
autoplot(diff(log(usmelec), 12))
0.0
0.1
1980 1990 2000 2010Time
diff(
log(
usm
elec
), 1
2)
Forecasting using R Differencing 24
-
Electricity production
autoplot(diff(diff(log(usmelec),12),1))
−0.15
−0.10
−0.05
0.00
0.05
0.10
1980 1990 2000 2010Time
diff(
diff(
log(
usm
elec
), 1
2), 1
)
Forecasting using R Differencing 25
-
Electricity productionSeasonally differenced series is closer to beingstationary.Remaining non-stationarity can be removed withfurther first difference.
If y′t = yt − yt−12 denotes seasonally differenced series,then twice-differenced series is
y∗t = y′t − y′t−1
= (yt − yt−12)− (yt−1 − yt−13)= yt − yt−1 − yt−12 + yt−13 .
Forecasting using R Differencing 26
-
Seasonal differencing
When both seasonal and first differences are applied. . .
it makes no difference which is done first—the resultwill be the same.If seasonality is strong, we recommend that seasonaldifferencing be done first because sometimes theresulting series will be stationary and there will be noneed for further first difference.
It is important that if differencing is used, the differencesare interpretable.
Forecasting using R Differencing 27
-
Seasonal differencing
When both seasonal and first differences are applied. . .
it makes no difference which is done first—the resultwill be the same.If seasonality is strong, we recommend that seasonaldifferencing be done first because sometimes theresulting series will be stationary and there will be noneed for further first difference.
It is important that if differencing is used, the differencesare interpretable.
Forecasting using R Differencing 27
-
Seasonal differencing
When both seasonal and first differences are applied. . .
it makes no difference which is done first—the resultwill be the same.If seasonality is strong, we recommend that seasonaldifferencing be done first because sometimes theresulting series will be stationary and there will be noneed for further first difference.
It is important that if differencing is used, the differencesare interpretable.
Forecasting using R Differencing 27
-
Interpretation of differencing
first differences are the change between oneobservation and the next;seasonal differences are the change between oneyear to the next.
But taking lag 3 differences for yearly data, for example,results in a model which cannot be sensibly interpreted.
Forecasting using R Differencing 28
-
Interpretation of differencing
first differences are the change between oneobservation and the next;seasonal differences are the change between oneyear to the next.
But taking lag 3 differences for yearly data, for example,results in a model which cannot be sensibly interpreted.
Forecasting using R Differencing 28
-
Outline
1 Stationarity
2 Differencing
3 Unit root tests
4 Lab session 10
5 Backshift notation
Forecasting using R Unit root tests 29
-
Unit root tests
Statistical tests to determine the required order ofdifferencing.
1 Augmented Dickey Fuller test: null hypothesis is thatthe data are non-stationary and non-seasonal.
2 Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: nullhypothesis is that the data are stationary andnon-seasonal.
3 Other tests available for seasonal data.
Forecasting using R Unit root tests 30
-
Dickey-Fuller test
Test for “unit root”
Estimate regression model
y′t = φyt−1 + b1y′t−1 + b2y
′t−2 + · · · + bky′t−k
where y′t denotes differenced series yt − yt−1.Number of lagged terms, k, is usually set to be about3.If original series, yt, needs differencing, φ̂ ≈ 0.If yt is already stationary, φ̂ < 0.In R: Use adf.test().
Forecasting using R Unit root tests 31
-
Dickey-Fuller test in R
adf.test(x,alternative = c("stationary", "explosive"),k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3Set alternative = stationary.
adf.test(dj)
#### Augmented Dickey-Fuller Test#### data: dj## Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816## alternative hypothesis: stationary
Forecasting using R Unit root tests 32
-
Dickey-Fuller test in R
adf.test(x,alternative = c("stationary", "explosive"),k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3Set alternative = stationary.
adf.test(dj)
#### Augmented Dickey-Fuller Test#### data: dj## Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816## alternative hypothesis: stationary
Forecasting using R Unit root tests 32
-
Dickey-Fuller test in R
adf.test(x,alternative = c("stationary", "explosive"),k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3Set alternative = stationary.
adf.test(dj)
#### Augmented Dickey-Fuller Test#### data: dj## Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816## alternative hypothesis: stationary
Forecasting using R Unit root tests 32
-
How many differences?
ndiffs(x)nsdiffs(x)
ndiffs(dj)
## [1] 1
nsdiffs(hsales)
## [1] 0
Forecasting using R Unit root tests 33
-
Outline
1 Stationarity
2 Differencing
3 Unit root tests
4 Lab session 10
5 Backshift notation
Forecasting using R Lab session 10 34
-
Lab Session 10
Forecasting using R Lab session 10 35
-
Outline
1 Stationarity
2 Differencing
3 Unit root tests
4 Lab session 10
5 Backshift notation
Forecasting using R Backshift notation 36
-
Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:
B(Byt) = B2yt = yt−2 .
For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.
Forecasting using R Backshift notation 37
-
Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:
B(Byt) = B2yt = yt−2 .
For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.
Forecasting using R Backshift notation 37
-
Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:
B(Byt) = B2yt = yt−2 .
For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.
Forecasting using R Backshift notation 37
-
Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:
B(Byt) = B2yt = yt−2 .
For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.
Forecasting using R Backshift notation 37
-
Backshift notation
The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as
y′t = yt − yt−1 = yt − Byt = (1− B)yt .
Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:
y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .
Forecasting using R Backshift notation 38
-
Backshift notation
The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as
y′t = yt − yt−1 = yt − Byt = (1− B)yt .
Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:
y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .
Forecasting using R Backshift notation 38
-
Backshift notation
The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as
y′t = yt − yt−1 = yt − Byt = (1− B)yt .
Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:
y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .
Forecasting using R Backshift notation 38
-
Backshift notation
The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as
y′t = yt − yt−1 = yt − Byt = (1− B)yt .
Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:
y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .
Forecasting using R Backshift notation 38
-
Backshift notation
Second-order difference is denoted (1− B)2.Second-order difference is not the same as a seconddifference, which would be denoted 1− B2;In general, a dth-order difference can be written as
(1− B)dyt.
A seasonal difference followed by a first differencecan be written as
(1− B)(1− Bm)yt .
Forecasting using R Backshift notation 39
-
Backshift notation
The “backshift” notation is convenient because the termscan be multiplied together to see the combined effect.
(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.
For monthly data,m = 12 and we obtain the same result asearlier.
Forecasting using R Backshift notation 40
-
Backshift notation
The “backshift” notation is convenient because the termscan be multiplied together to see the combined effect.
(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.
For monthly data,m = 12 and we obtain the same result asearlier.
Forecasting using R Backshift notation 40
StationarityDifferencingUnit root testsLab session 10Backshift notation