forecasting using r - rob j. hyndman · 2016. 10. 7. · forecasting using r di˙erencing 27....

58
Forecasting using R 1 Forecasting using R Rob J Hyndman 2.3 Stationarity and dierencing

Upload: others

Post on 05-Feb-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

  • Forecasting using R 1

    Forecasting using R

    Rob J Hyndman

    2.3 Stationarity and differencing

  • Outline

    1 Stationarity

    2 Differencing

    3 Unit root tests

    4 Lab session 10

    5 Backshift notation

    Forecasting using R Stationarity 2

  • Stationarity

    DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.

    A stationary series is:

    roughly horizontalconstant varianceno patterns predictable in the long-term

    Forecasting using R Stationarity 3

  • Stationarity

    DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.

    A stationary series is:

    roughly horizontalconstant varianceno patterns predictable in the long-term

    Forecasting using R Stationarity 3

  • Stationary?

    3600

    3700

    3800

    3900

    4000

    0 50 100 150 200 250 300Day

    Dow

    Jon

    es In

    dex

    Forecasting using R Stationarity 4

  • Stationary?

    −100

    −50

    0

    50

    0 50 100 150 200 250 300Day

    Cha

    nge

    in D

    ow J

    ones

    Inde

    x

    Forecasting using R Stationarity 5

  • Stationary?

    4000

    5000

    6000

    1950 1955 1960 1965 1970 1975 1980Year

    Num

    ber

    of s

    trik

    es

    Forecasting using R Stationarity 6

  • Stationary?

    40

    60

    80

    1975 1980 1985 1990 1995Year

    Tota

    l sal

    es

    Sales of new one−family houses, USA

    Forecasting using R Stationarity 7

  • Stationary?

    100

    200

    300

    1900 1920 1940 1960 1980Year

    $

    Price of a dozen eggs in 1993 dollars

    Forecasting using R Stationarity 8

  • Stationary?

    80

    90

    100

    110

    1990 1991 1992 1993 1994 1995Year

    thou

    sand

    s

    Number of pigs slaughtered in Victoria

    Forecasting using R Stationarity 9

  • Stationary?

    0

    2000

    4000

    6000

    1820 1840 1860 1880 1900 1920Year

    Num

    ber

    trap

    ped

    Annual Canadian Lynx Trappings

    Forecasting using R Stationarity 10

  • Stationary?

    400

    450

    500

    1995 2000 2005Year

    meg

    alitr

    es

    Australian quarterly beer production

    Forecasting using R Stationarity 11

  • Stationarity

    DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.

    Transformations help to stabilize the variance.For ARIMA modelling, we also need to stabilize the mean.

    Forecasting using R Stationarity 12

  • Stationarity

    DefinitionIf {yt} is a stationary time series, then for all s, thedistribution of (yt, . . . , yt+s) does not depend on t.

    Transformations help to stabilize the variance.For ARIMA modelling, we also need to stabilize the mean.

    Forecasting using R Stationarity 12

  • Non-stationarity in the mean

    Identifying non-stationary series

    time plot.The ACF of stationary data drops to zero relativelyquicklyThe ACF of non-stationary data decreases slowly.For non-stationary data, the value of r1 is often largeand positive.

    Forecasting using R Stationarity 13

  • Example: Dow-Jones index

    3600

    3700

    3800

    3900

    4000

    0 50 100 150 200 250 300Day

    Dow

    Jon

    es In

    dex

    Forecasting using R Stationarity 14

  • Example: Dow-Jones index

    0.00

    0.25

    0.50

    0.75

    1.00

    0 5 10 15 20 25Lag

    AC

    F

    Series: dj

    Forecasting using R Stationarity 15

  • Example: Dow-Jones index

    −100

    −50

    0

    50

    0 50 100 150 200 250 300Day

    Cha

    nge

    in D

    ow J

    ones

    Inde

    x

    Forecasting using R Stationarity 16

  • Example: Dow-Jones index

    −0.10

    −0.05

    0.00

    0.05

    0.10

    0 5 10 15 20 25Lag

    AC

    F

    Series: diff(dj)

    Forecasting using R Stationarity 17

  • Outline

    1 Stationarity

    2 Differencing

    3 Unit root tests

    4 Lab session 10

    5 Backshift notation

    Forecasting using R Differencing 18

  • Differencing

    Differencing helps to stabilize the mean.The differenced series is the change between eachobservation in the original series: y′t = yt − yt−1.The differenced series will have only T − 1 valuessince it is not possible to calculate a difference y′1 forthe first observation.

    Forecasting using R Differencing 19

  • Second-order differencing

    Occasionally the differenced data will not appearstationary and it may be necessary to difference the data asecond time:

    y′′t = y′t − y′t−1

    = (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.

    y′′t will have T − 2 values.In practice, it is almost never necessary to go beyondsecond-order differences.

    Forecasting using R Differencing 20

  • Second-order differencing

    Occasionally the differenced data will not appearstationary and it may be necessary to difference the data asecond time:

    y′′t = y′t − y′t−1

    = (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.

    y′′t will have T − 2 values.In practice, it is almost never necessary to go beyondsecond-order differences.

    Forecasting using R Differencing 20

  • Second-order differencing

    Occasionally the differenced data will not appearstationary and it may be necessary to difference the data asecond time:

    y′′t = y′t − y′t−1

    = (yt − yt−1)− (yt−1 − yt−2)= yt − 2yt−1 + yt−2.

    y′′t will have T − 2 values.In practice, it is almost never necessary to go beyondsecond-order differences.

    Forecasting using R Differencing 20

  • Seasonal differencing

    A seasonal difference is the difference between anobservation and the corresponding observation from theprevious year.

    y′t = yt − yt−mwherem = number of seasons.

    For monthly datam = 12.For quarterly datam = 4.

    Forecasting using R Differencing 21

  • Seasonal differencing

    A seasonal difference is the difference between anobservation and the corresponding observation from theprevious year.

    y′t = yt − yt−mwherem = number of seasons.

    For monthly datam = 12.For quarterly datam = 4.

    Forecasting using R Differencing 21

  • Seasonal differencing

    A seasonal difference is the difference between anobservation and the corresponding observation from theprevious year.

    y′t = yt − yt−mwherem = number of seasons.

    For monthly datam = 12.For quarterly datam = 4.

    Forecasting using R Differencing 21

  • Electricity production

    autoplot(usmelec)

    200

    300

    400

    1980 1990 2000 2010Time

    usm

    elec

    Forecasting using R Differencing 22

  • Electricity production

    autoplot(log(usmelec))

    5.1

    5.4

    5.7

    6.0

    1980 1990 2000 2010Time

    log(

    usm

    elec

    )

    Forecasting using R Differencing 23

  • Electricity production

    autoplot(diff(log(usmelec), 12))

    0.0

    0.1

    1980 1990 2000 2010Time

    diff(

    log(

    usm

    elec

    ), 1

    2)

    Forecasting using R Differencing 24

  • Electricity production

    autoplot(diff(diff(log(usmelec),12),1))

    −0.15

    −0.10

    −0.05

    0.00

    0.05

    0.10

    1980 1990 2000 2010Time

    diff(

    diff(

    log(

    usm

    elec

    ), 1

    2), 1

    )

    Forecasting using R Differencing 25

  • Electricity productionSeasonally differenced series is closer to beingstationary.Remaining non-stationarity can be removed withfurther first difference.

    If y′t = yt − yt−12 denotes seasonally differenced series,then twice-differenced series is

    y∗t = y′t − y′t−1

    = (yt − yt−12)− (yt−1 − yt−13)= yt − yt−1 − yt−12 + yt−13 .

    Forecasting using R Differencing 26

  • Seasonal differencing

    When both seasonal and first differences are applied. . .

    it makes no difference which is done first—the resultwill be the same.If seasonality is strong, we recommend that seasonaldifferencing be done first because sometimes theresulting series will be stationary and there will be noneed for further first difference.

    It is important that if differencing is used, the differencesare interpretable.

    Forecasting using R Differencing 27

  • Seasonal differencing

    When both seasonal and first differences are applied. . .

    it makes no difference which is done first—the resultwill be the same.If seasonality is strong, we recommend that seasonaldifferencing be done first because sometimes theresulting series will be stationary and there will be noneed for further first difference.

    It is important that if differencing is used, the differencesare interpretable.

    Forecasting using R Differencing 27

  • Seasonal differencing

    When both seasonal and first differences are applied. . .

    it makes no difference which is done first—the resultwill be the same.If seasonality is strong, we recommend that seasonaldifferencing be done first because sometimes theresulting series will be stationary and there will be noneed for further first difference.

    It is important that if differencing is used, the differencesare interpretable.

    Forecasting using R Differencing 27

  • Interpretation of differencing

    first differences are the change between oneobservation and the next;seasonal differences are the change between oneyear to the next.

    But taking lag 3 differences for yearly data, for example,results in a model which cannot be sensibly interpreted.

    Forecasting using R Differencing 28

  • Interpretation of differencing

    first differences are the change between oneobservation and the next;seasonal differences are the change between oneyear to the next.

    But taking lag 3 differences for yearly data, for example,results in a model which cannot be sensibly interpreted.

    Forecasting using R Differencing 28

  • Outline

    1 Stationarity

    2 Differencing

    3 Unit root tests

    4 Lab session 10

    5 Backshift notation

    Forecasting using R Unit root tests 29

  • Unit root tests

    Statistical tests to determine the required order ofdifferencing.

    1 Augmented Dickey Fuller test: null hypothesis is thatthe data are non-stationary and non-seasonal.

    2 Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: nullhypothesis is that the data are stationary andnon-seasonal.

    3 Other tests available for seasonal data.

    Forecasting using R Unit root tests 30

  • Dickey-Fuller test

    Test for “unit root”

    Estimate regression model

    y′t = φyt−1 + b1y′t−1 + b2y

    ′t−2 + · · · + bky′t−k

    where y′t denotes differenced series yt − yt−1.Number of lagged terms, k, is usually set to be about3.If original series, yt, needs differencing, φ̂ ≈ 0.If yt is already stationary, φ̂ < 0.In R: Use adf.test().

    Forecasting using R Unit root tests 31

  • Dickey-Fuller test in R

    adf.test(x,alternative = c("stationary", "explosive"),k = trunc((length(x)-1)^(1/3)))

    k = bT − 1c1/3Set alternative = stationary.

    adf.test(dj)

    #### Augmented Dickey-Fuller Test#### data: dj## Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816## alternative hypothesis: stationary

    Forecasting using R Unit root tests 32

  • Dickey-Fuller test in R

    adf.test(x,alternative = c("stationary", "explosive"),k = trunc((length(x)-1)^(1/3)))

    k = bT − 1c1/3Set alternative = stationary.

    adf.test(dj)

    #### Augmented Dickey-Fuller Test#### data: dj## Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816## alternative hypothesis: stationary

    Forecasting using R Unit root tests 32

  • Dickey-Fuller test in R

    adf.test(x,alternative = c("stationary", "explosive"),k = trunc((length(x)-1)^(1/3)))

    k = bT − 1c1/3Set alternative = stationary.

    adf.test(dj)

    #### Augmented Dickey-Fuller Test#### data: dj## Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816## alternative hypothesis: stationary

    Forecasting using R Unit root tests 32

  • How many differences?

    ndiffs(x)nsdiffs(x)

    ndiffs(dj)

    ## [1] 1

    nsdiffs(hsales)

    ## [1] 0

    Forecasting using R Unit root tests 33

  • Outline

    1 Stationarity

    2 Differencing

    3 Unit root tests

    4 Lab session 10

    5 Backshift notation

    Forecasting using R Lab session 10 34

  • Lab Session 10

    Forecasting using R Lab session 10 35

  • Outline

    1 Stationarity

    2 Differencing

    3 Unit root tests

    4 Lab session 10

    5 Backshift notation

    Forecasting using R Backshift notation 36

  • Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:

    Byt = yt−1 .

    In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:

    B(Byt) = B2yt = yt−2 .

    For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.

    Forecasting using R Backshift notation 37

  • Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:

    Byt = yt−1 .

    In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:

    B(Byt) = B2yt = yt−2 .

    For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.

    Forecasting using R Backshift notation 37

  • Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:

    Byt = yt−1 .

    In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:

    B(Byt) = B2yt = yt−2 .

    For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.

    Forecasting using R Backshift notation 37

  • Backshift notationA very useful notational device is the backward shiftoperator, B, which is used as follows:

    Byt = yt−1 .

    In other words, B, operating on yt, has the effect ofshifting the data back one period. Two applications of Bto yt shifts the data back two periods:

    B(Byt) = B2yt = yt−2 .

    For monthly data, if we wish to shift attention to “thesame month last year,” then B12 is used, and the notationis B12yt = yt−12.

    Forecasting using R Backshift notation 37

  • Backshift notation

    The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as

    y′t = yt − yt−1 = yt − Byt = (1− B)yt .

    Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:

    y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

    Forecasting using R Backshift notation 38

  • Backshift notation

    The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as

    y′t = yt − yt−1 = yt − Byt = (1− B)yt .

    Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:

    y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

    Forecasting using R Backshift notation 38

  • Backshift notation

    The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as

    y′t = yt − yt−1 = yt − Byt = (1− B)yt .

    Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:

    y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

    Forecasting using R Backshift notation 38

  • Backshift notation

    The backward shift operator is convenient for describingthe process of differencing. A first difference can bewritten as

    y′t = yt − yt−1 = yt − Byt = (1− B)yt .

    Note that a first difference is represented by (1− B).Similarly, if second-order differences (i.e., first differencesof first differences) have to be computed, then:

    y′′t = yt − 2yt−1 + yt−2 = (1− B)2yt .

    Forecasting using R Backshift notation 38

  • Backshift notation

    Second-order difference is denoted (1− B)2.Second-order difference is not the same as a seconddifference, which would be denoted 1− B2;In general, a dth-order difference can be written as

    (1− B)dyt.

    A seasonal difference followed by a first differencecan be written as

    (1− B)(1− Bm)yt .

    Forecasting using R Backshift notation 39

  • Backshift notation

    The “backshift” notation is convenient because the termscan be multiplied together to see the combined effect.

    (1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

    For monthly data,m = 12 and we obtain the same result asearlier.

    Forecasting using R Backshift notation 40

  • Backshift notation

    The “backshift” notation is convenient because the termscan be multiplied together to see the combined effect.

    (1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

    For monthly data,m = 12 and we obtain the same result asearlier.

    Forecasting using R Backshift notation 40

    StationarityDifferencingUnit root testsLab session 10Backshift notation