forecasting - rob j hyndman · exponential smoothing methods seasonal component trend n a m...

4. Exponential smoothing II

OTexts.com/fpp/7/Forecasting: Principles and Practice 1

Rob J Hyndman

Forecasting:

Principles and Practice

A confusing array of methods?

All these methods can be confusing!

How to choose between them?

The ETS framework provides an

automatic way of selecting the best

method.

It was developed to solve the problem

of automatically forecasting

pharmaceutical sales across thousands

of products.Forecasting: Principles and Practice 2

method.

Outline

1 Taxonomy of exponential smoothingmethods

2 Innovations state space models

3 ETS in R

4 Forecasting with ETS models

Forecasting: Principles and Practice Taxonomy of exponential smoothing methods 3

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothing

N,N: Simple exponential smoothingA,N: Holt’s linear method

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend method

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend method

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend method

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ method

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ methodA,M: Multiplicative Holt-Winters’ method

There are 15 separate exponential smoothing methods.

State space form

7/ exponential smoothing 149

ADDITIVE ERROR MODELS

Trend SeasonalN A M

N yt = `t−1 + εt yt = `t−1 + st−m + εt yt = `t−1st−m + εt`t = `t−1 +αεt `t = `t−1 +αεt `t = `t−1 +αεt/st−m

st = st−m +γεt st = st−m +γεt/`t−1

yt = `t−1 + bt−1 + εt yt = `t−1 + bt−1 + st−m + εt yt = (`t−1 + bt−1)st−m + εtA `t = `t−1 + bt−1 +αεt `t = `t−1 + bt−1 +αεt `t = `t−1 + bt−1 +αεt/st−m

bt = bt−1 + βεt bt = bt−1 + βεt bt = bt−1 + βεt/st−mst = st−m +γεt st = st−m +γεt/(`t−1 + bt−1)

yt = `t−1 +φbt−1 + εt yt = `t−1 +φbt−1 + st−m + εt yt = (`t−1 +φbt−1)st−m + εtAd `t = `t−1 +φbt−1 +αεt `t = `t−1 +φbt−1 +αεt `t = `t−1 +φbt−1 +αεt/st−m

bt = φbt−1 + βεt bt = φbt−1 + βεt bt = φbt−1 + βεt/st−mst = st−m +γεt st = st−m +γεt/(`t−1 +φbt−1)

yt = `t−1bt−1 + εt yt = `t−1bt−1 + st−m + εt yt = `t−1bt−1st−m + εtM `t = `t−1bt−1 +αεt `t = `t−1bt−1 +αεt `t = `t−1bt−1 +αεt/st−m

bt = bt−1 + βεt/`t−1 bt = bt−1 + βεt/`t−1 bt = bt−1 + βεt/(st−m`t−1)st = st−m +γεt st = st−m +γεt/(`t−1bt−1)

yt = `t−1bφt−1 + εt yt = `t−1b

φt−1 + st−m + εt yt = `t−1b

φt−1st−m + εt

Md `t = `t−1bφt−1 +αεt `t = `t−1b

φt−1 +αεt `t = `t−1b

φt−1 +αεt/st−m

bt = bφt−1 + βεt/`t−1 bt = b

φt−1 + βεt/`t−1 bt = b

φt−1 + βεt/(st−m`t−1)

st = st−m +γεt st = st−m +γεt/(`t−1bφt−1)

MULTIPLICATIVE ERROR MODELS

Trend SeasonalN A M

N yt = `t−1(1 + εt) yt = (`t−1 + st−m)(1 + εt) yt = `t−1st−m(1 + εt)`t = `t−1(1 +αεt) `t = `t−1 +α(`t−1 + st−m)εt `t = `t−1(1 +αεt)

st = st−m +γ(`t−1 + st−m)εt st = st−m(1 +γεt)

yt = (`t−1 + bt−1)(1 + εt) yt = (`t−1 + bt−1 + st−m)(1 + εt) yt = (`t−1 + bt−1)st−m(1 + εt)A `t = (`t−1 + bt−1)(1 +αεt) `t = `t−1 + bt−1 +α(`t−1 + bt−1 + st−m)εt `t = (`t−1 + bt−1)(1 +αεt)

bt = bt−1 + β(`t−1 + bt−1)εt bt = bt−1 + β(`t−1 + bt−1 + st−m)εt bt = bt−1 + β(`t−1 + bt−1)εtst = st−m +γ(`t−1 + bt−1 + st−m)εt st = st−m(1 +γεt)

yt = (`t−1 +φbt−1)(1 + εt) yt = (`t−1 +φbt−1 + st−m)(1 + εt) yt = (`t−1 +φbt−1)st−m(1 + εt)Ad `t = (`t−1 +φbt−1)(1 +αεt) `t = `t−1 +φbt−1 +α(`t−1 +φbt−1 + st−m)εt `t = (`t−1 +φbt−1)(1 +αεt)

bt = φbt−1 + β(`t−1 +φbt−1)εt bt = φbt−1 + β(`t−1 +φbt−1 + st−m)εt bt = φbt−1 + β(`t−1 +φbt−1)εtst = st−m +γ(`t−1 +φbt−1 + st−m)εt st = st−m(1 +γεt)

yt = `t−1bt−1(1 + εt) yt = (`t−1bt−1 + st−m)(1 + εt) yt = `t−1bt−1st−m(1 + εt)M `t = `t−1bt−1(1 +αεt) `t = `t−1bt−1 +α(`t−1bt−1 + st−m)εt `t = `t−1bt−1(1 +αεt)

bt = bt−1(1 + βεt) bt = bt−1 + β(`t−1bt−1 + st−m)εt/`t−1 bt = bt−1(1 + βεt)st = st−m +γ(`t−1bt−1 + st−m)εt st = st−m(1 +γεt)

yt = `t−1bφt−1(1 + εt) yt = (`t−1b

φt−1 + st−m)(1 + εt) yt = `t−1b

φt−1st−m(1 + εt)

Md `t = `t−1bφt−1(1 +αεt) `t = `t−1b

φt−1 +α(`t−1b

φt−1 + st−m)εt `t = `t−1b

φt−1(1 +αεt)

bt = bφt−1(1 + βεt) bt = b

φt−1 + β(`t−1b

φt−1 + st−m)εt/`t−1 bt = b

φt−1(1 + βεt)

st = st−m +γ(`t−1bφt−1 + st−m)εt st = st−m(1 +γεt)

Table 7.10: State space equationsfor each of the models in the ETSframework.

Outline

3 ETS in R

Forecasting: Principles and Practice Innovations state space models 6

Methods v Models

Algorithms that return point forecasts.

Innovations state space models

Generate same point forecasts but can alsogenerate forecast intervals.

A stochastic (or random) data generatingprocess that can generate an entire forecastdistribution.

Allow for “proper” model selection.

Methods v Models

ETS models

Each model has an observation equation andtransition equations, one for each state (level,trend, seasonal), i.e., state space models.

Two models for each method: one with additiveand one with multiplicative errors, i.e., in total30 models.ETS(Error,Trend,Seasonal):

Error= {A, M}Trend = {N, A, Ad, M, Md}Seasonal = {N, A, M}.

ETS models

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

General notation E T S : ExponenTial Smoothing↑

TrendExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

General notation E T S : ExponenTial Smoothing↑ ↖

Trend SeasonalExamples:

General notation E T S : ExponenTial Smoothing↗ ↑ ↖

Error Trend SeasonalExamples:

å All ETS models can be written in innovationsstate space form.

å Additive and multiplicative versions give thesame point forecasts but different predictionintervals.

ETS(A,N,N)

Observation equation yt = `t−1 + εt,

State equation `t = `t−1 + αεt

et = yt − yt|t−1 = εt

Assume εt ∼ NID(0, σ2)

“innovations” or “single source of error”because same error process, εt.

ETS(A,N,N)

ETS(M,N,N)

SES with multiplicative errors.

Specify relative errors εt =yt−yt|t−1

yt|t−1∼ NID(0, σ2)

Substituting yt|t−1 = `t−1 gives:yt = `t−1 + `t−1εtet = yt − yt|t−1 = `t−1εt

Observation equation yt = `t−1(1 + εt)

State equation `t = `t−1(1 + αεt)

Models with additive and multiplicative errorswith the same parameters generate the samepoint forecasts but different predictionintervals.

ETS(M,N,N)

yt|t−1∼ NID(0, σ2)

ETS(M,N,N)

yt|t−1∼ NID(0, σ2)

ETS(M,N,N)

yt|t−1∼ NID(0, σ2)

ETS(M,N,N)

yt|t−1∼ NID(0, σ2)

ETS(M,N,N)

yt|t−1∼ NID(0, σ2)

ETS(M,N,N)

yt|t−1∼ NID(0, σ2)

Holt’s linear method

ETS(A,A,N)

yt = `t−1 + bt−1 + εt

`t = `t−1 + bt−1 + αεt

bt = bt−1 + βεt

ETS(M,A,N)

yt = (`t−1 + bt−1)(1 + εt)

`t = (`t−1 + bt−1)(1 + αεt)

bt = bt−1 + β(`t−1 + bt−1)εt

Holt’s linear method

ETS(A,A,N)

yt = `t−1 + bt−1 + εt

`t = `t−1 + bt−1 + αεt

bt = bt−1 + βεt

ETS(M,A,N)

yt = (`t−1 + bt−1)(1 + εt)

`t = (`t−1 + bt−1)(1 + αεt)

bt = bt−1 + β(`t−1 + bt−1)εt

ETS(A,A,A)

Holt-Winters additive method with additive errors.

Forecast equation yt+h|t = `t + hbt + st−m+h+m

Observation equation yt = `t−1 + bt−1 + st−m + εt

State equations `t = `t−1 + bt−1 + αεt

bt = bt−1 + βεt

st = st−m + γεt

Forecast errors: εt = yt − yt|t−1

h+m = b(h− 1) mod mc+ 1.

Additive error models

Trend SeasonalN A M

φt−1st−m + εt

Md `t = `t−1bφt−1 +αεt `t = `t−1b

Trend SeasonalN A M

yt = `t−1bφt−1(1 + εt) yt = (`t−1b

φt−1 + st−m)(1 + εt) yt = `t−1b

Md `t = `t−1bφt−1(1 +αεt) `t = `t−1b

φt−1 +α(`t−1b

φt−1(1 +αεt)

φt−1 + β(`t−1b

φt−1(1 + βεt)

Multiplicative error models

Trend SeasonalN A M

φt−1st−m + εt

Md `t = `t−1bφt−1 +αεt `t = `t−1b

Trend SeasonalN A M

yt = `t−1bφt−1(1 + εt) yt = (`t−1b

φt−1 + st−m)(1 + εt) yt = `t−1b

Md `t = `t−1bφt−1(1 +αεt) `t = `t−1b

φt−1 +α(`t−1b

φt−1(1 +αεt)

φt−1 + β(`t−1b

φt−1(1 + βεt)

Let xt = (`t,bt, st, st−1, . . . , st−m+1) and εtiid∼ N(0, σ2).

yt = h(xt−1)︸︷︷︸+ k(xt−1)εt︸︷︷︸µt et

xt = f(xt−1) + g(xt−1)εt

Additive errors:k(x) = 1. yt = µt + εt.

Multiplicative errors:k(xt−1) = µt. yt = µt(1 + εt).εt = (yt − µt)/µt is relative error.

All the methods can be written in this statespace form.

The only difference between the additive errorand multiplicative error models is in theobservation equation.

Additive and multiplicative versions give thesame point forecasts.

Some unstable models

Some of the combinations of (Error, Trend,Seasonal) can lead to numerical difficulties; seeequations with division by a state.

These are: ETS(M,M,A), ETS(M,Md,A),ETS(A,N,M), ETS(A,A,M), ETS(A,Ad,M),ETS(A,M,N), ETS(A,M,A), ETS(A,M,M),ETS(A,Md,N), ETS(A,Md,A), and ETS(A,Md,M).

Models with multiplicative errors are useful forstrictly positive data – but are not numericallystable with data containing zeros or negativevalues. In that case only the six fully additivemodels will be applied.

Exponential smoothing models

Additive Error Seasonal ComponentTrend N A M

N (None) A,N,N A,N,A A,N,M

A (Additive) A,A,N A,A,A A,A,M

Ad (Additive damped) A,Ad,N A,Ad,A A,Ad,M

M (Multiplicative) A,M,N A,M,A A,M,M

Md (Multiplicative damped) A,Md,N A,Md,A A,Md,M

Multiplicative Error Seasonal ComponentTrend N A M

N (None) M,N,N M,N,A M,N,M

A (Additive) M,A,N M,A,A M,A,M

Ad (Additive damped) M,Ad,N M,Ad,A M,Ad,M

M (Multiplicative) M,M,N M,M,A M,M,M

Md (Multiplicative damped) M,Md,N M,Md,A M,Md,M

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Estimate parameters θ = (α, β, γ, φ) and initialstates x0 = (`0,b0, s0, s−1, . . . , s−m+1) byminimizing L∗.

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Estimate parameters θ = (α, β, γ, φ) and initialstates x0 = (`0,b0, s0, s−1, . . . , s−m+1) byminimizing L∗.

Parameter restrictionsUsual region

Traditional restrictions in the methods0 < α, β∗, γ∗, φ < 1 — equations interpreted asweighted averages.

In models we set β = αβ∗ and γ = (1− α)γ∗ therefore0 < α < 1, 0 < β < α and 0 < γ < 1− α.

0.8 < φ < 0.98 — to prevent numerical difficulties.

Admissible region

To prevent observations in the distant past having acontinuing effect on current forecasts.

Usually (but not always) less restrictive than theusual region.

For example for ETS(A,N,N):usual 0 < α < 1 — admissible is 0 < α < 2.

Admissible region

Model selectionAkaike’s Information Criterion

AIC = −2 log(Likelihood) + 2p

where p is the number of estimated parameters inthe model.

Minimizing the AIC gives the best model forprediction.

AIC corrected (for small sample bias)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Schwartz’ Bayesian IC

BIC = AIC + p(log(n)− 2)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

AICC = AIC +2(p+ 1)(p+ 2)

n− p

AICC = AIC +2(p+ 1)(p+ 2)

n− p

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Akaike’s Information Criterion

Value of AIC/AICc/BIC given in the R output.

AIC does not have much meaning by itself. Onlyuseful in comparison to AIC value for anothermodel fitted to same data set.

Consider several models with AIC values closeto the minimum.

A difference in AIC values of 2 or less is notregarded as substantial and you may choosethe simpler but non-optimal model.

AIC can be negative.

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each model that is appropriate to thedata. Optimize parameters and initial valuesusing MLE (or some other criterion).

Select best method using AICc:

Produce forecasts using best method.

Obtain prediction intervals using underlyingstate space model.

Method performed very well in M3 competition.

Outline

3 ETS in R

Forecasting: Principles and Practice ETS in R 25

Exponential smoothing

fit <- ets(ausbeer)fit2 <- ets(ausbeer,model="AAA",damped=FALSE)fcast1 <- forecast(fit, h=20)fcast2 <- forecast(fit2, h=20)

ets(y, model="ZZZ", damped=NULL, alpha=NULL,beta=NULL, gamma=NULL, phi=NULL,additive.only=FALSE,lower=c(rep(0.0001,3),0.80),upper=c(rep(0.9999,3),0.98),opt.crit=c("lik","amse","mse","sigma"), nmse=3,bounds=c("both","usual","admissible"),ic=c("aic","aicc","bic"), restrict=TRUE)

fit <- ets(ausbeer)fit2 <- ets(ausbeer,model="AAA",damped=FALSE)fcast1 <- forecast(fit, h=20)fcast2 <- forecast(fit2, h=20)

ets(y, model="ZZZ", damped=NULL, alpha=NULL,beta=NULL, gamma=NULL, phi=NULL,additive.only=FALSE,lower=c(rep(0.0001,3),0.80),upper=c(rep(0.9999,3),0.98),opt.crit=c("lik","amse","mse","sigma"), nmse=3,bounds=c("both","usual","admissible"),ic=c("aic","aicc","bic"), restrict=TRUE)

Exponential smoothing> fitETS(M,Md,M)

Smoothing parameters:alpha = 0.1776beta = 0.0454gamma = 0.1947phi = 0.9549

Initial states:l = 263.8531b = 0.9997s = 1.1856 0.9109 0.8612 1.0423

sigma: 0.0356

AIC AICc BIC2272.549 2273.444 2302.715

> fit2ETS(A,A,A)

Smoothing parameters:alpha = 0.2079beta = 0.0304gamma = 0.2483

Initial states:l = 255.6559b = 0.5687s = 52.3841 -27.1061 -37.6758 12.3978

sigma: 15.9053

AIC AICc BIC2312.768 2313.481 2339.583

ets() function

Automatically chooses a model by default usingthe AIC, AICc or BIC.

Can handle any combination of trend,seasonality and damping

Produces prediction intervals for every model

Ensures the parameters are admissible(equivalent to invertible)

Produces an object of class ets.

ets() function

ets objects

Methods: coef(), plot(), summary(),

residuals(), fitted(), simulate()

and forecast()

plot() function shows time plots of the

original time series along with the

extracted components (level, growth

and seasonal).

ets objects

Methods: coef(), plot(), summary(),

residuals(), fitted(), simulate()

and forecast()

plot() function shows time plots of the

original time series along with the

extracted components (level, growth

and seasonal).

1960 1970 1980 1990 2000 2010

Decomposition by ETS(M,Md,M) methodplot(fit)

Goodness-of-fit

> accuracy(fit)ME RMSE MAE MPE MAPE MASE

0.17847 15.48781 11.77800 0.07204 2.81921 0.20705

> accuracy(fit2)ME RMSE MAE MPE MAPE MASE

-0.11711 15.90526 12.18930 -0.03765 2.91255 0.21428

Forecast intervals

Forecasts from ETS(M,Md,M)

1995 2000 2005 2010

> plot(forecast(fit,level=c(50,80,95)))

Forecast intervals

Forecasts from ETS(M,Md,M)

1995 2000 2005 2010

> plot(forecast(fit,fan=TRUE))

ets() function also allows refitting model to newdata set.

> usfit <- ets(usnetelec[1:45])> test <- ets(usnetelec[46:55], model = usfit)

> accuracy(test)ME RMSE MAE MPE MAPE MASE

-3.35419 58.02763 43.85545 -0.07624 1.18483 0.52452

> accuracy(forecast(usfit,10), usnetelec[46:55])ME RMSE MAE MPE MAPE MASE

40.7034 61.2075 46.3246 1.0980 1.2620 0.6776

The ets() function in R

ets(y, model="ZZZ", damped=NULL,alpha=NULL, beta=NULL,gamma=NULL, phi=NULL,additive.only=FALSE,lambda=NULLlower=c(rep(0.0001,3),0.80),upper=c(rep(0.9999,3),0.98),opt.crit=c("lik","amse","mse","sigma"),nmse=3,bounds=c("both","usual","admissible"),ic=c("aic","aicc","bic"), restrict=TRUE)

yThe time series to be forecast.

modeluse the ETS classification and notation: “N” for none,“A” for additive, “M” for multiplicative, or “Z” forautomatic selection. Default ZZZ all components areselected using the information criterion.

dampedIf damped=TRUE, then a damped trend will be used(either Ad or Md).damped=FALSE, then a non-damped trend will used.If damped=NULL (the default), then either a dampedor a non-damped trend will be selected according tothe information criterion chosen.

The ets() function in Ralpha, beta, gamma, phiThe values of the smoothing parameters can bespecified using these arguments. If they are set toNULL (the default value for each of them), theparameters are estimated.

additive.onlyOnly models with additive components will beconsidered if additive.only=TRUE. Otherwise allmodels will be considered.

lambdaBox-Cox transformation parameter. It will be ignoredif lambda=NULL (the default value). Otherwise, thetime series will be transformed before the model isestimated. When lambda is not NULL,additive.only is set to TRUE.

lower,upper bounds for the parameter estimates ofα, β, γ and φ.opt.crit=lik (default) optimisation criterion usedfor estimation.bounds Constraints on the parameters.

usual region – "bounds=usual";admissible region – "bounds=admissible";"bounds=both" (the default) requires theparameters to satisfy both sets of constraints.

ic=aic (the default) information criterion to be usedin selecting models.restrict=TRUE (the default) models that causenumerical difficulties are not considered in modelselection.

Outline

3 ETS in R

Forecasting: Principles and Practice Forecasting with ETS models 39

Forecasting with ETS models

Point forecasts obtained by iterating equationsfor t = T + 1, . . . , T + h, setting εt = 0 for t > T.

Not the same as E(yt+h|xt) unless trend andseasonality are both additive.

Point forecasts for ETS(A,x,y) are identical toETS(M,x,y) if the parameters are the same.

Prediction intervals will differ between modelswith additive and multiplicative methods.

Exact PI available for many models.

Otherwise, simulate future sample paths,conditional on last estimate of states, andobtain PI from percentiles of simulated paths.

Point forecasts: iterate the equations fort = T + 1,T + 2, . . . ,T + h and set all εt = 0 for t > T.For example, for ETS(M,A,N):

yT+1 = (`T + bT)(1 + εT+1)

Therefore yT+1|T = `T + bTyT+2 = (`T+1 + bT+1)(1 + εT+1) =

[(`T + bT)(1 + αεT+1) + bT + β(`T + bT)εT+1] (1 + εT+1)

Therefore yT+2|T = `T + 2bT and so on.

Identical forecast with Holt’s linear method andETS(A,A,N). So the point forecasts obtained from themethod and from the two models that underly themethod are identical (assuming the same parametervalues are used).

yT+1 = (`T + bT)(1 + εT+1)

Prediction intervals: cannot be generated usingthe methods.

The prediction intervals will differ betweenmodels with additive and multiplicativemethods.

Exact formulae for some models.

More general to simulate future sample paths,conditional on the last estimate of the states,and to obtain prediction intervals from thepercentiles of these simulated future paths.

Options are available in R using the forecastfunction in the forecast package.

forecasting - rob j hyndman · exponential smoothing methods seasonal component trend n a m...

Documents