automatic algorithms for time series forecasting

204
Rob J Hyndman Automatic algorithms for time series forecasting

Upload: rob-hyndman

Post on 28-Jul-2015

609 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Automatic algorithms for time series forecasting

Rob J Hyndman

Automatic algorithmsfor time seriesforecasting

Page 2: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting Motivation 2

Page 3: Automatic algorithms for time series forecasting

Motivation

Automatic algorithms for time series forecasting Motivation 3

Page 4: Automatic algorithms for time series forecasting

Motivation

Automatic algorithms for time series forecasting Motivation 3

Page 5: Automatic algorithms for time series forecasting

Motivation

Automatic algorithms for time series forecasting Motivation 3

Page 6: Automatic algorithms for time series forecasting

Motivation

Automatic algorithms for time series forecasting Motivation 3

Page 7: Automatic algorithms for time series forecasting

Motivation

Automatic algorithms for time series forecasting Motivation 3

Page 8: Automatic algorithms for time series forecasting

Motivation

1 Common in business to have over 1000products that need forecasting at least monthly.

2 Forecasts are often required by people who areuntrained in time series analysis.

Specifications

Automatic forecasting algorithms must:

å determine an appropriate time series model;

å estimate the parameters;

å compute the forecasts with prediction intervals.

Automatic algorithms for time series forecasting Motivation 4

Page 9: Automatic algorithms for time series forecasting

Motivation

1 Common in business to have over 1000products that need forecasting at least monthly.

2 Forecasts are often required by people who areuntrained in time series analysis.

Specifications

Automatic forecasting algorithms must:

å determine an appropriate time series model;

å estimate the parameters;

å compute the forecasts with prediction intervals.

Automatic algorithms for time series forecasting Motivation 4

Page 10: Automatic algorithms for time series forecasting

Example: Asian sheep

Automatic algorithms for time series forecasting Motivation 5

Numbers of sheep in Asia

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Page 11: Automatic algorithms for time series forecasting

Example: Asian sheep

Automatic algorithms for time series forecasting Motivation 5

Automatic ETS forecasts

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Page 12: Automatic algorithms for time series forecasting

Example: Cortecosteroid sales

Automatic algorithms for time series forecasting Motivation 6

Monthly cortecosteroid drug sales in Australia

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

Page 13: Automatic algorithms for time series forecasting

Example: Cortecosteroid sales

Automatic algorithms for time series forecasting Motivation 6

Forecasts from ARIMA(3,1,3)(0,1,1)[12]

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Page 14: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting Forecasting competitions 7

Page 15: Automatic algorithms for time series forecasting

Makridakis and Hibon (1979)

Automatic algorithms for time series forecasting Forecasting competitions 8

Page 16: Automatic algorithms for time series forecasting

Makridakis and Hibon (1979)

Automatic algorithms for time series forecasting Forecasting competitions 8

Page 17: Automatic algorithms for time series forecasting

Makridakis and Hibon (1979)

This was the first large-scale empirical evaluation oftime series forecasting methods.

Highly controversial at the time.

Difficulties:8 How to measure forecast accuracy?8 How to apply methods consistently and objectively?8 How to explain unexpected results?

Common thinking was that the moresophisticated mathematical models (ARIMAmodels at the time) were necessarily better.If results showed ARIMA models not best, itmust be because analyst was unskilled.

Automatic algorithms for time series forecasting Forecasting competitions 9

Page 18: Automatic algorithms for time series forecasting

Makridakis and Hibon (1979)

It is amazing to me, however, that after all thisexercise in identifying models, transforming and soon, that the autoregressive moving averages comeout so badly. I wonder whether it might be partlydue to the authors not using the backwardsforecasting approach to obtain the initial errors.

— W.G. Gilchrist

I find it hard to believe that Box-Jenkins, if properlyapplied, can actually be worse than so many of thesimple methods . . . these authors are more athome with simple procedures than withBox-Jenkins. — C. ChatfieldAutomatic algorithms for time series forecasting Forecasting competitions 10

Page 19: Automatic algorithms for time series forecasting

Makridakis and Hibon (1979)

It is amazing to me, however, that after all thisexercise in identifying models, transforming and soon, that the autoregressive moving averages comeout so badly. I wonder whether it might be partlydue to the authors not using the backwardsforecasting approach to obtain the initial errors.

— W.G. Gilchrist

I find it hard to believe that Box-Jenkins, if properlyapplied, can actually be worse than so many of thesimple methods . . . these authors are more athome with simple procedures than withBox-Jenkins. — C. ChatfieldAutomatic algorithms for time series forecasting Forecasting competitions 10

Page 20: Automatic algorithms for time series forecasting

Consequences of M&H (1979)

As a result of this paper, researchers started to:

å consider how to automate forecasting methods;

å study what methods give the best forecasts;

å be aware of the dangers of over-fitting;

å treat forecasting as a different problem fromtime series analysis.

Makridakis & Hibon followed up with a newcompetition in 1982:

1001 seriesAnyone could submit forecasts (avoiding thecharge of incompetence)Multiple forecast measures used.

Automatic algorithms for time series forecasting Forecasting competitions 11

Page 21: Automatic algorithms for time series forecasting

Consequences of M&H (1979)

As a result of this paper, researchers started to:

å consider how to automate forecasting methods;

å study what methods give the best forecasts;

å be aware of the dangers of over-fitting;

å treat forecasting as a different problem fromtime series analysis.

Makridakis & Hibon followed up with a newcompetition in 1982:

1001 seriesAnyone could submit forecasts (avoiding thecharge of incompetence)Multiple forecast measures used.

Automatic algorithms for time series forecasting Forecasting competitions 11

Page 22: Automatic algorithms for time series forecasting

M-competition

Automatic algorithms for time series forecasting Forecasting competitions 12

Page 23: Automatic algorithms for time series forecasting

M-competition

Main findings (taken from Makridakis & Hibon, 2000)

1 Statistically sophisticated or complex methods donot necessarily provide more accurate forecaststhan simpler ones.

2 The relative ranking of the performance of thevarious methods varies according to the accuracymeasure being used.

3 The accuracy when various methods are beingcombined outperforms, on average, the individualmethods being combined and does very well incomparison to other methods.

4 The accuracy of the various methods depends uponthe length of the forecasting horizon involved.

Automatic algorithms for time series forecasting Forecasting competitions 13

Page 24: Automatic algorithms for time series forecasting

M3 competition

Automatic algorithms for time series forecasting Forecasting competitions 14

Page 25: Automatic algorithms for time series forecasting

Makridakis and Hibon (2000)

“The M3-Competition is a final attempt by the authors tosettle the accuracy issue of various time series methods. . .The extension involves the inclusion of more methods/researchers (in particular in the areas of neural networksand expert systems) and more series.”

3003 seriesAll data from business, demography, finance andeconomics.Series length between 14 and 126.Either non-seasonal, monthly or quarterly.All time series positive.M&H claimed that the M3-competition supported thefindings of their earlier work.However, best performing methods far from “simple”.

Automatic algorithms for time series forecasting Forecasting competitions 15

Page 26: Automatic algorithms for time series forecasting

Makridakis and Hibon (2000)Best methods:

Theta

A very confusing explanation.

Shown by Hyndman and Billah (2003) to be average oflinear regression and simple exponential smoothingwith drift, applied to seasonally adjusted data.

Later, the original authors claimed that theirexplanation was incorrect.

Forecast Pro

A commercial software package with an unknownalgorithm.

Known to fit either exponential smoothing or ARIMAmodels using BIC.

Automatic algorithms for time series forecasting Forecasting competitions 16

Page 27: Automatic algorithms for time series forecasting

M3 results (recalculated)

Method MAPE sMAPE MASE

Theta 17.42 12.76 1.39

ForecastPro 18.00 13.06 1.47

ForecastX 17.35 13.09 1.42

Automatic ANN 17.18 13.98 1.53

B-J automatic 19.13 13.72 1.54

Automatic algorithms for time series forecasting Forecasting competitions 17

Page 28: Automatic algorithms for time series forecasting

M3 results (recalculated)

Method MAPE sMAPE MASE

Theta 17.42 12.76 1.39

ForecastPro 18.00 13.06 1.47

ForecastX 17.35 13.09 1.42

Automatic ANN 17.18 13.98 1.53

B-J automatic 19.13 13.72 1.54

Automatic algorithms for time series forecasting Forecasting competitions 17

ä Calculations do not match

published paper.

ä Some contestants apparently

submitted multiple entries but only

best ones published.

Page 29: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting Exponential smoothing 18

Page 30: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 31: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothing

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 32: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear method

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 33: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend method

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 34: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend method

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 35: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend method

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 36: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ method

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 37: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ methodA,M: Multiplicative Holt-Winters’ method

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 38: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exp. smoothing methods.

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 39: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exp. smoothing methods.Each can have an additive or multiplicative error,giving 30 separate models.

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 40: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exp. smoothing methods.Each can have an additive or multiplicative error,giving 30 separate models.Only 19 models are numerically stable.

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 41: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exp. smoothing methods.Each can have an additive or multiplicative error,giving 30 separate models.Only 19 models are numerically stable.Multiplicative trend models give poor forecastsleaving 15 models.

Automatic algorithms for time series forecasting Exponential smoothing 19

Page 42: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Automatic algorithms for time series forecasting Exponential smoothing 20

Page 43: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Automatic algorithms for time series forecasting Exponential smoothing 20

Page 44: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↑

TrendExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Automatic algorithms for time series forecasting Exponential smoothing 20

Page 45: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↑ ↖

Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Automatic algorithms for time series forecasting Exponential smoothing 20

Page 46: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↗ ↑ ↖

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Automatic algorithms for time series forecasting Exponential smoothing 20

Page 47: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↗ ↑ ↖

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Automatic algorithms for time series forecasting Exponential smoothing 20

Page 48: Automatic algorithms for time series forecasting

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↗ ↑ ↖

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Automatic algorithms for time series forecasting Exponential smoothing 20

Innovations state space models

å All ETS models can be written in innovationsstate space form (IJF, 2002).

å Additive and multiplicative versions give thesame point forecasts but different predictionintervals.

Page 49: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 50: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 51: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 52: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1 xt+1

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 53: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1 xt+1 yt+2

εt+2

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 54: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1 xt+1 yt+2

εt+2 xt+2

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 55: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1 xt+1 yt+2

εt+2 xt+2 yt+3

εt+3

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 56: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1 xt+1 yt+2

εt+2 xt+2 yt+3

εt+3 xt+3

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 57: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1 xt+1 yt+2

εt+2 xt+2 yt+3

εt+3 xt+3 yt+4

εt+4

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

Page 58: Automatic algorithms for time series forecasting

ETS state space model

xt−1

εt

yt

xt yt+1

εt+1 xt+1 yt+2

εt+2 xt+2 yt+3

εt+3 xt+3 yt+4

εt+4

Automatic algorithms for time series forecasting Exponential smoothing 21

State space modelxt = (level, slope, seasonal)

EstimationCompute likelihood L fromε1, ε2, . . . , εT.Optimize L wrt modelparameters.

Page 59: Automatic algorithms for time series forecasting

Innovations state space models

Let xt = (`t,bt, st, st−1, . . . , st−m+1) and εtiid∼ N(0, σ2).

yt = h(xt−1)︸ ︷︷ ︸+ k(xt−1)εt︸ ︷︷ ︸ Observation equation

µt et

xt = f(xt−1) + g(xt−1)εt State equation

Additive errors:k(xt−1) = 1. yt = µt + εt.

Multiplicative errors:k(xt−1) = µt. yt = µt(1 + εt).

εt = (yt − µt)/µt is relative error.

Automatic algorithms for time series forecasting Exponential smoothing 22

Page 60: Automatic algorithms for time series forecasting

Innovations state space models

All models can be written in state space form.

Additive and multiplicative versions give samepoint forecasts but different predictionintervals.

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

)+ 2

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).

Automatic algorithms for time series forecasting Exponential smoothing 23

Page 61: Automatic algorithms for time series forecasting

Innovations state space models

All models can be written in state space form.

Additive and multiplicative versions give samepoint forecasts but different predictionintervals.

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

)+ 2

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).

Automatic algorithms for time series forecasting Exponential smoothing 23

Page 62: Automatic algorithms for time series forecasting

Innovations state space models

All models can be written in state space form.

Additive and multiplicative versions give samepoint forecasts but different predictionintervals.

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

)+ 2

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).

Automatic algorithms for time series forecasting Exponential smoothing 23

Page 63: Automatic algorithms for time series forecasting

Innovations state space models

All models can be written in state space form.

Additive and multiplicative versions give samepoint forecasts but different predictionintervals.

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

)+ 2

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).

Automatic algorithms for time series forecasting Exponential smoothing 23

Page 64: Automatic algorithms for time series forecasting

Innovations state space models

All models can be written in state space form.

Additive and multiplicative versions give samepoint forecasts but different predictionintervals.

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

)+ 2

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).

Automatic algorithms for time series forecasting Exponential smoothing 23

Page 65: Automatic algorithms for time series forecasting

Innovations state space models

All models can be written in state space form.

Additive and multiplicative versions give samepoint forecasts but different predictionintervals.

Estimation

L∗(θ,x0) = n log

( n∑t=1

ε2t /k

2(xt−1)

)+ 2

n∑t=1

log |k(xt−1)|

= −2 log(Likelihood) + constant

Minimize wrt θ = (α, β, γ, φ) and initial statesx0 = (`0,b0, s0, s−1, . . . , s−m+1).

Automatic algorithms for time series forecasting Exponential smoothing 23

Q: How to choosebetween the 15 usefulETS models?

Page 66: Automatic algorithms for time series forecasting

Cross-validationTraditional evaluation

Automatic algorithms for time series forecasting Exponential smoothing 24

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data

Page 67: Automatic algorithms for time series forecasting

Cross-validationTraditional evaluation

Standard cross-validation

Automatic algorithms for time series forecasting Exponential smoothing 24

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●

Page 68: Automatic algorithms for time series forecasting

Cross-validationTraditional evaluation

Standard cross-validation

Time series cross-validation

Automatic algorithms for time series forecasting Exponential smoothing 24

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 69: Automatic algorithms for time series forecasting

Cross-validationTraditional evaluation

Standard cross-validation

Time series cross-validation

Automatic algorithms for time series forecasting Exponential smoothing 24

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 70: Automatic algorithms for time series forecasting

Cross-validationTraditional evaluation

Standard cross-validation

Time series cross-validation

Automatic algorithms for time series forecasting Exponential smoothing 24

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● timeTraining data Test data

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Also known as “Evaluation ona rolling forecast origin”

Page 71: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

where L is the likelihood and k is the number ofestimated parameters in the model.

This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.

Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25

Page 72: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

where L is the likelihood and k is the number ofestimated parameters in the model.

This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.

Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25

Page 73: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

where L is the likelihood and k is the number ofestimated parameters in the model.

This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.

Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25

Page 74: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

where L is the likelihood and k is the number ofestimated parameters in the model.

This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.

Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25

Page 75: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

where L is the likelihood and k is the number ofestimated parameters in the model.

This is a penalized likelihood approach.If L is Gaussian, then AIC ≈ c + T log MSE + 2kwhere c is a constant, MSE is from one-stepforecasts on training set, and T is the length ofthe series.

Minimizing the Gaussian AIC is asymptoticallyequivalent (as T →∞) to minimizing MSE fromone-step forecasts on test set via time seriescross-validation.Automatic algorithms for time series forecasting Exponential smoothing 25

Page 76: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

Corrected AICFor small T, AIC tends to over-fit. Bias-correctedversion:

AICC = AIC + 2(k+1)(k+2)T−k

Bayesian Information Criterion

BIC = AIC + k[log(T)− 2]

BIC penalizes terms more heavily than AICMinimizing BIC is consistent if there is a truemodel.

Automatic algorithms for time series forecasting Exponential smoothing 26

Page 77: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

Corrected AICFor small T, AIC tends to over-fit. Bias-correctedversion:

AICC = AIC + 2(k+1)(k+2)T−k

Bayesian Information Criterion

BIC = AIC + k[log(T)− 2]

BIC penalizes terms more heavily than AICMinimizing BIC is consistent if there is a truemodel.

Automatic algorithms for time series forecasting Exponential smoothing 26

Page 78: Automatic algorithms for time series forecasting

Akaike’s Information Criterion

AIC = −2 log(L) + 2k

Corrected AICFor small T, AIC tends to over-fit. Bias-correctedversion:

AICC = AIC + 2(k+1)(k+2)T−k

Bayesian Information Criterion

BIC = AIC + k[log(T)− 2]

BIC penalizes terms more heavily than AICMinimizing BIC is consistent if there is a truemodel.

Automatic algorithms for time series forecasting Exponential smoothing 26

Page 79: Automatic algorithms for time series forecasting

What to use?

Choice: AIC, AICc, BIC, CV-MSE

CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.

As T →∞, BIC selects true model if there isone. But that is never true!

AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.

Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.

Automatic algorithms for time series forecasting Exponential smoothing 27

Page 80: Automatic algorithms for time series forecasting

What to use?

Choice: AIC, AICc, BIC, CV-MSE

CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.

As T →∞, BIC selects true model if there isone. But that is never true!

AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.

Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.

Automatic algorithms for time series forecasting Exponential smoothing 27

Page 81: Automatic algorithms for time series forecasting

What to use?

Choice: AIC, AICc, BIC, CV-MSE

CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.

As T →∞, BIC selects true model if there isone. But that is never true!

AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.

Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.

Automatic algorithms for time series forecasting Exponential smoothing 27

Page 82: Automatic algorithms for time series forecasting

What to use?

Choice: AIC, AICc, BIC, CV-MSE

CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.

As T →∞, BIC selects true model if there isone. But that is never true!

AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.

Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.

Automatic algorithms for time series forecasting Exponential smoothing 27

Page 83: Automatic algorithms for time series forecasting

What to use?

Choice: AIC, AICc, BIC, CV-MSE

CV-MSE too time consuming for most automaticforecasting purposes. Also requires large T.

As T →∞, BIC selects true model if there isone. But that is never true!

AICc focuses on forecasting performance, canbe used on small samples and is very fast tocompute.

Empirical studies in forecasting show AIC isbetter than BIC for forecast accuracy.

Automatic algorithms for time series forecasting Exponential smoothing 27

Page 84: Automatic algorithms for time series forecasting

ets algorithm in R

Automatic algorithms for time series forecasting Exponential smoothing 28

Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):

Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.

Select best method using AICc.

Produce forecasts using bestmethod.

Obtain prediction intervals usingunderlying state space model.

Page 85: Automatic algorithms for time series forecasting

ets algorithm in R

Automatic algorithms for time series forecasting Exponential smoothing 28

Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):

Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.

Select best method using AICc.

Produce forecasts using bestmethod.

Obtain prediction intervals usingunderlying state space model.

Page 86: Automatic algorithms for time series forecasting

ets algorithm in R

Automatic algorithms for time series forecasting Exponential smoothing 28

Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):

Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.

Select best method using AICc.

Produce forecasts using bestmethod.

Obtain prediction intervals usingunderlying state space model.

Page 87: Automatic algorithms for time series forecasting

ets algorithm in R

Automatic algorithms for time series forecasting Exponential smoothing 28

Based on Hyndman, Koehler,Snyder & Grose (IJF 2002):

Apply each of 15 models that areappropriate to the data. Optimizeparameters and initial valuesusing MLE.

Select best method using AICc.

Produce forecasts using bestmethod.

Obtain prediction intervals usingunderlying state space model.

Page 88: Automatic algorithms for time series forecasting

Exponential smoothing

Automatic algorithms for time series forecasting Exponential smoothing 29

Forecasts from ETS(M,A,N)

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

300

400

500

600

Page 89: Automatic algorithms for time series forecasting

Exponential smoothing

fit <- ets(livestock)fcast <- forecast(fit)plot(fcast)

Automatic algorithms for time series forecasting Exponential smoothing 30

Forecasts from ETS(M,A,N)

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

300

400

500

600

Page 90: Automatic algorithms for time series forecasting

Exponential smoothing

Automatic algorithms for time series forecasting Exponential smoothing 31

Forecasts from ETS(M,N,M)

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Page 91: Automatic algorithms for time series forecasting

Exponential smoothing

fit <- ets(h02)fcast <- forecast(fit)plot(fcast)

Automatic algorithms for time series forecasting Exponential smoothing 32

Forecasts from ETS(M,N,M)

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Page 92: Automatic algorithms for time series forecasting

Exponential smoothing

> fitETS(M,N,M)

Smoothing parameters:alpha = 0.4597gamma = 1e-04

Initial states:l = 0.4501s = 0.8628 0.8193 0.7648 0.7675 0.6946 1.2921

1.3327 1.1833 1.1617 1.0899 1.0377 0.9937

sigma: 0.0675

AIC AICc BIC-115.69960 -113.47738 -69.24592

Automatic algorithms for time series forecasting Exponential smoothing 33

Page 93: Automatic algorithms for time series forecasting

M3 comparisons

Method MAPE sMAPE MASE

Theta 17.42 12.76 1.39

ForecastPro 18.00 13.06 1.47

ForecastX 17.35 13.09 1.42

Automatic ANN 17.18 13.98 1.53

B-J automatic 19.13 13.72 1.54

ETS 17.38 13.13 1.43

Automatic algorithms for time series forecasting Exponential smoothing 34

Page 94: Automatic algorithms for time series forecasting

Exponential smoothing

Automatic algorithms for time series forecasting Exponential smoothing 35

Page 95: Automatic algorithms for time series forecasting

Exponential smoothing

Automatic algorithms for time series forecasting Exponential smoothing 35

www.OTexts.org/fpp

Page 96: Automatic algorithms for time series forecasting

Exponential smoothing

Automatic algorithms for time series forecasting Exponential smoothing 35

Page 97: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting ARIMA modelling 36

Page 98: Automatic algorithms for time series forecasting

ARIMA models

yt−1

yt−2

yt−3

yt

Inputs Output

Automatic algorithms for time series forecasting ARIMA modelling 37

Page 99: Automatic algorithms for time series forecasting

ARIMA models

yt−1

yt−2

yt−3

εt

yt

Inputs Output

Automatic algorithms for time series forecasting ARIMA modelling 37

Autoregression (AR)model

Page 100: Automatic algorithms for time series forecasting

ARIMA models

yt−1

yt−2

yt−3

εt

εt−1

εt−2

yt

Inputs Output

Automatic algorithms for time series forecasting ARIMA modelling 37

Autoregression movingaverage (ARMA) model

Page 101: Automatic algorithms for time series forecasting

ARIMA models

yt−1

yt−2

yt−3

εt

εt−1

εt−2

yt

Inputs Output

Automatic algorithms for time series forecasting ARIMA modelling 37

Autoregression movingaverage (ARMA) model

EstimationCompute likelihood L fromε1, ε2, . . . , εT.Use optimizationalgorithm to maximize L.

Page 102: Automatic algorithms for time series forecasting

ARIMA models

yt−1

yt−2

yt−3

εt

εt−1

εt−2

yt

Inputs Output

Automatic algorithms for time series forecasting ARIMA modelling 37

Autoregression movingaverage (ARMA) model

EstimationCompute likelihood L fromε1, ε2, . . . , εT.Use optimizationalgorithm to maximize L.

ARIMA modelAutoregression movingaverage (ARMA) modelapplied to differences.

Page 103: Automatic algorithms for time series forecasting

ARIMA modelling

Automatic algorithms for time series forecasting ARIMA modelling 38

Page 104: Automatic algorithms for time series forecasting

ARIMA modelling

Automatic algorithms for time series forecasting ARIMA modelling 38

Page 105: Automatic algorithms for time series forecasting

ARIMA modelling

Automatic algorithms for time series forecasting ARIMA modelling 38

Page 106: Automatic algorithms for time series forecasting

Auto ARIMA

Automatic algorithms for time series forecasting ARIMA modelling 39

Forecasts from ARIMA(0,1,0) with drift

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Page 107: Automatic algorithms for time series forecasting

Auto ARIMA

fit <- auto.arima(livestock)fcast <- forecast(fit)plot(fcast)

Automatic algorithms for time series forecasting ARIMA modelling 40

Forecasts from ARIMA(0,1,0) with drift

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Page 108: Automatic algorithms for time series forecasting

Auto ARIMA

Automatic algorithms for time series forecasting ARIMA modelling 41

Forecasts from ARIMA(3,1,3)(0,1,1)[12]

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

Page 109: Automatic algorithms for time series forecasting

Auto ARIMA

fit <- auto.arima(h02)fcast <- forecast(fit)plot(fcast)

Automatic algorithms for time series forecasting ARIMA modelling 42

Forecasts from ARIMA(3,1,3)(0,1,1)[12]

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

Page 110: Automatic algorithms for time series forecasting

Auto ARIMA

> fitSeries: h02ARIMA(3,1,3)(0,1,1)[12]

Coefficients:ar1 ar2 ar3 ma1 ma2 ma3 sma1

-0.3648 -0.0636 0.3568 -0.4850 0.0479 -0.353 -0.5931s.e. 0.2198 0.3293 0.1268 0.2227 0.2755 0.212 0.0651

sigma^2 estimated as 0.002706: log likelihood=290.25AIC=-564.5 AICc=-563.71 BIC=-538.48

Automatic algorithms for time series forecasting ARIMA modelling 43

Page 111: Automatic algorithms for time series forecasting

How does auto.arima() work?

A non-seasonal ARIMA process

φ(B)(1− B)dyt = c + θ(B)εt

Need to select appropriate orders p,q,d, andwhether to include c.

Automatic algorithms for time series forecasting ARIMA modelling 44

Algorithm choices driven by forecast accuracy.

Page 112: Automatic algorithms for time series forecasting

How does auto.arima() work?

A non-seasonal ARIMA process

φ(B)(1− B)dyt = c + θ(B)εt

Need to select appropriate orders p,q,d, andwhether to include c.

Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AICc.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.

Automatic algorithms for time series forecasting ARIMA modelling 44

Algorithm choices driven by forecast accuracy.

Page 113: Automatic algorithms for time series forecasting

How does auto.arima() work?

A non-seasonal ARIMA process

φ(B)(1− B)dyt = c + θ(B)εt

Need to select appropriate orders p,q,d, andwhether to include c.

Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AICc.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.

Automatic algorithms for time series forecasting ARIMA modelling 44

Algorithm choices driven by forecast accuracy.

Page 114: Automatic algorithms for time series forecasting

How does auto.arima() work?

A seasonal ARIMA process

Φ(Bm)φ(B)(1− B)d(1− Bm)Dyt = c + Θ(Bm)θ(B)εt

Need to select appropriate orders p,q,d, P,Q,D, andwhether to include c.

Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select D using OCSB unit root test.Select p,q, P,Q, c by minimising AICc.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.

Automatic algorithms for time series forecasting ARIMA modelling 45

Page 115: Automatic algorithms for time series forecasting

M3 comparisons

Method MAPE sMAPE MASE

Theta 17.42 12.76 1.39

ForecastPro 18.00 13.06 1.47

B-J automatic 19.13 13.72 1.54

ETS 17.38 13.13 1.43

AutoARIMA 19.12 13.85 1.47

Automatic algorithms for time series forecasting ARIMA modelling 46

Page 116: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 47

Page 117: Automatic algorithms for time series forecasting

Automatic nonlinear forecasting

Automatic ANN in M3 competition did poorly.

Linear methods did best in the NN3competition!

Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.

Some good recent work by Kourentzes andCrone on automated ANN for time series.

Watch this space!

Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48

Page 118: Automatic algorithms for time series forecasting

Automatic nonlinear forecasting

Automatic ANN in M3 competition did poorly.

Linear methods did best in the NN3competition!

Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.

Some good recent work by Kourentzes andCrone on automated ANN for time series.

Watch this space!

Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48

Page 119: Automatic algorithms for time series forecasting

Automatic nonlinear forecasting

Automatic ANN in M3 competition did poorly.

Linear methods did best in the NN3competition!

Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.

Some good recent work by Kourentzes andCrone on automated ANN for time series.

Watch this space!

Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48

Page 120: Automatic algorithms for time series forecasting

Automatic nonlinear forecasting

Automatic ANN in M3 competition did poorly.

Linear methods did best in the NN3competition!

Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.

Some good recent work by Kourentzes andCrone on automated ANN for time series.

Watch this space!

Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48

Page 121: Automatic algorithms for time series forecasting

Automatic nonlinear forecasting

Automatic ANN in M3 competition did poorly.

Linear methods did best in the NN3competition!

Very few machine learning methods getpublished in the IJF because authors cannotdemonstrate their methods give betterforecasts than linear benchmark methods,even on supposedly nonlinear data.

Some good recent work by Kourentzes andCrone on automated ANN for time series.

Watch this space!

Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 48

Page 122: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting Time series with complex seasonality 49

Page 123: Automatic algorithms for time series forecasting

Examples

Automatic algorithms for time series forecasting Time series with complex seasonality 50

US finished motor gasoline products

Weeks

Tho

usan

ds o

f bar

rels

per

day

1992 1994 1996 1998 2000 2002 2004

6500

7000

7500

8000

8500

9000

9500

Page 124: Automatic algorithms for time series forecasting

Examples

Automatic algorithms for time series forecasting Time series with complex seasonality 50

Number of calls to large American bank (7am−9pm)

5 minute intervals

Num

ber

of c

all a

rriv

als

100

200

300

400

3 March 17 March 31 March 14 April 28 April 12 May

Page 125: Automatic algorithms for time series forecasting

Examples

Automatic algorithms for time series forecasting Time series with complex seasonality 50

Turkish electricity demand

Days

Ele

ctric

ity d

eman

d (G

W)

2000 2002 2004 2006 2008

1015

2025

Page 126: Automatic algorithms for time series forecasting

TBATS model

TBATSTrigonometric terms for seasonality

Box-Cox transformations for heterogeneity

ARMA errors for short-term dynamics

Trend (possibly damped)

Seasonal (including multiple and non-integer periods)

Automatic algorithm described in AM De Livera,RJ Hyndman, and RD Snyder (2011). “Forecastingtime series with complex seasonal patterns usingexponential smoothing”. Journal of the AmericanStatistical Association 106(496), 1513–1527.

Automatic algorithms for time series forecasting Time series with complex seasonality 51

Page 127: Automatic algorithms for time series forecasting

TBATS model

yt = observation at time t

y(ω)t =

{(yω

t − 1)/ω if ω 6= 0;

log yt if ω = 0.

y(ω)t = `t−1 + φbt−1 +

M∑i=1

s(i)t−mi+ dt

`t = `t−1 + φbt−1 + αdt

bt = (1− φ)b + φbt−1 + βdt

dt =

p∑i=1

φidt−i +

q∑j=1

θjεt−j + εt

s(i)t =

ki∑j=1

s(i)j,t

Automatic algorithms for time series forecasting Time series with complex seasonality 52

s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt

s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt

Page 128: Automatic algorithms for time series forecasting

TBATS model

yt = observation at time t

y(ω)t =

{(yω

t − 1)/ω if ω 6= 0;

log yt if ω = 0.

y(ω)t = `t−1 + φbt−1 +

M∑i=1

s(i)t−mi+ dt

`t = `t−1 + φbt−1 + αdt

bt = (1− φ)b + φbt−1 + βdt

dt =

p∑i=1

φidt−i +

q∑j=1

θjεt−j + εt

s(i)t =

ki∑j=1

s(i)j,t

Automatic algorithms for time series forecasting Time series with complex seasonality 52

s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt

s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt

Box-Cox transformation

Page 129: Automatic algorithms for time series forecasting

TBATS model

yt = observation at time t

y(ω)t =

{(yω

t − 1)/ω if ω 6= 0;

log yt if ω = 0.

y(ω)t = `t−1 + φbt−1 +

M∑i=1

s(i)t−mi+ dt

`t = `t−1 + φbt−1 + αdt

bt = (1− φ)b + φbt−1 + βdt

dt =

p∑i=1

φidt−i +

q∑j=1

θjεt−j + εt

s(i)t =

ki∑j=1

s(i)j,t

Automatic algorithms for time series forecasting Time series with complex seasonality 52

s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt

s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt

Box-Cox transformation

M seasonal periods

Page 130: Automatic algorithms for time series forecasting

TBATS model

yt = observation at time t

y(ω)t =

{(yω

t − 1)/ω if ω 6= 0;

log yt if ω = 0.

y(ω)t = `t−1 + φbt−1 +

M∑i=1

s(i)t−mi+ dt

`t = `t−1 + φbt−1 + αdt

bt = (1− φ)b + φbt−1 + βdt

dt =

p∑i=1

φidt−i +

q∑j=1

θjεt−j + εt

s(i)t =

ki∑j=1

s(i)j,t

Automatic algorithms for time series forecasting Time series with complex seasonality 52

s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt

s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt

Box-Cox transformation

M seasonal periods

global and local trend

Page 131: Automatic algorithms for time series forecasting

TBATS model

yt = observation at time t

y(ω)t =

{(yω

t − 1)/ω if ω 6= 0;

log yt if ω = 0.

y(ω)t = `t−1 + φbt−1 +

M∑i=1

s(i)t−mi+ dt

`t = `t−1 + φbt−1 + αdt

bt = (1− φ)b + φbt−1 + βdt

dt =

p∑i=1

φidt−i +

q∑j=1

θjεt−j + εt

s(i)t =

ki∑j=1

s(i)j,t

Automatic algorithms for time series forecasting Time series with complex seasonality 52

s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt

s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt

Box-Cox transformation

M seasonal periods

global and local trend

ARMA error

Page 132: Automatic algorithms for time series forecasting

TBATS model

yt = observation at time t

y(ω)t =

{(yω

t − 1)/ω if ω 6= 0;

log yt if ω = 0.

y(ω)t = `t−1 + φbt−1 +

M∑i=1

s(i)t−mi+ dt

`t = `t−1 + φbt−1 + αdt

bt = (1− φ)b + φbt−1 + βdt

dt =

p∑i=1

φidt−i +

q∑j=1

θjεt−j + εt

s(i)t =

ki∑j=1

s(i)j,t

Automatic algorithms for time series forecasting Time series with complex seasonality 52

s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt

s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt

Box-Cox transformation

M seasonal periods

global and local trend

ARMA error

Fourier-like seasonalterms

Page 133: Automatic algorithms for time series forecasting

TBATS model

yt = observation at time t

y(ω)t =

{(yω

t − 1)/ω if ω 6= 0;

log yt if ω = 0.

y(ω)t = `t−1 + φbt−1 +

M∑i=1

s(i)t−mi+ dt

`t = `t−1 + φbt−1 + αdt

bt = (1− φ)b + φbt−1 + βdt

dt =

p∑i=1

φidt−i +

q∑j=1

θjεt−j + εt

s(i)t =

ki∑j=1

s(i)j,t

Automatic algorithms for time series forecasting Time series with complex seasonality 52

s(i)j,t = s(i)j,t−1 cosλ(i)j + s∗(i)j,t−1 sinλ(i)j + γ(i)1 dt

s(i)j,t = −s(i)j,t−1 sinλ(i)j + s∗(i)j,t−1 cosλ(i)j + γ(i)2 dt

Box-Cox transformation

M seasonal periods

global and local trend

ARMA error

Fourier-like seasonalterms

TBATSTrigonometric

Box-Cox

ARMA

Trend

Seasonal

Page 134: Automatic algorithms for time series forecasting

Examples

fit <- tbats(gasoline)fcast <- forecast(fit)plot(fcast)

Automatic algorithms for time series forecasting Time series with complex seasonality 53

Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>})

Weeks

Tho

usan

ds o

f bar

rels

per

day

1995 2000 2005

7000

8000

9000

1000

0

Page 135: Automatic algorithms for time series forecasting

Examples

fit <- tbats(callcentre)fcast <- forecast(fit)plot(fcast)

Automatic algorithms for time series forecasting Time series with complex seasonality 54

Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>})

5 minute intervals

Num

ber

of c

all a

rriv

als

010

020

030

040

050

0

3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June

Page 136: Automatic algorithms for time series forecasting

Examples

fit <- tbats(turk)fcast <- forecast(fit)plot(fcast)

Automatic algorithms for time series forecasting Time series with complex seasonality 55

Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>})

Days

Ele

ctric

ity d

eman

d (G

W)

2000 2002 2004 2006 2008 2010

1015

2025

Page 137: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting Hierarchical and grouped time series 56

Page 138: Automatic algorithms for time series forecasting

Hierarchical time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

ExamplesNet labour turnoverTourism by state and region

Automatic algorithms for time series forecasting Hierarchical and grouped time series 57

Page 139: Automatic algorithms for time series forecasting

Hierarchical time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

ExamplesNet labour turnoverTourism by state and region

Automatic algorithms for time series forecasting Hierarchical and grouped time series 57

Page 140: Automatic algorithms for time series forecasting

Hierarchical time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

ExamplesNet labour turnoverTourism by state and region

Automatic algorithms for time series forecasting Hierarchical and grouped time series 57

Page 141: Automatic algorithms for time series forecasting

Hierarchical time series

Total

A B C

Automatic algorithms for time series forecasting Hierarchical and grouped time series 58

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

bt : vector of all series atbottom level in time t.

Page 142: Automatic algorithms for time series forecasting

Hierarchical time series

Total

A B C

Automatic algorithms for time series forecasting Hierarchical and grouped time series 58

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

bt : vector of all series atbottom level in time t.

Page 143: Automatic algorithms for time series forecasting

Hierarchical time series

Total

A B C

yt = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

YA,t

YB,t

YC,t

Automatic algorithms for time series forecasting Hierarchical and grouped time series 58

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

bt : vector of all series atbottom level in time t.

Page 144: Automatic algorithms for time series forecasting

Hierarchical time series

Total

A B C

yt = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,t

YB,t

YC,t

Automatic algorithms for time series forecasting Hierarchical and grouped time series 58

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

bt : vector of all series atbottom level in time t.

Page 145: Automatic algorithms for time series forecasting

Hierarchical time series

Total

A B C

yt = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,t

YB,t

YC,t

︸ ︷︷ ︸

bt

Automatic algorithms for time series forecasting Hierarchical and grouped time series 58

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

bt : vector of all series atbottom level in time t.

Page 146: Automatic algorithms for time series forecasting

Hierarchical time series

Total

A B C

yt = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,t

YB,t

YC,t

︸ ︷︷ ︸

btyt = Sbt

Automatic algorithms for time series forecasting Hierarchical and grouped time series 58

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

bt : vector of all series atbottom level in time t.

Page 147: Automatic algorithms for time series forecasting

Hierarchical time seriesTotal

A

AX AY AZ

B

BX BY BZ

C

CX CY CZ

yt =

Yt

YA,t

YB,t

YC,t

YAX,t

YAY,t

YAZ,t

YBX,t

YBY,t

YBZ,t

YCX,t

YCY,t

YCZ,t

=

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸ ︷︷ ︸

S

YAX,t

YAY,t

YAZ,t

YBX,t

YBY,t

YBZ,t

YCX,t

YCY,t

YCZ,t

︸ ︷︷ ︸

bt

Automatic algorithms for time series forecasting Hierarchical and grouped time series 59

Page 148: Automatic algorithms for time series forecasting

Hierarchical time seriesTotal

A

AX AY AZ

B

BX BY BZ

C

CX CY CZ

yt =

Yt

YA,t

YB,t

YC,t

YAX,t

YAY,t

YAZ,t

YBX,t

YBY,t

YBZ,t

YCX,t

YCY,t

YCZ,t

=

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸ ︷︷ ︸

S

YAX,t

YAY,t

YAZ,t

YBX,t

YBY,t

YBZ,t

YCX,t

YCY,t

YCZ,t

︸ ︷︷ ︸

bt

Automatic algorithms for time series forecasting Hierarchical and grouped time series 59

Page 149: Automatic algorithms for time series forecasting

Hierarchical time seriesTotal

A

AX AY AZ

B

BX BY BZ

C

CX CY CZ

yt =

Yt

YA,t

YB,t

YC,t

YAX,t

YAY,t

YAZ,t

YBX,t

YBY,t

YBZ,t

YCX,t

YCY,t

YCZ,t

=

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸ ︷︷ ︸

S

YAX,t

YAY,t

YAZ,t

YBX,t

YBY,t

YBZ,t

YCX,t

YCY,t

YCZ,t

︸ ︷︷ ︸

bt

Automatic algorithms for time series forecasting Hierarchical and grouped time series 59

yt = Sbt

Page 150: Automatic algorithms for time series forecasting

Forecasting notation

Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)

Reconciled forecasts are of the form:yn(h) = SPyn(h)

for some matrix P.

P extracts and combines base forecasts yn(h)to get bottom-level forecasts.

S adds them up

Automatic algorithms for time series forecasting Hierarchical and grouped time series 60

Page 151: Automatic algorithms for time series forecasting

Forecasting notation

Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)

Reconciled forecasts are of the form:yn(h) = SPyn(h)

for some matrix P.

P extracts and combines base forecasts yn(h)to get bottom-level forecasts.

S adds them up

Automatic algorithms for time series forecasting Hierarchical and grouped time series 60

Page 152: Automatic algorithms for time series forecasting

Forecasting notation

Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)

Reconciled forecasts are of the form:yn(h) = SPyn(h)

for some matrix P.

P extracts and combines base forecasts yn(h)to get bottom-level forecasts.

S adds them up

Automatic algorithms for time series forecasting Hierarchical and grouped time series 60

Page 153: Automatic algorithms for time series forecasting

Forecasting notation

Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)

Reconciled forecasts are of the form:yn(h) = SPyn(h)

for some matrix P.

P extracts and combines base forecasts yn(h)to get bottom-level forecasts.

S adds them up

Automatic algorithms for time series forecasting Hierarchical and grouped time series 60

Page 154: Automatic algorithms for time series forecasting

Forecasting notation

Let yn(h) be vector of initial h-step forecasts, madeat time n, stacked in same order as yt. (They maynot add up.)

Reconciled forecasts are of the form:yn(h) = SPyn(h)

for some matrix P.

P extracts and combines base forecasts yn(h)to get bottom-level forecasts.

S adds them up

Automatic algorithms for time series forecasting Hierarchical and grouped time series 60

Page 155: Automatic algorithms for time series forecasting

General properties

yn(h) = SPyn(h)

Forecast biasAssuming the base forecasts yn(h) are unbiased,then the revised forecasts are unbiased iff SPS = S.

Forecast varianceFor any given P satisfying SPS = S, the covariancematrix of the h-step ahead reconciled forecasterrors is given by

Var[yn+h − yn(h)] = SPWhP′S′

where Wh is the covariance matrix of the h-stepahead base forecast errors.Automatic algorithms for time series forecasting Hierarchical and grouped time series 61

Page 156: Automatic algorithms for time series forecasting

General properties

yn(h) = SPyn(h)

Forecast biasAssuming the base forecasts yn(h) are unbiased,then the revised forecasts are unbiased iff SPS = S.

Forecast varianceFor any given P satisfying SPS = S, the covariancematrix of the h-step ahead reconciled forecasterrors is given by

Var[yn+h − yn(h)] = SPWhP′S′

where Wh is the covariance matrix of the h-stepahead base forecast errors.Automatic algorithms for time series forecasting Hierarchical and grouped time series 61

Page 157: Automatic algorithms for time series forecasting

General properties

yn(h) = SPyn(h)

Forecast biasAssuming the base forecasts yn(h) are unbiased,then the revised forecasts are unbiased iff SPS = S.

Forecast varianceFor any given P satisfying SPS = S, the covariancematrix of the h-step ahead reconciled forecasterrors is given by

Var[yn+h − yn(h)] = SPWhP′S′

where Wh is the covariance matrix of the h-stepahead base forecast errors.Automatic algorithms for time series forecasting Hierarchical and grouped time series 61

Page 158: Automatic algorithms for time series forecasting

BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then

minP

= trace[SPWhP′S′]

has solution P = (S′W†hS)−1S′W†

h.

W†h is generalized inverse of Wh.

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Revised forecasts Base forecasts

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).

Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62

Page 159: Automatic algorithms for time series forecasting

BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then

minP

= trace[SPWhP′S′]

has solution P = (S′W†hS)−1S′W†

h.

W†h is generalized inverse of Wh.

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Revised forecasts Base forecasts

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).

Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62

Page 160: Automatic algorithms for time series forecasting

BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then

minP

= trace[SPWhP′S′]

has solution P = (S′W†hS)−1S′W†

h.

W†h is generalized inverse of Wh.

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Revised forecasts Base forecasts

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).

Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62

Page 161: Automatic algorithms for time series forecasting

BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then

minP

= trace[SPWhP′S′]

has solution P = (S′W†hS)−1S′W†

h.

W†h is generalized inverse of Wh.

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Revised forecasts Base forecasts

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).

Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62

Page 162: Automatic algorithms for time series forecasting

BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then

minP

= trace[SPWhP′S′]

has solution P = (S′W†hS)−1S′W†

h.

W†h is generalized inverse of Wh.

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Revised forecasts Base forecasts

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).

Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62

Page 163: Automatic algorithms for time series forecasting

BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then

minP

= trace[SPWhP′S′]

has solution P = (S′W†hS)−1S′W†

h.

W†h is generalized inverse of Wh.

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Revised forecasts Base forecasts

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Wh).

Problem: Wh hard to estimate.Automatic algorithms for time series forecasting Hierarchical and grouped time series 62

Page 164: Automatic algorithms for time series forecasting

Optimal combination forecasts

Revised forecasts Base forecasts

Solution 1: OLS

yn(h) = S(S′S)−1S′yn(h)

Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Automatic algorithms for time series forecasting Hierarchical and grouped time series 63

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Page 165: Automatic algorithms for time series forecasting

Optimal combination forecasts

Revised forecasts Base forecasts

Solution 1: OLS

yn(h) = S(S′S)−1S′yn(h)

Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Automatic algorithms for time series forecasting Hierarchical and grouped time series 63

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Page 166: Automatic algorithms for time series forecasting

Optimal combination forecasts

Revised forecasts Base forecasts

Solution 1: OLS

yn(h) = S(S′S)−1S′yn(h)

Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Automatic algorithms for time series forecasting Hierarchical and grouped time series 63

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Page 167: Automatic algorithms for time series forecasting

Optimal combination forecasts

Revised forecasts Base forecasts

Solution 1: OLS

yn(h) = S(S′S)−1S′yn(h)

Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Automatic algorithms for time series forecasting Hierarchical and grouped time series 63

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Page 168: Automatic algorithms for time series forecasting

Optimal combination forecasts

Revised forecasts Base forecasts

Solution 1: OLS

yn(h) = S(S′S)−1S′yn(h)

Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Automatic algorithms for time series forecasting Hierarchical and grouped time series 63

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Page 169: Automatic algorithms for time series forecasting

Optimal combination forecasts

Revised forecasts Base forecasts

Solution 1: OLS

yn(h) = S(S′S)−1S′yn(h)

Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Automatic algorithms for time series forecasting Hierarchical and grouped time series 63

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Page 170: Automatic algorithms for time series forecasting

Optimal combination forecasts

Revised forecasts Base forecasts

Solution 1: OLS

yn(h) = S(S′S)−1S′yn(h)

Solution 2: WLSApproximate W1 by its diagonal.Assume Wh = khW1.Easy to estimate, and places weight where wehave best one-step forecasts.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Automatic algorithms for time series forecasting Hierarchical and grouped time series 63

yn(h) = S(S′W†hS)−1S′W†

hyn(h)

Page 171: Automatic algorithms for time series forecasting

Challenges

Computational difficulties in big hierarchies dueto size of the S matrix and singular behavior of(S′ΛS).

Loss of information in ignoring covariancematrix in computing point forecasts.

Still need to estimate covariance matrix toproduce prediction intervals.

Automatic algorithms for time series forecasting Hierarchical and grouped time series 64

yn(h) = S(S′ΛS)−1S′Λyn(h)

Page 172: Automatic algorithms for time series forecasting

Challenges

Computational difficulties in big hierarchies dueto size of the S matrix and singular behavior of(S′ΛS).

Loss of information in ignoring covariancematrix in computing point forecasts.

Still need to estimate covariance matrix toproduce prediction intervals.

Automatic algorithms for time series forecasting Hierarchical and grouped time series 64

yn(h) = S(S′ΛS)−1S′Λyn(h)

Page 173: Automatic algorithms for time series forecasting

Challenges

Computational difficulties in big hierarchies dueto size of the S matrix and singular behavior of(S′ΛS).

Loss of information in ignoring covariancematrix in computing point forecasts.

Still need to estimate covariance matrix toproduce prediction intervals.

Automatic algorithms for time series forecasting Hierarchical and grouped time series 64

yn(h) = S(S′ΛS)−1S′Λyn(h)

Page 174: Automatic algorithms for time series forecasting

Australian tourism

Automatic algorithms for time series forecasting Hierarchical and grouped time series 65

Page 175: Automatic algorithms for time series forecasting

Australian tourism

Automatic algorithms for time series forecasting Hierarchical and grouped time series 65

Hierarchy:

States (7)

Zones (27)

Regions (82)

Page 176: Automatic algorithms for time series forecasting

Australian tourism

Automatic algorithms for time series forecasting Hierarchical and grouped time series 65

Hierarchy:

States (7)

Zones (27)

Regions (82)

Base forecastsETS (exponential smoothing)models

Page 177: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: Total

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

6000

065

000

7000

075

000

8000

085

000

Page 178: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: NSW

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

1800

022

000

2600

030

000

Page 179: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: VIC

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

1000

012

000

1400

016

000

1800

0

Page 180: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: Nth.Coast.NSW

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

5000

6000

7000

8000

9000

Page 181: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: Metro.QLD

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

8000

9000

1100

013

000

Page 182: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: Sth.WA

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

400

600

800

1000

1200

1400

Page 183: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: X201.Melbourne

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

4000

4500

5000

5500

6000

Page 184: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: X402.Murraylands

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

010

020

030

0

Page 185: Automatic algorithms for time series forecasting

Base forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 66

Domestic tourism forecasts: X809.Daly

Year

Vis

itor

nigh

ts

1998 2000 2002 2004 2006 2008

020

4060

8010

0

Page 186: Automatic algorithms for time series forecasting

Reconciled forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 67

Tota

l

2000 2005 2010

6500

080

000

9500

0

Page 187: Automatic algorithms for time series forecasting

Reconciled forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 67

NS

W

2000 2005 2010

1800

024

000

3000

0

VIC

2000 2005 20101000

014

000

1800

0

QLD

2000 2005 2010

1400

020

000

Oth

er2000 2005 201018

000

2400

0

Page 188: Automatic algorithms for time series forecasting

Reconciled forecasts

Automatic algorithms for time series forecasting Hierarchical and grouped time series 67

Syd

ney

2000 2005 20104000

7000

Oth

er N

SW

2000 2005 2010

1400

022

000

Mel

bour

ne

2000 2005 2010

4000

5000

Oth

er V

IC

2000 2005 2010

6000

1200

0

GC

and

Bris

bane

2000 2005 2010

6000

9000

Oth

er Q

LD2000 2005 201060

0012

000

Cap

ital c

ities

2000 2005 2010

1400

020

000

Oth

er

2000 2005 2010

5500

7500

Page 189: Automatic algorithms for time series forecasting

Forecast evaluation

Select models using all observations;

Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;

Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;

In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.

Automatic algorithms for time series forecasting Hierarchical and grouped time series 68

Page 190: Automatic algorithms for time series forecasting

Forecast evaluation

Select models using all observations;

Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;

Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;

In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.

Automatic algorithms for time series forecasting Hierarchical and grouped time series 68

Page 191: Automatic algorithms for time series forecasting

Forecast evaluation

Select models using all observations;

Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;

Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;

In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.

Automatic algorithms for time series forecasting Hierarchical and grouped time series 68

Page 192: Automatic algorithms for time series forecasting

Forecast evaluation

Select models using all observations;

Re-estimate models using first 12 observationsand generate 1- to 8-step-ahead forecasts;

Increase sample size one observation at a time,re-estimate models, generate forecasts untilthe end of the sample;

In total 24 1-step-ahead, 23 2-steps-ahead, upto 17 8-steps-ahead for forecast evaluation.

Automatic algorithms for time series forecasting Hierarchical and grouped time series 68

Page 193: Automatic algorithms for time series forecasting

Hierarchy: states, zones, regions

MAPE h = 1 h = 2 h = 4 h = 6 h = 8 AverageTop Level: Australia

Bottom-up 3.79 3.58 4.01 4.55 4.24 4.06OLS 3.83 3.66 3.88 4.19 4.25 3.94WLS 3.68 3.56 3.97 4.57 4.25 4.04Level: States

Bottom-up 10.70 10.52 10.85 11.46 11.27 11.03OLS 11.07 10.58 11.13 11.62 12.21 11.35WLS 10.44 10.17 10.47 10.97 10.98 10.67Level: Zones

Bottom-up 14.99 14.97 14.98 15.69 15.65 15.32OLS 15.16 15.06 15.27 15.74 16.15 15.48WLS 14.63 14.62 14.68 15.17 15.25 14.94Bottom Level: Regions

Bottom-up 33.12 32.54 32.26 33.74 33.96 33.18OLS 35.89 33.86 34.26 36.06 37.49 35.43WLS 31.68 31.22 31.08 32.41 32.77 31.89

Automatic algorithms for time series forecasting Hierarchical and grouped time series 69

Page 194: Automatic algorithms for time series forecasting

hts package for R

Automatic algorithms for time series forecasting Hierarchical and grouped time series 70

hts: Hierarchical and grouped time seriesMethods for analysing and forecasting hierarchical and groupedtime series

Version: 4.5Depends: forecast (≥ 5.0), SparseMImports: parallel, utilsPublished: 2014-12-09Author: Rob J Hyndman, Earo Wang and Alan LeeMaintainer: Rob J Hyndman <Rob.Hyndman at monash.edu>BugReports: https://github.com/robjhyndman/hts/issuesLicense: GPL (≥ 2)

Page 195: Automatic algorithms for time series forecasting

Outline

1 Motivation

2 Forecasting competitions

3 Exponential smoothing

4 ARIMA modelling

5 Automatic nonlinear forecasting?

6 Time series with complex seasonality

7 Hierarchical and grouped time series

8 Recent developments

Automatic algorithms for time series forecasting Recent developments 71

Page 196: Automatic algorithms for time series forecasting

Further competitions

1 2011 tourism forecasting competition.

2 Kaggle and other forecasting platforms.

3 GEFCom 2012: Point forecasting of

electricity load and wind power.

4 GEFCom 2014: Probabilistic forecasting

of electricity load, electricity price,

wind energy and solar energy.

Automatic algorithms for time series forecasting Recent developments 72

Page 197: Automatic algorithms for time series forecasting

Further competitions

1 2011 tourism forecasting competition.

2 Kaggle and other forecasting platforms.

3 GEFCom 2012: Point forecasting of

electricity load and wind power.

4 GEFCom 2014: Probabilistic forecasting

of electricity load, electricity price,

wind energy and solar energy.

Automatic algorithms for time series forecasting Recent developments 72

Page 198: Automatic algorithms for time series forecasting

Further competitions

1 2011 tourism forecasting competition.

2 Kaggle and other forecasting platforms.

3 GEFCom 2012: Point forecasting of

electricity load and wind power.

4 GEFCom 2014: Probabilistic forecasting

of electricity load, electricity price,

wind energy and solar energy.

Automatic algorithms for time series forecasting Recent developments 72

Page 199: Automatic algorithms for time series forecasting

Further competitions

1 2011 tourism forecasting competition.

2 Kaggle and other forecasting platforms.

3 GEFCom 2012: Point forecasting of

electricity load and wind power.

4 GEFCom 2014: Probabilistic forecasting

of electricity load, electricity price,

wind energy and solar energy.

Automatic algorithms for time series forecasting Recent developments 72

Page 200: Automatic algorithms for time series forecasting

Forecasts about forecasting

1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.

2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.

3 Automatic forecasting algorithms formultivariate time series will be developed.

4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.

Automatic algorithms for time series forecasting Recent developments 73

Page 201: Automatic algorithms for time series forecasting

Forecasts about forecasting

1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.

2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.

3 Automatic forecasting algorithms formultivariate time series will be developed.

4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.

Automatic algorithms for time series forecasting Recent developments 73

Page 202: Automatic algorithms for time series forecasting

Forecasts about forecasting

1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.

2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.

3 Automatic forecasting algorithms formultivariate time series will be developed.

4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.

Automatic algorithms for time series forecasting Recent developments 73

Page 203: Automatic algorithms for time series forecasting

Forecasts about forecasting

1 Automatic algorithms will become moregeneral — handling a wide variety of timeseries.

2 Model selection methods will take accountof multi-step forecast accuracy as well asone-step forecast accuracy.

3 Automatic forecasting algorithms formultivariate time series will be developed.

4 Automatic forecasting algorithms thatinclude covariate information will bedeveloped.

Automatic algorithms for time series forecasting Recent developments 73

Page 204: Automatic algorithms for time series forecasting

For further information

robjhyndman.com

Slides and references for this talk.

Links to all papers and books.

Links to R packages.

A blog about forecasting research.

Automatic algorithms for time series forecasting Recent developments 74