forecast accuracy and evaluation - rob j hyndmanthe statistical forecasting perspective 2.5 5.0 7.5...

Forecast Accuracy and Evaluation 1

2017 Beijing Workshop onForecasting

Forecast Accuracyand Evaluation

Rob J Hyndman

robjhyndman.com/beijing2017

Outline

1 The statistical forecasting perspective

2 Some simple forecasting methods

3 Forecasting residuals

4 Measuring forecast accuracy

5 Time series cross-validation

6 Probability scoring

Forecast Accuracy and Evaluation The statistical forecasting perspective 2

The statistical forecasting perspective

2.5

5.0

7.5

10.0

1980 1990 2000 2010 2020

Year

Mill

ions

of v

isito

rs

Total international visitors to Australia


The statistical forecasting perspective

2.5

5.0

7.5

10.0

1980 1990 2000 2010 2020

Year

Mill

ions

of v

isito

rs



Forecasting is estimating how thesequence of observations willcontinue into the future.

Sample futures

2.5

5.0

7.5

10.0

1980 1990 2000 2010 2020

Year

Mill

ions

of v

isito

rs

Data

Future 1

Future 2

Future 3

Future 4

Future 5

Future 6

Future 7

Future 8

Future 9

Future 10



Forecast intervals

2.5

5.0

7.5

10.0

1980 1990 2000 2010 2020

Year

Mill

ions

of v

isito

rs

level

80

95



Statistical forecasting

Thing to be forecast: a random variable, yt.

Forecast distributions:yt|t−1 = yt|{y1, y2, . . . , yt−1}yT+h|T = yT+h|{y1, y2, . . . , yT}

The “point forecast” is the mean (or median) ofyT+h|T = yT+h|{y1, y2, . . . , yT}The “forecast variance” is Var[yT+h|y1, y2, . . . , yT]A prediction interval or “interval forecast” is a rangeof values of yt with high probability.


Outline







Forecast Accuracy and Evaluation Some simple forecasting methods 7

Some simple forecasting methods

400

450

500

1995 2000 2005

Year

meg

alitr

es

Australian quarterly beer production

Forecast Accuracy and Evaluation Some simple forecasting methods 8How would you forecast these data?


80

90

100

110

1990 1991 1992 1993 1994

Year

thou

sand

s

Number of pigs slaughtered in Victoria



400

500

600

700

800

0 200 400 600 800

Day

US

$

Google Stock Price (800 trading days from 25 February 2013)


Some simple forecasting methodsAverage method

Forecast of all future values is equal to mean of historicaldata {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · · + yT)/T

Naïve methodForecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve methodForecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km wherem = seasonal period andk = b(h− 1)/mc+1.



Drift methodForecasts equal to last value plus average change.Forecasts:

yT+h|T = yT +h

T − 1

T∑t=2

(yt − yt−1)

= yT +h

T − 1(yT − y1).

Equivalent to extrapolating a line drawn between firstand last observations.



400

450

500

1995 2000 2005 2010

Year

Meg

alitr

es

Forecast

Mean

Naïve

Seasonal naïve

Forecasts for quarterly beer production



80

90

100

110

1990 1991 1992 1993 1994 1995

Year

thou

sand

s series

Mean

Naïve

Seasonal naïve

Number of pigs slaughtered in Victoria



400

500

600

700

800

0 200 400 600 800 1000

Day

US

$

Forecast

Drift

Mean

Naïve



Outline







Forecast Accuracy and Evaluation Forecasting residuals 16

Fitted values

yt|t−1 is the forecast of yt based on observationsy1, . . . , yt−1.We call these “fitted values”.Sometimes drop the subscript: yt ≡ yt|t−1.Often not true forecasts since parameters areestimated on all data.

Examples:1 yt = y for average method.2 yt = yt−1 for naive method3 yt = yt−1 + (yT − y1)/(T − 1) for drift method.4 yt = yt−m for seasonal naive method


Forecasting residualsResiduals in forecasting: difference between observedvalue and its fitted value: et = yt − yt|t−1.

Assumptions

1 {et} uncorrelated. If they aren’t, then information left inresiduals that should be used in computing forecasts.

2 {et} have mean zero. If they don’t, then forecasts arebiased.

Useful properties (for prediction intervals)

3 {et} have constant variance.4 {et} are normally distributed.


Example: Australian beer production

Seasonal naïve forecast:

yt|t−1 = yt−4 et = yt− yt−4

400

450

500

1995 2000 2005

Year

Meg

alitr

es

Australian quarterly beer production





−40

−20

0

20

1995 2000 2005

Year

Meg

alitr

es

Residuals from seasonal naive method





−0.4

−0.2

0.0

0.2

4 8 12 16

Lag

AC

F






0

5

10

15

20

−60 −30 0 30 60

res

coun

t



Forecasting residuals

Minimizing the size of forecasting residuals is used forestimating model parameters (e.g., minimizing MSEor maximizing likelihood.In general, forecasting residuals cannot be used(directly) for estimating forecast accuracy.Forecast accuracy can only be measured usinggenuine forecasts; i.e., on different data.Forecasting residuals can help suggest modelimprovements.


Outline







Forecast Accuracy and Evaluation Measuring forecast accuracy 24

Training and test sets

timeTraining data Test data

The test set must not be used for any aspect of modeldevelopment or calculation of forecasts.Forecast accuracy is based only on the test set.A model which fits the data well does not necessarilyforecast well.A perfect fit can always be obtained by using a modelwith enough parameters. (Compare R2)Problems can be overcome by measuring trueout-of-sample forecast accuracy. Training set used toestimate parameters. Forecasts are made for test set.


Measures of forecast accuracyMean Absolute Scaled Error

MASE =1H

H∑h=1|yT+h − yT+h|T|/Q

where Q is a stable measure of the scale of the time series{yt}.

Proposed by Hyndman and Koehler (IJF, 2006).For non-seasonal time series,

Q =1

T − 1

T∑t=2|yt − yt−1|

works well. Then MASE is equivalent to MAE relative to anaïve method.


Measures of forecast accuracyMean Absolute Scaled Error

MASE =1H

H∑h=1|yT+h − yT+h|T|/Q

where Q is a stable measure of the scale of the time series{yt}.

Proposed by Hyndman and Koehler (IJF, 2006).For seasonal time series,

Q =1

T −m

T∑t=m+1|yt − yt−m|

works well. Then MASE is equivalent to MAE relative to aseasonal naïve method.


Measures of forecast accuracy

400

450

500

1995 2000 2005 2010

Year

Meg

alitr

es

Forecast Method

Mean

Naive

SeasonalNaive

Forecasts for quarterly beer production



RMSE MAE MAPE MASE

Mean method 38.5 34.8 8.28 2.44Naïve method 62.7 57.4 14.18 4.01Seasonal naïve method 14.3 13.4 3.17 0.94


Measures of forecast accuracyScaling can be used with any measure, and with differentscaling statistics.Mean Squared Scaled Error

MASE =1H

H∑h=1

(yT+h − yT+h|T)2/Q

where Q =1

T −m

T∑t=m+1

(yt − yt−m)2

Assumes {yt} is difference stationary.Minimizing MSSE leads to conditional mean forecasts.MSSE< 1 : out-of-sample multi-step forecasts aremore accurate than in-sample one-step forecasts.



Many suggested scale-free measures of forecastaccuracy are degenerate due to infinite variance.The denominator must be positive with probabilityone.Distribution of most measures are highly skewedwhen applied to real data.


Poll: true or false?

1 Good forecast methods should have normallydistributed residuals.

2 A model with small residuals will give good forecasts.3 The best measure of forecast accuracy is MAPE.4 If your model doesn’t forecast well, you should make

it more complicated.5 Always choose the model with the best forecast

accuracy as measured on the test set.


Outline







Forecast Accuracy and Evaluation Time series cross-validation 34

Cross-validationTraditional evaluation

timeTraining data Test data

Leave-one-out cross-validation


Cross-validationTime series cross-validation

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time

h = 1



● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time

h = 2



● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time

h = 3



● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time

h = 4



● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time

h = 5



● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time

h = 6



● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

time

Also known as “Evaluation on a rolling forecast origin”


Time series cross-validation

Assume k is the minimum number of observations for atraining set.

Select observation t + h for test set, and useobservations at times 1, 2, . . . , t to estimate model.Compute error on forecast for time t + h.Repeat for t = k, k + 1, . . . , T − h where T is totalnumber of observations.Compute accuracy measure over all errors.


Example: Pharmaceutical sales

10

20

30

1995 2000 2005

Year

$ m

illio

n

Antidiabetic drug sales



Which of these models is best?Linear model with trend and seasonal dummiesapplied to log data.ARIMA model applied to log dataETS model applied to original data

Forecast h = 12 steps ahead based on data to time tfor t = 1, 2, . . ..Compare MAE values for each forecast horizon.



0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10 11 12

horizon

MA

E

Forecast method

LM

ARIMA

ETS


Example: R code

f1 <- function(y,h) {forecast(tslm(y ~ trend + season, lambda=0), h=h)

}f2 <- function(y,h) {

forecast(auto.arima(y, D=1, lambda=0), h=h)}f3 <- function(y,h) {

forecast(ets(y), h=h)}e1 <- tsCV(a10, f1, h=12)e2 <- tsCV(a10, f2, h=12)e3 <- tsCV(a10, f3, h=12)mae <- data.frame(

LM = colMeans(abs(e1), na.rm=TRUE),ARIMA = colMeans(abs(e2), na.rm=TRUE),ETS = colMeans(abs(e3), na.rm=TRUE))


Hirotugu Akaike (1927–2009)

Akaike, H. (1974), “A new look at the statistical modelidentification”, IEEE Transactions on Automatic Control,19(6): 716–723.


Akaike’s Information CriterionAIC = −2 log(L) + 2k

where L is the model likelihood and k is the number ofestimated parameters in the model.

If L is Gaussian, then AIC ≈ c + T logMSE + 2kwhere c is a constant, MSE is from one-step forecastson training set, and T is the length of the series.

Minimizing the Gaussian AIC is asymptotically equivalent(as T →∞) to minimizing MSE from one-step forecasts ontest set via time series cross-validation.

AICc a bias-corrected small-sample version.AIC/AICcmuch faster than CV


Outline







Forecast Accuracy and Evaluation Probability scoring 51

Probablistic forecasting

400

600

800

1000

0 200 400 600 800 1000

Day

US

$



Probabilistic forecasting

How to evaluate a forecast probability distribution?

Forecast intervals: percentage of observationscovered compared to nominal percentage.Density forecastingQuantile forecastingDistribution forecasting


Probability Integral TransformLet F = cdf of Y and Z = F(Y). If F is continuous, then Z isstandard uniform.

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Probability Integral Transform

Y

Z=

F(Y

)

F(Y)


CalibrationYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf.

Calibration(a) F is marginally calibrated if E[F(y)] = P(Y ≤ y) ∀y ∈ R.(b) F is probabilistically calibrated if Z = F(Y) has a

standard uniform distribution.

å We could plot a histogram of Z = F(Y) and check thatit looks uniform.

å This is a more sophisticated version of testing ifprediction intervals have the correct coverage.


SharpnessYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf.

Sharpnesså A “sharp” forecast distribution has narrow prediction

intervals.

A good probabilistic forecast is both calibrated andsharp.Scoring rules combine calibration and sharpness in asingle measure.


Scoring rulesYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf. A scoring ruleassigns numerical score S(FT+h|T, yT+h).

Dawid-Sebastiani score:

DSS(F, y) =(y − µF)2

σ2F+ 2 logσF

Generalization of MSE assuming normality.

A “proper” scoring rule has the property:

EF[S(F, Y)] < EF[S(F, Y)]



Continuous Ranked Probability Score:

CRPS(F, y) =∫

[F(x)− 1{y≤x}]2dx = EF|Y − y| − 12EF|Y − Y′|

where Y and Y′ have cdf F. Generalization of MAE.

Continuous Ranked Probability Score:Let QT+h|T = F−1T+h|T be the forecast quantile function

CRPS(Q, y) = 2∫ 1

0

[Q(p)− y

] [1{y<Q(p)} − p

]dp



Energy ScoreES(F, y) = EF|Y − y|α − 1

2EF|Y − Y′|αwhere Y and Y′ have cdf F and α ∈ (0, 2].

Log ScorelogS(F, y) = − log f(y)

where f = dF/dy is density corresponding to F.


R packages


https://github.com/earowang/tsibble

http://pkg.earo.me/sugrrants

https://github.com/mitchelloharawild/fasster

http://pkg.robjhyndman.com/forecast

http://pkg.earo.me/hts

Textbook


OTexts.org/fpp2

forecast accuracy and evaluation - rob j hyndmanthe statistical forecasting perspective 2.5 5.0 7.5...

Documents