forecast accuracy and evaluation - rob j hyndmanthe statistical forecasting perspective 2.5 5.0 7.5...
TRANSCRIPT
Forecast Accuracy and Evaluation 1
2017 Beijing Workshop onForecasting
Forecast Accuracyand Evaluation
Rob J Hyndman
robjhyndman.com/beijing2017
Outline
1 The statistical forecasting perspective
2 Some simple forecasting methods
3 Forecasting residuals
4 Measuring forecast accuracy
5 Time series cross-validation
6 Probability scoring
Forecast Accuracy and Evaluation The statistical forecasting perspective 2
The statistical forecasting perspective
2.5
5.0
7.5
10.0
1980 1990 2000 2010 2020
Year
Mill
ions
of v
isito
rs
Total international visitors to Australia
Forecast Accuracy and Evaluation The statistical forecasting perspective 3
The statistical forecasting perspective
2.5
5.0
7.5
10.0
1980 1990 2000 2010 2020
Year
Mill
ions
of v
isito
rs
Total international visitors to Australia
Forecast Accuracy and Evaluation The statistical forecasting perspective 3
Forecasting is estimating how thesequence of observations willcontinue into the future.
Sample futures
2.5
5.0
7.5
10.0
1980 1990 2000 2010 2020
Year
Mill
ions
of v
isito
rs
Data
Future 1
Future 2
Future 3
Future 4
Future 5
Future 6
Future 7
Future 8
Future 9
Future 10
Total international visitors to Australia
Forecast Accuracy and Evaluation The statistical forecasting perspective 4
Forecast intervals
2.5
5.0
7.5
10.0
1980 1990 2000 2010 2020
Year
Mill
ions
of v
isito
rs
level
80
95
Total international visitors to Australia
Forecast Accuracy and Evaluation The statistical forecasting perspective 5
Statistical forecasting
Thing to be forecast: a random variable, yt.
Forecast distributions:yt|t−1 = yt|{y1, y2, . . . , yt−1}yT+h|T = yT+h|{y1, y2, . . . , yT}
The “point forecast” is the mean (or median) ofyT+h|T = yT+h|{y1, y2, . . . , yT}The “forecast variance” is Var[yT+h|y1, y2, . . . , yT]A prediction interval or “interval forecast” is a rangeof values of yt with high probability.
Forecast Accuracy and Evaluation The statistical forecasting perspective 6
Outline
1 The statistical forecasting perspective
2 Some simple forecasting methods
3 Forecasting residuals
4 Measuring forecast accuracy
5 Time series cross-validation
6 Probability scoring
Forecast Accuracy and Evaluation Some simple forecasting methods 7
Some simple forecasting methods
400
450
500
1995 2000 2005
Year
meg
alitr
es
Australian quarterly beer production
Forecast Accuracy and Evaluation Some simple forecasting methods 8How would you forecast these data?
Some simple forecasting methods
80
90
100
110
1990 1991 1992 1993 1994
Year
thou
sand
s
Number of pigs slaughtered in Victoria
Forecast Accuracy and Evaluation Some simple forecasting methods 9How would you forecast these data?
Some simple forecasting methods
400
500
600
700
800
0 200 400 600 800
Day
US
$
Google Stock Price (800 trading days from 25 February 2013)
Forecast Accuracy and Evaluation Some simple forecasting methods 10How would you forecast these data?
Some simple forecasting methodsAverage method
Forecast of all future values is equal to mean of historicaldata {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · · + yT)/T
Naïve methodForecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.
Seasonal naïve methodForecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km wherem = seasonal period andk = b(h− 1)/mc+1.
Forecast Accuracy and Evaluation Some simple forecasting methods 11
Some simple forecasting methodsAverage method
Forecast of all future values is equal to mean of historicaldata {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · · + yT)/T
Naïve methodForecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.
Seasonal naïve methodForecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km wherem = seasonal period andk = b(h− 1)/mc+1.
Forecast Accuracy and Evaluation Some simple forecasting methods 11
Some simple forecasting methodsAverage method
Forecast of all future values is equal to mean of historicaldata {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · · + yT)/T
Naïve methodForecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.
Seasonal naïve methodForecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km wherem = seasonal period andk = b(h− 1)/mc+1.
Forecast Accuracy and Evaluation Some simple forecasting methods 11
Some simple forecasting methods
Drift methodForecasts equal to last value plus average change.Forecasts:
yT+h|T = yT +h
T − 1
T∑t=2
(yt − yt−1)
= yT +h
T − 1(yT − y1).
Equivalent to extrapolating a line drawn between firstand last observations.
Forecast Accuracy and Evaluation Some simple forecasting methods 12
Some simple forecasting methods
400
450
500
1995 2000 2005 2010
Year
Meg
alitr
es
Forecast
Mean
Naïve
Seasonal naïve
Forecasts for quarterly beer production
Forecast Accuracy and Evaluation Some simple forecasting methods 13
Some simple forecasting methods
80
90
100
110
1990 1991 1992 1993 1994 1995
Year
thou
sand
s series
Mean
Naïve
Seasonal naïve
Number of pigs slaughtered in Victoria
Forecast Accuracy and Evaluation Some simple forecasting methods 14
Some simple forecasting methods
400
500
600
700
800
0 200 400 600 800 1000
Day
US
$
Forecast
Drift
Mean
Naïve
Google Stock Price (800 trading days from 25 February 2013)
Forecast Accuracy and Evaluation Some simple forecasting methods 15
Outline
1 The statistical forecasting perspective
2 Some simple forecasting methods
3 Forecasting residuals
4 Measuring forecast accuracy
5 Time series cross-validation
6 Probability scoring
Forecast Accuracy and Evaluation Forecasting residuals 16
Fitted values
yt|t−1 is the forecast of yt based on observationsy1, . . . , yt−1.We call these “fitted values”.Sometimes drop the subscript: yt ≡ yt|t−1.Often not true forecasts since parameters areestimated on all data.
Examples:1 yt = y for average method.2 yt = yt−1 for naive method3 yt = yt−1 + (yT − y1)/(T − 1) for drift method.4 yt = yt−m for seasonal naive method
Forecast Accuracy and Evaluation Forecasting residuals 17
Fitted values
yt|t−1 is the forecast of yt based on observationsy1, . . . , yt−1.We call these “fitted values”.Sometimes drop the subscript: yt ≡ yt|t−1.Often not true forecasts since parameters areestimated on all data.
Examples:1 yt = y for average method.2 yt = yt−1 for naive method3 yt = yt−1 + (yT − y1)/(T − 1) for drift method.4 yt = yt−m for seasonal naive method
Forecast Accuracy and Evaluation Forecasting residuals 17
Forecasting residualsResiduals in forecasting: difference between observedvalue and its fitted value: et = yt − yt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left inresiduals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts arebiased.
Useful properties (for prediction intervals)
3 {et} have constant variance.4 {et} are normally distributed.
Forecast Accuracy and Evaluation Forecasting residuals 18
Forecasting residualsResiduals in forecasting: difference between observedvalue and its fitted value: et = yt − yt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left inresiduals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts arebiased.
Useful properties (for prediction intervals)
3 {et} have constant variance.4 {et} are normally distributed.
Forecast Accuracy and Evaluation Forecasting residuals 18
Forecasting residualsResiduals in forecasting: difference between observedvalue and its fitted value: et = yt − yt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left inresiduals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts arebiased.
Useful properties (for prediction intervals)
3 {et} have constant variance.4 {et} are normally distributed.
Forecast Accuracy and Evaluation Forecasting residuals 18
Example: Australian beer production
Seasonal naïve forecast:
yt|t−1 = yt−4 et = yt− yt−4
400
450
500
1995 2000 2005
Year
Meg
alitr
es
Australian quarterly beer production
Forecast Accuracy and Evaluation Forecasting residuals 19
Example: Australian beer production
Seasonal naïve forecast:
yt|t−1 = yt−4 et = yt− yt−4
−40
−20
0
20
1995 2000 2005
Year
Meg
alitr
es
Residuals from seasonal naive method
Forecast Accuracy and Evaluation Forecasting residuals 20
Example: Australian beer production
Seasonal naïve forecast:
yt|t−1 = yt−4 et = yt− yt−4
−0.4
−0.2
0.0
0.2
4 8 12 16
Lag
AC
F
Residuals from seasonal naive method
Forecast Accuracy and Evaluation Forecasting residuals 21
Example: Australian beer production
Seasonal naïve forecast:
yt|t−1 = yt−4 et = yt− yt−4
0
5
10
15
20
−60 −30 0 30 60
res
coun
t
Residuals from seasonal naive method
Forecast Accuracy and Evaluation Forecasting residuals 22
Forecasting residuals
Minimizing the size of forecasting residuals is used forestimating model parameters (e.g., minimizing MSEor maximizing likelihood.In general, forecasting residuals cannot be used(directly) for estimating forecast accuracy.Forecast accuracy can only be measured usinggenuine forecasts; i.e., on different data.Forecasting residuals can help suggest modelimprovements.
Forecast Accuracy and Evaluation Forecasting residuals 23
Outline
1 The statistical forecasting perspective
2 Some simple forecasting methods
3 Forecasting residuals
4 Measuring forecast accuracy
5 Time series cross-validation
6 Probability scoring
Forecast Accuracy and Evaluation Measuring forecast accuracy 24
Training and test sets
timeTraining data Test data
The test set must not be used for any aspect of modeldevelopment or calculation of forecasts.Forecast accuracy is based only on the test set.A model which fits the data well does not necessarilyforecast well.A perfect fit can always be obtained by using a modelwith enough parameters. (Compare R2)Problems can be overcome by measuring trueout-of-sample forecast accuracy. Training set used toestimate parameters. Forecasts are made for test set.
Forecast Accuracy and Evaluation Measuring forecast accuracy 25
Measures of forecast accuracyTraining set: T observationsTest set: H observations
MAE =1H
H∑h=1|yT+h − yT+h|T|
MSE =1H
H∑h=1
(yT+h − yT+h|T)2 RMSE =
√√√√√1H
H∑h=1
(yT+h − yT+h|T)2
MAPE =100H
H∑h=1|yT+h − yT+h|T|/|yT+h|
MAE, MSE, RMSE are all scale dependent.MAPE is scale independent but is only sensible if yT+h � 0for all h, and y has a natural zero.
Forecast Accuracy and Evaluation Measuring forecast accuracy 26
Measures of forecast accuracyTraining set: T observationsTest set: H observations
MAE =1H
H∑h=1|yT+h − yT+h|T|
MSE =1H
H∑h=1
(yT+h − yT+h|T)2 RMSE =
√√√√√1H
H∑h=1
(yT+h − yT+h|T)2
MAPE =100H
H∑h=1|yT+h − yT+h|T|/|yT+h|
MAE, MSE, RMSE are all scale dependent.MAPE is scale independent but is only sensible if yT+h � 0for all h, and y has a natural zero.
Forecast Accuracy and Evaluation Measuring forecast accuracy 26
Measures of forecast accuracyMean Absolute Scaled Error
MASE =1H
H∑h=1|yT+h − yT+h|T|/Q
where Q is a stable measure of the scale of the time series{yt}.
Proposed by Hyndman and Koehler (IJF, 2006).For non-seasonal time series,
Q =1
T − 1
T∑t=2|yt − yt−1|
works well. Then MASE is equivalent to MAE relative to anaïve method.
Forecast Accuracy and Evaluation Measuring forecast accuracy 27
Measures of forecast accuracyMean Absolute Scaled Error
MASE =1H
H∑h=1|yT+h − yT+h|T|/Q
where Q is a stable measure of the scale of the time series{yt}.
Proposed by Hyndman and Koehler (IJF, 2006).For seasonal time series,
Q =1
T −m
T∑t=m+1|yt − yt−m|
works well. Then MASE is equivalent to MAE relative to aseasonal naïve method.
Forecast Accuracy and Evaluation Measuring forecast accuracy 28
Measures of forecast accuracy
400
450
500
1995 2000 2005 2010
Year
Meg
alitr
es
Forecast Method
Mean
Naive
SeasonalNaive
Forecasts for quarterly beer production
Forecast Accuracy and Evaluation Measuring forecast accuracy 29
Measures of forecast accuracy
RMSE MAE MAPE MASE
Mean method 38.5 34.8 8.28 2.44Naïve method 62.7 57.4 14.18 4.01Seasonal naïve method 14.3 13.4 3.17 0.94
Forecast Accuracy and Evaluation Measuring forecast accuracy 30
Measures of forecast accuracyScaling can be used with any measure, and with differentscaling statistics.Mean Squared Scaled Error
MASE =1H
H∑h=1
(yT+h − yT+h|T)2/Q
where Q =1
T −m
T∑t=m+1
(yt − yt−m)2
Assumes {yt} is difference stationary.Minimizing MSSE leads to conditional mean forecasts.MSSE< 1 : out-of-sample multi-step forecasts aremore accurate than in-sample one-step forecasts.
Forecast Accuracy and Evaluation Measuring forecast accuracy 31
Measures of forecast accuracy
Many suggested scale-free measures of forecastaccuracy are degenerate due to infinite variance.The denominator must be positive with probabilityone.Distribution of most measures are highly skewedwhen applied to real data.
Forecast Accuracy and Evaluation Measuring forecast accuracy 32
Poll: true or false?
1 Good forecast methods should have normallydistributed residuals.
2 A model with small residuals will give good forecasts.3 The best measure of forecast accuracy is MAPE.4 If your model doesn’t forecast well, you should make
it more complicated.5 Always choose the model with the best forecast
accuracy as measured on the test set.
Forecast Accuracy and Evaluation Measuring forecast accuracy 33
Outline
1 The statistical forecasting perspective
2 Some simple forecasting methods
3 Forecasting residuals
4 Measuring forecast accuracy
5 Time series cross-validation
6 Probability scoring
Forecast Accuracy and Evaluation Time series cross-validation 34
Cross-validationTraditional evaluation
timeTraining data Test data
Leave-one-out cross-validation
Forecast Accuracy and Evaluation Time series cross-validation 35
Cross-validationTime series cross-validation
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
Forecast Accuracy and Evaluation Time series cross-validation 36
Cross-validationTime series cross-validation
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
h = 1
Forecast Accuracy and Evaluation Time series cross-validation 37
Cross-validationTime series cross-validation
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
h = 2
Forecast Accuracy and Evaluation Time series cross-validation 38
Cross-validationTime series cross-validation
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
h = 3
Forecast Accuracy and Evaluation Time series cross-validation 39
Cross-validationTime series cross-validation
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
h = 4
Forecast Accuracy and Evaluation Time series cross-validation 40
Cross-validationTime series cross-validation
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
h = 5
Forecast Accuracy and Evaluation Time series cross-validation 41
Cross-validationTime series cross-validation
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
h = 6
Forecast Accuracy and Evaluation Time series cross-validation 42
Cross-validationTime series cross-validation
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
time
Also known as “Evaluation on a rolling forecast origin”
Forecast Accuracy and Evaluation Time series cross-validation 43
Time series cross-validation
Assume k is the minimum number of observations for atraining set.
Select observation t + h for test set, and useobservations at times 1, 2, . . . , t to estimate model.Compute error on forecast for time t + h.Repeat for t = k, k + 1, . . . , T − h where T is totalnumber of observations.Compute accuracy measure over all errors.
Forecast Accuracy and Evaluation Time series cross-validation 44
Example: Pharmaceutical sales
10
20
30
1995 2000 2005
Year
$ m
illio
n
Antidiabetic drug sales
Forecast Accuracy and Evaluation Time series cross-validation 45
Example: Pharmaceutical sales
Which of these models is best?Linear model with trend and seasonal dummiesapplied to log data.ARIMA model applied to log dataETS model applied to original data
Forecast h = 12 steps ahead based on data to time tfor t = 1, 2, . . ..Compare MAE values for each forecast horizon.
Forecast Accuracy and Evaluation Time series cross-validation 46
Example: Pharmaceutical sales
Which of these models is best?Linear model with trend and seasonal dummiesapplied to log data.ARIMA model applied to log dataETS model applied to original data
Forecast h = 12 steps ahead based on data to time tfor t = 1, 2, . . ..Compare MAE values for each forecast horizon.
Forecast Accuracy and Evaluation Time series cross-validation 46
Example: Pharmaceutical sales
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12
horizon
MA
E
Forecast method
LM
ARIMA
ETS
Forecast Accuracy and Evaluation Time series cross-validation 47
Example: R code
f1 <- function(y,h) {forecast(tslm(y ~ trend + season, lambda=0), h=h)
}f2 <- function(y,h) {
forecast(auto.arima(y, D=1, lambda=0), h=h)}f3 <- function(y,h) {
forecast(ets(y), h=h)}e1 <- tsCV(a10, f1, h=12)e2 <- tsCV(a10, f2, h=12)e3 <- tsCV(a10, f3, h=12)mae <- data.frame(
LM = colMeans(abs(e1), na.rm=TRUE),ARIMA = colMeans(abs(e2), na.rm=TRUE),ETS = colMeans(abs(e3), na.rm=TRUE))
Forecast Accuracy and Evaluation Time series cross-validation 48
Hirotugu Akaike (1927–2009)
Akaike, H. (1974), “A new look at the statistical modelidentification”, IEEE Transactions on Automatic Control,19(6): 716–723.
Forecast Accuracy and Evaluation Time series cross-validation 49
Akaike’s Information CriterionAIC = −2 log(L) + 2k
where L is the model likelihood and k is the number ofestimated parameters in the model.
If L is Gaussian, then AIC ≈ c + T logMSE + 2kwhere c is a constant, MSE is from one-step forecastson training set, and T is the length of the series.
Minimizing the Gaussian AIC is asymptotically equivalent(as T →∞) to minimizing MSE from one-step forecasts ontest set via time series cross-validation.
AICc a bias-corrected small-sample version.AIC/AICcmuch faster than CV
Forecast Accuracy and Evaluation Time series cross-validation 50
Akaike’s Information CriterionAIC = −2 log(L) + 2k
where L is the model likelihood and k is the number ofestimated parameters in the model.
If L is Gaussian, then AIC ≈ c + T logMSE + 2kwhere c is a constant, MSE is from one-step forecastson training set, and T is the length of the series.
Minimizing the Gaussian AIC is asymptotically equivalent(as T →∞) to minimizing MSE from one-step forecasts ontest set via time series cross-validation.
AICc a bias-corrected small-sample version.AIC/AICcmuch faster than CV
Forecast Accuracy and Evaluation Time series cross-validation 50
Akaike’s Information CriterionAIC = −2 log(L) + 2k
where L is the model likelihood and k is the number ofestimated parameters in the model.
If L is Gaussian, then AIC ≈ c + T logMSE + 2kwhere c is a constant, MSE is from one-step forecastson training set, and T is the length of the series.
Minimizing the Gaussian AIC is asymptotically equivalent(as T →∞) to minimizing MSE from one-step forecasts ontest set via time series cross-validation.
AICc a bias-corrected small-sample version.AIC/AICcmuch faster than CV
Forecast Accuracy and Evaluation Time series cross-validation 50
Akaike’s Information CriterionAIC = −2 log(L) + 2k
where L is the model likelihood and k is the number ofestimated parameters in the model.
If L is Gaussian, then AIC ≈ c + T logMSE + 2kwhere c is a constant, MSE is from one-step forecastson training set, and T is the length of the series.
Minimizing the Gaussian AIC is asymptotically equivalent(as T →∞) to minimizing MSE from one-step forecasts ontest set via time series cross-validation.
AICc a bias-corrected small-sample version.AIC/AICcmuch faster than CV
Forecast Accuracy and Evaluation Time series cross-validation 50
Outline
1 The statistical forecasting perspective
2 Some simple forecasting methods
3 Forecasting residuals
4 Measuring forecast accuracy
5 Time series cross-validation
6 Probability scoring
Forecast Accuracy and Evaluation Probability scoring 51
Probablistic forecasting
400
600
800
1000
0 200 400 600 800 1000
Day
US
$
Google Stock Price (800 trading days from 25 February 2013)
Forecast Accuracy and Evaluation Probability scoring 52
Probabilistic forecasting
How to evaluate a forecast probability distribution?
Forecast intervals: percentage of observationscovered compared to nominal percentage.Density forecastingQuantile forecastingDistribution forecasting
Forecast Accuracy and Evaluation Probability scoring 53
Probability Integral TransformLet F = cdf of Y and Z = F(Y). If F is continuous, then Z isstandard uniform.
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Probability Integral Transform
Y
Z=
F(Y
)
F(Y)
Forecast Accuracy and Evaluation Probability scoring 54
CalibrationYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf.
Calibration(a) F is marginally calibrated if E[F(y)] = P(Y ≤ y) ∀y ∈ R.(b) F is probabilistically calibrated if Z = F(Y) has a
standard uniform distribution.
å We could plot a histogram of Z = F(Y) and check thatit looks uniform.
å This is a more sophisticated version of testing ifprediction intervals have the correct coverage.
Forecast Accuracy and Evaluation Probability scoring 55
SharpnessYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf.
Sharpnesså A “sharp” forecast distribution has narrow prediction
intervals.
A good probabilistic forecast is both calibrated andsharp.Scoring rules combine calibration and sharpness in asingle measure.
Forecast Accuracy and Evaluation Probability scoring 56
Scoring rulesYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf. A scoring ruleassigns numerical score S(FT+h|T, yT+h).
Dawid-Sebastiani score:
DSS(F, y) =(y − µF)2
σ2F+ 2 logσF
Generalization of MSE assuming normality.
A “proper” scoring rule has the property:
EF[S(F, Y)] < EF[S(F, Y)]
Forecast Accuracy and Evaluation Probability scoring 57
Scoring rulesYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf. A scoring ruleassigns numerical score S(FT+h|T, yT+h).
Dawid-Sebastiani score:
DSS(F, y) =(y − µF)2
σ2F+ 2 logσF
Generalization of MSE assuming normality.
A “proper” scoring rule has the property:
EF[S(F, Y)] < EF[S(F, Y)]
Forecast Accuracy and Evaluation Probability scoring 57
Scoring rulesYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf. A scoring ruleassigns numerical score S(FT+h|T, yT+h).
Dawid-Sebastiani score:
DSS(F, y) =(y − µF)2
σ2F+ 2 logσF
Generalization of MSE assuming normality.
A “proper” scoring rule has the property:
EF[S(F, Y)] < EF[S(F, Y)]
Forecast Accuracy and Evaluation Probability scoring 57
Scoring rulesYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf. A scoring ruleassigns numerical score S(FT+h|T, yT+h).
Continuous Ranked Probability Score:
CRPS(F, y) =∫
[F(x)− 1{y≤x}]2dx = EF|Y − y| − 12EF|Y − Y′|
where Y and Y′ have cdf F. Generalization of MAE.
Continuous Ranked Probability Score:Let QT+h|T = F−1T+h|T be the forecast quantile function
CRPS(Q, y) = 2∫ 1
0
[Q(p)− y
] [1{y<Q(p)} − p
]dp
Forecast Accuracy and Evaluation Probability scoring 58
Scoring rulesYT+h|T has cdf FT+h|T. FT+h|T is our forecast cdf. A scoring ruleassigns numerical score S(FT+h|T, yT+h).
Energy ScoreES(F, y) = EF|Y − y|α − 1
2EF|Y − Y′|αwhere Y and Y′ have cdf F and α ∈ (0, 2].
Log ScorelogS(F, y) = − log f(y)
where f = dF/dy is density corresponding to F.
Forecast Accuracy and Evaluation Probability scoring 59
R packages
Forecast Accuracy and Evaluation Probability scoring 60
https://github.com/earowang/tsibble
http://pkg.earo.me/sugrrants
https://github.com/mitchelloharawild/fasster
http://pkg.robjhyndman.com/forecast
http://pkg.earo.me/hts
Textbook
Forecast Accuracy and Evaluation Probability scoring 61
OTexts.org/fpp2