frequency independent automatic input variable selection
TRANSCRIPT
Universität Hamburg Institut für Wirtschaftsinformatik
Prof. Dr. D.B. Preßmar
Frequency independent automatic input variable
selection for neural networks for forecasting
www.lancs.ac.uk
NikolaosNikolaos KourentzesKourentzesSven F. CroneSven F. Crone
LUMS –Department of Management Science
MotivationMotivation
► Large numbers of univariate time series are often needed to be forecasted
automatically in business and other contexts [Hyndman & Khandakar, 08]
Large Scale Automatic Forecasting ProblemsLarge Scale Automatic Forecasting Problems
10,000 + productsForecast daily
► Questions: Appropriate forecasting method? Correct specification? [Goodrich, 00]
► Typically time series periodicity is provided by experts� Fully automatic?
Forecast daily
Automatic forecasting ���� Necessary!
MotivationMotivationWhy Neural Networks?Why Neural Networks?
► Promising performance 64% (out of 126) articles found ANNs outperforming
benchmarks [Kourentzes, 10] (73% according to Adya & Collopy, 98)
► Large scale studies (100+ time series) NN at least as good as benchmarks [Hill et al., 96,
� Evidence of automatic forecasting with NNs
NN in Business Time Series Forecasting NN in Business Time Series Forecasting
Liao & Fildes, 05] � Evidence of automatic forecasting with NNs [Crone & Kourentzes, 10]
► Forecasting Competitions (M3) → NN lower accuracy than statistical models[Makridakis & Hibbon, 00]
► NN produce unreliable forecasts → criticised to “offer little promise even after
much research“ [Armstrong, 06]
Why? What does NN research suggest?
MotivationMotivationFocus on the input vectorFocus on the input vector
► Problem caused by inconsistent trial and error modelling approaches [Zhang et al., 98]
► Input variable selection
Modelling complexity gives rise to the problemsModelling complexity gives rise to the problems
� The most important issue in forecasting with NN [Zhang, 01, Zhang et al., 01, Zhang et al.,
98, Darbellay and Slama, 00 ]
� No widely accepted methodology on how to select the input variables [Anders
and Korn, 99, Zhang et al., 98]
► Fully automatic input selection implies knowledge of time series
frequency/periodicities � Ignored in “automated forecasting applications”
Focus on frequency identification & the input variables selection
Iterative Neural FilterIterative Neural Filter
► Step 1. Identify seasonal frequencies using the Iterative Neural Filter (INF)
A methodology to identify seasonal frequencies and inputsA methodology to identify seasonal frequencies and inputs
► Step 2. Identify lagged inputs
► Step 3. Fit model & produce forecasts
► Split time series in different possible seasonalities → find mean euclidean
distance
Iterative Neural FilterIterative Neural Filter
Step 1. Identify seasonal frequencies
Euclidean Distance to Identify SeasonalityEuclidean Distance to Identify Seasonality
0
1
Y = sin(2πt/12)
2 4-1
0
1
t
yt
s = 5 - Distance: 0.847
2 4 6 8 10 12-1
0
1
t
yt
s = 12 - Distance: 0
5 10 15-1
0
1
t
yt
s = 19 - Distance: 0.962
5 10 15 20-1
0
1
t
yt
s = 24 - Distance: 0
5 10 15 20 25 30 35 40 45 50-1
Multiple Multiple seasonalitiesseasonalities (12, 24, 36, ...) (12, 24, 36, ...) �� Identification problemIdentification problem
► Split time series in different possible seasonality (periodicity) → find
euclidean distance
Iterative Neural FilterIterative Neural Filter
Step 1. Identify seasonal frequencies
Penalised Euclidean DistancePenalised Euclidean Distance
20
40
60
Mean
Euclidean
Distance:
Season 4
20
40
60
Mean
Euclidean
Distance:
Season 7
0 50 1001
1.5
2
2.5
Season
Dis
tance
Penalised Euclidean Distance)log()1log()( sDsD sps τ−+=
1 2 3 40
20Distance:
391.62 4 6
0
20 Distance:
171.2
Identify seasonality avoiding multiples
Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies
I1
I2
H1
.
.
.
.O
1Y
=S
tt
πψ
2sin)(1
=S
tt
πψ
2cos)(2
Identified seasonality
Deterministic inputs
The iterative neural filter removes each identified seasonalities ���� explore remaining information for additional seasonalities
I3
I4
.
.H
n
.
.
1Y
1
If more seasonalities are identified add more inputs in each iteration...
=S
tt
πψ
2sin)(1
=S
tt
πψ
2cos)(2
S
tt =)(3ψ
1)(4 +−= tNtψ
Trends, level shifts, etc
Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies
50 100 150 200450
500
550
t
Input Time series
Iteration 1:
0 50 1001
1.5
2
2.5
Season
Dis
tance
Penalised Euclidean Distance
Penalised Distance
50 100 150 200450
500
550
t
INF output
Season = 1 � Stop!
50 100 150 200-30
-20
-10
0
10
20
30
t
Input Time series
0 50 1000.8
1
1.2
1.4
1.6
1.8
Season
Dis
tan
ce
Penalised Distance
Subtract the INF output from the input time series and repeat
Iteration 2:
Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies
Iterative Neural FilterIterative Neural FilterStep 2. Identify inputs
Fit two competing regressions:
t
N
j
S
i
jitjitD
s j
dbMaY ε+++= ∑∑= =1 1
ˆDeterministic
t
N
j
SjtjtS
s
YbaY ε++= ∑=
−1
ˆStochastic
Moving average of order max(Sj)
� Compare using AIC
Iterative Neural FilterIterative Neural FilterStep 2. Identify inputs
►Use stepwise regression►Use stepwise regression
� Force as initial inputs the pre-identified inputs
(deterministic/stochastic)
► Input vector identified
Experimental SetupExperimental SetupTime series
► Synthetic time series:
• Deterministic / Stochastic
• Four different levels of noise (None, Low, Medium, High)
• Quarterly and monthly seasonality & Day of the week and year double
seasonality
Empirical Evaluation – Time series
seasonality
• Total time series: 520
►Real time series:
• US air passenger miles
• Average bus ridership for Portland Oregon
• Total number of room nights and takings in Victoria
• Number of serious injuries and deaths in UK road accidents
ResultsResults
►Use INF to identify inputs for neural networks (Primed NN)
►Use NNs with automatically identified inputs as benchmarks (Stoch_NN)
� Inputs identified using regression [Swanson & White, 98, Kourentzes, 10]
►Use exponential smoothing as statistical benchmark (EXSM)
� Robust and accurate benchmark [Makridakis & Hibbon, 00]
� Use INF seasonality output to setup seasonal models
Synthetic Data Real Data
Subset Primed_NN Stoch_NN EXSM Subset Primed_NN Stoch_NN EXSM
Train 7.25% 7.45% 7.68% Train 8.52% 7.86% 7.82%
Valid 7.16% 7.47% 7.52% Valid 5.06% 5.91% 6.83%
Test 7.37% 7.70% 7.47% Test 7.86% 11.95% 10.72%
ConclusionsConclusions
► Proposed methodology identifies seasonal frequencies and inputs for neural
networks automatically
►Outperforms statistical and neural network benchmarks
► INF is useful to fully automate other forecasting methods � seasonal frequency► INF is useful to fully automate other forecasting methods � seasonal frequency
identification without the need for human experts
► Future work: Introduce stochastic elements in IMF to separate more accurately
the seasonal components
Nikolaos KourentzesLancaster University Management SchoolCentre for ForecastingLancaster, LA1 4YX, UKTel. +44 (0) 7960271368email [email protected]