frequency independent automatic input variable selection

16
Universität Hamburg Institut für Wirtschaftsinformatik Prof. Dr. D.B. Preßmar Frequency independent automatic input variable selection for neural networks for forecasting www.lancs.ac.uk Nikolaos Nikolaos Kourentzes Kourentzes Sven F. Crone Sven F. Crone LUMS – Department of Management Science

Upload: others

Post on 06-Apr-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Frequency independent automatic input variable selection

Universität Hamburg Institut für Wirtschaftsinformatik

Prof. Dr. D.B. Preßmar

Frequency independent automatic input variable

selection for neural networks for forecasting

www.lancs.ac.uk

NikolaosNikolaos KourentzesKourentzesSven F. CroneSven F. Crone

LUMS –Department of Management Science

Page 2: Frequency independent automatic input variable selection

MotivationMotivation

► Large numbers of univariate time series are often needed to be forecasted

automatically in business and other contexts [Hyndman & Khandakar, 08]

Large Scale Automatic Forecasting ProblemsLarge Scale Automatic Forecasting Problems

10,000 + productsForecast daily

► Questions: Appropriate forecasting method? Correct specification? [Goodrich, 00]

► Typically time series periodicity is provided by experts� Fully automatic?

Forecast daily

Automatic forecasting ���� Necessary!

Page 3: Frequency independent automatic input variable selection

MotivationMotivationWhy Neural Networks?Why Neural Networks?

► Promising performance 64% (out of 126) articles found ANNs outperforming

benchmarks [Kourentzes, 10] (73% according to Adya & Collopy, 98)

► Large scale studies (100+ time series) NN at least as good as benchmarks [Hill et al., 96,

� Evidence of automatic forecasting with NNs

NN in Business Time Series Forecasting NN in Business Time Series Forecasting

Liao & Fildes, 05] � Evidence of automatic forecasting with NNs [Crone & Kourentzes, 10]

► Forecasting Competitions (M3) → NN lower accuracy than statistical models[Makridakis & Hibbon, 00]

► NN produce unreliable forecasts → criticised to “offer little promise even after

much research“ [Armstrong, 06]

Why? What does NN research suggest?

Page 4: Frequency independent automatic input variable selection

MotivationMotivationFocus on the input vectorFocus on the input vector

► Problem caused by inconsistent trial and error modelling approaches [Zhang et al., 98]

► Input variable selection

Modelling complexity gives rise to the problemsModelling complexity gives rise to the problems

� The most important issue in forecasting with NN [Zhang, 01, Zhang et al., 01, Zhang et al.,

98, Darbellay and Slama, 00 ]

� No widely accepted methodology on how to select the input variables [Anders

and Korn, 99, Zhang et al., 98]

► Fully automatic input selection implies knowledge of time series

frequency/periodicities � Ignored in “automated forecasting applications”

Focus on frequency identification & the input variables selection

Page 5: Frequency independent automatic input variable selection

Iterative Neural FilterIterative Neural Filter

► Step 1. Identify seasonal frequencies using the Iterative Neural Filter (INF)

A methodology to identify seasonal frequencies and inputsA methodology to identify seasonal frequencies and inputs

► Step 2. Identify lagged inputs

► Step 3. Fit model & produce forecasts

Page 6: Frequency independent automatic input variable selection

► Split time series in different possible seasonalities → find mean euclidean

distance

Iterative Neural FilterIterative Neural Filter

Step 1. Identify seasonal frequencies

Euclidean Distance to Identify SeasonalityEuclidean Distance to Identify Seasonality

0

1

Y = sin(2πt/12)

2 4-1

0

1

t

yt

s = 5 - Distance: 0.847

2 4 6 8 10 12-1

0

1

t

yt

s = 12 - Distance: 0

5 10 15-1

0

1

t

yt

s = 19 - Distance: 0.962

5 10 15 20-1

0

1

t

yt

s = 24 - Distance: 0

5 10 15 20 25 30 35 40 45 50-1

Multiple Multiple seasonalitiesseasonalities (12, 24, 36, ...) (12, 24, 36, ...) �� Identification problemIdentification problem

Page 7: Frequency independent automatic input variable selection

► Split time series in different possible seasonality (periodicity) → find

euclidean distance

Iterative Neural FilterIterative Neural Filter

Step 1. Identify seasonal frequencies

Penalised Euclidean DistancePenalised Euclidean Distance

20

40

60

Mean

Euclidean

Distance:

Season 4

20

40

60

Mean

Euclidean

Distance:

Season 7

0 50 1001

1.5

2

2.5

Season

Dis

tance

Penalised Euclidean Distance)log()1log()( sDsD sps τ−+=

1 2 3 40

20Distance:

391.62 4 6

0

20 Distance:

171.2

Identify seasonality avoiding multiples

Page 8: Frequency independent automatic input variable selection

Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies

I1

I2

H1

.

.

.

.O

1Y

=S

tt

πψ

2sin)(1

=S

tt

πψ

2cos)(2

Identified seasonality

Deterministic inputs

The iterative neural filter removes each identified seasonalities ���� explore remaining information for additional seasonalities

I3

I4

.

.H

n

.

.

1Y

1

If more seasonalities are identified add more inputs in each iteration...

=S

tt

πψ

2sin)(1

=S

tt

πψ

2cos)(2

S

tt =)(3ψ

1)(4 +−= tNtψ

Trends, level shifts, etc

Page 9: Frequency independent automatic input variable selection

Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies

50 100 150 200450

500

550

t

Input Time series

Iteration 1:

0 50 1001

1.5

2

2.5

Season

Dis

tance

Penalised Euclidean Distance

Penalised Distance

50 100 150 200450

500

550

t

INF output

Season = 1 � Stop!

50 100 150 200-30

-20

-10

0

10

20

30

t

Input Time series

0 50 1000.8

1

1.2

1.4

1.6

1.8

Season

Dis

tan

ce

Penalised Distance

Subtract the INF output from the input time series and repeat

Iteration 2:

Page 10: Frequency independent automatic input variable selection

Iterative Neural FilterIterative Neural FilterStep 1. Identify seasonal frequencies

Page 11: Frequency independent automatic input variable selection

Iterative Neural FilterIterative Neural FilterStep 2. Identify inputs

Fit two competing regressions:

t

N

j

S

i

jitjitD

s j

dbMaY ε+++= ∑∑= =1 1

ˆDeterministic

t

N

j

SjtjtS

s

YbaY ε++= ∑=

−1

ˆStochastic

Moving average of order max(Sj)

� Compare using AIC

Page 12: Frequency independent automatic input variable selection

Iterative Neural FilterIterative Neural FilterStep 2. Identify inputs

►Use stepwise regression►Use stepwise regression

� Force as initial inputs the pre-identified inputs

(deterministic/stochastic)

► Input vector identified

Page 13: Frequency independent automatic input variable selection

Experimental SetupExperimental SetupTime series

► Synthetic time series:

• Deterministic / Stochastic

• Four different levels of noise (None, Low, Medium, High)

• Quarterly and monthly seasonality & Day of the week and year double

seasonality

Empirical Evaluation – Time series

seasonality

• Total time series: 520

►Real time series:

• US air passenger miles

• Average bus ridership for Portland Oregon

• Total number of room nights and takings in Victoria

• Number of serious injuries and deaths in UK road accidents

Page 14: Frequency independent automatic input variable selection

ResultsResults

►Use INF to identify inputs for neural networks (Primed NN)

►Use NNs with automatically identified inputs as benchmarks (Stoch_NN)

� Inputs identified using regression [Swanson & White, 98, Kourentzes, 10]

►Use exponential smoothing as statistical benchmark (EXSM)

� Robust and accurate benchmark [Makridakis & Hibbon, 00]

� Use INF seasonality output to setup seasonal models

Synthetic Data Real Data

Subset Primed_NN Stoch_NN EXSM Subset Primed_NN Stoch_NN EXSM

Train 7.25% 7.45% 7.68% Train 8.52% 7.86% 7.82%

Valid 7.16% 7.47% 7.52% Valid 5.06% 5.91% 6.83%

Test 7.37% 7.70% 7.47% Test 7.86% 11.95% 10.72%

Page 15: Frequency independent automatic input variable selection

ConclusionsConclusions

► Proposed methodology identifies seasonal frequencies and inputs for neural

networks automatically

►Outperforms statistical and neural network benchmarks

► INF is useful to fully automate other forecasting methods � seasonal frequency► INF is useful to fully automate other forecasting methods � seasonal frequency

identification without the need for human experts

► Future work: Introduce stochastic elements in IMF to separate more accurately

the seasonal components

Page 16: Frequency independent automatic input variable selection

Nikolaos KourentzesLancaster University Management SchoolCentre for ForecastingLancaster, LA1 4YX, UKTel. +44 (0) 7960271368email [email protected]