centro de investigaci³n operativa a decision support system
TRANSCRIPT
Centro de Investigación Operativa
I-2005-05
A decision support system methodology for automatic forecasting of time series
J. D. Bermúdez, J.V. Segura and
E. Vercher
March 2005
ISSN 1576-7264 Depósito legal A-646-2000
Centro de Investigación Operativa Universidad Miguel Hernández de Elche Avda. de la Universidad s/n 03202 Elche (Alicante) [email protected]
A decision support system methodology for automatic forecasting of time
series1
J.D. Bermúdeza, J.V. Segurab and E. Verchera,2
a Dpto. Estadística e Investigación Operativa,Universitat de València, C/ Dr. Moliner 50, 46100-
Burjassot, Valencia, Spain b Dpto. Estadística y Matemática Aplicada,Universidad Miguel Hernández de Elche, Avd. del Ferrocarril
s/n, 03202-Elche, Alicante, Spain
Abstract
Exponential procedures are widely used as forecasting techniques for
inventory control and business planning. This paper presents a
number of modifications to the generalized exponential smoothing
(Holt-Winters) approach to forecasting univariate time series and
proposes the result as a tool for decision support systems. The starting
values of level, trend and seasonal factors become additional
parameters in the optimization procedure, in order to make the
forecasts less sensitive to initial values and to extend the applicability
of exponential smoothing to short series of data. This procedure may
provide forecasts from different versions of exponential smoothing by
fitting the updated formulas of Holt-Winters and selects the best
method using a fuzzy multicriteria approach. In this paper we propose
a methodology that unifies the phases of estimation and model
selection into just one optimization framework which permits the
identification of robust solutions. It is compared to other forecasting
methods on the 111 series from the M–competition.
Keywords: Forecasting, Exponential smoothing, Holt-Winters method,
Non-linear programming, Fuzzy multi-objective programming
1 This work is partially supported by the Conselleria d’Educació, Ciència i Esport de la Generalitat Valenciana, grant
no.GV04B-090. 2 Corresponding author. Tel. +34-96-354-3793; fax: +34-96-354-4735. E-mail address: [email protected]
Bermúdez, Segura and Vercher 2
1. Introduction
Accurate forecasting is an essential tool for many management decisions, for both
strategic and tactical business planning. Advances in data analysis and software
capabilities have the potential to offer effective forecasting to anticipate future demands,
schedule productions and reduce inventories.
Traditional forecasting techniques rely on proper specifications of systems that we
assume to be understood. A statistical time series forecasting method is often applied to
a finite set of observations under certain assumptions regarding the data generating
process, which are that the variables change with time and their future values are related
in some way to their past values. Many statistical methods have been proposed to
forecast time series and it seems necessary for the forecaster to be familiar with a range
of possible models. An important forecasting tool is the ARIMA class of models. In
fitting an ARIMA model to time series data a Box-Jenkins approach (Box et al, 1994)
framework is used, which requires the forecaster to make judgements and have
experience and expertise.
Exponential smoothing methods are a class of methods that produce forecasts
with simple formulae, taking into account trend and seasonal effects of the data. These
procedures are widely used as forecasting techniques in inventory management and
sales forecasting, where a very large number of time series with similar properties may
arise. They are especially good for short-term forecasting. Recent papers have
stimulated renewed interest in the technique, putting exponential smoothing procedures
on a sound theoretical ground by identifying and examining the underlying statistical
models (Ord et al, 1997; Koehler et al, 2001). Moreover, the results of M-competitions
(Makridakis et al, 1998; Ord, 2001) confirmed that exponential smoothing methods give
reliable post-sample forecasts and indicate that in order to obtain more accurate ones it
would be worthwhile developing procedures that identify the most appropriate method
from a set of possible choices.
This paper deals with a decision support system methodology which searches for
model specifications that can replicate patterns of observed series, using a number of
modifications to the generalized Holt-Winters method. It is based on two issues:
considering the initial values of level, trend and seasonal factors as decision variables of
an optimization problem, and the perception that a model that minimizes three different
A decision support system for automatic forecasting 3
measures of fit will produce good forecasting results. Concerning the first issue, and
working with the mean squared error as a measure of fit to estimate the vector of
parameters, our approach shows a good performance with time series with a limited
amount of data (Bermúdez et al, 2004).
In this paper we propose an alternative model selection strategy based on a full
optimization framework, which permits both the specification of a broad class of
candidate methods and the selection of a method from within that class which is the
‘best’ in some sense. The procedure, using the update formulae of the generalized Holt-
Winters (HW) methods on a given time series, first provides the smoothing and initial
parameters which minimize one of the three measures of fit. At a second stage the
procedure uses a multi-objective formulation which jointly minimizes the error
measurements of fit while keeping the updating equations of the generalized HW
method. The selection of the model which is used for forecasting is determined by a
fuzzy framework.
Multiple objective decision problems are a well-known area in decision making
theory. There are several strategies for characterizing the efficient set of a multi-
objective problem in terms of optimal solutions of appropriate scalar optimization
problems. Within the context of fuzzy logic there are numerous books and papers
devoted to this matter (see, for instance, Zimmerman, 1978; Chanas, 1989; Delgado et
al, 1990; Lai and Hwang, 1996). To deal with imprecision, goal programming has also
provided some useful methodology which takes into account the decision-maker’s
preferences (see, for instance, Ramik (2000), Jimenez et al (2004)).
Our proposal for solving the non-linear multi-objective problem incorporates all
the information previously obtained at the first stage of the procedure, and selects the
fuzzy decision as in Zimmermann’s approach (Zimmermann, 1978). Assuming that both
fuzzy goals and specified membership functions are known for each of the objective
functions, the problem matches a fuzzy version of the multi-objective programming
problem under the given crisp constraints (Sakawa, 1998).
The outline of the paper is as follows. The following section describes a general
approach to the point forecast equations of some exponential smoothing methods from
the generalized HW updating formulas. We discuss estimation and model selection
criteria in Section 3 by using crisp and fuzzy mathematical programming techniques. In
Section 4 some numerical examples illustrate the performance of our proposal. Section
Bermúdez, Segura and Vercher 4
5 presents the empirical results of applying our approach to a set of 111 series used in
the M-competition along with a comparison with existing methods in previous studies.
2. The Holt-Winters forecasting procedures
The Holt-Winters method was originally developed by Winters (1960) and it
involves estimating three smoothing parameters associated with level, trend and
seasonal factors. The seasonal variation can be of either an additive or multiplicative
form. It is known that the additive seasonal form is optimal for a seasonal ARIMA
model. Recently, a class of dynamic non-linear state space models has led to several
models for which the optimal set of updating equations is close to the multiplicative
form of Holt-Winters (Ord et al, 1997), and hence this method may enjoy the
advantages that forecasting procedures based on a proper statistical model have.
Moreover, this class of models allows the forecast error variance to depend on the trend
and/or the seasonality (Chatfield et al (2001) and Koehler et al (2001)).
Let us introduce some notation and expressions. An observed time series is
denoted by {Dt}t=1,…,n. The forecast of Dn+h made at time n for the h-th period into the
future is denoted by Fn(h), h≥1. In the multiplicative form of seasonal effects, the
equations that enable the level, Lt, trend, bt, and seasonal indices, It, to be updated based
on a new observation Dt, are the following:
1 1(1 )( )tt t t
t p
DL L bI
α α − −−
= + − + (2.1)
1 1( ) (1 )t t t tb L L bβ β− −= − + − (2.2)
(1 )tt t p
t
DI IL
γ γ −= + − (2.3)
where α, β, and γ are the smoothing parameters associated with level, trend and
seasonal effects respectively (with values between 0 and 1), and with p being the
number of observations per seasonal cycle. Winters (1960) recommends that the
seasonal factors be normalized at the beginning of the series in such a way that, in the
additive case, the seasonal indices must add up to zero and, in the multiplicative one, the
seasonal indices are constrained so that they average to unity.
A decision support system for automatic forecasting 5
The (2.1)-(2.3) formulas are appropriate when there is a linear trend and
multiplicative seasonal effects in the time series. They provide robust forecasts that are
not very different from those obtained with more complex, expensive procedures for
seasonal time series (Chatfield and Yar, 1988). The forecast for time period (n+1) is
given by Fn(1):=(Ln + bn) In+1-p and the one-step-ahead forecast error at time n is en=
Dn-Fn-1(1). The forecasts made at time n for h periods ahead is obtained from the
expression: Fn(h):= (Ln+ hbn) In+h-p.
The additive seasonal form of the Holt-Winters method works with the following
recursive updating equations:
( ) 1 1(1 )( )t t t p t tL D I L bα α− − −= − + − + (2.4)
1 1( ) (1 )t t t tb L L bβ β− −= − + − (2.5)
( ) (1 )t t t t pI D L Iγ γ −= − + − (2.6)
and with the h-step-ahead forecast Fn(h):= Ln+ hbn+In+h-p. Notice that the seasonal
component is additive and the trend is linear.
In both schemes the effect of the trend may be high for long-term forecasts.
Hence, we use the recursive equations that allow us to incorporate a damping parameter
for the trend φ, such that β<φ≤1, and we modify the equations (2.2) and (2.5), in a
similar way to the damped-trend exponential smoothing method proposed by Gardner
and McKenzie (1985), which become:
1 1( ) ( )t t t tb L L bβ φ β− −= − + − (2.7)
but setting the forecasts to Fn(h):=(Ln+(∑i=0,..h-1 φ i)bn)In+h-p in the multiplicative version
and Fn(h):=Ln+(∑i=0,..h-1 φ i) bn +In+h-p in the additive one, respectively, as in the
Hyndman et al proposal (2002). Notice that for φ=1 the former methods are obtained.
The cases with φ<1 describe their damped-trend versions.
The above recursive procedures will lead to a broad range of exponential
smoothing models in the Pegels classification (1969). The work of Hyndman et al
(2002) gives the point forecast and state space equations for all these methods. In their
study they propose applying 24 state space models on each time series, with all the
smoothing parameters constrained to lie within [0.1,0.9] to avoid non-invertible models.
In practice neither the initial parameters ζ = (L0, b0, I1-p,..., I0)∈ℜp+2 nor the
smoothing parameters χ = (α, β, γ, φ) are known exactly, but they have to be estimated
Bermúdez, Segura and Vercher 6
from the data. The initial parameters are often calculated roughly with a heuristic
procedure (see, for instance, Makridakis et al, 1998) and then the smoothing parameters
are estimated. In Segura and Vercher (2001) it is shown that smoothing parameter
values (α, β, γ) are very sensitive to the specific formulae used to calculate the initial
values of the local level, trend and seasonal factors at the beginning of the series.
The estimation problem is often solved by minimizing some measure of fit based
on the one-step-ahead errors, {et=Dt-Ft-1(1)}t=1,..,n, over the period of fit. For example,
assuming the statistical model for the disturbances Dt = Ft-1(1)+εt, where {εt} is a
Gaussian white noise process, the maximum likelihood estimator is obtained by
minimizing the one-step-ahead error sum of squares (Ord et al, 1997; Hyndman et al,
2002).
In order to obtain a nonparametric, more robust estimation procedure we decide to
use not only one but three measures of fit as objective functions: the mean absolute
percentage error, MAPE, given by ϕ1, the root of the mean square error, RMSE,
denoted by ϕ2, and the mean absolute deviation, MAD, given by ϕ3. These measures
give relative and absolute information about the data fitting. The use of 3ϕ implies an
underlying linear loss function, quadratic for 2ϕ , while MAPE is a scale-independent
statistic.
The explicit formulas of these fit errors are:
MAPE: 11
0 1
(1)100( , )pm
j ip j ip
i j j ip
D Fn D
ϕ χ ζ + + −
= = +
−= ∑∑
RMSE: ( )2
2 10 1
1( , ) (1)1
pm
j ip j ipi j
D Fn
ϕ χ ζ + + −= =
= −− ∑∑
MAD: 3 10 1
1( , ) (1)pm
j ip j ipi j
D Fn
ϕ χ ζ + + −= =
= −∑∑
where m denotes the number of seasonal cycles and n=m×p.
Moreover, in this paper we do not use any heuristic to evaluate the initial values.
Instead we propose to estimate all the unknowns, the smoothing parameters as well as
the initial parameters, minimizing those objective functions.
A decision support system for automatic forecasting 7
3. An optimization approach for estimation and model selection
In a univariate time series context a very important question is that of model
selection. The first consideration appears with respect to whether the model is linear or
non-seasonal. It is very important to identify these special cases becausethis can avoid
the need to solve some optimization problems and/or reduce their dimension in the case
of p=1. Otherwise, all possibilities may be considered and the model selection is made
by means of fuzzy logic.
We decide to treat the values of ζ as decision variables of the optimization
problem in such a way that a broad range of ES models can be obtained (see Section
3.3). The estimation of smoothing parameters is then determined along with the optimal
starting values for the level, trend and seasonal indices, which implies estimating all the
parameters over the fitting period. One of the interesting features of this scheme is that
we can fit different exponential smoothing methods using a pair of optimization
problems, for the additive and multiplicative versions, at least as regards the feasible
region, because they have respectively the same recursive updating equations and
structural constraints. The set of parameters is now (χ,ζ) and the seasonal indices
constraint depends on the additive or multiplicative form of the seasonal effects. That is,
for the multiplicative form of the seasonal factors, the goal is to determine the vector
(χ,ζ) such that a certain measure of forecast error is minimized as long as the following
constraints are fulfilled:
1 1(1 )( )tt t t
t p
DL L bI
α α − −−
= + − + t=1, ..., n
1 1( ) ( )t t t tb L L bβ φ β− −= − + − t=1, ..., n
(1 )tt t p
t
DI IL
γ γ −= + − t=1, ..., n
11
p
hh
I p−=
=∑
(α, β, γ)∈[0, 1]3
1β φ< ≤
L0, I1-p, ..., I0∈ℜ+, b0∈ℜ
Throughout the paper this feasible region will be denoted by Φm. Analogously, the
feasible region for the additive form, Φa, requires I1-p,..., I0∈ℜ and contains the
Bermúdez, Segura and Vercher 8
corresponding updating equations (2.4) and (2.6) and the constraint 11
0p
hh
I −=
=∑ instead
of (2.1), (2.3) and 11
p
hh
I p−=
=∑ , respectively. Without loss of generality we denote these
feasible sets by Φ.
Assuming that the model has multiplicative seasonal factors (resp. additive) and
that the number of seasons per cycle is p, for each of the three error measures, ϕi, the
algorithm computes the optimal values of (χ,ζ) by solving three non-linear
programming problems which include the following additional constraint: 1),( ≤ζχϕU ,
to avoid the models working worse than in the naïve method.
The function ),( ζχϕU evaluates Theil’s U-statistic (Theil, 1966) for each iterate.
This statistic acts as a relative measure of the effort of determining the parameter values
with respect to the naïve forecast (such as using the previous period’s value as the
forecast for the next period, Fn(1):= Dn).
3.1 Model fitting
Let us describe the performance of the first stage of our procedure for obtaining
forecasts through the generalized Holt-Winters method, which is based on the ideas
previously introduced. Notice that the feasible set Φ may adopt several different forms
depending on the forecaster information about the seasonal effect and the number of
observations per seasonal cycle, although for p=1 there are no differences between the
additive and multiplicative versions of the Holt-Winters method.
Stage I.
Step 0. Let {Dt}t=1,...,n be an observed data series. Let p be the number of seasons per
cycle.
Step 1. Build the (NLP)i problems for i=1, 2, 3, where
(NLP)i min {ϕi(χ, ζ): (χ, ζ)∈Φ, ϕU(χ, ζ)≤1}
Step 2. Solve the (NLP)i problem using a multi-start strategy. Keep the set of local
minima in Ai, for i=1, 2, 3.
Step 3. Evaluate the optimal criterion vector [ϕ1(a), ϕ2(a), ϕ3(a)] for each a∈Ai, i=1, 2,
3 and select as the best solution {ai} the one with the lowest fit error in Ai, i=1, 2, 3.
A decision support system for automatic forecasting 9
At this stage, concerning the model selection criteria, the best fit of every error
function may or may not correspond to the same version of exponential smoothing, that
is to the same ES method. In fact, although we use only one measure of fitting error,
several candidate models associated to local minima of the corresponding non-linear
programming problem may appear.
Solving those non-linear programming problems using a multi-start strategy could
give us a collection of alternative local optimal solutions. We have a lot of information
about every measure of fitting that is summarized in the set A:={x=(χ,ζ)∈Φ: x is a
solution for at least one of the objectives ϕi}. For i=1, 2, 3, we define ϕi+= min{ϕi(x):
x∈A}, i.e. the lowest value obtained for each objective, ϕi-= max{ϕi(x): x∈A}, the
highest one and ϕim, as the median of all the values of the i-th objective in the set A.
3.2 Model building: the fuzzy multi-objective approach
In this section we describe the ideas that are the basis of the second stage of our
proposal. Our goal is to identify the most appropriate ES method from a set of possible
choices, summarized in A. We want to select the ‘best’ method and so we decide to
formulate a fuzzy multi-objective optimization problem that minimizes the three
measurements of fitting within the set Φm (resp. Φa). This problem may be expressed as
follows:
(FMOP) inm~ [ϕi(χ, ζ), i=1, 2, 3]
s. t. (χ, ζ)∈ Φ
ϕU(χ, ζ) ≤ 1
where the structural constraints and the recursive updating equations that define the
feasible region Φ are crisp constraints, while for the objective functions we apply the
fuzzy version of minimize, that is they should be minimized as much as possible under
the given constraints. Therefore, we will find a fuzzy solution which is a model that
satisfies the requirements of some of the ES models that underlie versions of the
generalized HW methods.
Here the fuzziness is assumed in the objective functions, for which we have
aspiration levels together with membership functions, fixed from the information
Bermúdez, Segura and Vercher 10
obtained in the first stage of the procedure for every fitting measure. Then the model
selection problem is defined as
0
~
( ) ( , ). . ( , ) 1, 2,3
( , ) 1( , )
i i
U
FP Finds t i
χ ζϕ χ ζ ϕ
ϕ χ ζχ ζ
< =
≤∈Φ
where ϕi0=ϕi
+ is the fuzzy goal of the i-th objective, for i=1, 2, 3 and ~< is a fuzzy
constraint. Once the membership functions µi(ϕi(x)) have been specified, the algorithm
selects the fuzzy decision defined by the max-min operator, as in Zimmermann’s
approach (1978). It then finds the solution, (χ, ζ, λ)max, with the maximum degree of
global satisfaction in the (FP) problem, and it is obtained by solving the following crisp
optimization problem:
4( ) max. . ( , ) 1, 2,3
( , ) 1( , )
[0,1]
i
U
NLPs t iϕ
λµ χ ζ λϕ χ ζ
χ ζλ
≥ =≤
∈Φ∈
The algorithm makes use of a separate subroutine for building the membership
function of each of the measures of fit. It receives as input all the information obtained
in Stage I about ϕi. This information enables the goal, tolerance and shape of the
membership functions of the objectives, µi(ϕi(x)), to be fully characterized, that is 0
0 0
0
1 ( )( ) ( ( )) ( )
0 ( )i
i i
i i i i i i
i i i
if xx x if x t
if x tϕ
ϕ ϕµ µ ϕ ϕ ϕ ϕ
ϕ ϕ
≤= < ≤ + > +
µi(ϕi(x)) being a continuous and strictly decreasing function at [ϕi0, ϕi
0+ti], where ti≥0
measures the amplitude of the tolerance interval for the fuzzy inequality ϕi(χ,ζ)~< ϕi
0 to
be met.
The tolerances are given by ti:= ϕi--ϕi
+, for i=1, 2, 3 and exponential or linear
membership functions are then obtained from the result of {ϕi+, ϕi
-, ϕim}, according to
the scheme proposed in Carlsson and Korhonen (1986). If ϕim= 2
1 (ϕi++ϕi
-, then the
A decision support system for automatic forecasting 11
linear membership function is used. Otherwise, the value of the parameter bi, which
defines the concave or convex form of the exponential membership function, must be
determined. To do that, we assume that ϕim is the value of the objective function ϕi(x)
associated with a possibility of 0.5, so the following equation may be solved
µi(ϕim)=0.5, where
)exp{1
)(exp1
))((i
i
iii
ii bt
xb
x−−
−
−−
=
− ϕϕ
ϕµ
Let us describe the performance of the second stage of our procedure for finding
optimal forecasts with the generalized Holt-Winters methods.
Stage II.
Step 4. Let A be the set of local minima from Stage I. For i=1,2,3 compute {ϕi+,ϕi
-,ϕim}.
If ϕim= 2
1 (ϕi++ϕi
-), build a linear membership function for i.
Otherwise, compute bi and build an exponential membership function.
Step 5. Solve the (NLP)4 problem and select the solution with the highest λ value.
Step 6. Evaluate Ln, bn and In+h-p for 1≤h for the solution (χ, ζ, λ)max and calculate the
forecasts made at time n for τ periods ahead.
3.2.1 Implementation issues
1. Our decision support system provides a consistent framework for selecting different
exponential smoothing methods, taking into account several measures of fit. Both the
model fitting and the building problem are formulated as a non-linear programming
problem. A GRG-based solution procedure is implemented in the widely accepted
spreadsheet environment, EXCEL, to solve this problem.
Our system has been implemented in the Visual Basic language. It consists of
three different modules: (i) data entry, (ii) active selection of either a multiplicative or
an additive forms of generalized HW and of the number of observations per seasonal
cycle and (iii) storage of the solution with the maximum degree of global satisfaction:
the model parameters, the fitting errors and the τ-step-ahead predictions.
2. Step 2 involves the evaluation of the local minima of several non-linear functions.
For improving the performance of the algorithm the analyst must provide feasible initial
Bermúdez, Segura and Vercher 12
values of (χ,ζ) for the optimization procedure. We then use twelve different initial
solutions, in which the values of the parameters in χ vary in a grid, while the initial
values for ζ are evaluated using the following equations (Makridakis et al, 1998) for the
multiplicative version:
_
10L D= , p
DDb 1
_
2
_
0−
= , 1
1,...,tt p
DI t pD− = =
where i
_D is the arithmetic mean of the data for the i-th cycle. For the additive form of
the updating equations the initial values of the seasonal indices are obtained by means
of
−
+−=− 00 21btLDI tpt , for t=1,…, p suggested by Chatfield and Yar (1988).
3. Concerning the forecasts provided by Step 6, it must be pointed out that additive and
multiplicative versions have been performed separately. Therefore, the information used
in Stage II and the solution of the (NLP)4 problem is associated to a version of ES, of
either additive or multiplicative form.
3.3 Model checking
Notice that the smoothing constants (α, β, γ) are parameters set within the limits
[0,1]. This implementation of generalized HW methods will never produce inferior fits
compared with more simple ES methods since they have the same objective, but more
decision variables, of which some apply to similar variables. The following cases may
then appear if both the smoothing parameters and the initial values ζ have been
estimated by minimizing the one-step prediction error through a joint optimization
scheme:
a) Non-seasonal models with trend
• If the optimal value for the smoothing constant α=0, the value of β is not
relevant, and we then have the linear regression model (for φ=1) or Holt’s
damped method, for φ<1.
• If the smoothing constant α>0, the fixed trend corresponds to β=0, then
bn=φn b0. For β>0 the additive trend models appear.
b) Models with trend and seasons (p>1)
A decision support system for automatic forecasting 13
• If the optimal smoothing constant γ for the seasonal parameters is equal to
zero, the update values of the seasonal indices will not change. Also the fixed
seasonal set of indices is obtained if the optimal α=1. In both cases the
seasonal indices are fixed, that is It-p=Ij-p for j≡t(mod p).
• If we find an optimal solution with γ=0 and the values of the seasonal factors
are It=1 for all t∈{1-p,...,0}, for the multiplicative case or It=0 for the additive
form, then Holt’s linear method is obtained.
• If γ=0 and there is any It0<1 (resp. It0≠0), the data are just seasonally adjusted
and we apply Holt’s deseasonalized method.
• If the data have no trend, then solutions with β=0 and b0=0 are appropriate.
Moreover, if the seasonal set of indices is fixed we obtain the single
exponential smoothing method or its seasonally adjusted version, depending on
the values of the seasonal factors.
It should be pointed out that the fixed seasonal effect, which is obtained for γ=0 or α=1,
implies the same ‘optimal’ pattern of deseasonalization for all the seasonal cycles.
Furthermore, the optimal set of indices is normalized, that is the sum for a cycle should
equal 0 (resp. p) in a model with additive (resp. multiplicative) adjustments of seasonal
effects.
Our main goal is to obtain a robust estimation of the unknown parameters in order
to build good forecasts. Notice that the proposed algorithm uses a multi-objective
formulation that permits the re-optimization of the forecasting procedure. This cross-
over scheme runs a crisp optimization procedure first and later switches to the fuzzy
multi-objective framework, where the parameters of the model are re-optimized. It
proves to be of great use from the point of view of the automation of the process of
obtaining the model that ‘best’ approximates data with respect to all the objectives
simultaneously. In the next section we present some comparative out-of-sample results
obtained from the optimal solution, at either Stage I or Stage II.
Bermúdez, Segura and Vercher 14
4. Comparative results
In order to illustrate the performance of our automatic forecasting procedure, the
method is applied to a pair of classical data sets. Notice that our algorithm may stop at
the end of Stage I selecting the model with the lowest fit error for every measure of
fitting, that is {ai}, or perform the complete scheme and select the model with the
highest degree of satisfaction. For the following examples a comparison between the
partial and complete performance of our procedure is shown.
Example 1. The data correspond to monthly passengers of an international airline
(Brown, 1963), from 1949 to 1959, where p=12. The multiplicative Holt-Winters
method is applied here to a sample of 132 months, the origin being January 1949. The
forecasted values are projected from month 133 to the planning horizon of the
subsequent 12 months.
Table 1 shows the best solutions obtained in Steps 3 and 5 for every (NLP)i
problem associated with the multiplicative form of the updating equations, i=1,…,4.
The subindex of the solution informs us about the fitting error which has been
optimized, that is MAPE, RMSE and MAD, and NLP for the compromise solution,
respectively. Notice that all the solutions have fixed seasonal factors (α=1 or γ=0), but
that there are two types of solutions. In fact, the underlying ES method in SMAPE and
SRMSE solutions is Holt’s deseasonalized method with a fixed set of seasonal indices
provided by the optimization scheme, and a damped-trend. The other solutions do not
have a damped trend and there are slight differences between them.
Table 2 contains the criterion vector of the fitting errors for every solution and
their U-statistic value. Notice that the SNLP solution is clearly a compromise solution
with respect to the fitting errors.
We compute forecasts up to 12 steps-ahead and evaluate the one-step-ahead
forecast error for all the solutions. For every forecast horizon we then determine the
post-sample accuracy by averaging the absolute percentage errors. Table 3 shows the
forecasting accuracy for the predictions for several horizons. The SNLP solution nearly
coincides with SMAD and their forecasts and post-sample forecasting errors are very
similar.
A decision support system for automatic forecasting 15
For this example, both the fitting errors and the accuracy of the predictions for
every solution are quite similar, with only a slight improvement in the post-sample
accuracy achieved for the proposal SNLP. Let us see how the algorithm performs for the
champagne example.
Example 2. Consider the time series corresponding to the sales of champagne in France
during the period 1962-1969. The monthly data, in thousands of bottles, which appear in
the AUTOCAST package (Gardner, 1986) are slightly different from those used in
Wheelwright and Makridakis (1973). We shall use the data for the demand found in
AUTOCAST, in order to compare our results with those published in Chatfield and Yar
(1988). The algorithm therefore works with the data obtained up to month 76, as they
did. The forecasted values are projected from month 76 to cover the demand
corresponding to the planning horizon of the subsequent 12 months.
Tables 4 and 5 show the best models for the champagne data (Step 3 of Stage I)
and their optimal criterion vector respectively, and also for the proposal SNLP (Step 5 of
Stage II). All the solutions have been evaluated using the multiplicative version of the
HW recursive equations. Notice that SMAPE, SRMSE and SNLP optimal solutions work with
seasonal indices obtained from the optimal fixed set (γ=0). For SMAD the seasonal
factors are not fixed and the underlying ES method has no trend (β=φ=0). In fact, the
best solutions for the non-linear optimization problems usually provide different
versions of exponential smoothing methods, which give different forecasts.
Table 6 contains the post-sample accuracy for the above solutions and for the
solution associated to the predictions in the paper written by Charfield and Yar (1991).
The compromise solution is only better for the predictions of the last six months, where
it also improves the results obtained by Charfield and Yar (1991).
Concerning the good performance of the MAD criterion in the model fit, our
results coincide with those obtained by Gardner (1999), which compare the damped-
trend exponential smoothing method with a rule-based forecasting (RBF) approach over
a set of 126 time series.
Bermúdez, Segura and Vercher 16
5. Application to the 111 series of the M-competition
To further investigate the predictive behavior of our proposal and its accuracy
with respect to other forecasting approaches, we apply it to a well-known set of data
banks with which to validate the out-of-sample predictions. We made comparisons for
the collection of 111 series from the M-competition (Makridakis et at, 1982), which
contains 51 monthly, 9 quarterly and 51 annual time series. Just as is specified in the M-
competition, we compute forecasts up to 18, 8 or 6 steps ahead as well as the MAPE
error for each forecast horizon.
First we investigate the robustness properties of our forecasting procedure with
respect to the initial values, i.e. the initial solutions of the nonlinear problems used as
starting values in our algorithm. In this context, we say that a forecasting method is
robust if it is not sensitive to the formulae used to set up those initial solutions; this is
not the case for most of the forecasting approaches based on the Holt-Winters model
(see, for instance, Chatfield and Yar, 1988; Segura and Vercher, 2001).
Table 7 compares the post-sample accuracy of the forecasts provided by our
method by using four different initial solutions of the nonlinear problems (those
suggested by Makridakis et al, 1998; Larrañeta et al, 1988; Granger and Newbold, 1986
and Winters, 1960) for the multiplicative version of the recursive equations. Table 8
shows a similar comparison but for the additive version, and the four initial solutions
used here are: that proposed by Chatfield and Yar (1988) and those in Larrañeta et al,
(1988), Granger and Newbold (1986) and Winters (1960). Tables 7 and 8 show no
significant differences in average predictive accuracy given the initial solution if our
optimization procedure is used.
In order to further study other influences on the initial solution, Table 9 shows a
small description of the average MAPE from the out-of-sample predictions for each one
of the 111 series and for each one of the four criteria to specify the initial values, if
multiplicative seasonality is assumed. It contains the median and the percentiles 80, 90
and 95, in order to compare their worst predictive behavior, not only their average
behavior. Once more, there are no differences in practice between the four criteria. A
similar result is obtained for the additive seasonal form of the HW methods.
The use of the methodology proposed in this paper allows us to obtain very good
fits to the sample data in most of the series, as is shown in Table 10. There the average
A decision support system for automatic forecasting 17
MAPE of the within-sample fit errors of the 111 series is described, assuming both
additive and multiplicative seasonality, for all series and for each seasonal subset of the
series. The formulae used here, and in the rest of the paper, to set the initial points of the
optimization algorithm are those in Makridakis et al (1998) for the multiplicative case
and in Chatfield and Yar (1988) for the additive one. There are no differences, on
average, in the fit of a multiplicative seasonal component or an additive component, but
there are for the subset of quarterly series for which multiplicative seasonality seems to
work better. In the case of monthly data the difference is insignificant and for non-
seasonal data the two approaches give, of course, the same results.
The predictive accuracy of our results is outlined in Figure 1, where the average
post-sample MAPE across different forecast horizons is plotted for each subset of
series: non-seasonal, quarterly and monthly. The average MAPE for the one-step-ahead
forecast and their means through the first four steps is registered in Table 11 for each
subset of series, showing a very good short-term predictive behavior. Table 11 shows
too the long-term predictive behavior, giving the average MAPE of steps one to eight
and one to eighteen for the non-seasonal series, one to eight for the quarterly series and
one to twelve and one to eighteen for the monthly series.
Notice that our scheme may perform the additive or multiplicative version of the
Holt-Winters method, obtaining very similar accuracy predictions. However, on average
the multiplicative version seems to work a little better than the additive one and it is the
first choice in an automatic implementation of our method.
Figure 2 compares the post-sample accuracy of our method with some of the best
performing methods in the M-competition, including some of the more time consuming
ones and the one recently proposed by Hyndman et al. (2002), by plotting the average
MAPE at different forecast horizons for each method.
Table 12 shows the values plotted in Figure 1 and their average over different
forecasting horizons. As shown there, our method, both in its additive and multiplicative
versions, performs better than the others for shorter forecast horizons, while for longer
forecast horizons only the Parzen method (see Makridakis et al, 1998) works better than
ours.
It should be pointed out that we have not done any preprocessing of the data. We
have not even treated the series which is assumed to be non-seasonal (p=1) in a different
Bermúdez, Segura and Vercher 18
manner. The algorithm is always applied to the data in the same way for each time
series.
Conclusions The exponential smoothing methods are much used forecasting techniques for
time series in stock management. This requires the user to specify initial values for the
level, trend and seasonal components as well as three smoothing constants, and we
decided to treat these quantities as decision variables and use mathematical
programming to find optimal values. This enables us to achieve a considerable
reduction in all the measures of forecast error in the cases studied. Besides, using fuzzy
techniques to solve the multi-objective non-linear programming problem that includes
the three measures of error enables a flexible optimal solution to be generated for either
the multiplicative or the additive Holt-Winters forecasting procedure.
The method used to specify initial values for the level, trend and seasonal
components has been very influential in previous works on the Holt-Winters forecasting
procedure. This is not the case in our proposal, because we treat these quantities as
decision variables. However, we still need to specify an initial vector of solutions from
which to start the optimization algorithm. Our empirical study, based on the 111 series
from the M-competition, shows that the method used to designate the initial vector has
very little effect on the goodness of the predictions obtained.
The multiplicative version of our method works on average better than the
additive one; it should be the first option in an automatic implementation of our
procedure. However, the differences in prediction accuracy between the two versions
are very small, so we still expect good results with the additive version if the
multiplicative one is inappropriate. Note that, if a data series contains some values equal
to zero, the multiplicative Holt-Winters method could not be used.
The use of optimization tools in estimation analysis associated with the
generalized Holt-Winters methods with damped trend is very fruitful. Although the
incorporation of the initial values as decision variables complicates the related non-
linear problems, the optimization tools facilitate the solving of those problems.
The decision to use a fuzzy methodology is due to the imprecise knowledge of the
goals and the need for managing several sources of fitting errors in order to find robust
A decision support system for automatic forecasting 19
solutions. Here the fuzzy system is modeled by using only the information that Stage I
of the procedure can provide. There are many effective methods available for
substituting multi-objective non-linear programs with fuzzy goals by crisp optimization
problems. In order to find the compromise solution we have selected the max-min
operator, because it gives the solution with the maximum degree of global satisfaction
and favours the complete automation of the forecasting procedure.
References
Bermúdez, J.D., Segura, J.V. and Vercher, E., 2004. Improving demand forecasting
accuracy using non-linear programming software, Journal of the Operational
Research Society (to appear).
Box, G. E. P., Jenkins, G. M. and Reinsel, G. C., 1994. Time Series Analysis,
Forecasting and Control, 3rd ed. Prentice Hall, Englewood Cliffs, NJ.
Brown, R.G., 1963. Smoothing forecasting and prediction of discrete time series.
Prentice Hall, Englewood Clifs, NJ.
Carlsson, Ch. and Korhonen, P., 1986. A parametric approach to fuzzy linear
programming. Fuzzy Sets and Systems 20, 17-30.
Chanas, S., 1989, Fuzzy programming in multiobjective linear programming: a
parametric approach. Fuzzy Sets and Systems 29, 303-313.
Chatfield, C., Koehler, A.B., Ord, J.K. and Snyder, R.D., 2001. A new look at models
for exponential smoothing. The Statistician 50, 147-159.
Chatfield, C. and Yar, M., 1988. Holt-Winters forecasting: some practical issues. The
Statistician 37, 129-140.
Chatfield, C. and Yar, M., 1991. Prediction intervals for multiplicative Holt-Winters.
International Journal of Forecasting 7, 31-37.
Delgado, M., Verdegay, J.L. and Vila, M.A., 1990. A possibilistic approach for
multiobjective programming problems. Efficiency of solutions. In: R. Slowinski and
J. Teghem (Eds.), Stochastic versus Fuzzy approaches to multiobjective matemathical
programming under uncertainty, Kluwer, Dordrecht, 229-248.
Bermúdez, Segura and Vercher 20
Gardner, Jr, E.S., 1985. Exponential smoothing: the state of the art. Journal of
Forecasting 4, 1-28.
Gardner, Jr, E.S., 1986. AUTOCAST User’s Manual. Core Analytic Inc., Bridgewater,
New Jersey.
Gardner, Jr, E.S., 1999. Note: Rule-Based Forecasting vs. Damped-Trend Exponential
Smoothing. Management Science 45, 1169-1176.
Gardner, Jr, E.S. and Mckenzie, M., 1985. Forecasting trends in time series.
Management Science 31, 1237-1246.
Granger, C.W.J. and Newbold, P., 1986. Forecasting economic time series. 2nd edition.
Academic Press, New York.
Hyndman, R.J., Koehler, A.B., Snyder, R.D. and Grose, S., 2002. A state space
framework automatic forecasting using exponential smoothing. International Journal
of Forecasting 18, 439-454.
Jiménez, M., Arenas, M., Bilbao, A. and Rodríguez Uría, M.V., 2004. Solving fuzzy
goal programming problems, Fuzzy Economic Review IX(1) 19-33.
Koehler, A.B., Snyder, R.D. and Ord, J.K., 2001. Forecasting models and prediction
intervals for the multiplicative Holt-Winters method. International Journal of
Forecasting 17, 269-286.
Lai. Y.J. and Hwang, Ch.L., 1996. Fuzzy Multiple Objective Decision Making:
Methods and Applications, Springer, Berlin.
Larrañeta, J.C. et al., 1988. Métodos modernos de gestión de la producción. Alianza
Universidad Textos, Madrid.
Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R.,
Newton, J., Parzen, E. and Winkler, R., 1982. The accuracy of extrapolation (time
series) methods: Results of a Forecasting Competition. Journal of Forecasting 1, 111-
153.
Makridakis, S. and Hibon, M., 2000. The M3-Competition: results, conclusions and
implications. International Journal of Forecasting 16, 451-476.
A decision support system for automatic forecasting 21
Makridakis, S., Wheelwright, S.C. and Hyndman, R.J., 1998. Forecasting. Methods and
Applications, 3rd edition. Wiley, New York.
Ord, J. K. (ed.), 2001. Commentaries on the M3-Competition. International Journal of
Forecasting 17, 537-584.
Ord, J. K., Koehler, A.B. and Snyder, R.D., 1997. Estimation and Prediction for a Class
of Dynamic Nonlinear Statistical Models. Journal of the American Statistical
Association 92, 1621-1629.
Pegels, C.C., 1969. Exponential forecasting: some new variations. Management Science
12, 311-315.
Ramik, J., 2000. Fuzzy goals and fuzzy alternatives in goal programming problem.
Fuzzy Sets and Systems 111, 81-86.
Sakawa, M., 1998. Fuzzy nonlinear programming with single and multiple objective
functions, In R. Slowinski (Ed.), Fuzzy Sets in Decision Analysis, Operations
Research and Statistics, Kluwer, Boston.
Segura, J.V. and Vercher, E., 2001. A spreadsheet modeling approach to the Holt-
Winters optimal forecasting. European Journal of Operational Research 131, 147-
160.
Theil, H., 1966. Applied Economic Forecasting. North-Holland, Amsterdam.
Wheelwright, S.C. and Makridakis, S., 1973. An examination of the use of adaptative
filtering in forecasting. Operational Research Quarterly 24, 55-64.
Winters, P.R., 1960. Forecasting sales by exponentially weighted moving averages.
Management Science 6, 324-342.
Zimmermann, H.J., 1978. Fuzzy programming and linear programming with several
objective functions. Fuzzy Sets and Systems 1, 45-55.
Bermúdez, Segura and Vercher 22
Table 1: Best solutions (χ, ζ) for the airline data
Best solution α β γ φ L0 b0 I-11 I-10 I-9 I-8 I-7 I-6 I-5 I-4 I-3 I-2 I-1 I0 SMAPE 0,782 0,245 0 0,570 120,294 1,208 0,922 0,882 1,006 0,972 0,982 1,101 1,215 1,216 1,060 0,929 0,808 0,908 SRMSE 0,646 0,395 0 0,395 103,076 21,765 0,907 0,865 0,998 0,962 0,977 1,119 1,251 1,250 1,060 0,922 0,800 0,889 SMAD 1 0,024 0,864 1 121,272 0,945 0,916 0,869 0,989 0,958 0,966 1,118 1,242 1,237 1,062 0,928 0,810 0,904 SNLP 1 0,024 0,864 1 121,375 0,945 0,915 0,870 0,994 0,961 0,972 1,116 1,242 1,238 1,060 0,925 0,807 0,901
A decision support system for automatic forecasting 23
Table 2: Fitting errors for the airline data
MAPE RMSE MAD U-StatisticSMAPE 2,69 9,45 6,60 0,36SRMSE 2,88 8,61 6,81 0,36SMAD 2,69 8,85 6,38 0,37SNLP 2,71 8,69 6,43 0,36
Bermúdez, Segura and Vercher 24
Table 3: Post-sample accuracy for the airline data
MAPE Average of forecasting horizon1-4 1-6 1-8 1-12
SMAPE 3,81 5,01 6,60 6,87SRMSE 3,83 4,16 4,82 5,29SMAD 3,51 3,70 4,08 3,26SNLP 3,62 3,60 3,91 3,12
A decision support system for automatic forecasting 25
Table 4: The best solutions for the champagne data
Best solution α β γ φ L0 b0 I-11 I-10 I-9 I-8 I-7 I-6 I-5 I-4 I-3 I-2 I-1 I0 SMAPE 0,040 0,835 0 0,835 4,057 -0,236 0,746 0,725 0,808 0,818 0,921 0,926 0,749 0,338 0,929 1,186 1,735 2,119 SRMSE 0,014 0,417 0 0,977 3,107 0,022 0,747 0,696 0,788 0,804 0,924 0,897 0,723 0,376 0,913 1,197 1,769 2,167 SMAD 0,188 0 0,001 0 3,109 0,032 0,745 0,720 0,818 0,811 0,922 0,913 0,749 0,349 0,899 1,151 1,771 2,153 SNLP 0,015 0,442 0 0,966 3,088 0,023 0,728 0,711 0,821 0,817 0,922 0,921 0,756 0,347 0,932 1,186 1,725 2,136
Bermúdez, Segura and Vercher 26
Table 5. Fitting errors of the champagne data
MAPE RMSE MAD U-StatisticSMAPE 7,31 0,56 0,31 0,24SRMSE 8,99 0,52 0,35 0,24SMAD 8,32 0,55 0,34 0,25SNLP 8,24 0,54 0,33 0,24
A decision support system for automatic forecasting 27
Table 6: Post-sample accuracy for the champagne data
MAPE Average of forecasting horizon1-4 1-6 1-8 1-12
SMAPE 23,43 19,83 19,77 17,21SRMSE 32,39 22,61 18,04 14,76SMAD 20,08 18,34 17,88 15,42SNLP 28,34 19,39 16,53 13,60Chatfield&Yar_91 17,11 15,73 17,08 17,25
Bermúdez, Segura and Vercher 28
Table 7: Average MAPE for all 111 series across different forecasting horizons, using multiplicative
seasonality: A comparison of four initial solutions.
Forecasting horizons Average of forecasting horizons
Criteria 1 2 3 4 5 6 8 12 15 18 1 - 4 1 - 6 1 - 8 1 - 12 1 - 15 1 - 18
M. et al 9,0 9,1 11,0 11,8 13,9 15,2 16,1 14,2 26,1 28,2 10,2 11,7 13,1 13,4 14,8 16,6 L. et al 9,0 9,3 11,0 11,9 13,9 15,1 16,3 14,2 26,4 28,2 10,3 11,7 13,1 13,4 14,9 16,7 G. & N. 8,9 9,0 11,0 12,2 14,7 15,9 17,2 13,8 26,3 28,7 10,3 11,9 13,5 13,7 15,2 16,9
W 9,1 9,1 11,1 11,9 14,0 15,3 16,3 14,4 26,3 28,3 10,3 11,7 13,2 13,5 14,9 16,7
A decision support system for automatic forecasting 29
Table 8: Average MAPE for all 111 series across different forecasting horizons, using additive seasonality: A
comparison of four initial solutions.
Forecasting horizons Average of forecasting horizons
Criteria 1 2 3 4 5 6 8 12 15 18 1 - 4 1 - 6 1 - 8 1 - 12 1 - 15 1 - 18
C. & Y. 9,8 9,3 11,3 12,1 14,7 15,8 16,6 14,5 25,6 28,5 10,6 12,2 13,7 13,9 15,2 16,9
L. et al 9,7 9,1 11,3 12,1 14,8 16,1 16,6 14,4 26,2 29,2 10,5 12,2 13,7 13,8 15,3 17,0
G. & N. 9,1 9,2 11,5 12,5 15,0 16,5 17,0 15,0 27,3 30,0 10,6 12,3 13,8 14,1 15,7 17,5
W 9,8 9,2 11,4 12,1 14,6 16,0 16,7 14,7 26,5 29,5 10,6 12,2 13,8 13,9 15,4 17,1
Bermúdez, Segura and Vercher 30
Table 9: Some sample percentiles of the average MAPE for each one of the 111 series, using
multiplicative seasonality only: A comparison of four criteria to calculate initial values.
percentiles
Criteria 50 80 90 95
M. et al 10,3 21,5 32,9 49,0
L. et al 10,5 20,9 32,8 47,9
G. & N. 9,3 21,0 30,7 53,1
W 10,3 21,5 32,9 49,4
A decision support system for automatic forecasting 31
Table 10: Average MAPE fit for all 111 series, and for each seasonal subset of series.
Seasonality All series Non-seasonal Quarterly Monthly
Additive 7,40 8,88 3,88 6,55
Multiplicative 7,38 8,88 3,53 6,57
Bermúdez, Segura and Vercher 32
Figure 1: Average MAPE across different forecast horizons for each subset of series.
A decision support system for automatic forecasting 33
Table 11: Average MAPE across some forecast horizons, using all 111 series and for each subset of
series.
Non-seasonal Quarterly Monthly
forecast horizons 1 1 - 4 1 - 8 1 - 18 1 1 - 4 1 - 8 1 1 - 4 1 - 12 1 - 18
Additive seasonality 10,4 12,0 16,0 22,5 4,6 10,2 19,6 9,8 9,3 11,7 14,2
Multiplicative seasonality 10,4 12,0 16,0 22,5 5,0 8,7 14,8 8,4 8,7 11,2 13,8
Bermúdez, Segura and Vercher 34
Figure 2: Average MAPE across different forecast horizons for all 111 series: A comparison of our method (both in
its additive and multiplicative version) with some of the best methods in the M competition.
A decision support system for automatic forecasting 35
Table 12: Average MAPE across different forecast horizons for all 111 series, comparing our method
(including both its additive and multiplicative version) with some of the best methods in the M competition. Forecasting horizons Average of forecasting horizons
Methods 1 2 3 4 5 6 8 12 15 18 1-4 1-6 1-8 1-12 1-15 1-18
Deseasonalised SES 7,8 10,8 13,1 14,5 15,7 17,2 16,5 13,6 29,3 30,1 11,6 13,2 14,1 14,0 15,3 16,8
Box-Jenkins 10,3 10,7 11,4 14,5 16,4 17,1 18,9 16,4 26,2 34,2 11,7 13,4 14,8 15,1 16,3 18,0
Parzen 10,6 10,7 10,7 13,5 14,3 14,7 16,0 13,7 22,5 26,5 11,4 12,4 13,3 13,4 14,3 15,4
Hyndman et al. 8,7 9,2 11,9 13,3 16,0 16,9 19,2 15,2 28,0 31,0 10,8 12,7 14,3 14,5 15,7 17,3
Our additive method 9,8 9,3 11,3 12,1 14,7 15,8 16,6 14,5 25,6 28,5 10,6 12,2 13,7 13,9 15,2 16,9
Our multiplicative method 9,0 9,1 11,0 11,8 13,9 15,2 16,1 14,2 26,1 28,2 10,2 11,7 13,1 13,4 14,8 16,6