centro de investigaci³n operativa a decision support system

Centro de Investigación Operativa

I-2005-05

A decision support system methodology for automatic forecasting of time series

J. D. Bermúdez, J.V. Segura and

E. Vercher

March 2005

ISSN 1576-7264 Depósito legal A-646-2000

Centro de Investigación Operativa Universidad Miguel Hernández de Elche Avda. de la Universidad s/n 03202 Elche (Alicante) [email protected]

A decision support system methodology for automatic forecasting of time

series1

J.D. Bermúdeza, J.V. Segurab and E. Verchera,2

a Dpto. Estadística e Investigación Operativa,Universitat de València, C/ Dr. Moliner 50, 46100-

Burjassot, Valencia, Spain b Dpto. Estadística y Matemática Aplicada,Universidad Miguel Hernández de Elche, Avd. del Ferrocarril

s/n, 03202-Elche, Alicante, Spain

Abstract

Exponential procedures are widely used as forecasting techniques for

inventory control and business planning. This paper presents a

number of modifications to the generalized exponential smoothing

(Holt-Winters) approach to forecasting univariate time series and

proposes the result as a tool for decision support systems. The starting

values of level, trend and seasonal factors become additional

parameters in the optimization procedure, in order to make the

forecasts less sensitive to initial values and to extend the applicability

of exponential smoothing to short series of data. This procedure may

provide forecasts from different versions of exponential smoothing by

fitting the updated formulas of Holt-Winters and selects the best

method using a fuzzy multicriteria approach. In this paper we propose

a methodology that unifies the phases of estimation and model

selection into just one optimization framework which permits the

identification of robust solutions. It is compared to other forecasting

methods on the 111 series from the M–competition.

Keywords: Forecasting, Exponential smoothing, Holt-Winters method,

Non-linear programming, Fuzzy multi-objective programming

1 This work is partially supported by the Conselleria d’Educació, Ciència i Esport de la Generalitat Valenciana, grant

no.GV04B-090. 2 Corresponding author. Tel. +34-96-354-3793; fax: +34-96-354-4735. E-mail address: [email protected]

Bermúdez, Segura and Vercher 2

1. Introduction

Accurate forecasting is an essential tool for many management decisions, for both

strategic and tactical business planning. Advances in data analysis and software

capabilities have the potential to offer effective forecasting to anticipate future demands,

schedule productions and reduce inventories.

Traditional forecasting techniques rely on proper specifications of systems that we

assume to be understood. A statistical time series forecasting method is often applied to

a finite set of observations under certain assumptions regarding the data generating

process, which are that the variables change with time and their future values are related

in some way to their past values. Many statistical methods have been proposed to

forecast time series and it seems necessary for the forecaster to be familiar with a range

of possible models. An important forecasting tool is the ARIMA class of models. In

fitting an ARIMA model to time series data a Box-Jenkins approach (Box et al, 1994)

framework is used, which requires the forecaster to make judgements and have

experience and expertise.

Exponential smoothing methods are a class of methods that produce forecasts

with simple formulae, taking into account trend and seasonal effects of the data. These

procedures are widely used as forecasting techniques in inventory management and

sales forecasting, where a very large number of time series with similar properties may

arise. They are especially good for short-term forecasting. Recent papers have

stimulated renewed interest in the technique, putting exponential smoothing procedures

on a sound theoretical ground by identifying and examining the underlying statistical

models (Ord et al, 1997; Koehler et al, 2001). Moreover, the results of M-competitions

(Makridakis et al, 1998; Ord, 2001) confirmed that exponential smoothing methods give

reliable post-sample forecasts and indicate that in order to obtain more accurate ones it

would be worthwhile developing procedures that identify the most appropriate method

from a set of possible choices.

This paper deals with a decision support system methodology which searches for

model specifications that can replicate patterns of observed series, using a number of

modifications to the generalized Holt-Winters method. It is based on two issues:

considering the initial values of level, trend and seasonal factors as decision variables of

an optimization problem, and the perception that a model that minimizes three different

A decision support system for automatic forecasting 3

measures of fit will produce good forecasting results. Concerning the first issue, and

working with the mean squared error as a measure of fit to estimate the vector of

parameters, our approach shows a good performance with time series with a limited

amount of data (Bermúdez et al, 2004).

In this paper we propose an alternative model selection strategy based on a full

optimization framework, which permits both the specification of a broad class of

candidate methods and the selection of a method from within that class which is the

‘best’ in some sense. The procedure, using the update formulae of the generalized Holt-

Winters (HW) methods on a given time series, first provides the smoothing and initial

parameters which minimize one of the three measures of fit. At a second stage the

procedure uses a multi-objective formulation which jointly minimizes the error

measurements of fit while keeping the updating equations of the generalized HW

method. The selection of the model which is used for forecasting is determined by a

fuzzy framework.

Multiple objective decision problems are a well-known area in decision making

theory. There are several strategies for characterizing the efficient set of a multi-

objective problem in terms of optimal solutions of appropriate scalar optimization

problems. Within the context of fuzzy logic there are numerous books and papers

devoted to this matter (see, for instance, Zimmerman, 1978; Chanas, 1989; Delgado et

al, 1990; Lai and Hwang, 1996). To deal with imprecision, goal programming has also

provided some useful methodology which takes into account the decision-maker’s

preferences (see, for instance, Ramik (2000), Jimenez et al (2004)).

Our proposal for solving the non-linear multi-objective problem incorporates all

the information previously obtained at the first stage of the procedure, and selects the

fuzzy decision as in Zimmermann’s approach (Zimmermann, 1978). Assuming that both

fuzzy goals and specified membership functions are known for each of the objective

functions, the problem matches a fuzzy version of the multi-objective programming

problem under the given crisp constraints (Sakawa, 1998).

The outline of the paper is as follows. The following section describes a general

approach to the point forecast equations of some exponential smoothing methods from

the generalized HW updating formulas. We discuss estimation and model selection

criteria in Section 3 by using crisp and fuzzy mathematical programming techniques. In

Section 4 some numerical examples illustrate the performance of our proposal. Section


5 presents the empirical results of applying our approach to a set of 111 series used in

the M-competition along with a comparison with existing methods in previous studies.

2. The Holt-Winters forecasting procedures

The Holt-Winters method was originally developed by Winters (1960) and it

involves estimating three smoothing parameters associated with level, trend and

seasonal factors. The seasonal variation can be of either an additive or multiplicative

form. It is known that the additive seasonal form is optimal for a seasonal ARIMA

model. Recently, a class of dynamic non-linear state space models has led to several

models for which the optimal set of updating equations is close to the multiplicative

form of Holt-Winters (Ord et al, 1997), and hence this method may enjoy the

advantages that forecasting procedures based on a proper statistical model have.

Moreover, this class of models allows the forecast error variance to depend on the trend

and/or the seasonality (Chatfield et al (2001) and Koehler et al (2001)).

Let us introduce some notation and expressions. An observed time series is

denoted by {Dt}t=1,…,n. The forecast of Dn+h made at time n for the h-th period into the

future is denoted by Fn(h), h≥1. In the multiplicative form of seasonal effects, the

equations that enable the level, Lt, trend, bt, and seasonal indices, It, to be updated based

on a new observation Dt, are the following:

1 1(1 )( )tt t t

t p

DL L bI

α α − −−

= + − + (2.1)

1 1( ) (1 )t t t tb L L bβ β− −= − + − (2.2)

(1 )tt t p

t

DI IL

γ γ −= + − (2.3)

where α, β, and γ are the smoothing parameters associated with level, trend and

seasonal effects respectively (with values between 0 and 1), and with p being the

number of observations per seasonal cycle. Winters (1960) recommends that the

seasonal factors be normalized at the beginning of the series in such a way that, in the

additive case, the seasonal indices must add up to zero and, in the multiplicative one, the

seasonal indices are constrained so that they average to unity.


The (2.1)-(2.3) formulas are appropriate when there is a linear trend and

multiplicative seasonal effects in the time series. They provide robust forecasts that are

not very different from those obtained with more complex, expensive procedures for

seasonal time series (Chatfield and Yar, 1988). The forecast for time period (n+1) is

given by Fn(1):=(Ln + bn) In+1-p and the one-step-ahead forecast error at time n is en=

Dn-Fn-1(1). The forecasts made at time n for h periods ahead is obtained from the

expression: Fn(h):= (Ln+ hbn) In+h-p.

The additive seasonal form of the Holt-Winters method works with the following

recursive updating equations:

( ) 1 1(1 )( )t t t p t tL D I L bα α− − −= − + − + (2.4)

1 1( ) (1 )t t t tb L L bβ β− −= − + − (2.5)

( ) (1 )t t t t pI D L Iγ γ −= − + − (2.6)

and with the h-step-ahead forecast Fn(h):= Ln+ hbn+In+h-p. Notice that the seasonal

component is additive and the trend is linear.

In both schemes the effect of the trend may be high for long-term forecasts.

Hence, we use the recursive equations that allow us to incorporate a damping parameter

for the trend φ, such that β<φ≤1, and we modify the equations (2.2) and (2.5), in a

similar way to the damped-trend exponential smoothing method proposed by Gardner

and McKenzie (1985), which become:

1 1( ) ( )t t t tb L L bβ φ β− −= − + − (2.7)

but setting the forecasts to Fn(h):=(Ln+(∑i=0,..h-1 φ i)bn)In+h-p in the multiplicative version

and Fn(h):=Ln+(∑i=0,..h-1 φ i) bn +In+h-p in the additive one, respectively, as in the

Hyndman et al proposal (2002). Notice that for φ=1 the former methods are obtained.

The cases with φ<1 describe their damped-trend versions.

The above recursive procedures will lead to a broad range of exponential

smoothing models in the Pegels classification (1969). The work of Hyndman et al

(2002) gives the point forecast and state space equations for all these methods. In their

study they propose applying 24 state space models on each time series, with all the

smoothing parameters constrained to lie within [0.1,0.9] to avoid non-invertible models.

In practice neither the initial parameters ζ = (L0, b0, I1-p,..., I0)∈ℜp+2 nor the

smoothing parameters χ = (α, β, γ, φ) are known exactly, but they have to be estimated


from the data. The initial parameters are often calculated roughly with a heuristic

procedure (see, for instance, Makridakis et al, 1998) and then the smoothing parameters

are estimated. In Segura and Vercher (2001) it is shown that smoothing parameter

values (α, β, γ) are very sensitive to the specific formulae used to calculate the initial

values of the local level, trend and seasonal factors at the beginning of the series.

The estimation problem is often solved by minimizing some measure of fit based

on the one-step-ahead errors, {et=Dt-Ft-1(1)}t=1,..,n, over the period of fit. For example,

assuming the statistical model for the disturbances Dt = Ft-1(1)+εt, where {εt} is a

Gaussian white noise process, the maximum likelihood estimator is obtained by

minimizing the one-step-ahead error sum of squares (Ord et al, 1997; Hyndman et al,

2002).

In order to obtain a nonparametric, more robust estimation procedure we decide to

use not only one but three measures of fit as objective functions: the mean absolute

percentage error, MAPE, given by ϕ1, the root of the mean square error, RMSE,

denoted by ϕ2, and the mean absolute deviation, MAD, given by ϕ3. These measures

give relative and absolute information about the data fitting. The use of 3ϕ implies an

underlying linear loss function, quadratic for 2ϕ , while MAPE is a scale-independent

statistic.

The explicit formulas of these fit errors are:

MAPE: 11

0 1

(1)100( , )pm

j ip j ip

i j j ip

D Fn D

ϕ χ ζ + + −

= = +

−= ∑∑

RMSE: ( )2

2 10 1

1( , ) (1)1

pm

j ip j ipi j

D Fn

ϕ χ ζ + + −= =

= −− ∑∑

MAD: 3 10 1

1( , ) (1)pm

j ip j ipi j

D Fn

ϕ χ ζ + + −= =

= −∑∑

where m denotes the number of seasonal cycles and n=m×p.

Moreover, in this paper we do not use any heuristic to evaluate the initial values.

Instead we propose to estimate all the unknowns, the smoothing parameters as well as

the initial parameters, minimizing those objective functions.


3. An optimization approach for estimation and model selection

In a univariate time series context a very important question is that of model

selection. The first consideration appears with respect to whether the model is linear or

non-seasonal. It is very important to identify these special cases becausethis can avoid

the need to solve some optimization problems and/or reduce their dimension in the case

of p=1. Otherwise, all possibilities may be considered and the model selection is made

by means of fuzzy logic.

We decide to treat the values of ζ as decision variables of the optimization

problem in such a way that a broad range of ES models can be obtained (see Section

3.3). The estimation of smoothing parameters is then determined along with the optimal

starting values for the level, trend and seasonal indices, which implies estimating all the

parameters over the fitting period. One of the interesting features of this scheme is that

we can fit different exponential smoothing methods using a pair of optimization

problems, for the additive and multiplicative versions, at least as regards the feasible

region, because they have respectively the same recursive updating equations and

structural constraints. The set of parameters is now (χ,ζ) and the seasonal indices

constraint depends on the additive or multiplicative form of the seasonal effects. That is,

for the multiplicative form of the seasonal factors, the goal is to determine the vector

(χ,ζ) such that a certain measure of forecast error is minimized as long as the following

constraints are fulfilled:

1 1(1 )( )tt t t

t p

DL L bI

α α − −−

= + − + t=1, ..., n

1 1( ) ( )t t t tb L L bβ φ β− −= − + − t=1, ..., n

(1 )tt t p

t

DI IL

γ γ −= + − t=1, ..., n

11

p

hh

I p−=

=∑

(α, β, γ)∈[0, 1]3

1β φ< ≤

L0, I1-p, ..., I0∈ℜ+, b0∈ℜ

Throughout the paper this feasible region will be denoted by Φm. Analogously, the

feasible region for the additive form, Φa, requires I1-p,..., I0∈ℜ and contains the


corresponding updating equations (2.4) and (2.6) and the constraint 11

0p

hh

I −=

=∑ instead

of (2.1), (2.3) and 11

p

hh

I p−=

=∑ , respectively. Without loss of generality we denote these

feasible sets by Φ.

Assuming that the model has multiplicative seasonal factors (resp. additive) and

that the number of seasons per cycle is p, for each of the three error measures, ϕi, the

algorithm computes the optimal values of (χ,ζ) by solving three non-linear

programming problems which include the following additional constraint: 1),( ≤ζχϕU ,

to avoid the models working worse than in the naïve method.

The function ),( ζχϕU evaluates Theil’s U-statistic (Theil, 1966) for each iterate.

This statistic acts as a relative measure of the effort of determining the parameter values

with respect to the naïve forecast (such as using the previous period’s value as the

forecast for the next period, Fn(1):= Dn).

3.1 Model fitting

Let us describe the performance of the first stage of our procedure for obtaining

forecasts through the generalized Holt-Winters method, which is based on the ideas

previously introduced. Notice that the feasible set Φ may adopt several different forms

depending on the forecaster information about the seasonal effect and the number of

observations per seasonal cycle, although for p=1 there are no differences between the

additive and multiplicative versions of the Holt-Winters method.

Stage I.

Step 0. Let {Dt}t=1,...,n be an observed data series. Let p be the number of seasons per

cycle.

Step 1. Build the (NLP)i problems for i=1, 2, 3, where

(NLP)i min {ϕi(χ, ζ): (χ, ζ)∈Φ, ϕU(χ, ζ)≤1}

Step 2. Solve the (NLP)i problem using a multi-start strategy. Keep the set of local

minima in Ai, for i=1, 2, 3.

Step 3. Evaluate the optimal criterion vector [ϕ1(a), ϕ2(a), ϕ3(a)] for each a∈Ai, i=1, 2,

3 and select as the best solution {ai} the one with the lowest fit error in Ai, i=1, 2, 3.


At this stage, concerning the model selection criteria, the best fit of every error

function may or may not correspond to the same version of exponential smoothing, that

is to the same ES method. In fact, although we use only one measure of fitting error,

several candidate models associated to local minima of the corresponding non-linear

programming problem may appear.

Solving those non-linear programming problems using a multi-start strategy could

give us a collection of alternative local optimal solutions. We have a lot of information

about every measure of fitting that is summarized in the set A:={x=(χ,ζ)∈Φ: x is a

solution for at least one of the objectives ϕi}. For i=1, 2, 3, we define ϕi+= min{ϕi(x):

x∈A}, i.e. the lowest value obtained for each objective, ϕi-= max{ϕi(x): x∈A}, the

highest one and ϕim, as the median of all the values of the i-th objective in the set A.

3.2 Model building: the fuzzy multi-objective approach

In this section we describe the ideas that are the basis of the second stage of our

proposal. Our goal is to identify the most appropriate ES method from a set of possible

choices, summarized in A. We want to select the ‘best’ method and so we decide to

formulate a fuzzy multi-objective optimization problem that minimizes the three

measurements of fitting within the set Φm (resp. Φa). This problem may be expressed as

follows:

(FMOP) inm~ [ϕi(χ, ζ), i=1, 2, 3]

s. t. (χ, ζ)∈ Φ

ϕU(χ, ζ) ≤ 1

where the structural constraints and the recursive updating equations that define the

feasible region Φ are crisp constraints, while for the objective functions we apply the

fuzzy version of minimize, that is they should be minimized as much as possible under

the given constraints. Therefore, we will find a fuzzy solution which is a model that

satisfies the requirements of some of the ES models that underlie versions of the

generalized HW methods.

Here the fuzziness is assumed in the objective functions, for which we have

aspiration levels together with membership functions, fixed from the information


obtained in the first stage of the procedure for every fitting measure. Then the model

selection problem is defined as

0

~

( ) ( , ). . ( , ) 1, 2,3

( , ) 1( , )

i i

U

FP Finds t i

χ ζϕ χ ζ ϕ

ϕ χ ζχ ζ

< =

≤∈Φ

where ϕi0=ϕi

+ is the fuzzy goal of the i-th objective, for i=1, 2, 3 and ~< is a fuzzy

constraint. Once the membership functions µi(ϕi(x)) have been specified, the algorithm

selects the fuzzy decision defined by the max-min operator, as in Zimmermann’s

approach (1978). It then finds the solution, (χ, ζ, λ)max, with the maximum degree of

global satisfaction in the (FP) problem, and it is obtained by solving the following crisp

optimization problem:

4( ) max. . ( , ) 1, 2,3

( , ) 1( , )

[0,1]

i

U

NLPs t iϕ

λµ χ ζ λϕ χ ζ

χ ζλ

≥ =≤

∈Φ∈

The algorithm makes use of a separate subroutine for building the membership

function of each of the measures of fit. It receives as input all the information obtained

in Stage I about ϕi. This information enables the goal, tolerance and shape of the

membership functions of the objectives, µi(ϕi(x)), to be fully characterized, that is 0

0 0

0

1 ( )( ) ( ( )) ( )

0 ( )i

i i

i i i i i i

i i i

if xx x if x t

if x tϕ

ϕ ϕµ µ ϕ ϕ ϕ ϕ

ϕ ϕ

≤= < ≤ + > +

µi(ϕi(x)) being a continuous and strictly decreasing function at [ϕi0, ϕi

0+ti], where ti≥0

measures the amplitude of the tolerance interval for the fuzzy inequality ϕi(χ,ζ)~< ϕi

0 to

be met.

The tolerances are given by ti:= ϕi--ϕi

+, for i=1, 2, 3 and exponential or linear

membership functions are then obtained from the result of {ϕi+, ϕi

-, ϕim}, according to

the scheme proposed in Carlsson and Korhonen (1986). If ϕim= 2

1 (ϕi++ϕi

-, then the


linear membership function is used. Otherwise, the value of the parameter bi, which

defines the concave or convex form of the exponential membership function, must be

determined. To do that, we assume that ϕim is the value of the objective function ϕi(x)

associated with a possibility of 0.5, so the following equation may be solved

µi(ϕim)=0.5, where

)exp{1

)(exp1

))((i

i

iii

ii bt

xb

x−−

−

−−

=

− ϕϕ

ϕµ

Let us describe the performance of the second stage of our procedure for finding

optimal forecasts with the generalized Holt-Winters methods.

Stage II.

Step 4. Let A be the set of local minima from Stage I. For i=1,2,3 compute {ϕi+,ϕi

-,ϕim}.

If ϕim= 2

1 (ϕi++ϕi

-), build a linear membership function for i.

Otherwise, compute bi and build an exponential membership function.

Step 5. Solve the (NLP)4 problem and select the solution with the highest λ value.

Step 6. Evaluate Ln, bn and In+h-p for 1≤h for the solution (χ, ζ, λ)max and calculate the

forecasts made at time n for τ periods ahead.

3.2.1 Implementation issues

1. Our decision support system provides a consistent framework for selecting different

exponential smoothing methods, taking into account several measures of fit. Both the

model fitting and the building problem are formulated as a non-linear programming

problem. A GRG-based solution procedure is implemented in the widely accepted

spreadsheet environment, EXCEL, to solve this problem.

Our system has been implemented in the Visual Basic language. It consists of

three different modules: (i) data entry, (ii) active selection of either a multiplicative or

an additive forms of generalized HW and of the number of observations per seasonal

cycle and (iii) storage of the solution with the maximum degree of global satisfaction:

the model parameters, the fitting errors and the τ-step-ahead predictions.

2. Step 2 involves the evaluation of the local minima of several non-linear functions.

For improving the performance of the algorithm the analyst must provide feasible initial


values of (χ,ζ) for the optimization procedure. We then use twelve different initial

solutions, in which the values of the parameters in χ vary in a grid, while the initial

values for ζ are evaluated using the following equations (Makridakis et al, 1998) for the

multiplicative version:

_

10L D= , p

DDb 1

_

2

_

0−

= , 1

1,...,tt p

DI t pD− = =

where i

_D is the arithmetic mean of the data for the i-th cycle. For the additive form of

the updating equations the initial values of the seasonal indices are obtained by means

of

−

+−=− 00 21btLDI tpt , for t=1,…, p suggested by Chatfield and Yar (1988).

3. Concerning the forecasts provided by Step 6, it must be pointed out that additive and

multiplicative versions have been performed separately. Therefore, the information used

in Stage II and the solution of the (NLP)4 problem is associated to a version of ES, of

either additive or multiplicative form.

3.3 Model checking

Notice that the smoothing constants (α, β, γ) are parameters set within the limits

[0,1]. This implementation of generalized HW methods will never produce inferior fits

compared with more simple ES methods since they have the same objective, but more

decision variables, of which some apply to similar variables. The following cases may

then appear if both the smoothing parameters and the initial values ζ have been

estimated by minimizing the one-step prediction error through a joint optimization

scheme:

a) Non-seasonal models with trend

• If the optimal value for the smoothing constant α=0, the value of β is not

relevant, and we then have the linear regression model (for φ=1) or Holt’s

damped method, for φ<1.

• If the smoothing constant α>0, the fixed trend corresponds to β=0, then

bn=φn b0. For β>0 the additive trend models appear.

b) Models with trend and seasons (p>1)


• If the optimal smoothing constant γ for the seasonal parameters is equal to

zero, the update values of the seasonal indices will not change. Also the fixed

seasonal set of indices is obtained if the optimal α=1. In both cases the

seasonal indices are fixed, that is It-p=Ij-p for j≡t(mod p).

• If we find an optimal solution with γ=0 and the values of the seasonal factors

are It=1 for all t∈{1-p,...,0}, for the multiplicative case or It=0 for the additive

form, then Holt’s linear method is obtained.

• If γ=0 and there is any It0<1 (resp. It0≠0), the data are just seasonally adjusted

and we apply Holt’s deseasonalized method.

• If the data have no trend, then solutions with β=0 and b0=0 are appropriate.

Moreover, if the seasonal set of indices is fixed we obtain the single

exponential smoothing method or its seasonally adjusted version, depending on

the values of the seasonal factors.

It should be pointed out that the fixed seasonal effect, which is obtained for γ=0 or α=1,

implies the same ‘optimal’ pattern of deseasonalization for all the seasonal cycles.

Furthermore, the optimal set of indices is normalized, that is the sum for a cycle should

equal 0 (resp. p) in a model with additive (resp. multiplicative) adjustments of seasonal

effects.

Our main goal is to obtain a robust estimation of the unknown parameters in order

to build good forecasts. Notice that the proposed algorithm uses a multi-objective

formulation that permits the re-optimization of the forecasting procedure. This cross-

over scheme runs a crisp optimization procedure first and later switches to the fuzzy

multi-objective framework, where the parameters of the model are re-optimized. It

proves to be of great use from the point of view of the automation of the process of

obtaining the model that ‘best’ approximates data with respect to all the objectives

simultaneously. In the next section we present some comparative out-of-sample results

obtained from the optimal solution, at either Stage I or Stage II.


4. Comparative results

In order to illustrate the performance of our automatic forecasting procedure, the

method is applied to a pair of classical data sets. Notice that our algorithm may stop at

the end of Stage I selecting the model with the lowest fit error for every measure of

fitting, that is {ai}, or perform the complete scheme and select the model with the

highest degree of satisfaction. For the following examples a comparison between the

partial and complete performance of our procedure is shown.

Example 1. The data correspond to monthly passengers of an international airline

(Brown, 1963), from 1949 to 1959, where p=12. The multiplicative Holt-Winters

method is applied here to a sample of 132 months, the origin being January 1949. The

forecasted values are projected from month 133 to the planning horizon of the

subsequent 12 months.

Table 1 shows the best solutions obtained in Steps 3 and 5 for every (NLP)i

problem associated with the multiplicative form of the updating equations, i=1,…,4.

The subindex of the solution informs us about the fitting error which has been

optimized, that is MAPE, RMSE and MAD, and NLP for the compromise solution,

respectively. Notice that all the solutions have fixed seasonal factors (α=1 or γ=0), but

that there are two types of solutions. In fact, the underlying ES method in SMAPE and

SRMSE solutions is Holt’s deseasonalized method with a fixed set of seasonal indices

provided by the optimization scheme, and a damped-trend. The other solutions do not

have a damped trend and there are slight differences between them.

Table 2 contains the criterion vector of the fitting errors for every solution and

their U-statistic value. Notice that the SNLP solution is clearly a compromise solution

with respect to the fitting errors.

We compute forecasts up to 12 steps-ahead and evaluate the one-step-ahead

forecast error for all the solutions. For every forecast horizon we then determine the

post-sample accuracy by averaging the absolute percentage errors. Table 3 shows the

forecasting accuracy for the predictions for several horizons. The SNLP solution nearly

coincides with SMAD and their forecasts and post-sample forecasting errors are very

similar.


For this example, both the fitting errors and the accuracy of the predictions for

every solution are quite similar, with only a slight improvement in the post-sample

accuracy achieved for the proposal SNLP. Let us see how the algorithm performs for the

champagne example.

Example 2. Consider the time series corresponding to the sales of champagne in France

during the period 1962-1969. The monthly data, in thousands of bottles, which appear in

the AUTOCAST package (Gardner, 1986) are slightly different from those used in

Wheelwright and Makridakis (1973). We shall use the data for the demand found in

AUTOCAST, in order to compare our results with those published in Chatfield and Yar

(1988). The algorithm therefore works with the data obtained up to month 76, as they

did. The forecasted values are projected from month 76 to cover the demand

corresponding to the planning horizon of the subsequent 12 months.

Tables 4 and 5 show the best models for the champagne data (Step 3 of Stage I)

and their optimal criterion vector respectively, and also for the proposal SNLP (Step 5 of

Stage II). All the solutions have been evaluated using the multiplicative version of the

HW recursive equations. Notice that SMAPE, SRMSE and SNLP optimal solutions work with

seasonal indices obtained from the optimal fixed set (γ=0). For SMAD the seasonal

factors are not fixed and the underlying ES method has no trend (β=φ=0). In fact, the

best solutions for the non-linear optimization problems usually provide different

versions of exponential smoothing methods, which give different forecasts.

Table 6 contains the post-sample accuracy for the above solutions and for the

solution associated to the predictions in the paper written by Charfield and Yar (1991).

The compromise solution is only better for the predictions of the last six months, where

it also improves the results obtained by Charfield and Yar (1991).

Concerning the good performance of the MAD criterion in the model fit, our

results coincide with those obtained by Gardner (1999), which compare the damped-

trend exponential smoothing method with a rule-based forecasting (RBF) approach over

a set of 126 time series.


5. Application to the 111 series of the M-competition

To further investigate the predictive behavior of our proposal and its accuracy

with respect to other forecasting approaches, we apply it to a well-known set of data

banks with which to validate the out-of-sample predictions. We made comparisons for

the collection of 111 series from the M-competition (Makridakis et at, 1982), which

contains 51 monthly, 9 quarterly and 51 annual time series. Just as is specified in the M-

competition, we compute forecasts up to 18, 8 or 6 steps ahead as well as the MAPE

error for each forecast horizon.

First we investigate the robustness properties of our forecasting procedure with

respect to the initial values, i.e. the initial solutions of the nonlinear problems used as

starting values in our algorithm. In this context, we say that a forecasting method is

robust if it is not sensitive to the formulae used to set up those initial solutions; this is

not the case for most of the forecasting approaches based on the Holt-Winters model

(see, for instance, Chatfield and Yar, 1988; Segura and Vercher, 2001).

Table 7 compares the post-sample accuracy of the forecasts provided by our

method by using four different initial solutions of the nonlinear problems (those

suggested by Makridakis et al, 1998; Larrañeta et al, 1988; Granger and Newbold, 1986

and Winters, 1960) for the multiplicative version of the recursive equations. Table 8

shows a similar comparison but for the additive version, and the four initial solutions

used here are: that proposed by Chatfield and Yar (1988) and those in Larrañeta et al,

(1988), Granger and Newbold (1986) and Winters (1960). Tables 7 and 8 show no

significant differences in average predictive accuracy given the initial solution if our

optimization procedure is used.

In order to further study other influences on the initial solution, Table 9 shows a

small description of the average MAPE from the out-of-sample predictions for each one

of the 111 series and for each one of the four criteria to specify the initial values, if

multiplicative seasonality is assumed. It contains the median and the percentiles 80, 90

and 95, in order to compare their worst predictive behavior, not only their average

behavior. Once more, there are no differences in practice between the four criteria. A

similar result is obtained for the additive seasonal form of the HW methods.

The use of the methodology proposed in this paper allows us to obtain very good

fits to the sample data in most of the series, as is shown in Table 10. There the average


MAPE of the within-sample fit errors of the 111 series is described, assuming both

additive and multiplicative seasonality, for all series and for each seasonal subset of the

series. The formulae used here, and in the rest of the paper, to set the initial points of the

optimization algorithm are those in Makridakis et al (1998) for the multiplicative case

and in Chatfield and Yar (1988) for the additive one. There are no differences, on

average, in the fit of a multiplicative seasonal component or an additive component, but

there are for the subset of quarterly series for which multiplicative seasonality seems to

work better. In the case of monthly data the difference is insignificant and for non-

seasonal data the two approaches give, of course, the same results.

The predictive accuracy of our results is outlined in Figure 1, where the average

post-sample MAPE across different forecast horizons is plotted for each subset of

series: non-seasonal, quarterly and monthly. The average MAPE for the one-step-ahead

forecast and their means through the first four steps is registered in Table 11 for each

subset of series, showing a very good short-term predictive behavior. Table 11 shows

too the long-term predictive behavior, giving the average MAPE of steps one to eight

and one to eighteen for the non-seasonal series, one to eight for the quarterly series and

one to twelve and one to eighteen for the monthly series.

Notice that our scheme may perform the additive or multiplicative version of the

Holt-Winters method, obtaining very similar accuracy predictions. However, on average

the multiplicative version seems to work a little better than the additive one and it is the

first choice in an automatic implementation of our method.

Figure 2 compares the post-sample accuracy of our method with some of the best

performing methods in the M-competition, including some of the more time consuming

ones and the one recently proposed by Hyndman et al. (2002), by plotting the average

MAPE at different forecast horizons for each method.

Table 12 shows the values plotted in Figure 1 and their average over different

forecasting horizons. As shown there, our method, both in its additive and multiplicative

versions, performs better than the others for shorter forecast horizons, while for longer

forecast horizons only the Parzen method (see Makridakis et al, 1998) works better than

ours.

It should be pointed out that we have not done any preprocessing of the data. We

have not even treated the series which is assumed to be non-seasonal (p=1) in a different


manner. The algorithm is always applied to the data in the same way for each time

series.

Conclusions The exponential smoothing methods are much used forecasting techniques for

time series in stock management. This requires the user to specify initial values for the

level, trend and seasonal components as well as three smoothing constants, and we

decided to treat these quantities as decision variables and use mathematical

programming to find optimal values. This enables us to achieve a considerable

reduction in all the measures of forecast error in the cases studied. Besides, using fuzzy

techniques to solve the multi-objective non-linear programming problem that includes

the three measures of error enables a flexible optimal solution to be generated for either

the multiplicative or the additive Holt-Winters forecasting procedure.

The method used to specify initial values for the level, trend and seasonal

components has been very influential in previous works on the Holt-Winters forecasting

procedure. This is not the case in our proposal, because we treat these quantities as

decision variables. However, we still need to specify an initial vector of solutions from

which to start the optimization algorithm. Our empirical study, based on the 111 series

from the M-competition, shows that the method used to designate the initial vector has

very little effect on the goodness of the predictions obtained.

The multiplicative version of our method works on average better than the

additive one; it should be the first option in an automatic implementation of our

procedure. However, the differences in prediction accuracy between the two versions

are very small, so we still expect good results with the additive version if the

multiplicative one is inappropriate. Note that, if a data series contains some values equal

to zero, the multiplicative Holt-Winters method could not be used.

The use of optimization tools in estimation analysis associated with the

generalized Holt-Winters methods with damped trend is very fruitful. Although the

incorporation of the initial values as decision variables complicates the related non-

linear problems, the optimization tools facilitate the solving of those problems.

The decision to use a fuzzy methodology is due to the imprecise knowledge of the

goals and the need for managing several sources of fitting errors in order to find robust


solutions. Here the fuzzy system is modeled by using only the information that Stage I

of the procedure can provide. There are many effective methods available for

substituting multi-objective non-linear programs with fuzzy goals by crisp optimization

problems. In order to find the compromise solution we have selected the max-min

operator, because it gives the solution with the maximum degree of global satisfaction

and favours the complete automation of the forecasting procedure.

References

Bermúdez, J.D., Segura, J.V. and Vercher, E., 2004. Improving demand forecasting

accuracy using non-linear programming software, Journal of the Operational

Research Society (to appear).

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C., 1994. Time Series Analysis,

Forecasting and Control, 3rd ed. Prentice Hall, Englewood Cliffs, NJ.

Brown, R.G., 1963. Smoothing forecasting and prediction of discrete time series.

Prentice Hall, Englewood Clifs, NJ.

Carlsson, Ch. and Korhonen, P., 1986. A parametric approach to fuzzy linear

programming. Fuzzy Sets and Systems 20, 17-30.

Chanas, S., 1989, Fuzzy programming in multiobjective linear programming: a

parametric approach. Fuzzy Sets and Systems 29, 303-313.

Chatfield, C., Koehler, A.B., Ord, J.K. and Snyder, R.D., 2001. A new look at models

for exponential smoothing. The Statistician 50, 147-159.

Chatfield, C. and Yar, M., 1988. Holt-Winters forecasting: some practical issues. The

Statistician 37, 129-140.

Chatfield, C. and Yar, M., 1991. Prediction intervals for multiplicative Holt-Winters.

International Journal of Forecasting 7, 31-37.

Delgado, M., Verdegay, J.L. and Vila, M.A., 1990. A possibilistic approach for

multiobjective programming problems. Efficiency of solutions. In: R. Slowinski and

J. Teghem (Eds.), Stochastic versus Fuzzy approaches to multiobjective matemathical

programming under uncertainty, Kluwer, Dordrecht, 229-248.


Gardner, Jr, E.S., 1985. Exponential smoothing: the state of the art. Journal of

Forecasting 4, 1-28.

Gardner, Jr, E.S., 1986. AUTOCAST User’s Manual. Core Analytic Inc., Bridgewater,

New Jersey.

Gardner, Jr, E.S., 1999. Note: Rule-Based Forecasting vs. Damped-Trend Exponential

Smoothing. Management Science 45, 1169-1176.

Gardner, Jr, E.S. and Mckenzie, M., 1985. Forecasting trends in time series.

Management Science 31, 1237-1246.

Granger, C.W.J. and Newbold, P., 1986. Forecasting economic time series. 2nd edition.

Academic Press, New York.

Hyndman, R.J., Koehler, A.B., Snyder, R.D. and Grose, S., 2002. A state space

framework automatic forecasting using exponential smoothing. International Journal

of Forecasting 18, 439-454.

Jiménez, M., Arenas, M., Bilbao, A. and Rodríguez Uría, M.V., 2004. Solving fuzzy

goal programming problems, Fuzzy Economic Review IX(1) 19-33.

Koehler, A.B., Snyder, R.D. and Ord, J.K., 2001. Forecasting models and prediction

intervals for the multiplicative Holt-Winters method. International Journal of


Lai. Y.J. and Hwang, Ch.L., 1996. Fuzzy Multiple Objective Decision Making:

Methods and Applications, Springer, Berlin.

Larrañeta, J.C. et al., 1988. Métodos modernos de gestión de la producción. Alianza

Universidad Textos, Madrid.

Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R.,

Newton, J., Parzen, E. and Winkler, R., 1982. The accuracy of extrapolation (time

series) methods: Results of a Forecasting Competition. Journal of Forecasting 1, 111-

153.

Makridakis, S. and Hibon, M., 2000. The M3-Competition: results, conclusions and

implications. International Journal of Forecasting 16, 451-476.


Makridakis, S., Wheelwright, S.C. and Hyndman, R.J., 1998. Forecasting. Methods and

Applications, 3rd edition. Wiley, New York.

Ord, J. K. (ed.), 2001. Commentaries on the M3-Competition. International Journal of


Ord, J. K., Koehler, A.B. and Snyder, R.D., 1997. Estimation and Prediction for a Class

of Dynamic Nonlinear Statistical Models. Journal of the American Statistical

Association 92, 1621-1629.

Pegels, C.C., 1969. Exponential forecasting: some new variations. Management Science

12, 311-315.

Ramik, J., 2000. Fuzzy goals and fuzzy alternatives in goal programming problem.

Fuzzy Sets and Systems 111, 81-86.

Sakawa, M., 1998. Fuzzy nonlinear programming with single and multiple objective

functions, In R. Slowinski (Ed.), Fuzzy Sets in Decision Analysis, Operations

Research and Statistics, Kluwer, Boston.

Segura, J.V. and Vercher, E., 2001. A spreadsheet modeling approach to the Holt-

Winters optimal forecasting. European Journal of Operational Research 131, 147-

160.

Theil, H., 1966. Applied Economic Forecasting. North-Holland, Amsterdam.

Wheelwright, S.C. and Makridakis, S., 1973. An examination of the use of adaptative

filtering in forecasting. Operational Research Quarterly 24, 55-64.

Winters, P.R., 1960. Forecasting sales by exponentially weighted moving averages.

Management Science 6, 324-342.

Zimmermann, H.J., 1978. Fuzzy programming and linear programming with several

objective functions. Fuzzy Sets and Systems 1, 45-55.


Table 1: Best solutions (χ, ζ) for the airline data

Best solution α β γ φ L0 b0 I-11 I-10 I-9 I-8 I-7 I-6 I-5 I-4 I-3 I-2 I-1 I0 SMAPE 0,782 0,245 0 0,570 120,294 1,208 0,922 0,882 1,006 0,972 0,982 1,101 1,215 1,216 1,060 0,929 0,808 0,908 SRMSE 0,646 0,395 0 0,395 103,076 21,765 0,907 0,865 0,998 0,962 0,977 1,119 1,251 1,250 1,060 0,922 0,800 0,889 SMAD 1 0,024 0,864 1 121,272 0,945 0,916 0,869 0,989 0,958 0,966 1,118 1,242 1,237 1,062 0,928 0,810 0,904 SNLP 1 0,024 0,864 1 121,375 0,945 0,915 0,870 0,994 0,961 0,972 1,116 1,242 1,238 1,060 0,925 0,807 0,901


Table 2: Fitting errors for the airline data

MAPE RMSE MAD U-StatisticSMAPE 2,69 9,45 6,60 0,36SRMSE 2,88 8,61 6,81 0,36SMAD 2,69 8,85 6,38 0,37SNLP 2,71 8,69 6,43 0,36


Table 3: Post-sample accuracy for the airline data

MAPE Average of forecasting horizon1-4 1-6 1-8 1-12

SMAPE 3,81 5,01 6,60 6,87SRMSE 3,83 4,16 4,82 5,29SMAD 3,51 3,70 4,08 3,26SNLP 3,62 3,60 3,91 3,12


Table 4: The best solutions for the champagne data

Best solution α β γ φ L0 b0 I-11 I-10 I-9 I-8 I-7 I-6 I-5 I-4 I-3 I-2 I-1 I0 SMAPE 0,040 0,835 0 0,835 4,057 -0,236 0,746 0,725 0,808 0,818 0,921 0,926 0,749 0,338 0,929 1,186 1,735 2,119 SRMSE 0,014 0,417 0 0,977 3,107 0,022 0,747 0,696 0,788 0,804 0,924 0,897 0,723 0,376 0,913 1,197 1,769 2,167 SMAD 0,188 0 0,001 0 3,109 0,032 0,745 0,720 0,818 0,811 0,922 0,913 0,749 0,349 0,899 1,151 1,771 2,153 SNLP 0,015 0,442 0 0,966 3,088 0,023 0,728 0,711 0,821 0,817 0,922 0,921 0,756 0,347 0,932 1,186 1,725 2,136


Table 5. Fitting errors of the champagne data

MAPE RMSE MAD U-StatisticSMAPE 7,31 0,56 0,31 0,24SRMSE 8,99 0,52 0,35 0,24SMAD 8,32 0,55 0,34 0,25SNLP 8,24 0,54 0,33 0,24


Table 6: Post-sample accuracy for the champagne data

MAPE Average of forecasting horizon1-4 1-6 1-8 1-12

SMAPE 23,43 19,83 19,77 17,21SRMSE 32,39 22,61 18,04 14,76SMAD 20,08 18,34 17,88 15,42SNLP 28,34 19,39 16,53 13,60Chatfield&Yar_91 17,11 15,73 17,08 17,25


Table 7: Average MAPE for all 111 series across different forecasting horizons, using multiplicative

seasonality: A comparison of four initial solutions.

Forecasting horizons Average of forecasting horizons

Criteria 1 2 3 4 5 6 8 12 15 18 1 - 4 1 - 6 1 - 8 1 - 12 1 - 15 1 - 18

M. et al 9,0 9,1 11,0 11,8 13,9 15,2 16,1 14,2 26,1 28,2 10,2 11,7 13,1 13,4 14,8 16,6 L. et al 9,0 9,3 11,0 11,9 13,9 15,1 16,3 14,2 26,4 28,2 10,3 11,7 13,1 13,4 14,9 16,7 G. & N. 8,9 9,0 11,0 12,2 14,7 15,9 17,2 13,8 26,3 28,7 10,3 11,9 13,5 13,7 15,2 16,9

W 9,1 9,1 11,1 11,9 14,0 15,3 16,3 14,4 26,3 28,3 10,3 11,7 13,2 13,5 14,9 16,7


Table 8: Average MAPE for all 111 series across different forecasting horizons, using additive seasonality: A

comparison of four initial solutions.

Forecasting horizons Average of forecasting horizons

Criteria 1 2 3 4 5 6 8 12 15 18 1 - 4 1 - 6 1 - 8 1 - 12 1 - 15 1 - 18

C. & Y. 9,8 9,3 11,3 12,1 14,7 15,8 16,6 14,5 25,6 28,5 10,6 12,2 13,7 13,9 15,2 16,9

L. et al 9,7 9,1 11,3 12,1 14,8 16,1 16,6 14,4 26,2 29,2 10,5 12,2 13,7 13,8 15,3 17,0

G. & N. 9,1 9,2 11,5 12,5 15,0 16,5 17,0 15,0 27,3 30,0 10,6 12,3 13,8 14,1 15,7 17,5

W 9,8 9,2 11,4 12,1 14,6 16,0 16,7 14,7 26,5 29,5 10,6 12,2 13,8 13,9 15,4 17,1


Table 9: Some sample percentiles of the average MAPE for each one of the 111 series, using

multiplicative seasonality only: A comparison of four criteria to calculate initial values.

percentiles

Criteria 50 80 90 95

M. et al 10,3 21,5 32,9 49,0

L. et al 10,5 20,9 32,8 47,9

G. & N. 9,3 21,0 30,7 53,1

W 10,3 21,5 32,9 49,4


Table 10: Average MAPE fit for all 111 series, and for each seasonal subset of series.

Seasonality All series Non-seasonal Quarterly Monthly

Additive 7,40 8,88 3,88 6,55

Multiplicative 7,38 8,88 3,53 6,57


Figure 1: Average MAPE across different forecast horizons for each subset of series.


Table 11: Average MAPE across some forecast horizons, using all 111 series and for each subset of

series.

Non-seasonal Quarterly Monthly

forecast horizons 1 1 - 4 1 - 8 1 - 18 1 1 - 4 1 - 8 1 1 - 4 1 - 12 1 - 18

Additive seasonality 10,4 12,0 16,0 22,5 4,6 10,2 19,6 9,8 9,3 11,7 14,2

Multiplicative seasonality 10,4 12,0 16,0 22,5 5,0 8,7 14,8 8,4 8,7 11,2 13,8


Figure 2: Average MAPE across different forecast horizons for all 111 series: A comparison of our method (both in

its additive and multiplicative version) with some of the best methods in the M competition.


Table 12: Average MAPE across different forecast horizons for all 111 series, comparing our method

(including both its additive and multiplicative version) with some of the best methods in the M competition. Forecasting horizons Average of forecasting horizons

Methods 1 2 3 4 5 6 8 12 15 18 1-4 1-6 1-8 1-12 1-15 1-18

Deseasonalised SES 7,8 10,8 13,1 14,5 15,7 17,2 16,5 13,6 29,3 30,1 11,6 13,2 14,1 14,0 15,3 16,8

Box-Jenkins 10,3 10,7 11,4 14,5 16,4 17,1 18,9 16,4 26,2 34,2 11,7 13,4 14,8 15,1 16,3 18,0

Parzen 10,6 10,7 10,7 13,5 14,3 14,7 16,0 13,7 22,5 26,5 11,4 12,4 13,3 13,4 14,3 15,4

Hyndman et al. 8,7 9,2 11,9 13,3 16,0 16,9 19,2 15,2 28,0 31,0 10,8 12,7 14,3 14,5 15,7 17,3

Our additive method 9,8 9,3 11,3 12,1 14,7 15,8 16,6 14,5 25,6 28,5 10,6 12,2 13,7 13,9 15,2 16,9

Our multiplicative method 9,0 9,1 11,0 11,8 13,9 15,2 16,1 14,2 26,1 28,2 10,2 11,7 13,1 13,4 14,8 16,6

centro de investigaci³n operativa a decision support system

Documents