lumpy demand forecasting using neural networks

ARTICLE IN PRESS

0925-5273/$ - see

doi:10.1016/j.ijp

�Correspondifax: +1915 747

E-mail addre

Int. J. Production Economics 111 (2008) 409–420

www.elsevier.com/locate/ijpe

Lumpy demand forecasting using neural networks

Rafael S. Gutierreza, Adriano O. Solisb,�, Somnath Mukhopadhyayb

aDepartment of Mechanical and Industrial Engineering, College of Engineering, The University of Texas at El Paso,

El Paso, TX 79968-0521, USAbDepartment of Information and Decision Sciences, College of Business Administration, The University of Texas at El Paso,

El Paso, TX 79968-0544, USA

Received 1 August 2005; accepted 1 January 2007

Available online 21 February 2007

Abstract

The current study applies neural network (NN) modeling in forecasting lumpy demand. It is, to the best of our

knowledge, the first such study. Our study compares the performance of NN forecasts to those using three traditional time-

series methods (single exponential smoothing, Croston’s method, and the Syntetos–Boylan approximation). We find NN

models to generally perform better than the traditional methods, using three different performance measures. We also

independently validate earlier findings that the Syntetos–Boylan approximation performs better than the Croston’s and

single exponential smoothing methods in lumpy demand forecasting.

r 2007 Elsevier B.V. All rights reserved.

Keywords: Forecasting; Lumpy demand; Neural network modeling

1. Introduction

Because future demand plays a very importantrole in production planning and inventory manage-ment, fairly accurate forecasts are needed. Themanufacturing sector has been trying to manage theuncertainty of demand for many years, which hasbrought about the development of many forecastingmethods and techniques (Makridakis and Wheelright, 1987).

Statistical methods, such as exponential smooth-ing and regression analysis, have been used bydecision makers for several decades in forecastingdemand. In addition to ‘uncertainty reduction

front matter r 2007 Elsevier B.V. All rights reserved

e.2007.01.007

ng author. Tel.: +1915 747 7757;

5126.

ss: [email protected] (A.O. Solis).

methods’ like forecasting, ‘uncertainty managementmethods’ such as adding redundant resources havealso been devised to cope with demand uncertaintyin manufacturing planning and control systems(Bartezzaghi et al., 1999). However, many of theseuncertainty reduction or management methods mayperform poorly when demand for an item is lumpy

or intermittent—that is, characterized by intervals inwhich there is no demand and, for periods withactual demand occurrences, a large variation indemand levels. Lumpy demand exists in bothmanufacturing and service environments, includingitems such as heavy machinery or spare parts(Willemain et al., 2004). For instance, lumpydemand has been observed in the auto-motive industry (Syntetos and Boylan, 2001,2005), in durable goods spare parts (Kalchschmidtet al., 2003), in aircraft maintenance service parts

.

www.elsevier.com/locate/ijpe

dx.doi.org/10.1016/j.ijpe.2007.01.007

mailto:[email protected]

ARTICLE IN PRESSR.S. Gutierrez et al. / Int. J. Production Economics 111 (2008) 409–420410

(Ghobbar and Friend, 2003), and in telecommuni-cation systems, large compressors, and textilemachines (Bartezzaghi et al., 1999), amongothers.

Croston (1972) noted that, while single exponen-tial smoothing has been frequently used for fore-casting in inventory control systems, demandlumpiness generally leads to stock levels that areinappropriate. He noted a bias associated withplacing the most weight on the most recent demanddate, leading to demand estimates that tend to behighest just after a demand occurrence and lowestjust before one. To address this bias, Crostonproposed a new method of forecasting lumpydemand, using both the average size of nonzerodemand occurrences and the average intervalbetween such occurrences. Croston’s method isapplied in leading application software packagesfor statistical forecasting (Syntetos and Boylan,2005).

Johnston and Boylan (1996) revisited Croston’smethod, using simulation analysis to establish thatthe average inter-demand interval must be greaterthan 1.25 forecast revision periods in order forbenefits of Croston’s method over exponentialsmoothing to be realized. On the other hand,Syntetos and Boylan (2001) reported an error inCroston’s mathematical derivation of expecteddemand and proposed a revision to approximatelycorrect a resulting bias built into estimates ofdemand. Leven and Segerstedt (2004) reported adifferent, independently developed modificationproposed by Segerstedt (2000).

Syntetos and Boylan (2005) quantified the biasassociated with Croston’s method and introduced anew modification involving a factor of (1�a/2)applied to Croston’s original estimator of meandemand, where a is the smoothing constant in usefor updating the inter-demand intervals. Thismodification of Croston’s method, which has cometo be known as the Syntetos–Boylan approximation(SBA), yields an approximately unbiased estimator.Syntetos and Boylan applied four forecastingmethods—simple moving average over 13 periods,single exponential smoothing, Croston’s method,and SBA—on monthly lumpy demand histories(over a 2-year period) of 3000 stock keeping units inthe automotive industry. They undertook extendedsimulation experiments establishing the superiorityof SBA over the three other methods, using eitherscaled mean error or relative geometric root-mean-square error as ordering criterion.

Syntetos et al. (2005) established regions ofsuperior performance in comparing exponentialsmoothing, Croston’s method, and SBA. They usedmean square error (MSE) as performance measurein the comparisons, owing to the mathematicaltractability of this measure.

Hill et al. (1996) pointed out that traditionalstatistical time-series methods can misjudge thefunctional form relating the independent anddependent variables. These misjudged relationshipsare inflexible to modification during the modelbuilding process. These traditional methods canalso fail to make necessary data transformations.The presence of outliers in data can lead to biasedestimates of model parameters of these models(Iman and Conovar, 1983). These models need to berecalibrated on all previous data. Finally, tradi-tional time-series methods may not sometimescapture the nonlinear pattern in data. Artificialneural network (ANN) or, simply, neural network(NN) modeling is a logical choice to overcome theselimitations. NN models can provide good approx-imations to just about any functional relationships(White, 1992). Successful applications of NNs havebeen well documented as early as in the 1980s(Elman and Zipser, 1987; Sejnowski and Rosenberg,1987). NN models, capable of modeling linear ornonlinear patterns in data, can be enhanced easilyby introducing new attributes in the model. Flex-ibility and nonlinearity are the two most powerfulaspects of NN modeling (Weiland and Leighton,1988). NN models are also designed to learn therelationship between independent and dependentvariables incrementally (adaptive in nature) whennew information is available. Most NN models areconvenient because they do not require anydistributional assumptions in data.

In spite of the above promises, NN models do notalways generalize for many applications when usedfor prediction in extrapolation unless these modelsare designed properly to fit the application phenom-enon (Roy and Mukhopadhyay, 1997). This isbecause the gradient search technique may find alocal minimum in the least mean squared costfunction instead of the global minimum (Lippmann,1987). Another difficulty with the NN trainingalgorithm is that in many cases the amount oftraining data required for convergence is large. Formany practical applications, NN models must learnthe mathematical function relating correct outputsto inputs with the desired degree of accuracy from alimited number of available training data. In order

ARTICLE IN PRESSR.S. Gutierrez et al. / Int. J. Production Economics 111 (2008) 409–420 411

to be competitive with alternative methods, NNmodels must achieve at least the same degree ofaccuracy as the other methods do.

Hill et al. (1994) noted both advocacy for as wellas concern over the use of NN models in place ofstatistical forecasting techniques. They reportedthat earlier empirical studies find NNs comparablewith statistical forecasting methods. Furthermore,Hill et al. (1996) compared forecasts produced byNNs against forecasts generated using six time-series methods applied to 1001 actual time series in awell known ‘M-competition’ (Makridakis et al.,1982). The Hill et al. (1996) study found meanabsolute percentage errors (MAPEs) of NN fore-casts to be significantly better than the MAPEs ofthe traditional statistical forecasts across monthlyand quarterly demand data.

Little work has been done on the application ofNN modeling in lumpy demand forecasting. Carmoand Rodrigues (2004) applied NN modeling on 10‘‘irregularly spaced’’ time series. Specifically then,our first research objective is to assess whether theNN-based approach is a superior alternative totraditional approaches for modeling and forecastinglumpy demand. Superior forecasting alone hasobvious practical benefits. If one can show thatthe NN-based technique would generally performbetter in forecasting lumpy demand, then thatwould be a significant contribution to research. Inaddition, the contribution to practice comes fromincreased confidence in forecasts that lead to betterplanning and policies for business when dealingwith the uncertainty associated with lumpy demand.The second objective of this study is to gain someinsight from the analysis of when NN-based modelsmay be superior (or inferior) to the traditional time-series models. This research intends to achieve thetwo objectives through detailed empirical analysisof forecasts from alternative methods of real-worldlumpy demand data series. The current studyapplies four methods: (i) single exponential smooth-ing, (ii) Croston’s method, (iii) SBA, and (iv) NNmodeling. Regattieri et al. (2005) analyzed 20methods to deal with lumpy demand. The threetraditional time-series methods we used for compar-ison were listed in the top 5 list as per theirsimulation results. We did not use seasonal regres-sion models because we did not find any evidence ofseasonality in our data series. We apply thesemethods to actual demand data from an electronicsdistributor operating in Monterrey, Mexico, invol-ving 24 stock keeping units, each with 967 daily

demand observations exhibiting a wide range ofdemand values and intervals between demandoccurrences.

The rest of the paper is organized as follows. InSection 2, we provide a brief discussion of forecast-ing methods, the lumpy demand data, and the errorstatistics used in forecast comparisons. Section 3presents and analyzes the results of our study.Section 4 provides conclusions as well as recom-mendations for future study. Appendix A provides abrief discussion of NN methods.

2. Forecasting methods and data

2.1. Single exponential smoothing and the

Syntetos– Boylan approximation

Ghobbar and Friend (2003) referred to the singleexponential smoothing method as the ‘‘standard’’method, in practice, for forecasting intermittentdemand. Evaluation of single exponential smooth-ing as a forecasting technique is fairly common inthe literature on lumpy demand forecasting (amongmany others, Ghobbar and Friend, 2003; Leven andSegerstedt, 2004; Willemain et al., 2004; Syntetosand Boylan, 2005; Regattieri et al., 2005).

Croston (1972) observed that when demand isintermittent, single exponential smoothing generallyleads to inappropriate stock levels in inventorycontrol systems. He noted a bias associated withplacing the most weight on the most recent demanddate. To address this bias, Croston proposed a newmethod of forecasting intermittent demand, invol-ving both the average size of nonzero demandoccurrences and the average interval betweenconsecutive occurrences.

An error in Croston’s mathematical derivation ofexpected demand size was reported by Syntetos andBoylan (2001), who proposed a revision to approxi-mately correct Croston’s demand estimates. Subse-quently, Syntetos and Boylan (2005) and Syntetos etal. (2005) introduced a correction factor of (1�a/2),where a is the smoothing constant in use forupdating the inter-demand intervals, to arrive at amodified demand forecast. We refer to this newmodification as the SBA.

2.2. Exponential smoothing constant

The use of low exponential smoothing constantvalues in the range of 0.05–0.20 has been consideredrealistic and recommended in the literature on

ARTICLE IN PRESS

# of no-transaction

periods

Demand at time = t-1

Bias

HU 1

HU 2

HU 3

Output

Input Units Hidden Units Layer

Predicted value of demand transaction

at time t

Bias unit connections represent a constant term for each hidden unit and the output unit.

Layer

Fig. 1. Lumpy demand forecasting neural network diagram.

R.S. Gutierrez et al. / Int. J. Production Economics 111 (2008) 409–420412

lumpy demand (Croston, 1972; Johnston andBoylan, 1996). In our study, we used the same foursmoothing constant values of 0.05, 0.10, 0.15, and0.20 used by Syntetos and Boylan (2005) in thesingle exponential smoothing, Croston’s, and SBAmethods.

2.3. Neural network models

In Section 1 we briefly discussed the strengths ofNN models in approximating a functional relation-ship between dependent and independent variables.We note that NN models are the ideal choice indealing with disturbances in a diffusion process dueto external factors. There are many methods in theNN literature that can be used for flexible nonlinearmodeling. We adopted the most widely usedmethod, a multi-layered perceptron (MLP) trainedby a back-propagation (BP) algorithm (Rumelhartet al., 1988). Appendix A summarizes the MLPnetwork and BP training algorithm. We used threelayers of MLP: one input layer for input variables,one hidden unit layer, and one output layer. TheMLP had three nodes in the hidden layer (n ¼ 3 inEq. (A.1)). One output unit was used in the outputlayer. All the input nodes were fully connected to allthe hidden nodes. The hidden nodes were in turnconnected to the output node. The input nodesrepresent two variables: (1) the demand at the end of

the immediately preceding period and (2) the number

of periods separating the last two nonzero demand

transactions as of the end of the immediately

preceding period. The output node represents thepredicted value of the demand transaction for thecurrent period.

The MLP network learning process can be rapidif the learning rate is increased. However, thelearning process can jump back and forth in thesolution space if the learning rate is too high,leading to a phenomenon called oscillation. One wayto increase the learning rate without leading tooscillation is to include a momentum factor in theweight change formula. It is a constant thatdetermines the effect of past weight changes onthe current direction of movement in the weightspace, effectively filtering out high-frequency varia-tions of the error surface in that space. We used alearning rate value of 0.1 and a momentum factorvalue of 0.9 in line with past research (Rumelhartet al., 1988).

We followed the guidelines proposed by a recentstudy on architecture selection of MLP (Xiang

et al., 2005). The study suggests that one should firsttry with a three-layered MLP. One should also startwith the minimum number of hidden units requiredto approximate the target function for a minimalarchitecture. Functions learned by a minimal netover calibration sample points work well on newsamples. We used three layers of network; one inputlayer for input variables, one hidden unit layer, andone output layer of one unit. We chose three hiddenunits (n ¼ 3) as it is a reasonably low number toapproximate any complex function. The NNarchitecture is shown in Fig. 1.

2.4. Demand lumpiness and data set

Each of the 24 time series in our set of industrialdata consists of 967 daily demand observations forstock keeping units carried by an electronicsdistributor operating in Monterrey, Mexico. Weapply the criteria for classifying demand patternsinto four categories (Syntetos, 2001; Syntetos et al.,2005) as outlined by Ghobbar and Friend (2002,2003). In particular, lumpy demand is ‘‘defined as ademand with great differences between each peri-od’s requirements and with a great number ofperiods with zero requests’’ (Ghobbar and Friend,2003). The demand pattern is classified into thelumpy demand category when the average inter-demand interval (ADI) is greater than 1.32 and thesquared coefficient of variation (CV2) is greater than0.49. Table 1 shows that each of the 24 demand

ARTICLE IN PRESS

Table 1

Mean demand, standard deviation, CV2 and ADI

Series 1 2 3 4 5 6 7 8

Mean 251.02 262.08 271.60 274.43 278.01 324.84 237.09 274.31

S.D. 1078.80 985.19 1305.36 1221.31 1191.04 1387.20 743.88 1134.55

CV2 18.47 14.13 23.10 19.81 18.35 18.24 9.84 17.11

ADI 4.51 4.25 4.78 3.97 3.77 3.73 5.21 4.73

Series 9 10 11 12 13 14 15 16

Mean 253.77 346.04 303.11 321.61 299.15 296.07 288.78 305.81

S.D 959.19 1710.19 1229.80 1149.70 1425.87 1321.28 1090.65 1257.98

CV2 14.29 24.43 16.46 12.78 22.72 19.92 14.26 16.92

ADI 4.03 4.83 5.14 4.83 5.44 4.68 4.39 4.41

Series 17 18 19 20 21 22 23 24

Mean 228.74 352.32 322.98 355.48 328.70 394.84 314.33 410.00

S.D 889.07 1480.69 1054.75 1609.05 1390.67 2675.95 1438.57 1929.56

CV2 15.11 17.66 10.66 20.49 17.90 45.93 20.95 22.15

ADI 4.30 4.09 3.90 4.86 4.09 4.37 3.38 3.39

R.S. Gutierrez et al. / Int. J. Production Economics 111 (2008) 409–420 413

series under consideration satisfies these criteriaspecified for demand lumpiness.

We used the first 624 observations of each seriesto calibrate all the models; we shall henceforth referto these first 624 observations as the training

sample. We then tested, at each of the four valuesof a (0.05, 0.10, 0.15 and 0.20), the four forecastingmodels under consideration on the final 343observations, which we shall call the test samplefrom hereon. Test samples of some series startedwith no demand. We started analyzing the testsamples only from the first nonzero demand intervalperiod. As a result there are slightly fewer than 343test observations in a few series.

2.5. Selection of error statistics

We initially ranked the methods using the overallMAPE as forecasting performance criterion basedon the test results (as in Hill et al., 1996). Given thatlumpy demand involves periods with zero demands,the traditional definition of MAPE, which involvesterms of the form jEtj=Dt (where Dt and Et,respectively, represent actual demand and forecasterror in period t), fails. We instead appliedthe following alternative specification of MAPE(Gilliland, 2002):

MAPE ¼

Pn

t¼1

jEtj

Pn

t¼1

Dt

. (1)

We, however, considered alternative error statis-tics other than MAPE to make the comparisonsmore robust. A previous study (Armstrong andCollopy, 1992) evaluated measures for makingcomparisons of errors across 90 annual and 101quarterly time-series data. The study concluded thatMAPE should not be the choice if large errors areexpected because MAPE is biased in favor of lowforecasts. The study also concluded that root-mean-square error (RMSE) is not reliable, even thoughmost practitioners prefer RMSE to all other errormeasures since it describes the magnitude of theerrors in terms useful to decision makers (Carboneand Armstrong, 1982). The study recommended themedian absolute percent error (MdAPE) statisticfor selecting the most accurate methods when manytime-series data are available. However, computingMdAPE for intermittent demand is difficult becauseof zero demand over many time periods. Wedecided to use other accuracy measures in additionto MAPE to compare forecasting techniques.

Syntetos and Boylan (2005) used two accuracymeasures relative to other methods. The firstmeasure, relative geometric root-mean-square error(RGRMSE), is given by

RGRMSE ¼

Qnt¼1ðAa;t � F a;tÞ

2� �1=2n

Qnt¼1ðAb;t � F b;tÞ

2� �1=2n

, (2)

where the symbols Ak,t and Fk,t denote actualdemand and forecast demand, respectively, underforecasting method k at the end of time period t.

ARTICLE IN PRESS

Table 2

Average nonzero demand transaction sizes in training and test

samples

Series Average of nonzero demand sizes

All (n ¼ 967) Training (n ¼ 624) Test (np343)

1 826 732 937

2 799 497 1132

3 831 699 979

4 804 635 956

5 779 480 1118

6 897 530 1225

7 732 566 888

8 824 548 1064

9 737 469 970

10 1023 662 1311

11 867 709 993

12 915 617 1147

13 890 786 979

14 868 626 1064

15 821 535 1082

16 875 542 1160

17 676 387 913

18 971 877 1052

19 849 662 1018

20 1023 826 1209

21 919 905 934

22 1197 1423 936

23 881 963 785

24 1257 1467 991


Fildes (1992) argued that RGRMSE has a desirablestatistical property. According to him the error in aparticular time period consists of two parts: one dueto the method and the other due to the time periodonly. RGRMSE expressed in a relative way isindependent of the error due to the time period,thereby focusing only on the relative merits of themethods. We used RGRMSE also as an errormeasure in this research.

The second error measure, the percentage best(PB), is the percentage of time periods one methodperforms better than the other methods underconsideration. PB is particularly meaningful be-cause all series and all data periods in each seriesgenerate results (Syntetos and Boylan, 2005). Weused absolute error as the criterion to assessalternative methods’ performance. The mathemati-cal expression for PB for method m is

PBm ¼

Pn

t¼1

Bm;t

n� 100, (3)

where for time period t, Bm,t ¼ 1 if jAm;t � Fm;tj isthe minimum of jAk;t � Fk;tj for all methods k underconsideration, and Bm,t ¼ 0 otherwise.

We report on three error measures, MAPE,RGRMSE and PB, to compare the alternativemethods under consideration.

The statistical results reported in this paper wereobtained using SAS/STAT software version 9.1(SAS Institute, Inc., 1999). The results reported inthe next section are based on testing the modelsexactly once on the test sample. This single testingprocess prevents bringing in information from thetest data to improve NN model performance duringcalibration.

3. Results and analysis

3.1. Average nonzero demand size

Table 2 depicts one notable characteristic of the24 time-series data: the average of the nonzero

demand sizes. The average nonzero demand size issmaller for the training sample as compared to thetest sample for each of the first 21 time series. Thischaracteristic, however, behaves in a slightly differ-ent way in series 22, 23, and 24, where the averagenonzero demand size is greater in the trainingsamples than in the test samples. This may be animportant observation because all NN modelstypically look at and adjust to the demand data in

the training sample. The current study also intendsto observe the performance of the models based onaverage nonzero demand transactions betweencalibration and test samples.

3.2. Relative performance of forecasting models

Two factors were considered to compare perfor-mance of the four methods on test data: forecast

error and forecast variances. We did not tweak theparameter values of the NN models, which usedonly one type of network with the same step sizeand momentum values. Therefore, the NN modelfor each time series is the same across all four valuesof a applied to the three traditional time-seriesmethods.

The SBA method clearly outperformed singleexponential smoothing and Croston’s method(based on overall MAPEs through entire testsamples) across all four a values that were used.This constitutes an independent validation of asimilar finding by Syntetos and Boylan (2005) of thesuperiority of SBA over the other two traditional


time-series methods, though in our case involvingtime series with far more than the 24 monthlydemand observations in the time-series data theytested.

MAPEs for Croston’s method were smaller thanthose for single exponential smoothing in all 24 timeseries at a ¼ 0.05, and in all but one of them (series24) at a ¼ 0.10. Single exponential smoothingactually outperformed Croston’s method, evenwithout the correction suggested by Syntetos andBoylan (2005), at higher levels of a—i.e., in eight ofthe 24 time series at a ¼ 0.15 and in more than half(14 of 24) of them at a ¼ 0.20.

The NN models generally outperformed the threetraditional time-series methods except for demandseries 24 when a ¼ 0.10, 0.15, or 0.20, and with theexception of series 22, 23 and 24 when a ¼ 0.05.Figs. 2 and 3 show the plots of overall MAPEs for

90

1 7 10 11 1

MA

PE

(%

)

Croston SBA N

MAPEs w

140

150

160

130

120

110

100

2 3 4 5 6 8 9

S

Fig. 2. Comparison of MA

90

100

110

120

130

140

150

1 2 3 4 5 6 7 8 9 10 11 1

MA

PE

(%

)

Croston SBA N

MAPEs w

160

S

Fig. 3. Comparison of MA

each of the four forecasting methods across the 24time series when a ¼ 0.05 and a ¼ 0.20, respectively.The plots suggest the general superiority of NNmodel performance, except for the last three timeseries (22–24). In particular, NN model perfor-mance is slightly inferior to all three traditionaltime-series methods in series 24 at all four values ofa, and more so as a decreases. For series 22 and 23,the NN model MAPE is inferior to that of SBAwhen a ¼ 0.05. These observations appear to beconsistent with the earlier discussion in Section 3.1(based on Table 1) concerning the large drops inaverage nonzero demand sizes from training sampleto test sample for series 22–24 that model perfor-mance may vary depending on the training and testsample distributions.

Given the superiority of SBA over Croston’smethod, we dropped the latter method from

2 13 14 15 16 17 18 19 20 21 22 23 24

eural Network Exp Smoothing

hen α = 0.05

eries

PEs when a ¼ 0.05.

2 13 14 15 16 17 18 19 20 21 22 23 24

eural Network Exp Smoothing

hen α α = 0.20

eries

PEs when a ¼ 0.20.

ARTICLE IN PRESS

Table 4

Results of tests of paired differences between absolute errors for

NN vs. SBA (a ¼ 0.05)

Series t-test of paired differences between absolute errors:

NN vs. SBA

d.f. t value Pr (t4| t value |)

1 341 3.34 0.0009

2 341 �46.64 o.0001

3 342 �5.89 o.0001

4 341 7.61 o.0001

5 340 �20.66 o.0001

6 337 �29.91 o.0001

7 338 �90.88 o.0001

8 342 �47.47 o.0001

9 342 �8.24 o.0001

10 340 �37.06 o.0001

11 342 �17.75 o.0001

12 339 �15.93 o.0001

13 340 �21.00 o.0001

14 342 �61.27 o.0001

15 339 �30.84 o.0001

16 335 �17.48 o.0001

17 342 �60.36 o.0001

18 342 �17.99 o.0001

19 339 �2.07 0.039


subsequent analysis. We also consider a ¼ 0.05 forsubsequent analysis since it is the best among all thefour values for both single exponential smoothingand SBA. Table 3 reports overall MAPEs for themethods under consideration when a ¼ 0.05. NNmodel performance is superior among all methodsoverall (a simple average of 24 MAPEs: 111.42%for NN vs. 126.98% for SBA and 132.75% forsingle exponential smoothing), except for series22–24 where SBA yields better MAPEs. SBA clearlyshows the best performance among the threetraditional time-series forecasting methods. Onemust note that all the MAPEs are higher than 100since we considered all periods including the periodswith no transactions.

Table 4 shows results of t-tests of the paireddifferences between absolute errors for NN vs. SBAmethods. The difference in performance is signifi-cant for all the 24 time series, with paired differencesbeing in favor of NN model performance (indicatedby negative t-values) in 20 of the 24 time series. Thedegrees of freedom (d.f.) varied slightly across the24 series because we lost a few data points due to

Table 3

Overall MAPEs of different forecasting methods with a ¼ 0.05

Series Exponential smoothing SBA NN

1 143.78 138.36 128.20

2 144.57 136.85 107.70

3 143.32 134.13 112.62

4 132.14 125.43 110.58

5 133.59 124.90 106.01

6 134.02 127.66 103.71

7 132.95 127.16 108.85

8 132.75 126.82 104.23

9 133.45 128.29 104.53

10 135.01 129.51 102.42

11 126.77 121.84 105.09

12 126.10 121.06 104.06

13 127.88 121.71 111.70

14 122.64 114.79 103.89

15 130.70 123.20 103.94

16 126.68 120.66 102.27

17 122.36 116.53 102.20

18 129.76 124.74 107.76

19 125.86 121.44 106.63

20 136.07 130.25 110.22

21 131.50 127.25 113.73

22 135.05 130.11 125.62

23 135.77 133.53 134.60

24 143.38 141.37 153.49

Overall 132.75 126.98 111.42

20 340 �43.24 o.0001

21 342 �2.71 0.0071

22 340 18.66 o.0001

23 342 �2.08 0.0384

24 337 34.40 o.0001

zero demand intervals at the beginning of the testsamples.

Table 5 reports on model performance based onPB statistics. NN models had the highest PB valuesfor series 1 through 22. In series 23, it is a closecomparison between exponential smoothing andNN, with exponential smoothing being slightlysuperior (42.86 vs. 38.19). For series 24, exponentialsmoothing performed the best most of the time.These PB statistics further establish the overallsuperiority of NN models (averaging 62.37% vs.23.14% for single exponential smoothing and14.52% for SBA).

Table 6 reports on performance of NN withrespect to single exponential smoothing and SBAmethods, respectively, based on Eq. (2) with NNbeing method a and exponential smoothing or SBAbeing method b. If the value for the NN method isless than 1 then it performed better than exponentialsmoothing or SBA, as the case may be, under theRGRMSE criterion. We see that, in terms ofRGRMSE, NN models performed better than bothexponential smoothing and SBA in series 1 through

ARTICLE IN PRESS

Table 5

Percentage best statistics of the forecasting methods (with

a ¼ 0.05)

Series Exponential smoothing SBA NN

1 25.66 18.08 56.27

2 15.74 8.45 75.80

3 20.12 12.24 67.64

4 17.78 16.03 66.18

5 19.24 17.78 62.97

6 13.99 14.58 71.43

7 14.29 14.58 71.14

8 18.08 14.58 67.35

9 19.53 12.24 68.22

10 17.78 14.29 67.93

11 27.41 15.16 57.43

12 19.83 11.37 68.80

13 34.99 12.83 52.19

14 18.37 19.83 61.81

15 14.58 13.70 71.72

16 22.16 11.37 66.47

17 18.66 15.45 65.89

18 20.99 15.45 63.56

19 20.70 17.78 61.52

20 21.28 12.83 65.89

21 31.49 11.37 57.14

22 25.66 11.66 62.68

23 42.86 18.95 38.19

24 54.23 17.78 28.57

Overall 23.14 14.52 62.37

Table 6

RGRMSE for NN method with respect to exponential smoothing

and with respect to SBA (with a ¼ 0.05)

Series NN with respect to

exponential smoothing

NN with

respect to SBA

1 0.76 0.77

2 0.41 0.45

3 0.54 0.59

4 0.44 0.47

5 0.47 0.53

6 0.42 0.45

7 0.59 0.63

8 0.44 0.46

9 0.42 0.42

10 0.38 0.40

11 0.58 0.60

12 0.45 0.45

13 0.67 0.68

14 0.60 0.65

15 0.47 0.50

16 0.55 0.55

17 0.41 0.44

18 0.57 0.59

19 0.58 0.59

20 0.55 0.54

21 0.72 0.73

22 0.72 0.72

23 1.07 1.02

24 1.41 1.23

Overall 0.59 0.60

R.S. Gutierrez et al. / Int. J. Production Economics 111 (2008) 409–420 417

22. For series 23, both exponential smoothing andSBA slightly outperformed the NN method, withRGRMSE ¼ 1.07 and 1.02, respectively. Forseries 24, NN performed worse than both exponen-tial smoothing and SBA, with RGRMSE ¼ 1.41and 1.23, respectively. Overall, NN modeling issuperior to both exponential smoothing andSBA, with RGRMSE averaging 0.59 and 0.60,respectively.

In summary, we found NN models to be superiorto both single exponential smoothing and SBAoverall, based on all three error measures (MAPE,PB, and RGRMSE).

3.3. Forecast error variances

Another important aspect of the forecastingmodel is the variance of forecast errors. Althoughemphasis is usually placed on forecasting accuracy,a reduction in the likelihood of extreme errors isuseful to forecast managers because they can trustthe forecasts more (Hill et al., 1996). We conductedtwo tests for forecast error variances at a ¼ 0.05 for

the two best methods (NN and SBA): equality ofvariances and the Cochran and Cox test (1950) atpo0.05 (SAS/STAT, 1999). The superior perfor-mance of the NN model was found statisticallysignificant for most of the 24 time series. However,the superior performance of SBA method isstatistically significant for series 22 and 24.

3.4. Analysis

Using four different values of a, we looked intothe best performance of the three traditional time-series methods in the test samples for generalizationpower and compared against the same characteristicNN models. Our intention in this research was toestablish a higher standard for the NN models bynot tweaking the parameter values across the series.By keeping the network topology and the step sizevalues the same across all 24 time series, weestablished that NN models generalize. On thecontrary, the traditional time-series forecastingmethods benefited from trying out four differentvalues for a. The average forecasts for the SBA


method became lower as a values were reduced from0.20 to 0.05. In particular, for series 22, 23, and 24,respectively, the average forecast demand per periodwere 524, 625, and 623 at a ¼ 0.20 vs. 342, 349, and369 at a ¼ 0.05. This improved the MAPE perfor-mance in series 22–24 of the traditional methodsrelative to NN models, because the average forecasttransaction sizes happened to decline in these seriesin the test samples. We also emphasize that NNmodel performance may also improve by adjustingthe parameter values just like the traditionalmethods.

Forecast error variances are also proven betterfor NN models through statistical tests, indicatingthat the superior performance achieved by these NNmodels were not simply by chance. Since thenumber of sample points are sufficiently large,assumptions of normality to conduct these testswas not critically important.

One other critical issue pertaining to NN modelsis overfitting. Weights and biases in NN models arefit parameters. Arbitrarily increasing the size of anNN model would lead to overfitting. Overfittingcauses poor generalization because NN modelsessentially form many local meaningless decisionboundaries (nonlinear) around each sample point inthe training data. These decision boundaries maynot be effective in a new sample with a differentdistribution of observations. To avoid overfitting,we started with a very simple network (n ¼ 3). As aresult, there were few fit parameters and thenetwork did not require any pruning to improvegeneralization.

A major consideration for many practitioners isthe implementation of a forecasting system. By notfitting an NN model individually for each dataseries, implementation and automation may befacilitated.

4. Conclusion and future research directions

We have accomplished the two objectives, asspecified earlier, through detailed empirical analysisof forecasts from alternative methods as applied to24 real world lumpy demand data series. Based onour results we conclude that NN models aregenerally superior to the traditional time-seriesmodels in forecasting lumpy demand.

Little work has been done on the application ofNN modeling in lumpy demand forecasting. We areaware of only one earlier study (Carmo andRodrigues, 2004) applying NN modeling on 10

‘‘irregularly spaced’’ time series. Our study com-pares the performance of NN forecasts to thosefrom three traditional time-series methods whichhave been considered fairly extensively in theliterature. We were also able to independentlyvalidate earlier findings (Syntetos and Boylan,2005) of the superiority of the SBA over bothCroston’s method and single exponential smoothingin lumpy demand forecasting involving much largersamples. More importantly, we have found that theNN models, even under a relatively simple networktopology, generally perform better than the threetraditional methods, using three different forecastperformance measures. However, in cases where asignificant decrease in the average of the nonzerodemand sizes arises between the training sample andthe test sample, we observed that forecasts fromtraditional time-series methods tend to improve inperformance relative to NN forecasts with lowervalues of the exponential smoothing constant inthese traditional methods.

We have thus far used a fairly simple NN networktopology. In future work, we shall endeavor toincrease the number of parameters in the NNmodels, while taking every precaution so that modelover-fitting does not occur.

Perhaps more significantly, considering the ob-served overall superiority of NN models, we intendin future research to look more closely into factorsthat lead to a diminution in performance of thesemodels relative to traditional time-series forecastingtechniques. The objective will be to identify condi-tions under which either NN or traditional modelswould be expected to perform better in forecastinglumpy demand.

Another interesting research issue is the possibi-lity of combining traditional models with NNmodels to build hybrid models. Combined forecastsare useful when two or more different methodsforecast close to actual in different directions. Rulesto combine NN models with traditional methodscan be generated for a typical lumpy demand seriesin future studies.

Furthermore, NN models may be extended tocausal forecasting models as opposed to theirtraditional time-series counterparts. Knowledgeexperts can identify relevant attributes that mayhave causal relationships with lumpiness in thedemand. Even though causal models may bedifficult to calibrate, these models can be built inthe future with the combined power of expertknowledge and the flexibility of NN models.


Finally, we point out that improved forecastingaccuracy does not necessarily translate into betterstock control performance. Syntetos and Boylan(2006), for instance, have empirically assessed stockcontrol performance associated with the use of fourtraditional time-series forecasting methods (expo-nential smoothing, simple moving average of length13, Croston’s method, and SBA). Using simulation,they established the overall superiority of SBA instock control performance with respect to threepossible managerial considerations (a specifiedcustomer service level and two cost policies). Weintend to likewise evaluate and compare stockcontrol performance associated with NN modelforecasts in relation to the traditional time-seriesmethods.

Appendix A. Neural network model

Biologically motivated artificial neural network(ANN or simply NN) were developed to represent alearning system to solve problems in all areas ofhuman interest. Human brains are composed of anumber of interconnected simple processing ele-ments called neurons or nodes. Rosenblatt (1962)first formally introduced a convergence procedure

for perceptron, a brain-like single layer networkcomprising neurons receiving the input informationto learn a pattern or relationship between input andoutput. However, an additional layer of units(called a hidden layer) is required to mathematicallyapproximate any relationships between input andoutput. As a result, multi-layered perceptron (MLP)became a popular NN tool for computation. TheMLP network has a layer of input nodes, one ormore layers of hidden nodes, and a layer of outputnodes. The nodes of the first hidden layer areconnected with the input layer nodes. The nodes ofthe output layer are connected with the last hiddenlayer nodes. The values assigned to connectionsbetween nodes are called connection strengths orweights. The output of each node in an MLPnetwork, sometimes called an activation value, is afunction of the inputs from the connecting nodes ofthe previous layer to itself and the correspondingweights. This function is called an activation

function. The outputs of the input layer nodes havethe values of the input variables. MLP learns themathematical relationship through the values of theweights. The equivalent nonlinear regressionmodel form of one hidden feed forward (informa-tion flow from input to output) neural network is as

follows:

log ytþh ¼ bf;h þXn

j¼1

bj;hf ðI t; wh;jÞ, (A.1)

where I t is the input vector of current time periodvalue and lagged values of log ytþh. The parameter h

is the forecast horizon. wh;j is the network weightvector corresponding to the forecast horizon h andthe jth hidden node. This research used the logisticform of the transfer function f:

f ðI t; wh;jÞ ¼ ð1þ e�zÞ�1, (A.2)

where

z ¼ wh;j;f þ wh;j tþXl

i¼1

ðwh;j;i log ytþh�iÞ. (A.3)

MLP needs a training algorithm to learn thenonlinear mathematical relationship (A.1) betweeninput and output attributes of a problem. Backpropagation (BP), the most widely used trainingalgorithm for MLP, is a supervised learningtechnique where the values of the independentvariables along with the values of the dependentvariables are fed to an MLP network input layer.The task of the training algorithm is to extract thefunctional relationship between the input variablesand the dependent variable, called ‘‘target’’,through proper assignments of weights. The outputlayer units represent the values of the targetvariables. An MLP uses the information availablefrom the independent variables for each observationin the training set to compute an output value. Theoutput value is then compared with the target valueto generate error signals for all units. There will beno error signal if there is no difference between thetwo values. Otherwise, the training involves abackward pass (hence, the name BP) through thenetwork during which error signals are sent to allunits in the network. Weight changes in networkconnections are proportional to the error signals. Inthis procedure, the constant of proportionality iscalled the learning rate. The larger this constant isthe larger are the changes in the weights in eachstep. The learning is complete when an errorfunction based on the output node errors isminimized. Nonlinearity comes through neuralactivation functions (A.2).


References

Armstrong, J.S., Collopy, F., 1992. Error measures for general-

izing about forecasting methods: Empirical comparisons.

International Journal of Forecasting 8, 69–80.

Bartezzaghi, E., Verganti, R., Zotteri, G., 1999. A simulation

framework for forecasting uncertain lumpy demand. Interna-

tional Journal of Production Economics 59, 499–510.

Carbone, R., Armstrong, J.S., 1982. Evaluation of extrapolative

forecasting methods: Results of a survey of academicians and

practitioners. Journal of Forecasting 1, 215–217.

Carmo, J.L., Rodrigues, A.J., 2004. Adaptive forecasting of

irregular demand processes. Engineering Applications of

Artificial Intelligence 17, 137–143.

Cochran, W.G., Cox, G.M., 1950. Experimental Designs. Wiley,

New York.

Croston, J.D., 1972. Forecasting and stock control for inter-

mittent demands. Operational Research Quarterly 23,

289–304.

Elman, J.L., Zipser, D., 1987. Learning the hidden structure of

speech. Institute of Cognitive Science Report 8701, UC San

Diego.

Fildes, R., 1992. The evaluation of extrapolative forecasting

methods. International Journal of Forecasting 8,

88–98.

Ghobbar, A.A., Friend, C.H., 2002. Sources of intermittent

demand for aircraft spare parts within airline operations.

Journal of Air Transport Management 8, 221–231.

Ghobbar, A.A., Friend, C.H., 2003. Evaluation of forecasting

methods for intermittent parts demand in the field of aviation:

a predictive model. Computers & Operations Research 30,

2097–2114.

Gilliland, M., 2002. Is forecasting a waste of time? Supply Chain

Management Review 6 (4), 16–23.

Hill, T., Marquez, L., O’Connor, M., Remus, W., 1994. Artificial

neural network models for forecasting and decision making.

International Journal of Forecasting 10, 5–15.

Hill, T., O’Connor, M., Remus, W., 1996. Neural network

models for time series forecasts. Management Science 42,

1082–1092.

Iman, R., Conovar, W.J., 1983. Modern Business Statistics.

Wiley Press, New York.

Johnston, F.R., Boylan, J.E., 1996. Forecasting for items with

intermittent demand. Journal of the Operational Research

Society 47, 113–121.

Kalchschmidt, M., Zotteri, G., Verganti, R., 2003. Inventory

management in a multi-echelon spare parts supply chain.

International Journal of Production Economics 81–82,

397–413.

Leven, E., Segerstedt, A., 2004. Inventory control with a modified

Croston procedure and Erlang distribution. International

Journal of Production Economics 90, 361–367.

Lippmann, R.P., 1987. An introduction to computing with neural

nets. IEEE ASSP Magazine April, 4–17.

Makridakis, S., Wheelright, S.C., 1987. The Handbook of

Forecasting: A Manager’s Guide. Wiley Press, New York.

Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon,

M., Lewandowski, R., Newton, J., Parzen, E., Winkler, R.,

1982. The accuracy of extrapolation (time series) methods:

Results of a forecasting competition. Journal of Forecasting

1, 111–153.

Regattieri, A., Gamberi, M., Gamberini, R., Manzini, R., 2005.

Managing lumpy demand for aircraft spare parts. Journal of

Air Transportation Management 11, 426–431.

Rosenblatt, F., 1962. Principles of Neurodynamics. Spartan, New

York.

Roy, A., Mukhopadhyay, S., 1997. Interactive generation of

higher-order nets in polynomial time using linear program-

ming. IEEE Transactions on Neural Networks 8,

402–412.

Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1988. Learning

internal representations by error propagation. In: Rumelhart,

D.E., McClelland, J.L., PDP Research Group (Eds.), Parallel

Distributed Processing Explorations in the Microstructure of

Cognition. MIT Press, Cambridge, MA, pp. 328–330.

SAS Institute, Inc., 1999. SAS/STAT User’s Guide, Version 8.

SAS Institute, Inc., Cary, NC, pp. 3578–3579.

Segerstedt, A., 2000. Forecasting slow-moving items and

ordinary items—a modification of Croston’s idea. Working

paper, Industrial Logistics, Lulea University of Technology,

Lulea, Sweden.

Sejnowski, T.J., Rosenberg, C.R., 1987. Parallel networks that

learn to pronounce English text. Complex Systems 1,

145–168.

Syntetos, A.A., 2001. Forecasting of intermittent demand,

unpublished. Ph.D. Thesis, Buckinghamshire Business

School, Brunel University, UK

Syntetos, A.A., Boylan, J.E., 2001. On the bias of intermittent

demand estimates. International Journal of Production

Economics 71, 457–466.

Syntetos, A.A., Boylan, J.E., 2005. The accuracy of intermittent

demand estimates. International Journal of Forecasting 21,

303–314.

Syntetos, A.A., Boylan, J.E., 2006. On the stock control

performance of intermittent demand estimators. International

Journal of Production Economics 103, 36–47.

Syntetos, A.A., Boylan, J.E., Croston, J.D., 2005. On the

categorization of demand patterns. Journal of the Opera-

tional Research Society 56, 495–503.

Weiland, A., Leighton, R., 1988. Geometric analysis of neural

network capabilities. Technical Report, Arpanet III, pp.

385–392.

White, H., 1992. Learning and statistics. In: White, H. (Ed.),

Artificial Neural Networks: Approximation and Learning

Theory. Blackwell, Oxford, UK, p. 79.

Willemain, T.R., Smart, C.N., Schwarz, H.F., 2004. A new

approach to forecasting intermittent demand for service parts

inventories. International Journal of Forecasting 20,

375–387.

Xiang, C., Ding, S.Q., Lee, T.H., 2005. Geometrical interpreta-

tion and architecture selection of MLP. IEEE Transactions

on Neural Networks 16, 84–96.

lumpy demand forecasting using neural networks

Documents