modeling and optimizing a vendor managed replenishment system using machine learning and genetic...

European Journal of Operational Research 180 (2007) 174–193

www.elsevier.com/locate/ejor

Production, Manufacturing and Logistics

Modeling and optimizing a vendormanaged replenishment system using machine

learning and genetic algorithms

Hoi-Ming Chi a, Okan K. Ersoy a, Herbert Moskowitz b,*, Jim Ward b

a School of Electrical and Computer Engineering, Purdue University, Electrical Engineering Building,

465 Northwestern Avenue, West Lafayette, IN 47907-2035, USAb The Krannert School of Management, Purdue University, 403 West State Street, West Lafayette, IN 47907-2056, USA

Received 30 October 2004; accepted 1 March 2006Available online 13 June 2006

Abstract

Using a supply chain network, we demonstrate the feasibility, viability, and robustness of applying machine learningand genetic algorithms to respectively model, understand, and optimize such data intensive environments. Deploymentof these algorithms, which learn from and optimize data, can obviate the need to perform more complex, expensive,and time consuming design of experiments (DOE), which usually disrupt system operations. We apply and compare thebehavior and performance of the proposed machine learning algorithms to that obtained via DOE in a simulated VendorManaged Replenishment system, developed for an actual firm. The results show that the models resulting from the pro-posed algorithms had strong explanatory and predictive power, comparable to that of DOE. The optimal system settingsand profit were also similar to that obtained from DOE. The virtues of using machine learning and evolutionary algo-rithms to model and optimize data rich environments thus seem promising because they are automatic, involving littlehuman intervention and expertise. We believe and are exploring how they can be made adaptive to improve parameterestimates with increasing data, as well as seamlessly detecting system (and therefore model) changes, thus being capableof recursively updating and reoptimizing a modified or new model.� 2006 Published by Elsevier B.V.

Keywords: Supply chain management; Support vector machines; Genetic algorithms; Machine learning; Decision support systems

1. Introduction

The objective of this paper is to demonstrate thefeasibility and viability of applying computational

0377-2217/$ - see front matter � 2006 Published by Elsevier B.V.

doi:10.1016/j.ejor.2006.03.040

* Corresponding author.E-mail addresses: [email protected] (H.-M. Chi), ersoy@pur

due.edu (O.K. Ersoy), [email protected] (H. Mosko-witz), [email protected] (J. Ward).

intelligence (machine learning) and evolutionaryalgorithms to respectively model and optimize datarich environments, such as supply chains. We willattempt to show the proposed machine learningand evolutionary algorithms perform statisticallyas well as formal design of experiments, and thushave promise as a potential, unobtrusive, andattractive methodology for system modeling andoptimization (which can be automated) vis-a-vis

mailto:[email protected]





H.-M. Chi et al. / European Journal of Operational Research 180 (2007) 174–193 175

traditional DOE methods, which can be disruptiveto ongoing operations and requires expertise in sta-tistics. A major motivation for proposing such anapproach is a result of the ability of informationtechnology (IT) to gather and distribute process datain real time. IT has been a major driver of increasedproductivity in the manufacturing and service sec-tors, by bringing real-time information to the appro-priate decision makers/process owners to improveprocess behavior and performance. Supply chainsare data intensive, generating a plethora of opera-tional and financial data that can be and is beingused to better manage these systems. Such systemshave many inputs and outputs, which are often toocomplex to model and optimize analytically and/orexpediently. Hence, in some cases, simulationmodels are constructed and experiments are per-formed on the simulation model to evaluate andimplement system improvements. This can be morefeasible and efficient than performing experimentson the actual system. An alternative to either ofthe above is to (a) apply machine learning andevolutionary algorithms to model and optimize theactual system, and (b) use these algorithms dynami-cally to make them adjust and adapt contemporane-ously to additional data and system changes. Simplyput, let humans do what they do best. . . perform thecreative part of decision making. . . define andmeasure (e.g., define inputs, outputs and associatedmetrics); let machines do what they do best. . .analyze, improve, and control (e.g., model,optimize).

Machine learning and optimization algorithmshave a long and rich history of applications in man-ufacturing and more recently in supply chain man-agement. See, for example, Yih and Nof (1991),Liang et al. (1992), Chiu and Yih (1995), Hamam-oto et al. (1999), and Sun et al. (2005) for applica-tions in manufacturing. For supply chainapplications see, for example, Lin and Pai (2000),Kimbrough et al. (2002), Naso et al. (2004), Danieland Rajendran (2005), Emerson and Piramuthu(2004), and Lu et al. (2005). However, our approachdiffers from previous research in the following sense:we propose and define an automated intelligentmanagement system (AIMS) for analysis and deci-sion making, which mines real-time and/or histori-cal data by using computational intelligence andevolutionary algorithms to simultaneously model

and optimize manufacturing and non-manufactur-ing processes. The statistical and computationalintelligence algorithms employed involve the use of

a regression support vector machine (SVM) formodel construction and a genetic algorithm (GA)for model optimization. Similar approaches usingartificial neural networks have been applied to man-ufacturing systems (e.g., Wang and Yih, 1997; Minet al., 1998; Kim et al., 1998; Wu et al., in press).Although the use of neural networks provides goodpredictive power, it does not provide the explana-tory power of our approach, which is fundamentalfor system understanding.

Although not reported, we have applied AIMSsuccessfully to simulated as well as real productdevelopment and manufacturing environments inthe consumer electronics and pharmaceutical indus-try respectively. In this paper, we apply AIMS to anactual company’s simulated vendor managedreplenishment system, which is essentially a sophis-ticated supply chain. (For confidentiality purposes,we call this company AGA.) Our experimentalresults indicate the broad applicability, effectiveness,and efficiency of such methodologies to manage sys-tems in data-rich environments.

This paper is partitioned into five sections. Sec-tion 2 describes the AGA supply chain and its var-ious operational parameters, and its supply chainsimulation model. Section 3 explains in detail themachine learning, model reduction, and evolution-ary algorithms employed in AIMS. Section 4describes the experiments conducted to comparethe performance of the proposed machine learningalgorithms and DOE. Section 5 presents conclu-sions and future research.

2. Background

AGA’s vendor managed replenishment (VMR)system is a classic example of advanced supply chainmanagement practices. In a VMR system, the mostfundamental inventory management decisions, suchas order quantity and frequency, are the responsibil-ity of the vendor, not the retailer. AGA’s systemincluded regular in-store visits from an AGA ‘‘mer-chandiser’’, who was responsible for the appearanceand stocking of AGA’s products on the retail floor.AGA desired to explore variations of and alterna-tives to AGA current policies, and relate these pol-icies to the overall management of AGA’s supplychain in the face of demand potential, lead time,and sales conversion rate uncertainties. At the indi-vidual retail store level, particular emphasis was onshipping frequency, minimum shipment require-ments, order-up-to levels, and the coordination of

176 H.-M. Chi et al. / European Journal of Operational Research 180 (2007) 174–193

shipments with the in-store visits of the AGAmerchandiser.

To this end we constructed a simulation modelbased on AGA’s understanding of how the variouscomponents of their supply chain interacted. Thesimulation provides a mechanism for evaluating awide range of options relative to the six issues raisedabove. Our model focused on a product group at thestore level. Product groups to be modeled were cal-endars, planning products, and posters. A productgroup typically contained between 50 and 100 sku’s,and were usually single-season products with a rela-tively narrow selling season. Store types includedboth mass retail chains and super stores. Key ele-ments of each model included shipment policies,in-store visits, display aesthetics, product location(e.g., in-transit, backroom, on-display), informationcollection, lead times, and retailer-impact measures.The model is stochastic in nature reflecting not onlythe uncertainty inherent in customer demand, butalso the uncertainty in product movement, andhence lead-times. The model attempts to capturesome rather complex interactions. For example,daily demand not only has a random component,but is affected by a parameter that represents theattractiveness of the product display, which in turnis dependent on the time since the last merchandiservisit, which may also have a random component. Byvarying the parameters of the model (mean cus-tomer demand rates and uncertainty, lead times,shipping frequency, order-to-levels, etc.), the modelcan be used to represent the broad spectrum ofAGA’s retailers, and product families.

Some of the key parameters and interactionsbuilt into the model are described below. Moredetails can be found at http://web.ics.purdue.edu/~chih/Supply_Chain_Appendix.doc.

1. Shipping frequency: Shipping frequency impactssystem performance directly through its impacton product availability, inventory costs, obsoles-cence, and shipping costs (i.e. freight, handling,etc.), and indirectly through its impact on displayaesthetics (full enough?).

2. ‘‘Store Visiting’’ frequency: Merchandiser visitshelp insure that AGA stock moves from thebackroom to the display floor in a timely fashion.Visiting frequency may impact customer demandthrough display aesthetics, and together withshipment frequency, indirectly may impactoperating costs, product availability, andobsolescence.

3. Shipment trigger policy: AGA shipped weekly,but only if a minimum amount (in $) was neededto reach the order-up-to levels. This decisionimpacts shipping frequency, and inventory-related costs. It also may impact the timing ofshipping with merchandiser visits that in turncould impact several other system parameters.

4. Order-up-to inventory levels: Order-up-to levelsare closely linked to several AGA controlled fac-tors such as frequency of shipments, informationand delivery lead times, as well as retailer/prod-uct specific factors such as desired display sizeand stock, demand, lead time, product substitut-ability, sales rate uncertainty, and restocking(from backroom inventory) timeliness. Theseare particularly critical in determining theamount of returned merchandise at the end of aproduct’s selling season.

5. Information and delivery lead times between retail-

ers and warehouse: While reducing informationaland delivery lead times would require coopera-tion with retailers and shippers which might beimpractical at this time, it was nevertheless usefulto know what potential benefits might arise fromshorter lead times and faster response rates.Other system parameters (shipment frequency,order-up-to levels, etc.) would need to be recali-brated for different lead times to assess the poten-tial operational and financial impact on thesystem as a whole.

The simulation model was programmed inEXTEND, a PC-based, highly graphic, relativelyintuitive simulation language. Because of its user-friendliness, after development, the EXTENDmodel could be seamlessly ‘‘handed-off’’ to users(who need not be simulation programmers) for fur-ther analysis and development.

We next discuss the machine learning and geneticalgorithms used to model and optimize systembehavior and performance, respectively.

3. Structure of the proposed algorithms

The proposed algorithms are data mining toolscapable of modeling a supply chain and specifyingand adjusting the input settings and configurationsto optimize system performance. They have thepotential to complement, and under certain circum-stances perhaps even to replace, classical design ofexperiments (DOE) and regression methods usedon observational data which are often highly

http://web.ics.purdue.edu/~chih/Supply_Chain_Appendix.doc

http://web.ics.purdue.edu/~chih/Supply_Chain_Appendix.doc


correlated. Unlike DOE, the proposed algorithmsdo not necessarily require the use of experimentaldesigns, which can be disruptive and time consum-ing, and thus can potentially save significantamounts of money and time (although causality iscompromised).

The structure of the proposed algorithms con-sist of two major components: (a) training amachine learning algorithm (e.g., a 2nd-order (orhigher order) polynomial support vector machine(SVM) for regression) to model a supply chain(the explanatory modeling process), and (b) apply-ing a global optimization algorithm such as agenetic algorithm (GA) to obtain input settings thatyield optimum system performance (the optimiza-tion process).

3.1. Support vector machines (SVMs)

for regression

Support vector learning is a machine learningalgorithm recently developed by Vapnik (1995). Itsidea originates from the structural risk minimizationprinciple in statistical learning theory, which uses aninduction principle to minimize a bound on general-ization error. It involves implicitly mapping theinput vectors to the feature vectors through variouskernel functions so that computation in the highdimensional feature space is avoided, and then con-structing a linear function with the e-insensitive lossfunction in the feature space. Extensive research inthe past few years has shown that SVMs have manyattractive features and promising empirical perfor-mance over other conventional learning algorithmsin many real-world problems. For example, SVMsfor regression have been used successfully in medical

Fig. 1. The e-insensitive loss function used i

image segmentation (Wang et al., 2001), signal pro-cessing (Vapnik et al., 1996), time series prediction(Muller et al., 1997), and financial forecasting (Traf-alis and Ince, 2000).

Mathematically, a SVM is formulated as a con-vex quadratic programming problem (QPP) withinequality constraints. The primal QPP formulationis as follows (Smola and Scholkopf, 1998):

minimize1

2wk k2 þ C

XN

i¼1

fi þ f�i� �

subject to

yi � wT/ðxiÞ � b 6 eþ fi;

wT/ðxiÞ þ b� yi 6 eþ f�i ;

fi; f�i P 0;

8>><>>:

ð1Þ

where xi is the ith data vector, yi is the output re-sponse of the ith data vector, fi and f�i are the slackvariables, N is the number of data vectors, w is thecoefficient vector of the decision function, /(Æ) is thenonlinear mapping function from the input space tothe high-dimensional feature space, C is the regular-ization parameter, and b is the bias. In SVM forregression, a special loss function, called the e-insen-sitive loss function, is used. It penalizes deviationslarger than e in a linear fashion, and is zero other-wise. The regularization parameter C determinesthe tradeoff between the flatness kwk2 and the devi-ations fi; f�i larger than e from the actual target val-ues, and its value is usually determined by cross-validation. The relationship between e and fi; f�i isdepicted in Fig. 1.

The primal formulation in (1) can be convertedto a dual formulation using the Karush–Kuhn–Tucker (KKT) conditions (Fletcher, 1987) asfollows:

n SVM (Smola and Scholkopf, 1998).

1 We chose a 2nd-order polynomial kernel function for thefollowing reasons: (1) polynomial functions are often used torepresent physical systems and are the core models of DOEs, and(2) often, indeed usually, higher than 2nd-order terms in DOEsare usually disregarded because of their added complexity andrelative insignificance.


maximize � 1

2

XN

i¼1

XN

j¼1

ðai � a�i Þðaj � a�j ÞKðxi; xjÞ

� eXN

i¼1

ðai þ a�i Þ þXN

i¼1

yiðai � a�i Þ

subject to

PNi�1

ðai � a�i Þ ¼ 0;

0 6 ai; a�i 6 C;

8<:

ð2Þwhere ai; a�i are the Lagrangian multipliers andK(Æ, Æ) is the kernel function, and it is equal to/T(Æ)/(Æ) if Mercer’s condition (Vapnik, 1995) issatisfied.

The optimal w is given by

w ¼XN

i¼1

ðai � a�i Þ/ðxiÞ: ð3Þ

The corresponding decision function is given by

f ðxÞ ¼ wT/ðxÞ þ b ¼XN

i¼1

ðai � a�i ÞKðxi; xÞ þ b: ð4Þ

It should be noted that K(Æ, Æ) is simply a functionof the input vectors xi and xj, and it can take onmany different forms, such as polynomial and radialbasis functions, as shown in Eqs. (5) and (6), respec-tively. This implicit kernel mapping avoids compu-tation in the high dimensional feature space, andthus cures the curse of dimensionality.

Kðxi; xjÞ ¼ ðxTi xj þ 1Þd ðpolynomialÞ; ð5Þ

Kðxi; xjÞ ¼ expð� xi � xj

�� 2

2=r2Þ

ðradial basisÞ: ð6Þ

The proposed machine learning approach weemployed uses a 2nd-order polynomial SVM formodeling AGA’s supply chain, as opposed to aregression model obtained from a DOE/responsesurface design or actual observational data. Thereare several advantages of SVMs over a traditionalregression model. First, the SVM formulationattempts to minimize the sum of the regularizationterm kwk2 and the error term, while regression sim-ply minimizes the latter, which is therefore moreprone to overfitting. This concept of SVM formula-tion is very similar to ridge regression, which tendsto outperform regular ordinary least squares regres-sion in the usual case of correlated input variables(multi-collinearity) when dealing with observationaldata. Secondly, the SVM is not restricted to a

2nd-order polynomial representation of a supplychain. By using a different kernel such as a higherorder polynomial function or even a radial basisfunction, the SVM can model more complex nonlin-ear systems with great flexibility. This is especiallyimportant in situations where the 2nd or higherorder polynomial representation is not sufficient tomodel an actual supply chain. The SVM thus pro-vides a more general approach to model and opti-mize complex and nonlinear systems. This implicitkernel mapping also avoids computation in the highdimensional feature space (the complexity of train-ing a SVM only depends on the number of trainingvectors), thus cures the curse of dimensionality. Inaddition, SVMs impose sparseness on the finallearned function, which only consists of a subsetof the training vectors called support vectors.Finally, the loss function used in a SVM is an e-insensitive function instead of a quadratic functionused in regression. As a result, a regular SVM is lesssensitive to outliers present in the training data,because they are emphasized less heavily if the e-insensitive loss function is used.

SVM is a powerful machine learning and datamining tool that has already been shown to possesspredictive power. Moreover, it can also haveexplanatory power by selecting an appropriate ker-nel function, as we shall show using a SVM with a2nd-order polynomial kernel function,1 whose func-tional form is:

Kðx; yÞ ¼ ðxTyþ 1Þ2; ð7Þwhere x and y are the input vectors. This is the rea-son why a SVM with a 2nd-order polynomial kernelfunction was used to model the supply chain.

Having explanatory power means the followingmathematical model is produced for a 2nd-orderpolynomial kernel function:

y ¼ w0 þ w1x1 þ � � � þ wmxm þ wmþ1x21 þ � � �

þ w2mx2m þ w2mþ1x1x2 þ w2mþ2x1x3 þ � � � ð8Þ

Namely, the response y is a linear combination oflinear, quadratic, and two-way interaction terms ofthe input variables xi’s (similar to regression analy-sis or DOE used to characterize many physical


processes). Eq. (8) has explanatory power becauseeach w represents the effect of each right-hand-sideterm on the output response y. If wi is large, thisimplies that xi has a more significant effect on y thana smaller wi, assuming the data have been properlynormalized. Therefore, unlike artificial neural net-works which operate like a ‘black-box’, a SVM witha 2nd-order polynomial kernel function possessesexplanatory power, which allows for easy explana-tion, interpretation, and therefore understandingof the supply chain. If the supply chain is more com-plex, a higher order polynomial or other functionalSVM may be used instead. Based on the model con-structed, a GA, which is a well-known global opti-mization technique, can then be applied to seekthe optimum input settings for maximizing singleor multiple conflicting performance criteria.

3.2. SVM model reduction using random probe

method

A 2nd-order polynomial SVM always produces alinear output equation with coefficients for all linear,interaction and quadratic terms. However, in real sit-uations, many of these terms, including some of theinteraction and quadratic terms, will not be signifi-cant in modeling a process. In fact, via model reduc-tion by removing the insignificant terms, the overallprediction accuracy of the SVM model can often beimproved, due to the bias-variance trade-off (Hastieet al., 2001). In addition, since a mathematical equa-tion with fewer terms is more parsimonious andallows easier analysis and interpretation, a simplerand more compact equation, which is sufficient tomodel a process, is obviously preferable. Therefore,we apply a novel feature selection algorithm to the2nd-order polynomial SVM to remove insignificantterms.

The criterion used in the proposed feature selec-tion algorithm is w2

i , where wi is the coefficient ofthe ith term in the 2nd-order polynomial SVM(Eq. (8)). The algorithm removes terms whose corre-sponding w2

i is small (with respect to a specified cri-terion, which will be discussed later), as thecontribution by the ith term to the output y is pro-portional to w2

i . Namely, a larger w2i implies that xi

has a more significant effect on y than a smaller w2i ,

assuming the data have been properly normalized.Moreover, the SVM is formulated to minimize thesum of kwk2 and the error term in the primal sense.Therefore, loosely speaking, each w2

i tends to beminimized to zero unless it helps in reducing the

error term. In other words, w2i of useless or irrele-

vant features are probably minimized towards zero.This is another justification of using w2

i as a criterionfor feature selection. This method intuitively makessense, and Brank et al. (2002) have already pro-posed a similar idea for feature selection using a lin-ear SVM in document categorization. They chosethe threshold value by considering the tradeoffbetween the sparsity of the document representationand the amount of training data for the fixedamount of system memory. In their work, they onlyconsidered linear SVMs, while in this paper, thisidea is combined with our random probe featureselection method, which is extended to selecting sig-nificant terms in a 2nd-order polynomial SVM.

The random probe feature selection method wasoriginally developed by Stoppiglia et al. (2003). Weextended this method to feature selection in SVMsusing w2

i as the criterion. A linear SVM is first con-sidered, and the extension to a 2nd-order polyno-mial SVM is discussed at the end of this section. Itinvolves artificially creating a random probe featureas an extra input variable. The random probe can begenerated as a Gaussian random variable with zeromean and unit variance. A linear SVM is thentrained using this new data set with the extra ran-dom probe feature, and all those input variableswhose w2

i are less than w2r are discarded, where wr

is the coefficient (or weight) of a linear SVM ofthe random probe (the subscript r stands for ran-dom). Since the probe feature is a random variable,all coefficients wi, i = 1, . . . ,n, where n is the numberof input features, will also be random variables, andwill be different every time a SVM is trained. Here,wi and wr are assumed to be jointly distributedGaussian random variables. This random probemethod has also been extended to 1-norm SVMs(Chi et al., 2005), which has good feature selectionresults as reported by Bradley and Mangasarian(1998).

Both Stoppiglia’s and our feature selection meth-ods share the same idea of artificially creating a ran-dom probe feature as an extra input variable in theinitial step, but the latter steps in both methods dif-fer drastically; Stoppiglia’s method performs succes-sive Gram–Schmidt orthogonalization proceduresto extract the features in a closed form solution,whereas our method focuses on training multiplelinear SVMs and comparing w2

r to w2i statistically.

To determine whether the ith input feature shouldbe kept in our feature selection method, we computePi, defined as the probability of the random probe


being more relevant than the ith input feature.Mathematically,

Pi ¼ P ðw2r P c2 � w2

i Þ: ð9ÞThe parameter c2 is introduced to allow flexibility tocontrol the level of sparsity of the remaining inputfeatures. c2 should always be less than or equal to1, and a smaller c2 will result in more input variablesbeing removed, and conversely. In most problems,typical values of c2 range between 0.5 and 1.

Eq. (9) can further be written as

Pi ¼ P ððwr þ c � wiÞ � ðwr � c � wiÞP 0Þ: ð10Þ

Now we denote wr + c Æ wi by ai and wr � c Æ wi bybi. It can be shown that if wr and wi are jointlyGaussian, then ai and bi are also jointly Gaussianwith means, variances, and correlation coefficientraibi

shown below:

mai ¼ mwr þ c � mwi ; ð11Þmbi¼ mwr � c � mwi ; ð12Þ

r2ai¼ r2

wrþ c2 � r2

wiþ 2 � c � rwrwi � rwr � rwi ; ð13Þ

r2bi¼ r2

wrþ c2 � r2

wi� 2 � c � rwrwi � rwr � rwi ; ð14Þ

raibi¼

r2wr� c2 � r2

wi

rai � rbi

; ð15Þ

where mwr ; mwi ; r2wr; r2

wiare the means and vari-

ances of the coefficients of the random probe andthe ith term, respectively, and rwrwi is their correla-tion coefficient. These five parameters can be esti-mated by training the linear SVM several timestogether with the random probe feature added tothe training set.

Now, Pi can be written in terms of ai and bi asfollows:

Pi ¼ P ðai � bi P 0Þ ¼ P ðai P 0 and

bi P 0Þ þ P ðai 6 0 and bi 6 0Þ: ð16Þ

Since ai and bi are jointly Gaussian, then

Pðai P 0 and bi P 0Þ

¼Z 1

0

Z 1

0

1

2p � rai � rbi�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

aibi

q

� exp � 1

2ð1� r2aibiÞ �ðai � maiÞ

2

r2ai

þ ðbi � mbiÞ2

r2bi

� 2 � raibi� ðai � maiÞ � ðbi � mbi

Þrai � rbi

!!

and

P ðai 6 0 and bi 6 0Þ

¼Z 0

�1

Z 0

�1

1

2p � rai � rbi�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

aibi

q

� exp � 1

2ð1� r2aibiÞ �ðai � maiÞ

2

r2ai

þ ðbi � mbiÞ2

r2bi

� 2 � raibi� ðai � maiÞ � ðbi � mbi

Þrai � rbi

��:

Now Pi can be computed for each input feature,and those input features which have a Pi larger thana threshold are discarded.

A summary of the random probe method for alinear SVM is given below:

1. Set the sparsity level parameter c2 to a valuebetween 0 and 1. A small c2 will cause more inputfeatures to be discarded.

2. Standardize the training data set (with n inputfeatures) such that each input feature has zeromean and standard deviation equal to one.

3. Artificially generate a random probe feature froma Gaussian distribution with zero mean and stan-dard deviation equal to one, and append it to thetraining set.

4. Train a linear SVM and compute the coefficientsw1, . . . ,wn and wr.

5. Repeat steps 3 and 4 for several times to estimatethe means and variances of w1, . . . ,wn and wr.

6. Compute the means, variances, and correlationcoefficients of ai and bi for i = 1, . . . ,n usingEqs. (11)–(15).

7. Compute Pi for i = 1, . . . ,n using Eq. (16).8. Discard the input features which have a Pi larger

than a threshold.

The random probe method described above isused to select relevant input features, but it can alsobe easily extended to selecting significant terms in a2nd-order polynomial SVM using a simple heuristic.Similar to the random probe method in linearSVMs, a Gaussian random feature is artificially gen-erated as a random probe and appended to thetraining set to train a 2nd-order polynomial SVM.In a 2nd-order polynomial representation, thereare linear, quadratic, and interaction terms. To dis-card the irrelevant linear terms, the exact randomprobe method described above can be used. Forquadratic terms, the coefficients of x2

1; . . . ; x2n are


compared to x2r using the same random probe

method. For interaction terms, a term xi Æ xj will onlybe discarded if it is simultaneously less relevant thanboth xi Æ xr and xj Æ xr. Once all irrelevant terms areremoved, the remaining terms can be used as inputfeatures to train a linear SVM. It should be notedthat this random probe method can be extended tohigher order polynomial representations similarly.

3.3. Optimization using genetic algorithm (GA)

Global optimization attempts to locate an abso-lutely best set of optimum conditions that result inhighest objective value. It is usually a very difficultproblem. Traditional optimization methods suchas gradient ascent/descent search in the directionof the local gradient vector, and thus easily get stuckin a problem with a multimodal objective function.The optimization method used in our study is thegenetic algorithm (GA), which is one of the mostpopular and widely-used techniques for globaloptimization.

The GA is a searching technique that mimics bio-logical evolution (Holland, 1974; Winter et al.,1995). It starts with a population of potential solu-tions, and computes the fitness values for all individ-uals using a user-defined objective function. Theindividuals are usually encoded in binary format(0 or 1), and are called chromosomes. Integer-val-ued or real-valued encoding can also be used.Within this population, chromosomes are selectedfor reproduction with probabilities according totheir fitness using sampling methods such as Rou-lette Wheel Selection and Stochastic Universal Sam-pling. For instance, in Roulette Wheel Selection, theprobability that a chromosome will be selected isequal to its fitness value divided by the total popu-lation fitness (Chipperfield et al., 1994). As a result,the better-fitted chromosomes have a higher proba-bility of surviving and producing offsprings in thenext generation through crossover. Crossover is arecombination operation in which genetic materialsof the parents’ chromosomes are exchanged to pro-duce the offsprings. Mutation, which is a suddenand random change of the genetic material of chro-mosomes, can occur during this process as well. Theoffsprings recombine to form the parent population,their fitness values are then computed, and thewhole process is repeated for a pre-specified numberof generations. The best chromosome in each gener-ation is stored, and the final solution(s) can be cho-sen among these chromosomes.

It should be noted that the GA is stochastic innature, and thus similar to other traditional optimi-zation algorithms, it does not always guarantee glo-bal optimal solutions. However, unlike steepestascent/descent or other traditional optimizationmethods, the GA does not require the intensivecomputation of the gradient vector, and it advanta-geously provides a number of potential solutions toa given optimization problem. It is also highly cus-tomizable and robust in many problems. The majorreason for using the GA rather than gradient searchor other methods is its efficiency in solving multi-modal optimization problems (Ortiz et al., 2004).Usually in a supply chain optimization, there areseveral conflicting objectives, which result in ahighly multi-modal objective function, especiallywith increasing number of input factors and/or out-put responses.

4. Performance comparison of machine learningalgorithms versus DOE

The objectives of this research were to:

(a) Apply the proposed data mining and machinelearning algorithms to model a supply chain ofa calendar-manufacturing company calledAGA, and to obtain the optimum input set-tings that result in maximizing systemperformance.

(b) Compare the results to that obtained via DOEusing a common data set.

(c) Assess the viability and desirability of usingmachine learning algorithms and GAs tomodel and optimize complex supply chains.

4.1. Computer animated simulation of the VMR

system

Using the EXTEND software modeling tool, acomputer simulation model of the AGA’s supplychain system was developed. We used a representa-tive store (a product family composed of 50 sku’swith the family’s seasonal demand mean of 1000units) for all the experimental simulation runs. Partof the purpose for building the AGA simulationmodel was to perform experiments on the simulatedsystem rather than on the actual system, whichwould have been operationally and financiallyprohibitive.

Eight potentially significant control parameters(input variables x’s) were identified: average time

Table 1Range of input variables

Input variable X Minimumvalue

Maximumvalue

Average time betweenmerchandiser visits (ADBV) (x1)

7 days 21 days

Order-up-to level (OUT) (x2) 30 daysof demand

50 daysof demand

Time betweenshipments (TBS) (x3)

7 days 21 days

Shipment leadtime (SLT) (x4) 2 days 6 daysInitial inventory (IINV) (x5) 25% of season

demanda50% ofseasondemanda

Range of time betweenmerchandiser visits(RDBV) (x6)

No variation 114% ofmean

Minimum shipmentquantity (MinQ) (x7)

10 units 40 units

Day shipments arestopped (SDay) (x8)

154th dayof season

168th dayof season

a Season demand is 1000 units.


between merchandiser visits x1 (ADBV), order-up-to level x2 (OUT), time between shipments x3

(TBS), shipment leadtime x4 (SLT), initial inventoryx5 (IINV), range of time between merchandiser vis-its x6 (RDBV), minimum shipment quantity x7

(MinQ), and day shipments are stopped x8 (SDay).For each control parameter, we chose a minimumand maximum value that reflected the range of pos-sible values for that parameter (x), as shown inTable 1. Seven performance measures (output vari-ables y’s) were also identified: shortages (units) y1,

Table 2Correlation matrices of input and output variables

Shorts (y1) Sales (y2) Returns (y3) Avera

(a) Correlation matrix between pairs of performance measures

Shorts (y1) 1.000Sales (y2) �0.961 1.000Returns (y3) 0.767 �0.798 1.000Average inventory (y4) 0.017 0.006 0.240 1.00No. orders (y5) �0.350 0.388 �0.136 �0.37No. visits (y6) �0.676 0.541 �0.446 �0.20Av. OS (y7) 0.156 �0.189 �0.011 0.07

ADBV (x1) OUT (x2) TBS (x3) SL

(b) Correlation matrix between input variables and performance measur

Shorts (y1) 0.73 �0.47 0.169Sales (y2) �0.625 0.522 �0.197 �Returns (y3) 0.556 �0.314 �0.013Average inventory (y4) 0.264 0.275 �0.108No. orders (y5) �0.128 0.195 �0.69 �Line missing

sales (units) y2, returns (units) y3, average inventory

(units) y4, number of shipments (counts) y5, number

of visits (counts) y6, and average order size (units)y7. Table 2(a) shows the correlation matrix for theseseven performance measures, reflecting the closeconnection between sales, shortages, and returns.Table 2(b) gives the correlation matrix among theeight input variables and the seven performancemeasures, reflecting strong correlation among sev-eral variables. For example, ADBV and numberof visits are strongly negatively correlated, whichis not a surprising observation because a shortertime between merchandiser visits implies more num-ber of visits. It should be noted that the input vari-ables x’s are uncorrelated because data used camefrom a design of experiment (DOE) which will fur-ther be described below.

The above seven performance metrics were com-bined into an overall financial metric, which wasequal to Sales minus Cost in dollars. Two separateanalyses of the experimental design yielded the set-tings for the control parameters to (a) first maximizethe single criterion of profit, and then (b) simulta-neously optimize a bi-criteria model; namely, maxi-mize the sales in dollars and minimize the cost indollars using an appropriate preference (value)function (as well as indicating the impact of the con-trol parameters on the seven operational and costperformance measures).

It is thus a multi-input–multi-output problem. x’sand y’s are related by ‘‘hidden functions’’ throughthe simulation model. To train the proposed

ge inventory (y4) No. orders (y5) No. visits (y6) Av. OS (y7)

09 1.0001 0.129 1.0001 �0.797 0.056 1.000

T (x4) IINV (x5) RDBV (x6) MinQ (x7) SDay (x8)

es

0.215 �0.133 0.385 0.026 �0.0130.246 0.143 �0.315 �0.045 0.0270.327 �0.021 0.321 0.06 0.1320.133 0.851 0.146 0.011 0.060.03 �0.51 �0.051 �0.231 0.044


machine learning algorithm, the simulated data col-lected from a previously created and executed DOEwas used. We could have generated observationaldata from the simulation, but the DOE created datawas used in our experiments as a better way to com-pare the proposed machine learning and DOEapproaches. If data are observational, then theinput variables x’s may be correlated. Convention-ally, ridge regression would be used to overcomemulti-collinearity present in the data. SVMs, witha similar objective function as ridge regression inminimizing kwk2, are therefore a better approachto multi-collinear data than a regression model usedin a DOE. Moreover, it would be operationally dis-ruptively and may not be even feasible to perform afull DOE analysis in an operational supply chainsetting. This is precisely the reason why a simulationmodel was first built to allow DOE analysis. Theproposed machine learning algorithm is thus morerobust, in the sense that either DOE created dataor observational data can be used to learn theunderlying supply chain model.

The DOE required the simulation runs to bemade with each input variable set to either its min-imum or maximum value (Montgomery, 1997). Inall, 65 combinations of settings were used (a 1/4fractional factorial design with a center point).For each setting, 10 independent simulation runs(replications) were performed. Therefore, a total of650 runs of data were collected, of which 550 datapoints were randomly selected to form the trainingset used to train the SVM model with the randomprobe method. The remaining 100 data points wereused for cross-validation purposes. However, in anactual operational setting, when training the SVMmodel with the random probe method, the datawould normally, and more seamlessly and advanta-geously, be randomly collected from ongoing oper-ations, as opposed to systematically collectedaccording to that created by a specific DOEconfiguration.

4.2. Modeling using SVMs and random probe method

All training input variables were first standard-ized to zero mean and unit variance. Using the datagenerated from the simulator, a 2nd-order polyno-mial SVM system was trained to discover the x–y

relationship, i.e., Eq. (8).The SVM was implemented in MATLAB using

SVMlight (Joachims, 1998), a popular softwarepackage for SVM simulations, and its MATLAB

interface by Schwaighofer (online). Usually, beforetraining a SVM, certain parameters (kernel typeand its associated parameters, as well as the regular-ization parameter C) must be chosen to ensure goodprediction performance. This is called model selec-tion, and it is problem specific. The typical methodinvolves a grid search over a wide range of theparameter space to locate the parameter valueswhich yield the lowest validation error. This methodis computationally intensive and time-consuming.However, in our application, since a 2nd-orderpolynomial kernel function is usually sufficient toreasonably model a physical process such as a typi-cal supply chain, the only parameter left is the reg-ularization parameter C. This reduces the modelselection problem to a one-dimensional search.The value C was chosen using the validationmethod, which involves training the SVM on thetraining data and validating it on a separate set ofdata, and selecting the value of C that gives the low-est validation mean square error. Once the 2nd-order polynomial SVM is trained, the randomprobe feature selection (or model reduction) method(with threshold set to 0.1, and c2 set to 1) can beapplied to remove irrelevant terms, as described inSection 3.2.

4.3. Optimization using GAs

The GA was then applied to determine the opti-mum input settings, based on the mathematicalmodels generated by the SVMs with the randomprobe method, and using first a single objectivefunction (profit) and then a multiple criteria objec-tive function with conflicting objectives (sales, cost),which are further discussed in Sections 4.4.2 and4.4.3, respectively. The GA was implemented inMATLAB using the GA toolbox by Chipperfieldet al. (1994). In the GA toolbox we used, there isa linear scaling function which converts the objec-tive function values (which is the same as the overalldesirability) to a fitness value, upon which the GAwill select the individuals for reproduction. The pur-pose of this scaling function is to further differenti-ate the ‘‘goodness’’ of each individual objective’soverall desirability. By using the input settingsreturned by the GA in the simulator, hopefully,the performance would be optimized with thelearned model.

Similar to training a SVM, there are certainparameters in the GAs that must be chosen properlyto ensure good performance, both in terms of finding


a global (or a very deep) optimum and a rapid rate ofconvergence. These parameters include the size ofparent population, parent/offspring ratio, selectiontype, crossover type, crossover rate, mutation type,and mutation rate. These parameters are all inter-related and have a great impact on the GA’s perfor-mance. For example, on the one hand, a large parentpopulation size requires more computation butallows a more thorough search for a global optimumover the entire input space. On the other hand, asmall population size may speed up convergence(due to fewer computations in each generation),but it may yield a local optimum as the GA termi-nates, because the entire input space is not searchedthoroughly. The effects of the other parameters aremore subtle, and probably can best be discoveredthrough experimentation. How to choose theseparameters are usually problem specific, but fortu-nately, there are usually some guidelines on doing so.

As a starting point, the below parameters wereinitially chosen according to the settings for therobust genetic algorithm proposed by Ortiz et al.(2004), as shown in Table 3. Due to implementationrestrictions of the GA toolbox we used, the selectiontype was chosen to be ‘‘Stochastic Universal Sam-pling’’ instead of ‘‘Tournament’’. In addition, as inOrtiz et al. (2004), the mutation rate was set to0.125. Also shown are the final settings. For the sin-gle criterion optimization, the parent populationwas reduced to 15 in order to save computation timebut without any significant loss in performance. Forthe bi-criteria optimization, the parent populationsize and the parent/offspring ratio were increasedto 50 and 1:15, respectively, to increase the searchspace for locating a global optimum.

4.4. Experimental results

The simulator was set to run independently 10times using the optimum input settings obtained

Table 3Initial and final GA settings used in experiment

Parameter Initial GA settings Fin

Sin

Parent population 20 15Parent/offspring ratio 1:7 1:7Selection type Stochastic universal sampling StoCrossover type Single-point SinCrossover rate 0.85 0.8Mutation rate 0.125 0.1

from the following various sources and approaches:(a) DOE/response surface performed using Minitab(Minitab Training Manual, 2000), (b) the proposedmachine learning and evolutionary algorithms (thereduced SVM model and GA), and (c) the best set-tings of the training data.

4.4.1. Explanatory models and predictive behavior

Table 4(a) and (b), respectively list the coeffi-cients (non-standardized) of the equations of theoutputs y1, . . . ,y7 generated by the SVM withthe random probe method and the DOE. In usingthe random probe method, 10 2nd-order polyno-mial SVMs were trained independently to estimatethe means and variances of w1, . . . ,wn and wr, andthe threshold and c2 were set to 0.10 and 1,respectively. Some of the interaction and qua-dratic terms were regarded as insignificant, andwere removed from these final equations. Thiswhole training process took approximately severalminutes on a Pentium 4 machine with 512 MBRAM.

The corresponding root mean squared errors andFractional Variance Unexplained (FVUs) of bothtraining and validation data sets are depicted inTable 5. The FVU is a standardized measure ofmean squared error divided by the variance of theoutputs, and thus allows comparison among out-puts with different scales and magnitudes. Boththe training and validation FVUs of the reducedSVM and DOE are quite small, which indicate thatboth were able to predict the supply chain simula-tion outputs reasonably well. Note that both thetraining and validation data sets were used to runthe DOE, while only the training data set was usedfor the SVM. Therefore, the validation root meansquared error and the FVU of the DOE wereslightly lower than those of the proposed machinelearning algorithm, although comparable (i.e., notsignificantly different).

al GA settings

gle criterion Bi-criteria

501:15

chastic universal sampling Stochastic universal samplinggle-point Single-point5 0.8525 0.125

Table 4Coefficients (up to 2nd-order polynomial) of system equations of shortages, sales, returns, average inventory, number of orders, number ofvisits, and average order size generated by (a) SVM and random probe method and (b) DOE

Shortages (y1) Sales (y2) Returns (y3) Average inventory (y4) No. orders (y5) No. visits (y6) Av. OS (y7)

(a)

Constant 3098 5406.8 �704.02 �398.2 46.57 567.02 �8.82ADBV �13.75 25.90 2.03 3.54 0.20 �2.35 0.46OUT 6.56 �11.40 �2.71 3.05 �0.17 �0.81 1.15TBS �7.00 12.84 36.40 10.34 �0.05 �1.78 7.89SLT 22.96 �29.34 13.74 5.15 �0.17 �2.01 �9.77IINV 0.30 �0.63 0.84 0.90 �0.03 �0.05 0.11RDBV 5.50 �17.86 2.35 0.76 �0.03 �0.01 0.17MINQ �8.17 7.84 2.29 0.87 �0.05 0.02 0.68SDAY �35.94 �56.80 3.43 1.59 �0.37 �5.98 �0.19ADBV * ADBV 0.00 �0.05 �0.01 0 0.00 0.02 0ADBV * OUT 0 0 0 0 0.00 0 0ADBV * TBS 0 0 0 0.01 0.01 0 �0.06ADBV * SLT 0.65 �1.24 0.66 0.25 �0.01 0 0ADBV * IINV �0.01 0.01 0 0.00 0.00 0.00 0.00ADBV * RDBV �0.20 0.28 �0.11 �0.01 0.00 0.04 0ADBV * MINQ �0.02 0 0 0 0.00 0 0.01ADBV * SDAY 0.15 �0.20 0 �0.02 0.00 0 0OUT * OUT 0 0 �0.02 0 0.00 0.01 0OUT * TBS �0.09 0.12 �0.28 �0.12 0.00 0.00 �0.05OUT * SLT �0.37 0.83 �0.48 �0.19 0.00 0 0OUT * IINV 0.01 �0.01 0 0.00 0.00 0.00 0.00OUT * RDBV 0.02 �0.03 0 0 0.00 0.00 0OUT * MINQ �0.05 0.08 �0.05 �0.02 0.00 0.00 0OUT * SDAY �0.07 0.09 0.06 0.02 0.00 0 0TBS * TBS 0.01 �0.06 �0.32 0 0.00 0.06 0TBS * SLT 0.28 �0.68 0.64 0.23 0 0.00 0TBS * IINV �0.01 0.01 0 0.00 0.00 0 0.00TBS * RDBV �0.05 0 0 0 0.00 0.00 0TBS * MINQ 0 0 0.02 0.01 0.01 0 �0.03TBS * SDAY 0.09 �0.13 �0.12 �0.05 0.00 0.00 0SLT * SLT 0.05 �0.75 �0.17 0 0.02 0.25 0SLT * IINV �0.02 0.02 0 0 0.00 0 0.00SLT * RDBV �0.51 0.80 �0.47 �0.14 0.01 0.00 0SLT * MINQ �0.04 0 0.09 0.03 0.00 0.00 0.03SLT * SDAY 0 0 0 0 0.00 0 0.05IINV * IINV 0.00 0.00 0.00 0 0.00 0.00 0IINV * RDBV 0.00 0 0 0 0.00 0.00 0IINV * MINQ 0.00 0.00 0.00 0.00 0.00 0.00 0IINV * SDAY 0.00 0.01 �0.01 0.00 0 0 0RDBV * RDBV 0.23 �0.25 0.10 0.01 0.00 �0.04 �0.01RDBV * MINQ 0.03 �0.06 0 0 0.00 0 0RDBV * SDAY �0.03 0.10 0 0 0.00 0 0MINQ * MINQ 0.00 �0.01 0.00 0 0.00 0.00 0MINQ * SDAY 0.07 �0.07 0 0 0.00 0 0SDAY * SDAY 0.11 0.18 0.00 0 0.00 0.02 0

(b)

Constant 1100.12 �340.22 �448.60 �276.81 18.00 136.03 25.79ADBV �11.63 5.95 9.00 3.97 0.19 �1.71 �3.27OUT �23.84 31.27 �6.78 0.97 �0.58 �4.67 1.86TBS �17.17 22.39 26.52 11.42 �0.09 �0.02 7.24SLT �14.60 28.92 �14.08 �6.59 �0.12 0.00 0.00IINV 0.39 �0.64 0.91 0.85 �0.03 0.00 0.18RDBV 8.75 �0.99 2.32 0.92 0.01 �0.04 �0.33MINQ �4.71 3.51 2.27 0.66 �0.08 0.00 �0.95SDAY �3.04 3.37 1.90 0.88 0.09 0.00 �0.73

(continued on next page)


Table 2 (continued)

Shortages (y1) Sales (y2) Returns (y3) Average inventory (y4) No. orders (y5) No. visits (y6) Av. OS (y7)

ADBV * ADBV 0.00 0.00 0.00 0.00 0.00 0.00 0.00ADBV * OUT 0.00 0.05 0.00 0.00 0.00 0.00 0.00ADBV * TBS �0.05 0.00 0.00 0.00 0.01 0.00 �0.08ADBV * SLT 0.62 �0.98 0.61 0.23 �0.01 0.00 0.00ADBV * IINV �0.01 0.00 0.00 0.00 0.00 0.00 0.00ADBV * RDBV 0.00 0.00 0.00 0.00 0.00 0.00 0.00ADBV * MINQ 0.00 0.00 0.00 0.00 0.00 0.00 0.02ADBV * SDAY 0.13 �0.10 �0.05 �0.02 0.00 0.00 0.03OUT * OUT 0.31 �0.42 0.00 0.00 0.01 0.06 0.00OUT * TBS �0.05 0.09 �0.27 �0.12 0.00 0.00 �0.10OUT * SLT �0.44 0.74 �0.45 �0.16 0.00 0.00 0.00OUT * IINV 0.01 �0.01 0.00 0.00 0.00 0.00 0.00OUT * RDBV 0.03 �0.06 0.00 0.00 0.00 0.00 0.00OUT * MINQ �0.04 0.07 �0.04 �0.02 0.00 0.00 0.00OUT * SDAY �0.03 0.04 0.07 0.03 0.00 0.00 0.00TBS * TBS 0.00 0.00 0.00 0.00 0.00 0.00 0.00TBS * SLT 0.57 �0.93 0.73 0.28 0.00 0.00 0.00TBS * IINV �0.01 0.01 0.00 0.00 0.00 0.00 0.00TBS * RDBV 0.00 0.00 0.00 0.00 0.00 0.00 0.00TBS * MINQ 0.00 0.00 0.02 0.01 0.01 0.00 �0.03TBS * SDAY 0.14 �0.18 �0.12 �0.06 0.00 0.00 0.02SLT * SLT 0.00 0.00 0.00 0.00 0.00 0.00 0.00SLT * IINV �0.02 0.02 0.00 0.00 0.00 0.00 0.00SLT * RDBV �0.46 0.77 �0.42 �0.17 0.01 0.00 0.00SLT * MINQ 0.00 0.00 0.00 0.03 0.00 0.00 0.00SLT * SDAY 0.23 �0.38 0.16 0.06 0.00 0.00 0.00IINV * IINV 0.00 0.00 0.00 0.00 0.00 0.00 0.00IINV * RDBV 0.00 0.00 0.00 0.00 0.00 0.00 0.00IINV * MINQ 0.00 0.00 0.00 0.00 0.00 0.00 0.00IINV * SDAY 0.00 0.01 �0.01 0.00 0.00 0.00 0.00RDBV * RDBV 0.00 0.00 0.00 0.00 0.00 0.00 0.00RDBV * MINQ 0.00 �0.03 0.00 0.00 0.00 0.00 0.00RDBV * SDAY �0.05 0.00 0.00 0.00 0.00 0.00 0.00MINQ * MINQ 0.00 0.00 0.00 0.00 0.00 0.00 0.00MINQ * SDAY 0.04 �0.05 0.00 0.00 0.00 0.00 0.01SDAY * SDAY 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Table 5Training and validation root mean square errors and FVUs of each output

y1 y2 y3 y4 y5 y6 y7

Training root meansquare error

DOE 26.20 44.40 26.30 10.10 0.59 1.23 9.94SVM + Random probe 27.8915 45.7545 26.5424 10.3317 0.6197 1.2181 13.75562nd-order polynomialSVM

27.6525 45.3065 26.5469 10.2661 0.6209 1.2133 11.425

Training FVU DOE 0.08 0.13 0.22 0.04 0.03 0.01 0.06SVM + Random probe 0.086 0.1385 0.2224 0.0372 0.0359 0.0104 0.11312nd-order polynomialSVM

0.0842 0.1361 0.2229 0.0368 0.0361 0.0104 0.0781

Validation root meansquare error

DOE 29.80 43.60 25.50 9.74 0.59 1.20 9.84SVM + Random probe 33.5567 49.8234 26.4905 10.6325 0.6285 1.2689 10.36972nd-order polynomialSVM

33.5986 49.9872 28.2618 10.7114 0.63 1.2843 9.5613

Validation FVU DOE 0.10 0.13 0.21 0.03 0.03 0.01 0.06SVM + Random probe 0.1237 0.1643 0.2215 0.0394 0.0369 0.0113 0.06422nd-order polynomial SVM 0.1253 0.1670 0.2547 0.0404 0.0375 0.0117 0.0552


0 20 40 60 80 100 1202

2.1

2.2

2.3

2.4

2.5

2.6

2.7x 10 4

Generations

Estim

ated

Pro

fit $

Fig. 2. Estimated profit $ versus generations using the GA insingle criterion optimization.


The root mean squared errors and FVUs of a full2nd-order polynomial SVM (without using the ran-dom probe method to remove irrelevant terms) arealso shown in the table. Except for y7, the SVM withthe random probe method achieved lower valida-tion root mean squared errors for all outputs thanthe full SVM, indicating the success of applyingthe random probe method to remove irrelevantterms.

4.4.2. Profit (single criterion) optimization

After building the mathematical models relatingx and y by the SVMs, the GA was tested on botha single criterion and a multiple criteria objectivefunction. For the single criterion case, our objectivewas to specify the x’s that would maximize expected

profit in dollars, which is a pre-specified function ofthe y’s:2

Profit ¼ Sales in dollars� Cost in dollars; ð17Þwhere

Sales in dollars ¼ 40 � y2; ð18ÞCost in dollars ¼ y3 þ 20 � y4 þ 200 � y5

þ 200 � y6: ð19Þ

It should be noted that y1 and y7 do not explicitlyappear in (17)–(19). Eq. (17) is the objective func-tion used in the GA. The GA was performed inde-pendently 10 times, with a different initial parentpopulation in each trial. The final GA parametersettings shown in Table 3 were determined experi-mentally to ensure that the GA converged to thesame final optimum value in all 10 trials, whilesimultaneously minimizing the computational load.Fig. 2 shows a typical plot of the average estimatedprofit of all the individuals in each generation versusthe generation step. The average estimated profitconverged to a final optimum value after abouttwenty generations.

As most of the time required by the GA wasspent in computing the objective function of the off-springs, the total number of objective function eval-uations provides a good indication of thecomputational load required. In this single criterion

2 AGA’s overall objective was to investigate the relationshipsamong the input and output variables, x and y respectively. Webelieved it would also be useful to model and optimize the system.This would present an opportunity to apply the proposedmachine learning and genetic algorithms and compare theirresults to traditional DOE, response surface design, andoptimization.

optimization, the GA required about 2100 compu-tations of the objective function to reach conver-gence, which translated to about several secondson a Pentium 4 machine with 512 MB RAM.

The optimum settings returned by the GA andDOE, together with the 65 different settings of thetraining data, were all inputted into the simulatoras parameters (x). All resulting seven y outputs wererecorded and averaged over 10 independent simula-tion runs. Table 6(a) shows the actual simulationoutputs (y) given by the proposed SVM and GAalgorithms, DOE, and the best three settings ofthe training data with highest mean profit. Theprofit in dollars was then computed using (17)–(19), and is also shown in Table 6(a). The corre-sponding optimum settings (x) given by theproposed machine learning algorithm, DOE, andthe best three settings of the training data with high-est mean profit are depicted in Table 6(b).

All hypothesis tests were conducted at the 5%level.3 As Table 6(a) shows, the optimum settingobtained from the proposed machine learning algo-rithm achieved the highest expected profit of$25,038, which is greater than the optimums fromthe DOE (averaged $24,285) and from the best threetraining data settings (ranged from $24,218 to$24,785). The difference in means between themachine learning/evolutionary algorithms and theDOE, however, was not statistically significantðD�x ¼ 753; p ¼ 0:158Þ. The proposed algorithms

3 p-critical = 0.05; p-computed is designated as p.

Table 6Performance results and optimum input settings—single criterion

Shortages Sales Returns Averageinventory

No.orders

No.visits

Averageorder size

Sales indollars ·104

Costs indollars · 104

Profit indollars · 104

(a) Actual values of shortages, sales, returns, average inventory, number of orders, number of visits, and average order size,

sales in dollars, costs in dollars, and profit in dollars obtained from the simulator with the optimum settings given by each method a

Proposedalgorithms

175(0) 825(766) 30(152) 207(39) 5(0) 14(0) 100(18) 3.30(122) 0.80(1.58) 2.50(150)

DOE 88(0) 896(596) 12(108) 168(26) 5(0) 36(0) 132(19) 3.58(95) 1.16(1) 2.43(111)

Best threesettings oftraining data

192(397) 796(1690) 46(694) 182(69) 5(0) 12(0) 121(32) 3.19(270) 0.71(3) 2.48(312)88(0) 895(1270) 14(241) 168(40) 5(0) 36(0) 132(23) 3.58(203) 1.16(2) 2.42(241)185(0) 824(1270) 42(261) 265(53) 3(0) 12(0) 122(81) 3.29(202) 0.83(2) 2.46(245)

ADBV (x1) OUT (x2) TBS (x3) SLT (x4) IINV (x5) RDBV (x6) MinQ (x7) SDay (x8)

(b) Optimum settings of maximizing profit given by each method: single criterion

Proposedalgorithms

18 50 16 2 355 0 40 154

DOE 7 50 21 2 250 0 40 154

Best three settings oftraining data

21 50 21 2 500 0 40 168

7 50 21 2 250 0 40 168

7 50 21 2 500 0 40 154

Attributes with different optimum values are shown in italics.a Values in each cell are Mean (Variance).


also yielded a higher profit variance than DOE, butit was not statistically significant (F = 150/110,p = 0.662) as well. Hence, in terms of profit, the pro-posed machine learning algorithms performed com-parably (maybe even slightly better) and asconsistently as DOE.

The best three settings in the training dataachieved a slightly lower profit than the machinelearning algorithm, although the difference was notstatistically significant (their corresponding p-valueswere 0.715, 0.206, and 0.504, and the confidenceintervals of their means considerably overlappedwith those of the proposed algorithms and DOE).Moreover, the proposed algorithms yielded a lowerprofit variance than these three settings (althoughnot statistically significant as well, with correspond-ing p-values 0.290, 0.488, and 0.475).

Table 6(b) reveals that four out of eight input set-tings obtained from each method were similar,except for the four input attributes ADBV (averagetime between merchandiser visits), TBS (timebetween shipment), IINV (initial inventory), andSDay (day shipments are stopped). In addition,some of the input variables were at their boundaryvalues, implying that better performance (i.e., lowershortages and higher sales) might be attainable if theranges of these input variables could be expanded.The fact that a global optimum setting was found

to be located near the boundary is unusual andimplies that this firm (and perhaps others) specifiedinput parameters that may be too restrictive.

Fig. 3(a) portrays the variance versus the mean ofthe profit in dollars for all 67 settings (the proposedalgorithms, DOE, and 65 different settings of thetraining set). Points further to the bottom right aremore desirable, as they represent a higher mean profitand lower variance. Similarly, Figs. 3(b) and 3(c)depict the respective variances versus the means ofthe sales and costs, and Fig. 3(d) shows the expectedsales versus expected costs. An efficient frontier issketched in as the solid line, with the cost-sale pointsgiven by the proposed machine learning algorithmand the DOE lying essentially on it.

4.4.3. Sales versus cost tradeoffs (bi-criteria)

optimization

It may be the case that profit is not the overallobjective. For example, an enterprise may wish tocapture market share by focusing on sales (e.g.,growth company). Or, an enterprise may wish toemphasize cost reduction (e.g., mature companywhere product may be a commodity). Given thesepossible considerations, an enterprise might wishto make tradeoffs between sales and costs.

This part of our study tests GA’s ability to solvesuch multi-objective optimization problems by

Profit in Dollars

1

10

100

1000

10000

0 1 2 3Mean x 104

Varia

nce

x 10

4

Training DataSVM+Random ProbeDOE

Efficient Frontier








Sales in Dollars

1

10

100

1000

10000

0 1 2 3 4Mean x 104

Varia

nce

x 10

4

Costs in Dollars

1

10

100

1000

0 5 10 15 20Mean x 103

Varia

nce

x 10

3

Trade-off between Expected Sales vs. Costs

0

5

10

15

20

25

30

35

40

0 5 10 15 20

Mean Costs x 103

Mea

n Sa

les

x 10

3

Efficient Frontier

(a) (b)

(c) (d)

Fig. 3. Single criterion performance. (a) Variance versus mean of profit for all settings, (b) variance versus mean of sales for all settings, (c)variance versus mean of costs for all settings, (d) trade-off between sales versus costs.


changing the objective function from maximizingprofit (single criterion) to simultaneously maximiz-ing sales in dollars while minimizing cost in dollars,by, for example, placing more weight on maximiz-ing sales in dollars (multiple criteria). This multiplecriteria optimization is usually very difficult to solveanalytically, due to its highly nonlinear objectivefunction. We followed a similar approach used inMinitab to optimize multiple criteria by creating adesirability function and using it as the objectivefunction in the GA. It is given as

D ¼ dt11 � d

t22 ; ð20Þ

where

d1 ¼

1 s > B1;

s� A1

B1 � A1

� �w1

A1 6 s 6 B1;

0 s < A1

8>><>>:

and

d2 ¼

0 c > A2;

c� A2

B2 � A2

� �w2

B2 6 c 6 A2;

1 c < B2;

8>><>>:

where s ¼ 40 � y2 and c ¼ y3 þ 20 � y4 þ 200 � y5þ200 � y6 are the estimated responses of the sales

and cost in dollars ((10) and (11)) computed bythe SVM, respectively, Bi is the target value, Ai isthe minimum (for sales) or the maximum (for cost)acceptable response, and wi is the weight that affectsthe shape (convex, linear, concave) of the desirabil-ity function, and ti is the importance weight (trade-off) given to each response. Their values were set tothose shown in Table 7(a). The overall desirability D

is simply the geometric mean of the individual desir-abilities di, which is the exact same desirability func-tion used in DOE when dealing with multipleconflicting criteria optimization problems in Mini-tab. The importance weight given to sales t1 wasarbitrarily set to be twice as large as that given tocost t2. This represents the situation in which anenterprise wishes to place twice as much emphasison market share (top line growth) over cost (andis not necessarily primarily interested in maximizingprofit (bottom line growth) per se).

Similar to the single criterion optimization, theGA was performed independently 10 times, with adifferent initial parent population in each trial.The final GA parameter settings are shown in Table3. Fig. 4 represents a typical plot of the averageoverall desirability of all the individuals in each gen-eration versus the generation step. The averageoverall desirability converged to a final optimum

0 20 40 60 80 100 1200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Generations

Ove

rall

desi

rabi

lity

Fig. 4. Overall desirability versus generations using the GA in bi-criteria optimization.

Table 7Performance results and optimum input settings—bi-criteria

Sales in dollars s Cost in dollars c

(a) The values of A, B, w and t used in the desirability function

A (minimum (for sales) or maximum (for cost) acceptable response) 0 50,000B (target value) 50,000 0w (weight) 1 1t (importance weight) 2 1

Sales in dollars Costs in dollars Overall desirability D

(b) Actual values of sales and cost in dollars obtained from the simulator and the corresponding overall desirability value with the optimum

settings given by each method a

Proposed algorithms 3.69(126) 1.38(6) 0.395(5)DOE 3.68(50) 1.39(3) 0.391(2)

Best three settings of training data 3.66(31) 1.28(1) 0.400(2)3.77(115) 1.49(4) 0.399(5)3.58(203) 1.16(2) 0.395(11)

ADBV (x1) OUT (x2) TBS (x3) SLT (x4) IINV (x5) RDBV (x6) MinQ (x7) SDay (x8)

(c) Optimum settings of simultaneously maximizing sales in dollars and minimizing cost in dollars given by each method: bi-criteria

Proposed algorithms 7 50 12 3 426 0 40 168

DOE 7 50 7 2 250 0 40 168

Best three settings of training data 7 50 21 2 500 0 40 154

7 50 7 2 500 0 10 168

7 50 21 2 250 0 40 168


value after about 10 generations. In this bi-criteriaoptimization, the GA required about 7500 compu-tations of the objective function to reach conver-gence, which translated to about several secondson a Pentium 4 machine with 512 MB RAM.

Actual sales and costs in dollars given by the sim-ulation and the corresponding overall desirabilityvalues for each method are shown in Table 7(b).

The corresponding optimum settings given by theproposed machine learning algorithm and DOEare also depicted in Table 7(c). The optimum settingobtained from the proposed machine learning algo-rithm achieved sales of $36,908 and a cost of$13,795 with an overall desirability of 0.395, whilethat obtained from the DOE achieved sales of$36,776 and a cost of $13,866 with an overall desir-ability of 0.391. These values indicate that the pro-posed machine learning algorithm performedslightly but not significantly better than the tradi-tional DOE method (p = 0.676). Similar to profitmaximization, we compared these results with thebest three input settings of the training data set. Itcan be easily seen in Table 7(b) that the best inputsetting of the training data achieved sales of$36,644 and a cost of $12,801 with an overall desir-ability of 0.400, which is only slightly but not signif-icantly larger than the proposed machine learningalgorithm (p = 0.618).

Table 7(c) shows that the input settings obtainedfrom each method were similar, except for the inputattributes TBS (time between shipments), SLT(shipment leadtime), IINV (initial inventory), mini-mum shipment quantity (MinQ), and SDay (day

Overall Desirability

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0 0.1 0.2 0.3 0.4 0.5Mean

Var

ianc

e


Fig. 5. Bi-criteria performance—variance versus mean of overalldesirability for all settings.

4 We are currently working on making the proposed machinelearning models adaptive using statistical process control tech-niques to detect a system (model) change and then recursivelyupdating and reoptimizing the system model.


shipments are stopped). In addition, similar to theresults for profit maximization, some of the inputattributes were at their boundary values, implyingthat better performance (i.e., lower number ofshortages and higher number of sales) may beachievable if the ranges of the input attributes wereexpanded.

Fig. 5 depicts the variance versus mean of theoverall desirability, where points further to the bot-tom right are more desirable, as they represent set-tings yielding higher mean overall desirability andlower desirability variance. Both the optimum set-tings obtained from the proposed machine learningalgorithm and DOE are located in the desired loca-tion, indicating the ability of the GA in optimizingthis multiple criteria objective.

Again, data would ordinarily not be organized/collected from a DOE created design to train the pro-posed machine learning algorithms. In fact, observa-tional data is usually a virtue because variation in thetraining data (i.e., in each input attribute) is prefera-ble, in order to fully represent the input space.Moreover, machine learning algorithms are particu-larly suited to observational data that is multi-collin-ear. Notwithstanding the use of a DOE design, theresults of this experiment show the robustness ofthe proposed machine learning algorithms in seam-lessly modeling and optimizing a supply chain, andthe potential to do so for any data-rich system.

5. Conclusions and future research

The main goal of this paper was to demonstrateand validate the applicability and desirability ofusing machine learning techniques to model, under-stand, and optimize complex supply chains, whichare data rich but mathematically complex. Basedon this experimental study, the proposed machinelearning algorithms show promise as a potential

methodology for system modeling and optimizationvis-a-vis traditional DOE methods, which often aredisruptive to ongoing operations and requiresexpertise in statistics. The proposed machine learn-ing algorithms have also been applied successfullyfor process improvement in the pharmaceuticalindustry with similar conclusions.

Two main virtues of the proposed approach are:(a) it is automatic, requiring little human interven-tion, and (b) it can be made to be adaptive. IT hasmade the use of machine learning algorithms notonly feasible, but indeed a potentially highly desir-able management and decision making tool, savingcompanies considerable time and money by reduc-ing the need for statistical training of employeesand the need to perform experiments for improve-ment and optimization. The proposed machinelearning and evolutionary algorithms can easily beextended to other management practices and appli-cations in manufacturing, service, and healthcare.Our next step will be to implement the proposedmachine learning and evolutionary algorithms inan actual, real-time supply chain environment.

In terms of future research, the proposedmachine learning algorithms can be further modi-fied and improved by:

• Incorporating Bayesian procedures to train thealgorithms more parsimoniously, i.e., using lessdata by making use of the prior knowledge ofthe model and its parameter which is revisedbased on the training data.

• Incorporating an adaptive capability to continu-ally track and adjust to additional data and anychanges occurring in the supply chain system. Ifthere is a sudden internal change in the system,the system would be able to detect the distur-bance (spurious or sustained) and alert the userof such change. The user then has the option ofcollecting new data and retraining the SVM, orapplying the adaptive filtering techniques suchas least mean squares (LMS) to update the sys-tem model.4

• Incorporating sensitivity analysis to illustrate therelative importance of each input variable or fac-tor. Sensitivity analysis is particularly essentialand beneficial in cases where the process system


is modeled by a complex nonlinear mathematicalstructure other than a 2nd-order polynomialSVM. In such situations, the variance-basedapproach to global sensitivity analysis, such asthe Fourier amplitude sensitivity test (FAST)(Cukier et al., 1973 and Sobol’, 1993), can beemployed to discover how the variations in theoutputs of a model can be related to the varia-tions in the model inputs.

References

Bradley, P.S., Mangasarian, O.L., 1998. Feature selection viaconcave minimization and support vector machines. In:Shavlik, J. (Ed.), Proceedings of the 15th InternationalConference on Machine Learning, pp. 82–90.

Brank, J., Grobelnik, M., Milic-Frayling, N., Mladenic, D., 2002.Feature selection using linear support vector machines.Technical Report, MSR-TR-2002-63, Microsoft Research,Microsoft Corporation.

Chi, H.M., Ersoy, O., Moskowitz, H., 2005. Feature selectionusing random probes and linear support vector machines. In:Computational Intelligence: Methods and Applications(CIMA), Istanbul, Turkey, December 15–17, in press.

Chipperfield, A.J., Fleming, P.J., Pohlheim, H., Fonseca, C.M.,1994. Genetic algorithm toolbox user’s guide. ACSE ResearchReport No. 512, University of Sheffield.

Chiu, C., Yih, Y., 1995. A learning-based methodology fordynamic scheduling in distributed manufacturing systems.International Journal of Production Research 33 (11), 3217–3232.

Cukier, R.I., Fortuin, C.M., Shuler, K.E., Petschek, A.G.,Schaibly, J.H., 1973. Study of the sensitivity of coupledreaction systems to uncertainties in rate coefficients. I.Theory. The Journal of Chemical Physics 59, 3873–3878.

Daniel Sr., J., Rajendran, C., 2005. A simulation-based geneticalgorithm for inventory optimization in a serial supply chain.International Transactions in Operational Research 12 (1),101–127.

Emerson, D., Piramuthu, S., 2004. Agent-based framework fordynamic supply chain configuration. In: Proceedings of the37th Annual Hawaii International Conference on SystemSciences (HICSS’04).

Fletcher, R., 1987. Practical Methods of Optimization. JohnWiley and Sons, New York.

Hamamoto, S., Yih, Y., Salvendy, G., 1999. Development andvalidation of genetic algorithm-based facility layout—a casestudy in the pharmaceutical industry. International Journal ofProduction Research 37 (4), 749–768.

Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements ofStatistical Learning: Data Mining, Inference, and Prediction.Springer-Verlag, New York.

Holland, J., 1974. Adaptation in Natural and Artificial Systems.The University of Michigan Press, Ann Arbor, MI.

Joachims, T., 1998. Making Large-Scale SVM Learning Practi-calAdvances in Kernel Methods: Support Vector Learning.MIT Press, Cambridge, MA.

Kim, C., Min, H.-S., Yih, Y., 1998. Integration of inductivelearning and neural network for multi-objective FMS sched-

uling. International Journal of Production Research 36 (9),2497–2509.

Kimbrough, S.O., Wu, D.J., Zhong, F., 2002. Computers playthe beer game: Can artificial agents manage supply chains?.Decisions Support Systems 33 323–333.

Liang, T.P., Moskowitz, H., Yih, Y., 1992. Integrating neuralnetworks and semi-Markov processes for automated knowl-edge acquisition: An application to real-time scheduling.Decision Science 23 (6), 1297–1314.

Lin, F., Pai, Y., 2000. Using multi-agent simulation and learningto design new business processes. IEE Transactions onSystem, Man, and Cybernetics—Part A: Systems andHumans 30 (3), 380–384.

Lu, T.P., Chang, T.M., Yih, Y., 2005. Production controlframework for supply chain management- an application inelevator manufacturing industry. International Journal ofProduction Research 43 (20), 4219–4233.

Min, H.-S., Yih, Y., Kim, C., 1998. A competitive neural networkapproach to multi-objective FMS scheduling. InternationalJournal of Production Research 36 (7), 1749–1765.

Minitab Training Manual, 2000. Factorial DOE using Minitab.Minitab Inc., Rel. 13, Ver. 2.2.

Montgomery, Douglas C., 1997. Design and Analysis of Exper-iments, fourth ed. John Wiley and Sons, Inc., New York.

Muller, K.R., Smola, A., Ratsch, G., Scholkopf, B., Kohlmor-gen, J., Vapnik, V., 1997. Predicting time series with supportvector machines. In: Proceedings of the 7th InternationalConference on Artificial Neural Networks (ICANN 1997),pp. 999–1004.

Naso, D., Surico, M., Turchiano, B., Kaymak, U., 2004. Geneticalgorithms in supply chain scheduling of ready-mixed con-crete. ERIM Report Series Reference. No. ERS-2004-096-LIS.

Ortiz Jr., F., Simpson, J.R., Pignatiello Jr., J.J., Heredia-Langner, A., 2004. A genetic algorithm approach to multi-ple-response optimization. Journal of Quality Technology 36(4), 432–450.

Schwaighofer, A. MATLAB Interface to SVM light. Institute forTheoretical Computer Science at Graz University of Tech-nology [online]. Available from: <http://www.cis.tugraz.at/igi/aschwaig/software.html>.

Smola, A., Scholkopf, B., 1998. A tutorial on support vectorregression. NeuroCOLT2 Technical Report Series, NC2-TR-1998-030.

Sobol’, I.M., 1993. Sensitivity analysis for non linear mathemat-ical models. Mathematical Modeling and ComputationalExperiment 1, 407–414.

Stoppiglia, Herve, Dreyfus, G., Dubois, R., Oussar, Y., 2003.Ranking a random feature for variable and feature selection.Journal of Machine Learning Research 3, 1399–1414.

Sun, Y.-L., Chang, T.M., Yih, Y., 2005. A learning-basedadaptive controller for dynamic manufacturing cells. Inter-national Journal of Production Research 43 (14), 3011–3025.

Trafalis, T.B., Ince, H., 2000. Support vector machine forregression and applications to financial forecasting. In:Proceedings of the International Joint Conference NeuralNetwork (IJCNN) 2000, Como, Italy, pp. 348–353.

Vapnik, V., 1995. The Nature of Statistical Learning Theory.Springer-Verlag, New York.

Vapnik, V., Golowich, S., Smola, A., 1996. Support vectormethod for function approximation, regression estimationand signal processing. In: Mozer, M.C., Jordan, M.I.,

http://www.cis.tugraz.at/igi/aschwaig/software.html

http://www.cis.tugraz.at/igi/aschwaig/software.html


Petsche, M. (Eds.), in press, Advances in Neural InformationProcessing Systems 9. MIT Press, Cambridge, MA.

Wang, J.-Y., Yih, Y., 1997. Using neural networks to selectcontrol strategy for automated storage and retrieval systems(AS/RS). International Journal of Computer IntegratedManufacturing 10 (6), 487–495.

Wang, S., Zhu, W., Liang, Z., 2001. Shape deformation: SVMregression and application to medical image segmentation. In:International Conference on Computer Vision (ICCV’01),Vancouver, BC, Canada, vol. 2, pp. 209–216.

Winter, G., Periaux, J., Galan, M., 1995. Genetic Algorithms inEngineering and Computer Science. John Wiley and SonsLtd., New York.

Wu, S.-J., Gebraeel, N., Lawley, M., Yih, Y., in press. A neural-network integrated decision support system for the optimalpredictive maintenance policy of rotating machinery. IEEETransactions on Man, Machines, and Cybernetics Part A.

Yih, Y., Nof, S.Y., 1991. The impact of integrating knowledge-based technologies in manufacturing: An evaluation. Com-puter-Integrated Manufacturing Systems 4 (4), 254–263.

modeling and optimizing a vendor managed replenishment system using machine learning and genetic...

Documents