kleijnen - a methodology for fitting and validating metamodels in simulation
TRANSCRIPT
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
1/16
Theory and Methodology
A methodology for tting and validating metamodels in
simulation 1
Jack P.C. Kleijnen a,*, Robert G. Sargent b
a
Department of Information Systems (BIK)/Center for Economic Research, Tilburg University (KUB), 5000 LE Tilburg,The Netherlands
b Simulation Research Group, Department of Electrical Engineering and Computer Science, 439 Link Hall, Syracuse University,
Syracuse, NY 13244, USA
Received 1 December 1997; accepted 1 October 1998
Abstract
This paper proposes a methodology that replaces the usual ad hoc approach to metamodeling. This methodology
considers validation of a metamodel with respect to both the underlying simulation model and the problem entity. It
distinguishes between tting and validating a metamodel, and covers four types of goal: (i) understanding, (ii) pre-
diction, (iii) optimization, and (iv) verication and validation. The methodology consists of a metamodeling processwith 10 steps. This process includes classic design of experiments (DOE) and measuring t through standard measures
such as R-square and cross-validation statistics. The paper extends this DOE to stagewise DOE, and discusses several
validation criteria, measures, and estimators. The methodology covers metamodels in general (including neural net-
works); it also gives a specic procedure for developing linear regression (including polynomial) metamodels for
random simulation. 2000 Elsevier Science B.V. All rights reserved.
Keywords: Simulation; Approximation; Response surface; Modeling; Regression
1. Introduction
Although metamodels have been quite fre-
quently studied in the simulation literature and
applied in the simulation practice, this has been
done in an ad hoc way. Therefore in this paper we
oer a methodology for developing metamodels.Our methodology has the following novel char-
acteristics: (i) We strictly distinguish between t-
ting and validating a metamodel. (ii) We
emphasize the role of the problem entity besides
the metamodel and the simulation model. (iii) We
assume that any model should be developed for a
specic goal; for metamodels we distinguish
four types of goal. Guided by these three charac-
teristics, we derive a metamodeling process with
10 steps. We also give a detailed procedure for
European Journal of Operational Research 120 (2000) 1429www.elsevier.com/locate/orms
* Corresponding author. Tel.: +31-13-466-2029; fax: +31-13-
466-3377; e-mail: [email protected] Two anonymous referees' comments on the rst draft lead
to an improved organization of our paper.
0377-2217/00/$ see front matter 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 2 2 1 7 ( 9 8 ) 0 0 3 9 2 - 0
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
2/16
validating metamodels that use linear regression
analysis in random simulation; we feel that this
procedure is suited to all four goals of metamod-
eling.
Our methodology is meant for both researchers
and practitioners. We assume that those readers
have a basic knowledge of simulation and mathe-
matical statistics, especially regression analysis.
The characteristics of our methodology are de-
tailed as follows.
We distinguish the relationships among meta-
model, simulation model, and problem entity that
are displayed in Fig. 1, where w.r.t. stands for
`with respect to'. A problem entity is some system
(real or proposed), idea, situation, policy, or phe-nomenon that is being modeled. A simulation
model is a causal model of some problem entity;
this model may be deterministic or stochastic. A
metamodel is an approximation of the input/out-
put (I/O) transformation that is implied by the
simulation model; the resulting black-box model is
also known as a response surface. There are dif-
ferent types of metamodels; for example, polyno-
mial regression models (which are a type of linear
regression), splines (which partition the domain of
applicability into subdomains and t simple re-gression models to each of the subdomains), and
neural networks (a type of non-linear regression);
see Barton (1993, 1994), Friedman (1996), Huber
et al. (1996), Kleijnen (1999), Pierreval (1996), Yu
and Popplewell (1994).
Validation is the `substantiation that a model
within its domain of applicability possesses a sat-
isfactory range of accuracy consistent with the
intended application of the model'; see Sargent
(1996) and Schlesinger et al. (1979). Fig. 1 shows
that validation of a metamodel relates to both the
simulation model and the problem entity. To val-
idate a metamodel, we must know the accuracy
required of the metamodel; this depends on the
goal of the metamodel. We do not discuss valida-
tion of the underlying simulation model, except
when discussing the use of a metamodel to aid in
this validation (for validation of simulation mod-els see Beck et al., 1997; Kleijnen, 1995b; Sargent,
1996 and http://manta.cs.vt.edu/biblio/.) So we
assume that the simulation model under consid-
eration is valid, unless stated otherwise.
Fitting applies mathematical and statistical
techniques to a set of simulation I/O data, to (i)
estimate the metamodel's parameter values, and
(ii) evaluate the estimated parameter values with
respect to the data set, using quantitative criteria.
Prior to tting a metamodel, we must specify the
type and form (or class) of the metamodel.Whereas validation requires (domain) knowledge
about the problem entity and the specied accu-
racy required of the metamodel, tting concerns
only the determination of a `good' set of parameter
values from the data set. So tting is not concerned
with whether the metamodel adequately describes
the problem entity and the simulation model, given
the goal of the metamodel.
For metamodels we identify four general goals:
(i) understanding the problem entity, (ii) predicting
values of the output or response variable, (iii)
performing optimization, and (iv) aiding the veri-cation and validation (V & V) of a simulation
model. These goals need attention when specifying
the type and form of metamodel to be tted and
when determining the validity of the metamodel.
The study of a problem entity usually distin-
guishes several outputs. Hence, the corresponding
simulation also has multiple response variables.
Current practice, however, uses metamodels with a
single output: a separate metamodel is developed
for each output. Indeed we develop a methodologyFig. 1. Metamodel, simulation model, and problem entity.
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 15
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
3/16
for a single output; this methodology can be ap-
plied to each of the single-response metamodels.
We use dierent symbols for the I/O data of
problem entity, simulation, and metamodel: see
Fig. 2. We shall return to the individual symbols.
We organize the remainder of this paper as
follows. Section 2 covers our methodology for
developing metamodels, consisting of a process
with 10 steps. Section 3 details some steps. Sec-
tion 4 focuses on linear-regression metamodels in
random simulation. Section 5 contains research
issues. Section 6 provides a summary.
2. The metamodeling process
We suggest the following 10 steps for develop-
ing a metamodel.
1. Determine the goal of the metamodel: Let us
reconsider the four goals mentioned above. (a)
Problem entity understanding: A goal may be un-
derstanding the internal behavior of a problem
entity (e.g., determining the `bottlenecks' in a
system), understanding the behavior of the output,
performing sensitivity analysis (i.e., determining
how sensitive the output is to changes to the in-puts), and conducting whatif studies. (b) Predic-
tion: The metamodel may replace the simulation,
to obtain a set of values of the output for a specic
set of values of the inputs. The metamodel is then
used routinely instead of the simulation, since the
metamodel is usually quicker and easier to use
than the simulation. (c) Optimization: The meta-
model may be used to determine the set of values
of the problem entity inputs that optimizes a spe-
cic objective function. This function may depend
on one or more problem entity's outputs. (d) Aid in
V & V of the simulation: see Section 1.
2. Identify the inputs and their characteristics:
We distinguish between input variables (briey:
inputs) and parameters (see Zeigler, 1976). Inputs
are directly observable (for example, number of
servers), whereas parameters require statistical in-ference (for example, service rate k). (The simula-
tion computer program may treat these inputs and
parameters as `inputs'.) For each input we should
determine whether the input is deterministic or
random; also we must select a measurement scale
(these scales namely, nominal, ordinal, interval,
ratio, and absolute are discussed in Kleijnen
(1987, pp. 138141)). The metamodel has `inde-
pendent variables' x, which are functions of the
simulation inputs and parameters; for example,
x1 k2
; also see Fig. 2. The independent variablesof the metamodel should be identied. The simu-
lation inputs and parameters are called factors in
the design of experiments (DOEs). Even a random
simulation has deterministic inputs and parame-
ters; the only random element is the seed of its
pseudorandom number generator s0, but even this
seed may be xed.
3. Specify the domain of applicability (experi-
mental region): We should determine the combi-
nation of values of the factors for which the
metamodel is to be valid. (To obtain a valid sim-
ulation, its factor values also need to be restrictedto a certain domain; see the `experimental frame'
concept in Zeigler, 1976.)
4. Identify the output variable and its charac-
teristics: The output should be identied, and
like the inputs this output must be classied as
random or deterministic, and a measurement scale
must be selected.
5. Specify the accuracy required of the meta-
model: We specify the range of accuracy required
of the output of the metamodel, for its domain ofFig. 2. Notation for I/O data of problem entity, simulation, and
metamodel.
16 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
4/16
applicability, with respect to both the simulation
and the problem entity. This range depends on the
goal of the metamodel and also, perhaps, on the
goal of the simulation study. The accuracy re-
quired of the metamodel when applied to the
simulation may dier from the accuracy when it is
applied to the problem entity. The ranges of ac-
curacies may be specied via the validity measures
discussed in the next step.
6. Specify the metamodel's validity measures and
their required values: We determine the metamodel's
validity measures (e.g., absolute error and absolute
relative error), along with their required values.
7. Specify the metamodel, and review this speci-
cation: First we select a type of metamodel; ex-amples are polynomial regression models and
neural networks (nets). Next, we select a form of
the metamodel from the type selected; examples
are rst- or second-degree polynomials, and neural
nets with dierent levels and number of nodes. The
users of the metamodel should review the type and
form of metamodel specied, to ensure that the
particular metamodel is satisfactory for its in-
tended goals.
8. Specify a design including tactical issues, and
review the DOE: We determine the type of designto be used (e.g., 2kp) and the specic (say) n design
points for which data are to be collected from the
simulation. The resulting I/O data are partitioned
into two parts: one for tting and one for vali-
dating the metamodel. Determining the design is
frequently called a strategic issue in DOE. We
must also decide on tactical issues when the sim-
ulation is random: the number of observations
(runs) at each design point must be determined.
Further, we may use variance reduction techniques
(e.g., common pseudorandom numbers). There
may be interaction between strategic and tacticalissues; for example, common and antithetic num-
bers may be selected following the well-known
SchrubenMargolin strategy; see Donohue (1995).
In this step we must nally review all DOE aspects,
to determine whether they are satisfactory for the
goal, type, and form of metamodel specied and
for the validation of the metamodel.
9. Fit the metamodel: We run the simulation to
obtain those I/O data specied in Step 8 for tting
the metamodel. (We do not yet obtain the data for
determining validity; see the next step.) From these
data we estimate the parameter values of the
metamodel, using (say) least squares. We evaluate
these estimates, using mathematical and statistical
criteria. For example, if the metamodel is a linear-
regression model, then a statistical analysis can
determine if the metamodel is overtted (i.e., some
parameters can be set zero). If the assessment is
not satisfactory, then we repeat Steps 79.
10. Determine the validity of the tted meta-
model: We run the simulation to obtain the data
specied in Step 8 for validating the metamodel.
We then determine if the metamodel satises the
validity measures with respect to the simulation
(specied in Step 6), rst for the validity data set,and then for the tting data set. If the metamodel
satises the validity measures for both data sets,
then we consider the metamodel valid relative to
the simulation; else we repeat Steps 710. Lastly,
we determine the validity of the metamodel versus
the problem entity; the amount of testing and
evaluation depends on the goal of the metamodel.
If validity cannot be established with respect to the
problem entity, then we repeat Steps 710.
3. Discussion of some steps
In this section we detail some of the steps in the
metamodeling process presented in Section 2. We
do not devote equal space to each step: we are brief
on information widely known in the simulation
literature. Specically, Section 3.1 discusses Step 1
(determine the goal of the metamodel), Section 3.2
returns to Step 6 (specify the metamodel's validity
measures and their required values), Section 3.3
details Step 8 (specify a design including tactical
issues, and review the DOE), Section 3.4 treatsStep 9 (t the metamodel), and Section 3.5 covers
Step 10 (determine the validity of the tted meta-
model).
3.1. Step 1: Determine the goal of the metamodel
3.1.1. Goal 1 (understanding)
We distinguish the following three levels of un-
derstanding.
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 17
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
5/16
(a) Direction of output: does the output increase
or decrease for an increase in a specic input? This
goal requires at least ordinal measurement scales
for input and output. For this goal a metamodel
can often be simple; in fact, it may be desirable to
develop several simple models instead of one
complex model. The signs of the individual pa-
rameters of the metamodel should support prior
knowledge about the problem entity. For example,
in a queuing problem the mean waiting time
should increase with trac rate; so if a rst-order
polynomial is used and x1 denotes trac rate, then
its estimated b1 should be non-negative.
(b) Screening: give a `short list' of the most
important factors. For this goal, a rather crudemetamodel may suce (of course there is always
the danger that such a crude metamodel is mis-
leading). For example, a rst-order polynomial,
possibly augmented with cross-products (two-fac-
tor interactions) is used by a technique called se-
quential bifurcation; see Bettonvil and Kleijnen
(1997). An alternative technique is discussed in
Saltelli et al. (1995).
(c) Main eects, interactions, quadratic eects,
and other high-order eects: what are the eects of
a specic factor, possibly in combination withother factors? The type of metamodel may still be
simple: rst- or second-order polynomials, possi-
bly combined with transformations such as 1/x
and log(x). For example, in a case study of Flex-
ible Manufacturing Systems (FMSs) Kleijnen and
Standridge (1988) use simulation and metamodel-
ing: the metamodel suggests that of the four ma-
chine types, only types one and two are
bottlenecks that do interact. Other examples are
the single-server queues with dierent priority
rules, and the queuing networks in Cheng and
Kleijnen (1999): a second-degree polynomial canexplain the `exploding' behavior of such a simu-
lated system as the trac rate x approaches unity;
even better is the explanation by a rst-order
polynomial multiplied by the factor 1/(1 x).Regression analysis aims at more understand-
ing than classic Analysis of Variance (ANOVA)
and its extensions, namely Multiple Comparison
Procedures (MCPs) and Multiple Ranking Proce-
dures (MRPs). Indeed, in ANOVA we test the
null-hypothesis that a given number of system
congurations (populations) have the same mean
(no factor eect at all). Regression analysis, how-
ever, aims at estimating the magnitudes of the ef-
fects and at inter- and extrapolation. MCPs aim at
detecting clusters of system congurations with the
same mean per cluster. MRPs try to nd the best
system conguration among a limited set of con-
gurations. See Ehrman et al. (1996), Fu (1994),
and Goldsman and Nelson (1998).
Low-order polynomials are simpler to under-
stand than splines and neural nets. It is easy to
comprehend main eects, two-factor interactions,
and quadratic eects; they can be easily displayed
graphically.
The type of metamodel selected should be ap-propriate for the type of understanding desired.
For example, if it is desired to understand what is
occurring inside some system, then a metamodel
such as a polynomial over the entire domain of
applicability is probably more appropriate than
(say) a spline metamodel (several simple meta-
models tted to subdomains of applicability). A
spline model may be the best choice if the response
surface is expected to be highly complex and the
understanding desired is what inputs have the
most eect on the output.
3.1.2. Goal 2 (prediction)
Prediction requires either the absolute or the
relative magnitude of the output. For example, the
goals of the simulation study in Kleijnen (1995a)
are: (a) determine whether it is worthwhile to send
a ship equipped with sonar into a certain area, for
a mine search mission: absolute scale necessary; (b)
determine the relative eects of certain technical
sonar parameters: interval or ratio scale suces.
Other examples are the queuing systems in Cheng
and Kleijnen (1999); these examples require up toa fourth and sixth-degree polynomial, respectively.
In general, if the metamodel is used for pre-
diction, then the type of metamodel must be se-
lected with extreme care: (a) The simulation's I/O
transformation to be approximated by the meta-
model often has complex behavior over the do-
main of applicability; see Sargent (1991). (b) The
accuracy required for a predictive metamodel is
usually very high. If, for example, a polynomial is
used directly on the simulation data, then it
18 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
6/16
probably will have to be of high order (with nu-
merous interaction terms). However, if a good
transformation of data can be found (see Cheng
and Kleijnen, 1999), then a simpler polynomial
regression model can be used. Several other types
of metamodels may be appropriate; for example,
neural nets and splines (see Section 1).
To predict an output for a new set of input
values, the simulation itself can be used if simu-
lation run time is not prohibitive. In real-time
control, however, a metamodel may be preferred;
examples are found in production scheduling and
nancial markets.
3.1.3. Goal 3 (optimization)Optimization is a well-known topic in Opera-
tions Research. Closely related is goal seeking:
given a target value for the output, nd the cor-
responding input values. There are many mathe-
matical techniques; for example, Response Surface
Methodology (RSM), sequential simplex search,
genetic algorithms, simulated annealing, and tabu
search; see Kleijnen (1999). Metamodels for opti-
mization that account for both the mean and the
variance of the output is the focus of Taguchi's
methods; see Kleijnen and Gaury (1998) andSanchez et al. (1996).
Consider one specic technique, namely RSM
(books and review articles are referenced in Khuri,
1996b, p. 377). RSM relies on low-order polyno-
mial regression metamodels. Local marginal eects
are estimated to nd the direction of improvement.
In the local-exploration phase, RSM uses a se-
quence of rst-order polynomial metamodels,
combined with steepest ascent search. In the nal
optimization phase, RSM uses a single second-
order polynomial metamodel. Clearly, optimiza-
tion is more demanding than `understanding' andless demanding than prediction.
3.1.4. Goal 4 (V & V of simulation)
We present two ways that metamodels can aid
goal 4. The rst way determines whether the
metamodel has eects with the correct signs: does
the output in the metamodel respond to changes in
the inputs in the direction that agrees with prior,
qualitative knowledge about the simulated prob-
lem entity? A simple academic example is a
queuing simulation: waiting time increases when
the trac rate increases. Suppose that for low
values of the trac rate the metamodel is a rst-
order polynomial. Then the estimated parameter
relating these two variables should be positive; else
there is an error somewhere. Two case studies are
the ecological simulation in Kleijnen et al. (1992)
and the military simulation in Kleijnen (1995a).
The second way to this goal becomes relevant
when the problem entity is easily observable. Then
comparisons can be made between the simulation's
and the problem entity's outputs for numerous
experimental conditions over the simulation's do-
main of applicability. Since the collection of data
on the problem entity is usually costly, it is im-portant to do these data gathering eciently. So it
must be decided how many experimental condi-
tions to observe, and where to locate them. This
depends on the complexity of the simulation's I/O
transformation over the domain of applicability.
For example, whether the metamodel is linear or
quadratic guides the number and location of data
points needed for comparison; that is, DOE can be
applied. The resulting design is a plan for com-
paring the problem entity to the simulation for
validation of the simulation. See Chen (1985).In general, a simulation is supposed to meet the
V & V requirements only for a certain application
domain. Hence, V & V of simulations requires a
test plan; we proposed to base that plan on DOE.
But DOE is substantially aided by the use of an
explicit metamodel! However, that metamodel
should be validated with respect to the simulation.
3.2. Step 6: Specify the metamodel's validity
measures and their required values
Fig. 1 shows that we wish to determine the
validity of metamodel versus problem entity. We
may use the approaches also used for validating a
simulation. High validity requires that problem
entity outputs be obtained for numerous experi-
mental conditions. Unfortunately, this is usually
impossible. Therefore, the validity of a metamodel
is determined by making (i) many comparisons
between the outputs of the metamodel and the
simulation, and (ii) a few comparisons between the
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 19
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
7/16
metamodel and the problem entity, using whatever
approaches are appropriate. If the problem entity
is unobservable, then it is impossible to determine
whether the metamodel satises any specic nu-
merical accuracy with respect to the problem en-
tity. (A special case is the simulation of a computer
system: the simulation usually runs slower than the
real system, if the latter exists.)
Fig. 1 further shows that the dierence between
the metamodel and the problem entity's responses
result from a combination of two approximations:
(i) metamodel versus simulation, and (ii) simula-
tion versus problem entity. These two approxi-
mations either add to each other, or partially
cancel each other. It is not uncommon to havedierent ranges of accuracy required of the simu-
lation and of the metamodel.
Whenever the simulation has continuous fac-
tors, it is impossible to compare metamodel and
simulation over the complete domain of the
metamodel's domain. Instead, DOE determines
the data points actually used in these comparisons
A metamodel may be valid for one subdomain,
not for another. For example, in queuing simula-
tions a rst-order polynomial regression may be
valid for low trac rates, whereas higher tracrates require more sophisticated metamodels; see
Cheng and Kleijnen (1999).
A metamodel is modied until it is valid for all
experimental conditions tested, using the specied
validity measures and their required values. When
statistical tests are used to determine validity, the
probabilities of types I and II errors can be cal-
culated sometimes; see Balci and Sargent (1981). A
type II error occurs if the variance of an estimated
factor eect is high (high output variance or small
total sample size). A type I error occurs if that
variance is small (large sample sizes do occur insimulation).
To decide whether to accept a specic model,
we need a criterion, which depends on the model's
goal. Some case studies use the Absolute Relative
Error or ARE (say) r1wY y jw yawj orsimply r1 or r (simulation output w and metamodel
output y; see Fig. 2).
However, this measure is decient if w may be
zero; for example, w may be the Net Present Value
(NPV) in a nancial analysis such as the case-study
in Van Groenendaal and Kleijnen (1997). There-
fore we may prefer the Absolute Error or AE (say)
r2 jw yj (numerator of r1). Kleijnen (1995a)gives as example: `if the target is not detected
within 5 m, the weapon is ineective'.
Another popular measure is the Mean Squared
Prediction Error (MSPE); see Diebold and Mar-
iano (1995). This reference also considers more
general `economic loss' functions, for example,
asymmetric functions.
Besides the selection of a criterion r, we should
quantify a threshold (say) rmax, for example,
rmax 0.1.Since a metamodel is a derivative of the un-
derlying simulation, both may use the same crite-rion. So ideally, the metamodel's output y should
be compared with the problem entity's output z
(Fig. 2), which gives r1(z, y), analogous to r1(z, w).
In the remainder of this section we focus on
comparing the metamodel and the simulation. The
value of the criterion r(w, y) varies with the `system
conguration', determined by DOE's vector of k
factor values d d1Y F F F Y dkH:
r1wdY yd jwd ydawdjX 1
These values may be characterized by theirmean (say) c1 or by their maximum (say) c2:
c1
d
rwdY ydddY c2
maxd
rwdY ydX 2
The choice between mean and maximum again
depends on the problem. For example, in the in-
surance business, an individual policy holder may
generate a loss, but on average the insurance
company makes a prot. So, if a model is usedmany times and small errors may compensate
large errors, then the mean c1 is adequate. Exam-
ples are the neural-network metamodels for eco-
nomic and nancial problems in Verkooyen
(1996). If, however, a single large error is cata-
strophic, then the maximum c2 is a better charac-
teristic. Examples are nuclear simulations and
econometric time-series models. (Dierent criteria
for model selection are examined in the statistical
monograph by Linhart and Zucchini, 1986.)
20 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
8/16
How can the quantities c1 and c2 be estimated?
Dierent statistics may be used to estimate the
mean (c1); most popular is the sample average
(say) "r. This average has attractive properties
(maximum likelihood; minimum variance) pro-
vided certain assumptions hold (e.g., normally,
identically, and independently or NID distributed
r). A more robust statistic is the median of the
sample of size n (say) rna2 (see Kleijnen, 1987). To
estimate the maximum (c2), we may take the
sample maximum rn. Diebold and Mariano
(1995) discuss several statistical tests for compar-
ing predictive accuracy.
Two applications of the accuracy measure ARE
and its maximum, but without an explicit a priorithreshold rmax are the coal-transportation system-
dynamics study in Kleijnen (1995c) and the FMS
simulation in Kleijnen and Standridge (1988).
Both examples are deterministic simulations. (The
mean AE is used in a case study on forecasting the
dollar/Dutch guilder in Diebold and Mariano
(1995); that study, however, concerns an econo-
metric time-series model.)
The literature on random simulations focuses on
the precision of a single characteristic (e.g., mean
steady-state waiting time). This precision is mea-sured by the condence interval half-width, ex-
pressed in either absolute units (say, seconds) or a
percentage (relative precision). Such simulations
have intrinsic noise: while the input vector d is
constant, the simulation output w still shows vari-
ation. This variation may be measured by the
variance varwjd with 1Y F F F Y n. Consequently,it is no longer obvious what the simulation output
w is, when measuring the accuracy; see (Eq. (1)).
Actually this output has a statistical distribution
determined by the simulation. This distribution
may be characterized through dierent quantities,such as its mean E(w) and its 90% quantile wX90.
These quantities may be estimated based on simu-
lation replications (alternatives are renewal analy-
sis, spectral analysis, standardized time series; see
Alexopoulos and Seila, 1998). Then in Eq. (1) w
becomes the estimated mean (say) "w or the esti-
mated quantile w0X9m with m denoting the number
of replications for the factor combination. We may
formulate a null-hypothesis H0 and an alternative
hypothesis H1 (analogous to r2 and rmax):
H0X jiw iyjT dY H1X jiw iyj b dY
3
where the threshold d depends on the problem.Usually such a hypothesis is tested through a
Student statistic, tm. This statistic resembles the
ARE, but now the standardization is done by the
standard error vary "w1a2, not by the simula-tion output w. Actually, several hypotheses need to
be formulated, one for each of the n factor com-
binations. Their simultaneous testing may use
Bonferroni's inequality. Also see Diebold and
Mariano (1995).
Instead of the classic statistical test statistics
(such as the t statistic), we may apply bootstrap-ping, which allows the estimation of the distribu-
tion of any statistic, for any distribution; see Efron
and Tibshirani (1993). Recently, bootstrapping
has been applied to the validation of both simu-
lation and metamodels by Kleijnen et al. (1998)
and Kleijnen et al. (1999), respectively.
3.3. Step 8: Specify a design including tactical
issues, and review the DOE
To save space we assume that the reader has
some familiarity with classic DOE, which as-
sumes a low-order polynomial regression meta-
model for a single output with NID output with
common variance (say) r2 r2. For other
metamodels there are no standard solutions; in-
stead a computerized search for `optimal' designs,
given the metamodel is performed. For example,
Sacks et al. (1989) assume that the response
function is smooth, and that a covariance-sta-
tionary process is a valid model for the tting
errors (the closer two factor combinations are,the more correlated the tting errors are). They
use splines as metamodels. Their designs do not
show regular geometric patterns. Also see Welch
et al. (1992).
We introduce some more symbols: main eects
bj j 1Y F F F Y k, two-factor interactions bjY jH j `jH and jH 2Y F F F Y k, quadratic eects bjY j; totalnumber of regression parameters q; bold letters for
matrices including vectors. For certain k values, a
design is saturated; that is, n q.
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 21
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
9/16
DOE implies an n kdesign matrix D dY j.For example, a rst-order polynomial metamodel
is y X1b1 where X1 is the n (k + 1) matrix ofindependent variables with row vectors x 1YY 1Y F F F YY kYX1 is xed by D since X1 1YDwith 1 1Y F F F Y 1H.
Classic DOE uses Ordinary Least Squares
(OLS) tting (mathematical criterion). So the OLS
estimator b is computed by tting the specied
linear-regression metamodel to a set of I/O data
that consist of the n q matrix of independent
regression variables X and the n-dimensional vec-
tor of simulation outputs w w; Fig. 2 showsthese data for a single factor combination. This
gives b XHX1
XHw, which assumes that theinverse exists; DOE must ensure that X (xed by
D) is indeed not collinear.
Knowledge of the problem entity should inspire
the selection of the metamodel type, which in turn
guides the selection of the design.
Generalizations of OLS are Weighted LS
(WLS) and Generalized LS (GLS). WLS uses the
standard deviations of the outputs r as weights.
GLS is recommended when common pseudoran-
dom numbers are used so the simulation outputs
are correlated. For details on WLS and GLS werefer to Kleijnen (1999).
The goal of DOE may be to minimize the
variances of the estimated factor eects bj. It can
be proven that an orthogonal design matrix D is
optimal, given the classic assumptions. Classic
design classes are resolution III or R-3, R-4, R-5,
and Central Composite (CC) designs. We (briey)
discuss these designs because we shall apply them
later; more examples and references are given in
Kleijnen (1999).
By denition, R-3 gives unique estimates of the
k + 1 parameters in a rst-order polynomial, usingonly n k + 4 (k modulo 4) combinations; forexample, k 7 requires n 8 274.
R-4 gives unique estimates of all rst-order ef-
fects, plus unique estimates of certain linear com-
binations of two-factor interactions. For example,
when k 8 then a 284 design may estimateb1; 2 + b4; 8 + b3; 7 + b5; 6 (see Kleijnen, 1999, pp.
304305). R-4 is simply constructed from R-3
through the Foldover principle: to the original
R-3's D, we add the mirror image D, which
yields a R-4 with n equal to 2krounded upward to
the next power of two. For example, n 16 whenk 5, 6, 7 or 8.
R-5 gives estimators of all main eects and all
two-factor interactions not biased by each other.
Because R-5 requires many factor combina-
tions, these designs are used only for small values
of k.
CC assumes a second-degree polynomial. A CC
design is easily constructed, once a R-5 design is
available: simply add 2k points changing each
factor one at a time, once increasing and once
decreasing each factor in turn; also add the `center'
point. A disadvantage is its high number of com-
binations n ) q 1 k kk 1a2 k and itsnon-orthogonality.
These four design classes imply a k-dimensional
hypercube for the experimental domain, expressed
in standardized factors. We recommend the fol-
lowing standardization (see Bettonvil and Kleij-
nen, 1997). Let the original factor j denoted by vj,
range between low value lj and upper value uj; that
is, the simulation is not valid outside that range.
Measure variation by the half-range j uj lja2 and location by the mean j
uj lja2. Then the design matrix D dY j usesdY j Y j jaj.If a qualitative factor (nominal scale) has more
than two levels or `values', then several binary
variables should be used for coding this factor
in regression analysis. The sonar case-study in
Kleijnen (1995a) erroneously coded three kinds
of bottom (rock, sand, mud) as the values 1, 2,
and 3. Most ANOVA software helps avoid such
errors.
Next we focus on stagewise DOE, because in
simulation the computer generates runs one after
the other. Moreover, classic DOE assumes a spe-cic polynomial regression metamodel, whereas
we suppose that it is uncertain what the correct
degree of the polynomial is. Therefore we recom-
mend R-4 over R-3. The foldover principle, how-
ever, implies that in the rst stage we run a R-3
to estimate the rst-order eects. Some R-3 are
saturated, so it is impossible to apply cross-
validation. However, certain values of k give a
non-saturated design; for example, k 4, 5 or 6requires n 8.
22 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
10/16
After we have run the R-3, we run the mirror
design. Unlike classic DOE, we may execute the
combinations of the mirror design one after an-
other, and apply cross-validation. Once we have
run all combinations of the mirror design, we
compute sums of certain two-factor interactions.
These tell which interactions explain lack of t of
the rst-order metamodel. For example, suppose
factor 1 has an unimportant rst-order eect and
this factor is not involved in an important sum of
interactions b1Y 2Y F F F Yb1Y k. Then we may remove
this factor, to avoid overtting. As in any model-
ing eort, there are no foolproof recipes: factor 1
may still have a quadratic eect.
It may take too much time to run the mirrordesign. Then we may run only the center point.
If we nd out that a second-order polynomial
metamodel is needed, then we expand the R-3 or
the R-4 to a CC. Again, stagewise experimentation
is possible: add 2k axial points plus the center
point. The R-4 may not be a proper subset of the
CC; then some combinations may be used for
checking the t of the second-order polynomial
metamodel.
Once we have run a CC and tted a second-order
polynomial, we check this model's t: third-ordereects may be important. As new check-points we
take combinations not yet simulated: part of the CC
is a fractional two-level factorial (e.g., 2kp), as-
suming k > 4 (see Kleijnen, 1999, p. 309); hence a
fraction of the full 2k is not yet simulated. Which
check-points to select, may be determined ran-
domly, if no other information is available.
The CC's check-points require extrapolation,
whereas the center point implies interpolation. In
general, a metamodel is more reliable when used
for interpolation; it may be dangerous when used
to extrapolate the simulated behavior far outsidethe domain simulated in the DOE.
3.4. Step 9: Fit the metamodel
After the parameters of a metamodel have been
estimated, the t of the metamodel should be
evaluated by using one or more measures. The lack
of t measures discussed in this section may be
applied to any type of metamodel (linear-regres-
sion, neural net, etc.), tted to either random or
deterministic simulation output (Section 4 will
focus on random simulations and linear regression
metamodels). Random simulation usually has
several observations per combination, say m. So
the total number of simulation observations (say)
N equalsn
1 mX Let y denote the metamodel'soutput for the estimated parameter vector bY wY rdenote simulation observation r for combination i;
"w is the average of all N simulation outputs.
The classic measure is the coecient of deter-
mination
2
1 n1
mr1
y4
wY r25D n
1
mr
wY r4
w25X
4
R2 equals one (perfect t) if all N metamodel
outputs equal their corresponding simulation
outputs y wY r, which holds ifN q (saturateddesign with m 1). Because R
2 always increases as
more regression variables are added (higher q), the
adjusted R2 is introduced:
2adjusted 1 1
2
x 1ax qX 5
Unfortunately, there is no simple lower threshold
for this criterion; bootstrapping may help.
The linear correlation coecient q was origi-
nally developed to characterize a bivariate normal
variable (say) (w, y): all observations lie on a
straight line with intercept a0 and slope a1 given by
a0 iy a1iwY
a1 qwY y
varyavarw
pX 6
Hence, if qwY y 1, and y and w have equalmeans and equal variances, then this line has slope
one and intercept zero. In metamodeling, however,
this measure must be interpreted with care: in
Fig. 2 the variables w and y are not bivariate
normal. Nevertheless, y can be regressed on w.
Ideally, simulation and metamodel outputs are
equal (see R2), which means zero intercept and
unity slope. A case study on gas transmission in
Indonesia indeed uses such a plot; see Van Groe-
nendaal and Kleijnen (1997).
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 23
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
11/16
Cross-validation (`tting to subsets' would be a
better term) uses the tted metamodel to predict
the outcomes for new combinations of the simu-
lation, and to compare these predictions with the
corresponding simulation responses. `Leave-one-
out' cross-validation eliminates one combination
(instead of adding new combinations); re-estimates
the metamodel from the remaining n 1 combi-nations, assuming n 1 observations suce (non-saturated design); recomputes the forecast y and
compares this forecast with the simulation output
w; repeats this elimination for a dierent combi-
nation. In practice, the relative prediction errors
yaw are often `eye balled', given the goals of the
model. Examples are the deterministic simulationsfor a FMS in Kleijnen and Standridge (1988) and a
coal transport system in Kleijnen (1995c). (We
shall discuss cross-validation for random simula-
tion; see Section 4)
Only if the metamodel as a whole ts well
(measured as explained above), analysts should
examine individual parameters. In cross-validation
they may observe how these estimated parameters
change as simulated combinations are deleted; if
the specied metamodel is a good approximation,
then these estimates remain stable. Case studies arethe FMS and the coal-transport studies mentioned
above. In these polynomial metamodels the pa-
rameters can be interpreted as factor eects, so the
metamodel can be used for explanation, whereas
the yaw concern prediction. Other metamodels(e.g., neural nets) are harder to interpret.
Eliminating noisy parameters may decrease the
variance of the remaining parameter estimators b
and the corresponding predictions y. For example,
in M/M/1 simulation Cheng and Kleijnen (1999) t
a rst-order and a second-order polynomial re-
spectively; the former has a variance seven timessmaller than the overtted metamodel! Elimina-
tion of unimportant factors may also increase the
adjusted R2. Kleijnen and Standridge (1988) apply
cross-validation to a rst-order polynomial with
k 4, which gives unstable estimates for the fac-tors 1 and 3 and high values for the relative pre-
diction errors. Therefore they replace the rst-
order by a second-order polynomial for only k 2stable factors; they set non-stable factor eects to
zero. Now they get an adequate metamodel: stable
eects b2Y b4Y b2Y 4 and low prediction errors. Ingeneral, the mission of science is to come up with
simple explanations: parsimony or Occam's razor.
3.5. Step 10: Determine the validity of the tted
metamodel
The eort devoted to validating a metamodel
depends on the goal of this metamodel. If the goal
is `understanding', then a `reasonable' eort
should be spent. However, for `prediction' exten-
sive validation should be performed. In case of
`optimization', a series of metamodels are usually
developed; validity testing may be limited to thelast few metamodels, which are tested extensively.
Finally, if the goal is `aiding in V & V of a simu-
lation', then validation is usually conducted only
with respect to the simulation (not the problem
entity).
The data for validating a metamodel versus the
simulation must rst be generated. In Step 8 we
developed a DOE for data generation in two parts:
one for tting, and one for validating. Prior to
generating the data for validation, the DOE needs
to be reviewed. It may need modication becausethe original design was `expanded' in tting and
checking the t. DOE should allow testing during
the validation for a higher-order model than the
tted metamodel. After the metamodel has passed
validity with the new data, the data for tting may
also be used for additional validity testing. This is
especially worthwhile if the measures used in va-
lidity and tting are dierent.
Next we conduct validation of the metamodel
versus the problem entity. The extend of this vali-
dation again depends on the goal of the meta-
model. The same methods for validatingsimulations (see Section 1) can be used for vali-
dating metamodels versus the problem entity. The
c1, c2, in Eq. (2) and the statistical formulas given
above, can now be used for testing validity, but the
symbol w (simulation output) is now replaced by z
(problem entity output). Suppose that the problem
entity is observable and data have been collected
on it and used for validating the simulation. Then
it seems reasonable to use these same data when
validating the metamodel versus the problem
24 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
12/16
entity. The resulting statistical condence level
seems to deserve more research.
4. A procedure for linear-regression metamodeling
in random simulation
In this section we focus on random simulations
and linear-regression metamodels, highlighting
Steps 810. If this metamodel is estimated through
OLS, the variance of its output is vary xHrbx; for GLS we refer to Kleijnen (1999). De-
noting cross-validation with elimination of I/O
combination i by the subscript-i gives
b XHX
1XHw with 1Y F F F Y nX 7
Actually, these n recomputations can be avoided,
using the original b and the so-called hat matrix
XHX1XH (see Kleijnen and Van Groenendaal,
1992, pp. 156157). Using b gives the predictor y.
The Studentized cross-validation prediction error is
tm w y
varw
vary
1a2Y 8
where the degrees of freedom m are unknown.
Kleijnen (1992) uses m m 1. To control thetype I error rate, we apply Bonferroni's inequality:
replace a by a/n. Diagnostic statistics in modern
software related to our cross-validation measures
are PRESS, DEFITS, DFBETAS, and Cook's D.
An alternative is the F lack-of-t test, which
compares two estimators of the common variance
of simulation outputs (r2). The rst estimator is
based on replication:
s2 m
j1
wY j "w2am 1Y 9
where "w is the average of m independent simu-
lation replications. The weighted average of these
n estimators isn
1 s2 maxan. The second
variance estimator isn
1 e2man q with the
estimated residuals e "w y. The latter esti-mator is unbiased only if the regression model is
specied correctly; otherwise it overestimates. The
two estimators are compared through pnqY xn.
Rao (1959) extends this test from OLS to GLS.
Kleijnen (1992) shows that Rao's test is better than
cross-validation if the simulation responses are
symmetrically distributed, for example, normally
or uniformly distributed; lognormally distributed
responses are better analyzed through cross-vali-
dation (or bootstrapping of Rao's statistic; see
Kleijnen et al. (1999)).
Most simulations have multiple outputs (e.g.,
customer's queuing time and server's idle time;
mean and 90% quantile). In practice, multiple
outputs are handled through the application of the
techniques of this paper per output type. Actually,
for linear-regression metamodels Khuri (1996b,
p. 385) proves that the best linear unbiased esti-
mator (BLUE) of the factor eects for the various
responses remains the same as the OLS estimatorsobtained from tting per individual output, as-
suming a single experimental design is used for the
data generation of all outputs. Ideally, the design
should account for the presence of multiple out-
puts; see Khuri (1996a, b) and Kleijnen (1987).
Simultaneous inference may be based on Bonfer-
roni's inequality. Optimization in the presence of
multiple responses is discussed in Kleijnen (1999).
If a metamodel does not pass the tting and
validation tests, then instead of adding interac-
tions and simulating the combinations of theaugmented design we may try transformations or
disaggregate an independent variable into its
components (e.g., disaggregate trac rate into
arrival rate and service rate).
In Fig. 3 we give a owchart for this meta-
modeling process, focusing on Steps 810 re-
stricted to linear-regression metamodels. We add
the following comments.
Blocks 13, 16, and 19 state `Go back to 1'.
Obviously, in such an iteration a step may de-
generate (e.g., Step 3, select type of design, may
use exactly the same design as the one used in thepreceding iteration). A practical example is the
FMS case study in Kleijnen and Standridge (1988):
a rst-order polynomial in four factors is rejected,
and replaced by a rst-order polynomial in only
two of the original four factors augmented with
the interaction between the two remaining factors;
the same design with its I/O simulation data is
used.
Goals may aect certain blocks; for example,
optimization through RSM implies that block 2
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 25
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
13/16
Fig. 3. A procedure for linear-regression metamodeling in random simulation
26 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
14/16
will always select a rst-order polynomial in the
rst iterations. Yet we feel that the gure is suited
to all four goals of metamodeling: this gure is
based on years of experience with linear regression
for all four goals of metamodeling.
After validating the metamodel versus the
simulation, this metamodel should be validated
against the problem entity; to save space, these
steps are not displayed in the gure.
5. Research issues
A few academic studies use other mathematical
norms than L2 (least squares): see Kleijnen (1987).
Well-known alternative norms are L1 (absolute
tting errors) and LI (Tchebyshev, maximum er-
ror):
v1X minb
n1
jw ybjY vIX minb
max
jw ybjX
10
Current practice is as follows. Simulationists use a
convenient mathematical norm to t a specic
metamodel to the simulation I/O data. Next, theyoften use a dierent norm when they examine the
resulting t and validity of the metamodel! Ex-
amples are provided by the case studies mentioned
above: Kleijnen (1995a) and Kleijnen and Stan-
dridge (1988) use L2 (OLS) to compute the esti-
mated parameter vector b, but vI to decide on the
adequacy of the resulting metamodel. Verkooyen
(1996), however, uses the same norm, namely L2,
in both tting and validating his neural net. No
research has yet been done on how to compute the
metamodel's parameters such that c1 (mean inac-
curacy in validation; see Eq. (2)) or c2 (maximum)is minimized.
In random simulations, cross-validation and
the lack-of-t F-statistic are usually applied to test
zero lack of t. However, we may wish to test for a
non-zero threshold (d > 0 in Eq. (3)).
Accuracy of metamodel versus problem entity
(not simulation) has not been investigated in the
literature.
The four dierent goals of metamodeling may
require dierent metamodel types and dierent
accuracies. Fig. 3 neglects this problem; more re-
search is needed.
Multi-variate regression analysis and multi-
variate DOE for metamodeling certainly deserve
more research.
A procedure such as outlined in Fig. 3 (linear
regression in random simulation) should also be
developed for other metamodel types such as
neural nets and splines.
6. Summary
Because metamodels are often used in an ad
hoc way, we derived a methodology for developingmetamodels with the following characteristics: (i)
We strictly distinguished between tting and vali-
dating. (ii) We emphasized the role of the problem
entity. (iii) We distinguished four types of goal.
This yielded a metamodeling process with 10 steps.
We also gave a detailed procedure for validating
linear-regression metamodels in random simula-
tion, suitable to all four goals.
We covered several metamodel types, including
linear regression and neural nets, but we focused
on polynomial metamodels. The four types ofgoals are: (i) understanding, (ii) prediction, (iii)
optimization, and (iv) V & V of the underlying
simulation.
Our methodology with 10 steps includes classic
DOE and standard measures of t, such as the R-
square coecient and various cross-validation
measures. The methodology, however, also in-
cludes stagewise DOE and several validation cri-
teria, measures, and estimators.
Acknowledgements
CentER provided nancial support enabling
R.G. Sargent to visit Tilburg University during
one month for research on the topic of this paper.
References
Alexopoulos, Seila, 1998. Output data analysis. In: Banks, J.
(Ed.), Handbook of Simulation. Wiley, New York.
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 27
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
15/16
Balci, O., Sargent, R.G., 1981. A methodology for cost-risk
analysis in the statistical validation of simulation models.
Communications of the ACM 24 (4), 190197.
Barton, R.R., 1993. New tools for simulation metamodels.IMSE Working Paper 93110, Department of Industrial
and Management Systems Engineering, Penn State Univer-
sity, University Park, PA 16802.
Barton, R.R., 1994. Metamodeling: A state of the art review.
In: Tew, J.D., Manivannan, S., Sadowski, Seila, A.F.,
(Eds.), Proceedings of the 1994 Winter Simulation Confer-
ence, pp. 237244.
Beck, M.B., Ravetz, J.R., Mulkey, L.A., Barnwell, T.O., 1997.
On the problem of model validation for predictive exposure
assessments. Stochastic Hydrology and Hydraulics 11, pp.
229254.
Bettonvil, B., Kleijnen, J.P.C., 1997. Searching for important
factors in simulation models with many factors: Sequential
bifurcation. European Journal of Operational Research 96(1), 180194.
Chen, B., 1985. A statistical validation procedure for discrete
event simulation over experimental regions. Ph.D. Disser-
tation, Department of Industrial Engineering and Opera-
tions Research, Syracuse University, Syracuse, NY.
Cheng, R.C.H., Kleijnen, J.P.C., 1999. Improved designs of
queuing simulation experiments with highly heteroscedastic
responses. Operations Research (forthcoming).
Diebold, F.X., Mariano, R.S., 1995. Comparing predictive
accuracy. Journal of Business and Economic Statistics 13
(3), 253263.
Donohue, J.M., 1995. The use of variance reduction techniques
in the estimation of simulation metamodels. In: Alexopou-
los, C., Kang, K., Lilegdon, W.R., Goldsman, D., (Eds.),
Proceedings of the 1995 Winter Simulation Conference, pp.
195199.
Efron, B., Tibshirani, T.J., 1993. Introduction to the Bootstrap.
Chapman and Hall, New York.
Ehrman, C.M., Hamburg, M., Krieger, A.M., 1996. A method
for selecting a subset of alternatives for future decision
making. European Journal of Operational Research 96,
407416.
Friedman, L.W., 1996. The Simulation Metamodel. Kluwer,
Dordrecht, The Netherlands.
Fu, M.C., 1994. Optimization via simulation: A review. Annals
of Operations Research 53, 199247.
Goldsman, D., Nelson, B.L., 1998. Comparing Systems viaSimulation. In: Banks, J. (Ed.), Handbook of Simulation.
Wiley, New York.
Huber, K.P., Berthold, M.R., Szczerbicka, H., 1996. Analysis
of simulation models with fuzzy graph based metamodeling.
Performance Evaluation 27/28, 473490.
Khuri, A.I., 1996a. Analysis of multiresponse experiments: A
review. In: Ghosh, S. (Ed.), Statistical Design and Analysis
of Industrial Experiments, Marcel Dekker, New York, pp.
231246.
Khuri, A.I., 1996b. Multiresponse surface methodology. In:
Ghosh, S., Rao, C.R. (Eds.), Handbook of Statistics, vol.
13. Elsevier, Amsterdam, pp. 377406.
Kleijnen, J.P.C., 1987. Statistical tools for simulation practi-
tioners. Marcel Dekker, New York.
Kleijnen, J.P.C., 1992. Regression metamodels for simulation
with common random numbers: comparison of validationtests and condence intervals. Management Science 38 (8),
11641185.
Kleijnen, J.P.C., 1995a. Case study: Statistical validation of
simulation models. European Journal of Operational Re-
search 87 (1), 2134.
Kleijnen, J.P.C., 1995b. Verication and validation of simula-
tion models. European Journal of Operational Research 82
(1), 145162.
Kleijnen, J.P.C., 1995c. Sensitivity analysis and optimization of
system dynamics models: Regression analysis and statistical
design of experiments. System Dynamics Review 11 (4),
275288.
Kleijnen, J.P.C., 1999. Experimental design for sensitivity
analysis, optimization, and validation of simulation models.In: Banks, J., (Ed.), Handbook of Simulation. Wiley, New
York. (Preprint: CentER Discussion Paper, no. 9752.).
Kleijnen, J.P.C., Gaury, E., 1998. Risk analysis of robust
system design. In: Medeiros, D.J., Watson, E.F., Carson,
J.S., Manivannan, M.S. (Eds.), Proceedings of the 1998
Winter Simulation Conference, pp. 15331540.
Kleijnen, J.P.C., Standridge, C., 1988. Experimental design and
regression analysis: An FMS case study. European Journal
of Operational Research 33 (3), 257261.
Kleijnen, J.P.C., van Groenendaal, W., 1992. Simulation: A
statistical perspective. Wiley, Chichester.
Kleijnen, J.P.C., Cheng, R.C.H., Bettonvil, B., 1998. Validation
of trace-driven simulation models: bootstrapped tests.
Working Paper.
Kleijnen, J.P.C., Feelders, A.J., Cheng, R.C.H., 1998. Boot-
strapping and validation of metamodels in simulation. In:
Medeiros, D.J., Watson, E.F., Carson, J.S., Manivannan,
M.S. (Eds.), Proceedings of the 1998 Winter Simulation
Conference, pp. 701706.
Kleijnen, J.P.C., Van Ham, G., Rotmans, J., 1992. Techniques
for sensitivity analysis of simulation models: a case study of
the CO2 greenhouse eect. Simulation 58 (6), 410417.
Linhart, H., Zucchini, W., 1986. Model selection. Wiley, New
York.
Pierreval, H., 1996. A metamodel approach based on neural
networks. International Journal in Computer Simulation 6
(3) 365378.Rao, C.R., 1959. Some problems involving linear hypothesis in
multivariate analysis. Biometrika 46, 4958.
Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P., 1989. Design
and Analysis of Computer Experiments (Includes Com-
ments and Rejoinder). Statistical Science 4 (4), 409435.
Saltelli, A., Andres, T.H., Homma, T., 1995. Sensitivity analysis
of model output; performance of the iterated fractional
factorial design method. Computational Statistics and Data
Analysis 20, 387407.
Sanchez, S.M., Sanchez, P.J., Ramberg, J.S., Moeeni, F., 1996.
Eective engineering design through simulation. Interna-
tional Transactions Operational Research 3 (2) 169-185.
28 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429
-
7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation
16/16
Sargent, R.G., 1991. Research issues in metamodeling. In:
Nelson, B.L., Kelton, W.D., Clark, G.M. (Eds.), Proceed-
ings of the 1991 Winter Simulation Conference, pp. 888
893.Sargent, R.G., 1996. Verifying and validating simulation
models. In: Charnes, J.M., Morrice, D.M., Brunner, D.T.,
Swain, J.J. (Eds.), Proceedings of the 1996 Winter Simula-
tion Conference, pp. 5564.
Schlesinger, S., et al., 1979. Terminology for model credibility.
Simulation 32 (3), 103104.
Van Groenendaal, W.J.H., Kleijnen, J.P.C., 1997. On the
assessment of economical risk: Factorial design versus
Monte Carlo methods. Journal of Reliability Engineering
and Systems Safety 57 (1), 103105.
Verkooyen, W.J.H., 1996. Neural networks in economic
modelling. CentER, Tilburg University, Tilburg.Welch, W.J., Buck, R.J., Sacks, J., Wynn H.P., c.s., 1992.
Screening, predicting, and computer experiments. Techno-
metrics 34 (1), 1525.
Yu, B., Popplewell, K., 1994. Metamodel in manufacturing: a
review. International Journal of Production Research 32 (4),
787796.
Zeigler, B., 1976. Theory of Modelling and Simulation. Wiley/
Interscience, New York.
J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 29