kleijnen - a methodology for fitting and validating metamodels in simulation

Upload: rodrigo-freitas-lopes

Post on 03-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    1/16

    Theory and Methodology

    A methodology for tting and validating metamodels in

    simulation 1

    Jack P.C. Kleijnen a,*, Robert G. Sargent b

    a

    Department of Information Systems (BIK)/Center for Economic Research, Tilburg University (KUB), 5000 LE Tilburg,The Netherlands

    b Simulation Research Group, Department of Electrical Engineering and Computer Science, 439 Link Hall, Syracuse University,

    Syracuse, NY 13244, USA

    Received 1 December 1997; accepted 1 October 1998

    Abstract

    This paper proposes a methodology that replaces the usual ad hoc approach to metamodeling. This methodology

    considers validation of a metamodel with respect to both the underlying simulation model and the problem entity. It

    distinguishes between tting and validating a metamodel, and covers four types of goal: (i) understanding, (ii) pre-

    diction, (iii) optimization, and (iv) verication and validation. The methodology consists of a metamodeling processwith 10 steps. This process includes classic design of experiments (DOE) and measuring t through standard measures

    such as R-square and cross-validation statistics. The paper extends this DOE to stagewise DOE, and discusses several

    validation criteria, measures, and estimators. The methodology covers metamodels in general (including neural net-

    works); it also gives a specic procedure for developing linear regression (including polynomial) metamodels for

    random simulation. 2000 Elsevier Science B.V. All rights reserved.

    Keywords: Simulation; Approximation; Response surface; Modeling; Regression

    1. Introduction

    Although metamodels have been quite fre-

    quently studied in the simulation literature and

    applied in the simulation practice, this has been

    done in an ad hoc way. Therefore in this paper we

    oer a methodology for developing metamodels.Our methodology has the following novel char-

    acteristics: (i) We strictly distinguish between t-

    ting and validating a metamodel. (ii) We

    emphasize the role of the problem entity besides

    the metamodel and the simulation model. (iii) We

    assume that any model should be developed for a

    specic goal; for metamodels we distinguish

    four types of goal. Guided by these three charac-

    teristics, we derive a metamodeling process with

    10 steps. We also give a detailed procedure for

    European Journal of Operational Research 120 (2000) 1429www.elsevier.com/locate/orms

    * Corresponding author. Tel.: +31-13-466-2029; fax: +31-13-

    466-3377; e-mail: [email protected] Two anonymous referees' comments on the rst draft lead

    to an improved organization of our paper.

    0377-2217/00/$ see front matter 2000 Elsevier Science B.V. All rights reserved.

    PII: S 0 3 7 7 - 2 2 1 7 ( 9 8 ) 0 0 3 9 2 - 0

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    2/16

    validating metamodels that use linear regression

    analysis in random simulation; we feel that this

    procedure is suited to all four goals of metamod-

    eling.

    Our methodology is meant for both researchers

    and practitioners. We assume that those readers

    have a basic knowledge of simulation and mathe-

    matical statistics, especially regression analysis.

    The characteristics of our methodology are de-

    tailed as follows.

    We distinguish the relationships among meta-

    model, simulation model, and problem entity that

    are displayed in Fig. 1, where w.r.t. stands for

    `with respect to'. A problem entity is some system

    (real or proposed), idea, situation, policy, or phe-nomenon that is being modeled. A simulation

    model is a causal model of some problem entity;

    this model may be deterministic or stochastic. A

    metamodel is an approximation of the input/out-

    put (I/O) transformation that is implied by the

    simulation model; the resulting black-box model is

    also known as a response surface. There are dif-

    ferent types of metamodels; for example, polyno-

    mial regression models (which are a type of linear

    regression), splines (which partition the domain of

    applicability into subdomains and t simple re-gression models to each of the subdomains), and

    neural networks (a type of non-linear regression);

    see Barton (1993, 1994), Friedman (1996), Huber

    et al. (1996), Kleijnen (1999), Pierreval (1996), Yu

    and Popplewell (1994).

    Validation is the `substantiation that a model

    within its domain of applicability possesses a sat-

    isfactory range of accuracy consistent with the

    intended application of the model'; see Sargent

    (1996) and Schlesinger et al. (1979). Fig. 1 shows

    that validation of a metamodel relates to both the

    simulation model and the problem entity. To val-

    idate a metamodel, we must know the accuracy

    required of the metamodel; this depends on the

    goal of the metamodel. We do not discuss valida-

    tion of the underlying simulation model, except

    when discussing the use of a metamodel to aid in

    this validation (for validation of simulation mod-els see Beck et al., 1997; Kleijnen, 1995b; Sargent,

    1996 and http://manta.cs.vt.edu/biblio/.) So we

    assume that the simulation model under consid-

    eration is valid, unless stated otherwise.

    Fitting applies mathematical and statistical

    techniques to a set of simulation I/O data, to (i)

    estimate the metamodel's parameter values, and

    (ii) evaluate the estimated parameter values with

    respect to the data set, using quantitative criteria.

    Prior to tting a metamodel, we must specify the

    type and form (or class) of the metamodel.Whereas validation requires (domain) knowledge

    about the problem entity and the specied accu-

    racy required of the metamodel, tting concerns

    only the determination of a `good' set of parameter

    values from the data set. So tting is not concerned

    with whether the metamodel adequately describes

    the problem entity and the simulation model, given

    the goal of the metamodel.

    For metamodels we identify four general goals:

    (i) understanding the problem entity, (ii) predicting

    values of the output or response variable, (iii)

    performing optimization, and (iv) aiding the veri-cation and validation (V & V) of a simulation

    model. These goals need attention when specifying

    the type and form of metamodel to be tted and

    when determining the validity of the metamodel.

    The study of a problem entity usually distin-

    guishes several outputs. Hence, the corresponding

    simulation also has multiple response variables.

    Current practice, however, uses metamodels with a

    single output: a separate metamodel is developed

    for each output. Indeed we develop a methodologyFig. 1. Metamodel, simulation model, and problem entity.

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 15

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    3/16

    for a single output; this methodology can be ap-

    plied to each of the single-response metamodels.

    We use dierent symbols for the I/O data of

    problem entity, simulation, and metamodel: see

    Fig. 2. We shall return to the individual symbols.

    We organize the remainder of this paper as

    follows. Section 2 covers our methodology for

    developing metamodels, consisting of a process

    with 10 steps. Section 3 details some steps. Sec-

    tion 4 focuses on linear-regression metamodels in

    random simulation. Section 5 contains research

    issues. Section 6 provides a summary.

    2. The metamodeling process

    We suggest the following 10 steps for develop-

    ing a metamodel.

    1. Determine the goal of the metamodel: Let us

    reconsider the four goals mentioned above. (a)

    Problem entity understanding: A goal may be un-

    derstanding the internal behavior of a problem

    entity (e.g., determining the `bottlenecks' in a

    system), understanding the behavior of the output,

    performing sensitivity analysis (i.e., determining

    how sensitive the output is to changes to the in-puts), and conducting whatif studies. (b) Predic-

    tion: The metamodel may replace the simulation,

    to obtain a set of values of the output for a specic

    set of values of the inputs. The metamodel is then

    used routinely instead of the simulation, since the

    metamodel is usually quicker and easier to use

    than the simulation. (c) Optimization: The meta-

    model may be used to determine the set of values

    of the problem entity inputs that optimizes a spe-

    cic objective function. This function may depend

    on one or more problem entity's outputs. (d) Aid in

    V & V of the simulation: see Section 1.

    2. Identify the inputs and their characteristics:

    We distinguish between input variables (briey:

    inputs) and parameters (see Zeigler, 1976). Inputs

    are directly observable (for example, number of

    servers), whereas parameters require statistical in-ference (for example, service rate k). (The simula-

    tion computer program may treat these inputs and

    parameters as `inputs'.) For each input we should

    determine whether the input is deterministic or

    random; also we must select a measurement scale

    (these scales namely, nominal, ordinal, interval,

    ratio, and absolute are discussed in Kleijnen

    (1987, pp. 138141)). The metamodel has `inde-

    pendent variables' x, which are functions of the

    simulation inputs and parameters; for example,

    x1 k2

    ; also see Fig. 2. The independent variablesof the metamodel should be identied. The simu-

    lation inputs and parameters are called factors in

    the design of experiments (DOEs). Even a random

    simulation has deterministic inputs and parame-

    ters; the only random element is the seed of its

    pseudorandom number generator s0, but even this

    seed may be xed.

    3. Specify the domain of applicability (experi-

    mental region): We should determine the combi-

    nation of values of the factors for which the

    metamodel is to be valid. (To obtain a valid sim-

    ulation, its factor values also need to be restrictedto a certain domain; see the `experimental frame'

    concept in Zeigler, 1976.)

    4. Identify the output variable and its charac-

    teristics: The output should be identied, and

    like the inputs this output must be classied as

    random or deterministic, and a measurement scale

    must be selected.

    5. Specify the accuracy required of the meta-

    model: We specify the range of accuracy required

    of the output of the metamodel, for its domain ofFig. 2. Notation for I/O data of problem entity, simulation, and

    metamodel.

    16 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    4/16

    applicability, with respect to both the simulation

    and the problem entity. This range depends on the

    goal of the metamodel and also, perhaps, on the

    goal of the simulation study. The accuracy re-

    quired of the metamodel when applied to the

    simulation may dier from the accuracy when it is

    applied to the problem entity. The ranges of ac-

    curacies may be specied via the validity measures

    discussed in the next step.

    6. Specify the metamodel's validity measures and

    their required values: We determine the metamodel's

    validity measures (e.g., absolute error and absolute

    relative error), along with their required values.

    7. Specify the metamodel, and review this speci-

    cation: First we select a type of metamodel; ex-amples are polynomial regression models and

    neural networks (nets). Next, we select a form of

    the metamodel from the type selected; examples

    are rst- or second-degree polynomials, and neural

    nets with dierent levels and number of nodes. The

    users of the metamodel should review the type and

    form of metamodel specied, to ensure that the

    particular metamodel is satisfactory for its in-

    tended goals.

    8. Specify a design including tactical issues, and

    review the DOE: We determine the type of designto be used (e.g., 2kp) and the specic (say) n design

    points for which data are to be collected from the

    simulation. The resulting I/O data are partitioned

    into two parts: one for tting and one for vali-

    dating the metamodel. Determining the design is

    frequently called a strategic issue in DOE. We

    must also decide on tactical issues when the sim-

    ulation is random: the number of observations

    (runs) at each design point must be determined.

    Further, we may use variance reduction techniques

    (e.g., common pseudorandom numbers). There

    may be interaction between strategic and tacticalissues; for example, common and antithetic num-

    bers may be selected following the well-known

    SchrubenMargolin strategy; see Donohue (1995).

    In this step we must nally review all DOE aspects,

    to determine whether they are satisfactory for the

    goal, type, and form of metamodel specied and

    for the validation of the metamodel.

    9. Fit the metamodel: We run the simulation to

    obtain those I/O data specied in Step 8 for tting

    the metamodel. (We do not yet obtain the data for

    determining validity; see the next step.) From these

    data we estimate the parameter values of the

    metamodel, using (say) least squares. We evaluate

    these estimates, using mathematical and statistical

    criteria. For example, if the metamodel is a linear-

    regression model, then a statistical analysis can

    determine if the metamodel is overtted (i.e., some

    parameters can be set zero). If the assessment is

    not satisfactory, then we repeat Steps 79.

    10. Determine the validity of the tted meta-

    model: We run the simulation to obtain the data

    specied in Step 8 for validating the metamodel.

    We then determine if the metamodel satises the

    validity measures with respect to the simulation

    (specied in Step 6), rst for the validity data set,and then for the tting data set. If the metamodel

    satises the validity measures for both data sets,

    then we consider the metamodel valid relative to

    the simulation; else we repeat Steps 710. Lastly,

    we determine the validity of the metamodel versus

    the problem entity; the amount of testing and

    evaluation depends on the goal of the metamodel.

    If validity cannot be established with respect to the

    problem entity, then we repeat Steps 710.

    3. Discussion of some steps

    In this section we detail some of the steps in the

    metamodeling process presented in Section 2. We

    do not devote equal space to each step: we are brief

    on information widely known in the simulation

    literature. Specically, Section 3.1 discusses Step 1

    (determine the goal of the metamodel), Section 3.2

    returns to Step 6 (specify the metamodel's validity

    measures and their required values), Section 3.3

    details Step 8 (specify a design including tactical

    issues, and review the DOE), Section 3.4 treatsStep 9 (t the metamodel), and Section 3.5 covers

    Step 10 (determine the validity of the tted meta-

    model).

    3.1. Step 1: Determine the goal of the metamodel

    3.1.1. Goal 1 (understanding)

    We distinguish the following three levels of un-

    derstanding.

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 17

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    5/16

    (a) Direction of output: does the output increase

    or decrease for an increase in a specic input? This

    goal requires at least ordinal measurement scales

    for input and output. For this goal a metamodel

    can often be simple; in fact, it may be desirable to

    develop several simple models instead of one

    complex model. The signs of the individual pa-

    rameters of the metamodel should support prior

    knowledge about the problem entity. For example,

    in a queuing problem the mean waiting time

    should increase with trac rate; so if a rst-order

    polynomial is used and x1 denotes trac rate, then

    its estimated b1 should be non-negative.

    (b) Screening: give a `short list' of the most

    important factors. For this goal, a rather crudemetamodel may suce (of course there is always

    the danger that such a crude metamodel is mis-

    leading). For example, a rst-order polynomial,

    possibly augmented with cross-products (two-fac-

    tor interactions) is used by a technique called se-

    quential bifurcation; see Bettonvil and Kleijnen

    (1997). An alternative technique is discussed in

    Saltelli et al. (1995).

    (c) Main eects, interactions, quadratic eects,

    and other high-order eects: what are the eects of

    a specic factor, possibly in combination withother factors? The type of metamodel may still be

    simple: rst- or second-order polynomials, possi-

    bly combined with transformations such as 1/x

    and log(x). For example, in a case study of Flex-

    ible Manufacturing Systems (FMSs) Kleijnen and

    Standridge (1988) use simulation and metamodel-

    ing: the metamodel suggests that of the four ma-

    chine types, only types one and two are

    bottlenecks that do interact. Other examples are

    the single-server queues with dierent priority

    rules, and the queuing networks in Cheng and

    Kleijnen (1999): a second-degree polynomial canexplain the `exploding' behavior of such a simu-

    lated system as the trac rate x approaches unity;

    even better is the explanation by a rst-order

    polynomial multiplied by the factor 1/(1 x).Regression analysis aims at more understand-

    ing than classic Analysis of Variance (ANOVA)

    and its extensions, namely Multiple Comparison

    Procedures (MCPs) and Multiple Ranking Proce-

    dures (MRPs). Indeed, in ANOVA we test the

    null-hypothesis that a given number of system

    congurations (populations) have the same mean

    (no factor eect at all). Regression analysis, how-

    ever, aims at estimating the magnitudes of the ef-

    fects and at inter- and extrapolation. MCPs aim at

    detecting clusters of system congurations with the

    same mean per cluster. MRPs try to nd the best

    system conguration among a limited set of con-

    gurations. See Ehrman et al. (1996), Fu (1994),

    and Goldsman and Nelson (1998).

    Low-order polynomials are simpler to under-

    stand than splines and neural nets. It is easy to

    comprehend main eects, two-factor interactions,

    and quadratic eects; they can be easily displayed

    graphically.

    The type of metamodel selected should be ap-propriate for the type of understanding desired.

    For example, if it is desired to understand what is

    occurring inside some system, then a metamodel

    such as a polynomial over the entire domain of

    applicability is probably more appropriate than

    (say) a spline metamodel (several simple meta-

    models tted to subdomains of applicability). A

    spline model may be the best choice if the response

    surface is expected to be highly complex and the

    understanding desired is what inputs have the

    most eect on the output.

    3.1.2. Goal 2 (prediction)

    Prediction requires either the absolute or the

    relative magnitude of the output. For example, the

    goals of the simulation study in Kleijnen (1995a)

    are: (a) determine whether it is worthwhile to send

    a ship equipped with sonar into a certain area, for

    a mine search mission: absolute scale necessary; (b)

    determine the relative eects of certain technical

    sonar parameters: interval or ratio scale suces.

    Other examples are the queuing systems in Cheng

    and Kleijnen (1999); these examples require up toa fourth and sixth-degree polynomial, respectively.

    In general, if the metamodel is used for pre-

    diction, then the type of metamodel must be se-

    lected with extreme care: (a) The simulation's I/O

    transformation to be approximated by the meta-

    model often has complex behavior over the do-

    main of applicability; see Sargent (1991). (b) The

    accuracy required for a predictive metamodel is

    usually very high. If, for example, a polynomial is

    used directly on the simulation data, then it

    18 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    6/16

    probably will have to be of high order (with nu-

    merous interaction terms). However, if a good

    transformation of data can be found (see Cheng

    and Kleijnen, 1999), then a simpler polynomial

    regression model can be used. Several other types

    of metamodels may be appropriate; for example,

    neural nets and splines (see Section 1).

    To predict an output for a new set of input

    values, the simulation itself can be used if simu-

    lation run time is not prohibitive. In real-time

    control, however, a metamodel may be preferred;

    examples are found in production scheduling and

    nancial markets.

    3.1.3. Goal 3 (optimization)Optimization is a well-known topic in Opera-

    tions Research. Closely related is goal seeking:

    given a target value for the output, nd the cor-

    responding input values. There are many mathe-

    matical techniques; for example, Response Surface

    Methodology (RSM), sequential simplex search,

    genetic algorithms, simulated annealing, and tabu

    search; see Kleijnen (1999). Metamodels for opti-

    mization that account for both the mean and the

    variance of the output is the focus of Taguchi's

    methods; see Kleijnen and Gaury (1998) andSanchez et al. (1996).

    Consider one specic technique, namely RSM

    (books and review articles are referenced in Khuri,

    1996b, p. 377). RSM relies on low-order polyno-

    mial regression metamodels. Local marginal eects

    are estimated to nd the direction of improvement.

    In the local-exploration phase, RSM uses a se-

    quence of rst-order polynomial metamodels,

    combined with steepest ascent search. In the nal

    optimization phase, RSM uses a single second-

    order polynomial metamodel. Clearly, optimiza-

    tion is more demanding than `understanding' andless demanding than prediction.

    3.1.4. Goal 4 (V & V of simulation)

    We present two ways that metamodels can aid

    goal 4. The rst way determines whether the

    metamodel has eects with the correct signs: does

    the output in the metamodel respond to changes in

    the inputs in the direction that agrees with prior,

    qualitative knowledge about the simulated prob-

    lem entity? A simple academic example is a

    queuing simulation: waiting time increases when

    the trac rate increases. Suppose that for low

    values of the trac rate the metamodel is a rst-

    order polynomial. Then the estimated parameter

    relating these two variables should be positive; else

    there is an error somewhere. Two case studies are

    the ecological simulation in Kleijnen et al. (1992)

    and the military simulation in Kleijnen (1995a).

    The second way to this goal becomes relevant

    when the problem entity is easily observable. Then

    comparisons can be made between the simulation's

    and the problem entity's outputs for numerous

    experimental conditions over the simulation's do-

    main of applicability. Since the collection of data

    on the problem entity is usually costly, it is im-portant to do these data gathering eciently. So it

    must be decided how many experimental condi-

    tions to observe, and where to locate them. This

    depends on the complexity of the simulation's I/O

    transformation over the domain of applicability.

    For example, whether the metamodel is linear or

    quadratic guides the number and location of data

    points needed for comparison; that is, DOE can be

    applied. The resulting design is a plan for com-

    paring the problem entity to the simulation for

    validation of the simulation. See Chen (1985).In general, a simulation is supposed to meet the

    V & V requirements only for a certain application

    domain. Hence, V & V of simulations requires a

    test plan; we proposed to base that plan on DOE.

    But DOE is substantially aided by the use of an

    explicit metamodel! However, that metamodel

    should be validated with respect to the simulation.

    3.2. Step 6: Specify the metamodel's validity

    measures and their required values

    Fig. 1 shows that we wish to determine the

    validity of metamodel versus problem entity. We

    may use the approaches also used for validating a

    simulation. High validity requires that problem

    entity outputs be obtained for numerous experi-

    mental conditions. Unfortunately, this is usually

    impossible. Therefore, the validity of a metamodel

    is determined by making (i) many comparisons

    between the outputs of the metamodel and the

    simulation, and (ii) a few comparisons between the

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 19

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    7/16

    metamodel and the problem entity, using whatever

    approaches are appropriate. If the problem entity

    is unobservable, then it is impossible to determine

    whether the metamodel satises any specic nu-

    merical accuracy with respect to the problem en-

    tity. (A special case is the simulation of a computer

    system: the simulation usually runs slower than the

    real system, if the latter exists.)

    Fig. 1 further shows that the dierence between

    the metamodel and the problem entity's responses

    result from a combination of two approximations:

    (i) metamodel versus simulation, and (ii) simula-

    tion versus problem entity. These two approxi-

    mations either add to each other, or partially

    cancel each other. It is not uncommon to havedierent ranges of accuracy required of the simu-

    lation and of the metamodel.

    Whenever the simulation has continuous fac-

    tors, it is impossible to compare metamodel and

    simulation over the complete domain of the

    metamodel's domain. Instead, DOE determines

    the data points actually used in these comparisons

    A metamodel may be valid for one subdomain,

    not for another. For example, in queuing simula-

    tions a rst-order polynomial regression may be

    valid for low trac rates, whereas higher tracrates require more sophisticated metamodels; see

    Cheng and Kleijnen (1999).

    A metamodel is modied until it is valid for all

    experimental conditions tested, using the specied

    validity measures and their required values. When

    statistical tests are used to determine validity, the

    probabilities of types I and II errors can be cal-

    culated sometimes; see Balci and Sargent (1981). A

    type II error occurs if the variance of an estimated

    factor eect is high (high output variance or small

    total sample size). A type I error occurs if that

    variance is small (large sample sizes do occur insimulation).

    To decide whether to accept a specic model,

    we need a criterion, which depends on the model's

    goal. Some case studies use the Absolute Relative

    Error or ARE (say) r1wY y jw yawj orsimply r1 or r (simulation output w and metamodel

    output y; see Fig. 2).

    However, this measure is decient if w may be

    zero; for example, w may be the Net Present Value

    (NPV) in a nancial analysis such as the case-study

    in Van Groenendaal and Kleijnen (1997). There-

    fore we may prefer the Absolute Error or AE (say)

    r2 jw yj (numerator of r1). Kleijnen (1995a)gives as example: `if the target is not detected

    within 5 m, the weapon is ineective'.

    Another popular measure is the Mean Squared

    Prediction Error (MSPE); see Diebold and Mar-

    iano (1995). This reference also considers more

    general `economic loss' functions, for example,

    asymmetric functions.

    Besides the selection of a criterion r, we should

    quantify a threshold (say) rmax, for example,

    rmax 0.1.Since a metamodel is a derivative of the un-

    derlying simulation, both may use the same crite-rion. So ideally, the metamodel's output y should

    be compared with the problem entity's output z

    (Fig. 2), which gives r1(z, y), analogous to r1(z, w).

    In the remainder of this section we focus on

    comparing the metamodel and the simulation. The

    value of the criterion r(w, y) varies with the `system

    conguration', determined by DOE's vector of k

    factor values d d1Y F F F Y dkH:

    r1wdY yd jwd ydawdjX 1

    These values may be characterized by theirmean (say) c1 or by their maximum (say) c2:

    c1

    d

    rwdY ydddY c2

    maxd

    rwdY ydX 2

    The choice between mean and maximum again

    depends on the problem. For example, in the in-

    surance business, an individual policy holder may

    generate a loss, but on average the insurance

    company makes a prot. So, if a model is usedmany times and small errors may compensate

    large errors, then the mean c1 is adequate. Exam-

    ples are the neural-network metamodels for eco-

    nomic and nancial problems in Verkooyen

    (1996). If, however, a single large error is cata-

    strophic, then the maximum c2 is a better charac-

    teristic. Examples are nuclear simulations and

    econometric time-series models. (Dierent criteria

    for model selection are examined in the statistical

    monograph by Linhart and Zucchini, 1986.)

    20 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    8/16

    How can the quantities c1 and c2 be estimated?

    Dierent statistics may be used to estimate the

    mean (c1); most popular is the sample average

    (say) "r. This average has attractive properties

    (maximum likelihood; minimum variance) pro-

    vided certain assumptions hold (e.g., normally,

    identically, and independently or NID distributed

    r). A more robust statistic is the median of the

    sample of size n (say) rna2 (see Kleijnen, 1987). To

    estimate the maximum (c2), we may take the

    sample maximum rn. Diebold and Mariano

    (1995) discuss several statistical tests for compar-

    ing predictive accuracy.

    Two applications of the accuracy measure ARE

    and its maximum, but without an explicit a priorithreshold rmax are the coal-transportation system-

    dynamics study in Kleijnen (1995c) and the FMS

    simulation in Kleijnen and Standridge (1988).

    Both examples are deterministic simulations. (The

    mean AE is used in a case study on forecasting the

    dollar/Dutch guilder in Diebold and Mariano

    (1995); that study, however, concerns an econo-

    metric time-series model.)

    The literature on random simulations focuses on

    the precision of a single characteristic (e.g., mean

    steady-state waiting time). This precision is mea-sured by the condence interval half-width, ex-

    pressed in either absolute units (say, seconds) or a

    percentage (relative precision). Such simulations

    have intrinsic noise: while the input vector d is

    constant, the simulation output w still shows vari-

    ation. This variation may be measured by the

    variance varwjd with 1Y F F F Y n. Consequently,it is no longer obvious what the simulation output

    w is, when measuring the accuracy; see (Eq. (1)).

    Actually this output has a statistical distribution

    determined by the simulation. This distribution

    may be characterized through dierent quantities,such as its mean E(w) and its 90% quantile wX90.

    These quantities may be estimated based on simu-

    lation replications (alternatives are renewal analy-

    sis, spectral analysis, standardized time series; see

    Alexopoulos and Seila, 1998). Then in Eq. (1) w

    becomes the estimated mean (say) "w or the esti-

    mated quantile w0X9m with m denoting the number

    of replications for the factor combination. We may

    formulate a null-hypothesis H0 and an alternative

    hypothesis H1 (analogous to r2 and rmax):

    H0X jiw iyjT dY H1X jiw iyj b dY

    3

    where the threshold d depends on the problem.Usually such a hypothesis is tested through a

    Student statistic, tm. This statistic resembles the

    ARE, but now the standardization is done by the

    standard error vary "w1a2, not by the simula-tion output w. Actually, several hypotheses need to

    be formulated, one for each of the n factor com-

    binations. Their simultaneous testing may use

    Bonferroni's inequality. Also see Diebold and

    Mariano (1995).

    Instead of the classic statistical test statistics

    (such as the t statistic), we may apply bootstrap-ping, which allows the estimation of the distribu-

    tion of any statistic, for any distribution; see Efron

    and Tibshirani (1993). Recently, bootstrapping

    has been applied to the validation of both simu-

    lation and metamodels by Kleijnen et al. (1998)

    and Kleijnen et al. (1999), respectively.

    3.3. Step 8: Specify a design including tactical

    issues, and review the DOE

    To save space we assume that the reader has

    some familiarity with classic DOE, which as-

    sumes a low-order polynomial regression meta-

    model for a single output with NID output with

    common variance (say) r2 r2. For other

    metamodels there are no standard solutions; in-

    stead a computerized search for `optimal' designs,

    given the metamodel is performed. For example,

    Sacks et al. (1989) assume that the response

    function is smooth, and that a covariance-sta-

    tionary process is a valid model for the tting

    errors (the closer two factor combinations are,the more correlated the tting errors are). They

    use splines as metamodels. Their designs do not

    show regular geometric patterns. Also see Welch

    et al. (1992).

    We introduce some more symbols: main eects

    bj j 1Y F F F Y k, two-factor interactions bjY jH j `jH and jH 2Y F F F Y k, quadratic eects bjY j; totalnumber of regression parameters q; bold letters for

    matrices including vectors. For certain k values, a

    design is saturated; that is, n q.

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 21

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    9/16

    DOE implies an n kdesign matrix D dY j.For example, a rst-order polynomial metamodel

    is y X1b1 where X1 is the n (k + 1) matrix ofindependent variables with row vectors x 1YY 1Y F F F YY kYX1 is xed by D since X1 1YDwith 1 1Y F F F Y 1H.

    Classic DOE uses Ordinary Least Squares

    (OLS) tting (mathematical criterion). So the OLS

    estimator b is computed by tting the specied

    linear-regression metamodel to a set of I/O data

    that consist of the n q matrix of independent

    regression variables X and the n-dimensional vec-

    tor of simulation outputs w w; Fig. 2 showsthese data for a single factor combination. This

    gives b XHX1

    XHw, which assumes that theinverse exists; DOE must ensure that X (xed by

    D) is indeed not collinear.

    Knowledge of the problem entity should inspire

    the selection of the metamodel type, which in turn

    guides the selection of the design.

    Generalizations of OLS are Weighted LS

    (WLS) and Generalized LS (GLS). WLS uses the

    standard deviations of the outputs r as weights.

    GLS is recommended when common pseudoran-

    dom numbers are used so the simulation outputs

    are correlated. For details on WLS and GLS werefer to Kleijnen (1999).

    The goal of DOE may be to minimize the

    variances of the estimated factor eects bj. It can

    be proven that an orthogonal design matrix D is

    optimal, given the classic assumptions. Classic

    design classes are resolution III or R-3, R-4, R-5,

    and Central Composite (CC) designs. We (briey)

    discuss these designs because we shall apply them

    later; more examples and references are given in

    Kleijnen (1999).

    By denition, R-3 gives unique estimates of the

    k + 1 parameters in a rst-order polynomial, usingonly n k + 4 (k modulo 4) combinations; forexample, k 7 requires n 8 274.

    R-4 gives unique estimates of all rst-order ef-

    fects, plus unique estimates of certain linear com-

    binations of two-factor interactions. For example,

    when k 8 then a 284 design may estimateb1; 2 + b4; 8 + b3; 7 + b5; 6 (see Kleijnen, 1999, pp.

    304305). R-4 is simply constructed from R-3

    through the Foldover principle: to the original

    R-3's D, we add the mirror image D, which

    yields a R-4 with n equal to 2krounded upward to

    the next power of two. For example, n 16 whenk 5, 6, 7 or 8.

    R-5 gives estimators of all main eects and all

    two-factor interactions not biased by each other.

    Because R-5 requires many factor combina-

    tions, these designs are used only for small values

    of k.

    CC assumes a second-degree polynomial. A CC

    design is easily constructed, once a R-5 design is

    available: simply add 2k points changing each

    factor one at a time, once increasing and once

    decreasing each factor in turn; also add the `center'

    point. A disadvantage is its high number of com-

    binations n ) q 1 k kk 1a2 k and itsnon-orthogonality.

    These four design classes imply a k-dimensional

    hypercube for the experimental domain, expressed

    in standardized factors. We recommend the fol-

    lowing standardization (see Bettonvil and Kleij-

    nen, 1997). Let the original factor j denoted by vj,

    range between low value lj and upper value uj; that

    is, the simulation is not valid outside that range.

    Measure variation by the half-range j uj lja2 and location by the mean j

    uj lja2. Then the design matrix D dY j usesdY j Y j jaj.If a qualitative factor (nominal scale) has more

    than two levels or `values', then several binary

    variables should be used for coding this factor

    in regression analysis. The sonar case-study in

    Kleijnen (1995a) erroneously coded three kinds

    of bottom (rock, sand, mud) as the values 1, 2,

    and 3. Most ANOVA software helps avoid such

    errors.

    Next we focus on stagewise DOE, because in

    simulation the computer generates runs one after

    the other. Moreover, classic DOE assumes a spe-cic polynomial regression metamodel, whereas

    we suppose that it is uncertain what the correct

    degree of the polynomial is. Therefore we recom-

    mend R-4 over R-3. The foldover principle, how-

    ever, implies that in the rst stage we run a R-3

    to estimate the rst-order eects. Some R-3 are

    saturated, so it is impossible to apply cross-

    validation. However, certain values of k give a

    non-saturated design; for example, k 4, 5 or 6requires n 8.

    22 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    10/16

    After we have run the R-3, we run the mirror

    design. Unlike classic DOE, we may execute the

    combinations of the mirror design one after an-

    other, and apply cross-validation. Once we have

    run all combinations of the mirror design, we

    compute sums of certain two-factor interactions.

    These tell which interactions explain lack of t of

    the rst-order metamodel. For example, suppose

    factor 1 has an unimportant rst-order eect and

    this factor is not involved in an important sum of

    interactions b1Y 2Y F F F Yb1Y k. Then we may remove

    this factor, to avoid overtting. As in any model-

    ing eort, there are no foolproof recipes: factor 1

    may still have a quadratic eect.

    It may take too much time to run the mirrordesign. Then we may run only the center point.

    If we nd out that a second-order polynomial

    metamodel is needed, then we expand the R-3 or

    the R-4 to a CC. Again, stagewise experimentation

    is possible: add 2k axial points plus the center

    point. The R-4 may not be a proper subset of the

    CC; then some combinations may be used for

    checking the t of the second-order polynomial

    metamodel.

    Once we have run a CC and tted a second-order

    polynomial, we check this model's t: third-ordereects may be important. As new check-points we

    take combinations not yet simulated: part of the CC

    is a fractional two-level factorial (e.g., 2kp), as-

    suming k > 4 (see Kleijnen, 1999, p. 309); hence a

    fraction of the full 2k is not yet simulated. Which

    check-points to select, may be determined ran-

    domly, if no other information is available.

    The CC's check-points require extrapolation,

    whereas the center point implies interpolation. In

    general, a metamodel is more reliable when used

    for interpolation; it may be dangerous when used

    to extrapolate the simulated behavior far outsidethe domain simulated in the DOE.

    3.4. Step 9: Fit the metamodel

    After the parameters of a metamodel have been

    estimated, the t of the metamodel should be

    evaluated by using one or more measures. The lack

    of t measures discussed in this section may be

    applied to any type of metamodel (linear-regres-

    sion, neural net, etc.), tted to either random or

    deterministic simulation output (Section 4 will

    focus on random simulations and linear regression

    metamodels). Random simulation usually has

    several observations per combination, say m. So

    the total number of simulation observations (say)

    N equalsn

    1 mX Let y denote the metamodel'soutput for the estimated parameter vector bY wY rdenote simulation observation r for combination i;

    "w is the average of all N simulation outputs.

    The classic measure is the coecient of deter-

    mination

    2

    1 n1

    mr1

    y4

    wY r25D n

    1

    mr

    wY r4

    w25X

    4

    R2 equals one (perfect t) if all N metamodel

    outputs equal their corresponding simulation

    outputs y wY r, which holds ifN q (saturateddesign with m 1). Because R

    2 always increases as

    more regression variables are added (higher q), the

    adjusted R2 is introduced:

    2adjusted 1 1

    2

    x 1ax qX 5

    Unfortunately, there is no simple lower threshold

    for this criterion; bootstrapping may help.

    The linear correlation coecient q was origi-

    nally developed to characterize a bivariate normal

    variable (say) (w, y): all observations lie on a

    straight line with intercept a0 and slope a1 given by

    a0 iy a1iwY

    a1 qwY y

    varyavarw

    pX 6

    Hence, if qwY y 1, and y and w have equalmeans and equal variances, then this line has slope

    one and intercept zero. In metamodeling, however,

    this measure must be interpreted with care: in

    Fig. 2 the variables w and y are not bivariate

    normal. Nevertheless, y can be regressed on w.

    Ideally, simulation and metamodel outputs are

    equal (see R2), which means zero intercept and

    unity slope. A case study on gas transmission in

    Indonesia indeed uses such a plot; see Van Groe-

    nendaal and Kleijnen (1997).

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 23

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    11/16

    Cross-validation (`tting to subsets' would be a

    better term) uses the tted metamodel to predict

    the outcomes for new combinations of the simu-

    lation, and to compare these predictions with the

    corresponding simulation responses. `Leave-one-

    out' cross-validation eliminates one combination

    (instead of adding new combinations); re-estimates

    the metamodel from the remaining n 1 combi-nations, assuming n 1 observations suce (non-saturated design); recomputes the forecast y and

    compares this forecast with the simulation output

    w; repeats this elimination for a dierent combi-

    nation. In practice, the relative prediction errors

    yaw are often `eye balled', given the goals of the

    model. Examples are the deterministic simulationsfor a FMS in Kleijnen and Standridge (1988) and a

    coal transport system in Kleijnen (1995c). (We

    shall discuss cross-validation for random simula-

    tion; see Section 4)

    Only if the metamodel as a whole ts well

    (measured as explained above), analysts should

    examine individual parameters. In cross-validation

    they may observe how these estimated parameters

    change as simulated combinations are deleted; if

    the specied metamodel is a good approximation,

    then these estimates remain stable. Case studies arethe FMS and the coal-transport studies mentioned

    above. In these polynomial metamodels the pa-

    rameters can be interpreted as factor eects, so the

    metamodel can be used for explanation, whereas

    the yaw concern prediction. Other metamodels(e.g., neural nets) are harder to interpret.

    Eliminating noisy parameters may decrease the

    variance of the remaining parameter estimators b

    and the corresponding predictions y. For example,

    in M/M/1 simulation Cheng and Kleijnen (1999) t

    a rst-order and a second-order polynomial re-

    spectively; the former has a variance seven timessmaller than the overtted metamodel! Elimina-

    tion of unimportant factors may also increase the

    adjusted R2. Kleijnen and Standridge (1988) apply

    cross-validation to a rst-order polynomial with

    k 4, which gives unstable estimates for the fac-tors 1 and 3 and high values for the relative pre-

    diction errors. Therefore they replace the rst-

    order by a second-order polynomial for only k 2stable factors; they set non-stable factor eects to

    zero. Now they get an adequate metamodel: stable

    eects b2Y b4Y b2Y 4 and low prediction errors. Ingeneral, the mission of science is to come up with

    simple explanations: parsimony or Occam's razor.

    3.5. Step 10: Determine the validity of the tted

    metamodel

    The eort devoted to validating a metamodel

    depends on the goal of this metamodel. If the goal

    is `understanding', then a `reasonable' eort

    should be spent. However, for `prediction' exten-

    sive validation should be performed. In case of

    `optimization', a series of metamodels are usually

    developed; validity testing may be limited to thelast few metamodels, which are tested extensively.

    Finally, if the goal is `aiding in V & V of a simu-

    lation', then validation is usually conducted only

    with respect to the simulation (not the problem

    entity).

    The data for validating a metamodel versus the

    simulation must rst be generated. In Step 8 we

    developed a DOE for data generation in two parts:

    one for tting, and one for validating. Prior to

    generating the data for validation, the DOE needs

    to be reviewed. It may need modication becausethe original design was `expanded' in tting and

    checking the t. DOE should allow testing during

    the validation for a higher-order model than the

    tted metamodel. After the metamodel has passed

    validity with the new data, the data for tting may

    also be used for additional validity testing. This is

    especially worthwhile if the measures used in va-

    lidity and tting are dierent.

    Next we conduct validation of the metamodel

    versus the problem entity. The extend of this vali-

    dation again depends on the goal of the meta-

    model. The same methods for validatingsimulations (see Section 1) can be used for vali-

    dating metamodels versus the problem entity. The

    c1, c2, in Eq. (2) and the statistical formulas given

    above, can now be used for testing validity, but the

    symbol w (simulation output) is now replaced by z

    (problem entity output). Suppose that the problem

    entity is observable and data have been collected

    on it and used for validating the simulation. Then

    it seems reasonable to use these same data when

    validating the metamodel versus the problem

    24 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    12/16

    entity. The resulting statistical condence level

    seems to deserve more research.

    4. A procedure for linear-regression metamodeling

    in random simulation

    In this section we focus on random simulations

    and linear-regression metamodels, highlighting

    Steps 810. If this metamodel is estimated through

    OLS, the variance of its output is vary xHrbx; for GLS we refer to Kleijnen (1999). De-

    noting cross-validation with elimination of I/O

    combination i by the subscript-i gives

    b XHX

    1XHw with 1Y F F F Y nX 7

    Actually, these n recomputations can be avoided,

    using the original b and the so-called hat matrix

    XHX1XH (see Kleijnen and Van Groenendaal,

    1992, pp. 156157). Using b gives the predictor y.

    The Studentized cross-validation prediction error is

    tm w y

    varw

    vary

    1a2Y 8

    where the degrees of freedom m are unknown.

    Kleijnen (1992) uses m m 1. To control thetype I error rate, we apply Bonferroni's inequality:

    replace a by a/n. Diagnostic statistics in modern

    software related to our cross-validation measures

    are PRESS, DEFITS, DFBETAS, and Cook's D.

    An alternative is the F lack-of-t test, which

    compares two estimators of the common variance

    of simulation outputs (r2). The rst estimator is

    based on replication:

    s2 m

    j1

    wY j "w2am 1Y 9

    where "w is the average of m independent simu-

    lation replications. The weighted average of these

    n estimators isn

    1 s2 maxan. The second

    variance estimator isn

    1 e2man q with the

    estimated residuals e "w y. The latter esti-mator is unbiased only if the regression model is

    specied correctly; otherwise it overestimates. The

    two estimators are compared through pnqY xn.

    Rao (1959) extends this test from OLS to GLS.

    Kleijnen (1992) shows that Rao's test is better than

    cross-validation if the simulation responses are

    symmetrically distributed, for example, normally

    or uniformly distributed; lognormally distributed

    responses are better analyzed through cross-vali-

    dation (or bootstrapping of Rao's statistic; see

    Kleijnen et al. (1999)).

    Most simulations have multiple outputs (e.g.,

    customer's queuing time and server's idle time;

    mean and 90% quantile). In practice, multiple

    outputs are handled through the application of the

    techniques of this paper per output type. Actually,

    for linear-regression metamodels Khuri (1996b,

    p. 385) proves that the best linear unbiased esti-

    mator (BLUE) of the factor eects for the various

    responses remains the same as the OLS estimatorsobtained from tting per individual output, as-

    suming a single experimental design is used for the

    data generation of all outputs. Ideally, the design

    should account for the presence of multiple out-

    puts; see Khuri (1996a, b) and Kleijnen (1987).

    Simultaneous inference may be based on Bonfer-

    roni's inequality. Optimization in the presence of

    multiple responses is discussed in Kleijnen (1999).

    If a metamodel does not pass the tting and

    validation tests, then instead of adding interac-

    tions and simulating the combinations of theaugmented design we may try transformations or

    disaggregate an independent variable into its

    components (e.g., disaggregate trac rate into

    arrival rate and service rate).

    In Fig. 3 we give a owchart for this meta-

    modeling process, focusing on Steps 810 re-

    stricted to linear-regression metamodels. We add

    the following comments.

    Blocks 13, 16, and 19 state `Go back to 1'.

    Obviously, in such an iteration a step may de-

    generate (e.g., Step 3, select type of design, may

    use exactly the same design as the one used in thepreceding iteration). A practical example is the

    FMS case study in Kleijnen and Standridge (1988):

    a rst-order polynomial in four factors is rejected,

    and replaced by a rst-order polynomial in only

    two of the original four factors augmented with

    the interaction between the two remaining factors;

    the same design with its I/O simulation data is

    used.

    Goals may aect certain blocks; for example,

    optimization through RSM implies that block 2

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 25

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    13/16

    Fig. 3. A procedure for linear-regression metamodeling in random simulation

    26 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    14/16

    will always select a rst-order polynomial in the

    rst iterations. Yet we feel that the gure is suited

    to all four goals of metamodeling: this gure is

    based on years of experience with linear regression

    for all four goals of metamodeling.

    After validating the metamodel versus the

    simulation, this metamodel should be validated

    against the problem entity; to save space, these

    steps are not displayed in the gure.

    5. Research issues

    A few academic studies use other mathematical

    norms than L2 (least squares): see Kleijnen (1987).

    Well-known alternative norms are L1 (absolute

    tting errors) and LI (Tchebyshev, maximum er-

    ror):

    v1X minb

    n1

    jw ybjY vIX minb

    max

    jw ybjX

    10

    Current practice is as follows. Simulationists use a

    convenient mathematical norm to t a specic

    metamodel to the simulation I/O data. Next, theyoften use a dierent norm when they examine the

    resulting t and validity of the metamodel! Ex-

    amples are provided by the case studies mentioned

    above: Kleijnen (1995a) and Kleijnen and Stan-

    dridge (1988) use L2 (OLS) to compute the esti-

    mated parameter vector b, but vI to decide on the

    adequacy of the resulting metamodel. Verkooyen

    (1996), however, uses the same norm, namely L2,

    in both tting and validating his neural net. No

    research has yet been done on how to compute the

    metamodel's parameters such that c1 (mean inac-

    curacy in validation; see Eq. (2)) or c2 (maximum)is minimized.

    In random simulations, cross-validation and

    the lack-of-t F-statistic are usually applied to test

    zero lack of t. However, we may wish to test for a

    non-zero threshold (d > 0 in Eq. (3)).

    Accuracy of metamodel versus problem entity

    (not simulation) has not been investigated in the

    literature.

    The four dierent goals of metamodeling may

    require dierent metamodel types and dierent

    accuracies. Fig. 3 neglects this problem; more re-

    search is needed.

    Multi-variate regression analysis and multi-

    variate DOE for metamodeling certainly deserve

    more research.

    A procedure such as outlined in Fig. 3 (linear

    regression in random simulation) should also be

    developed for other metamodel types such as

    neural nets and splines.

    6. Summary

    Because metamodels are often used in an ad

    hoc way, we derived a methodology for developingmetamodels with the following characteristics: (i)

    We strictly distinguished between tting and vali-

    dating. (ii) We emphasized the role of the problem

    entity. (iii) We distinguished four types of goal.

    This yielded a metamodeling process with 10 steps.

    We also gave a detailed procedure for validating

    linear-regression metamodels in random simula-

    tion, suitable to all four goals.

    We covered several metamodel types, including

    linear regression and neural nets, but we focused

    on polynomial metamodels. The four types ofgoals are: (i) understanding, (ii) prediction, (iii)

    optimization, and (iv) V & V of the underlying

    simulation.

    Our methodology with 10 steps includes classic

    DOE and standard measures of t, such as the R-

    square coecient and various cross-validation

    measures. The methodology, however, also in-

    cludes stagewise DOE and several validation cri-

    teria, measures, and estimators.

    Acknowledgements

    CentER provided nancial support enabling

    R.G. Sargent to visit Tilburg University during

    one month for research on the topic of this paper.

    References

    Alexopoulos, Seila, 1998. Output data analysis. In: Banks, J.

    (Ed.), Handbook of Simulation. Wiley, New York.

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 27

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    15/16

    Balci, O., Sargent, R.G., 1981. A methodology for cost-risk

    analysis in the statistical validation of simulation models.

    Communications of the ACM 24 (4), 190197.

    Barton, R.R., 1993. New tools for simulation metamodels.IMSE Working Paper 93110, Department of Industrial

    and Management Systems Engineering, Penn State Univer-

    sity, University Park, PA 16802.

    Barton, R.R., 1994. Metamodeling: A state of the art review.

    In: Tew, J.D., Manivannan, S., Sadowski, Seila, A.F.,

    (Eds.), Proceedings of the 1994 Winter Simulation Confer-

    ence, pp. 237244.

    Beck, M.B., Ravetz, J.R., Mulkey, L.A., Barnwell, T.O., 1997.

    On the problem of model validation for predictive exposure

    assessments. Stochastic Hydrology and Hydraulics 11, pp.

    229254.

    Bettonvil, B., Kleijnen, J.P.C., 1997. Searching for important

    factors in simulation models with many factors: Sequential

    bifurcation. European Journal of Operational Research 96(1), 180194.

    Chen, B., 1985. A statistical validation procedure for discrete

    event simulation over experimental regions. Ph.D. Disser-

    tation, Department of Industrial Engineering and Opera-

    tions Research, Syracuse University, Syracuse, NY.

    Cheng, R.C.H., Kleijnen, J.P.C., 1999. Improved designs of

    queuing simulation experiments with highly heteroscedastic

    responses. Operations Research (forthcoming).

    Diebold, F.X., Mariano, R.S., 1995. Comparing predictive

    accuracy. Journal of Business and Economic Statistics 13

    (3), 253263.

    Donohue, J.M., 1995. The use of variance reduction techniques

    in the estimation of simulation metamodels. In: Alexopou-

    los, C., Kang, K., Lilegdon, W.R., Goldsman, D., (Eds.),

    Proceedings of the 1995 Winter Simulation Conference, pp.

    195199.

    Efron, B., Tibshirani, T.J., 1993. Introduction to the Bootstrap.

    Chapman and Hall, New York.

    Ehrman, C.M., Hamburg, M., Krieger, A.M., 1996. A method

    for selecting a subset of alternatives for future decision

    making. European Journal of Operational Research 96,

    407416.

    Friedman, L.W., 1996. The Simulation Metamodel. Kluwer,

    Dordrecht, The Netherlands.

    Fu, M.C., 1994. Optimization via simulation: A review. Annals

    of Operations Research 53, 199247.

    Goldsman, D., Nelson, B.L., 1998. Comparing Systems viaSimulation. In: Banks, J. (Ed.), Handbook of Simulation.

    Wiley, New York.

    Huber, K.P., Berthold, M.R., Szczerbicka, H., 1996. Analysis

    of simulation models with fuzzy graph based metamodeling.

    Performance Evaluation 27/28, 473490.

    Khuri, A.I., 1996a. Analysis of multiresponse experiments: A

    review. In: Ghosh, S. (Ed.), Statistical Design and Analysis

    of Industrial Experiments, Marcel Dekker, New York, pp.

    231246.

    Khuri, A.I., 1996b. Multiresponse surface methodology. In:

    Ghosh, S., Rao, C.R. (Eds.), Handbook of Statistics, vol.

    13. Elsevier, Amsterdam, pp. 377406.

    Kleijnen, J.P.C., 1987. Statistical tools for simulation practi-

    tioners. Marcel Dekker, New York.

    Kleijnen, J.P.C., 1992. Regression metamodels for simulation

    with common random numbers: comparison of validationtests and condence intervals. Management Science 38 (8),

    11641185.

    Kleijnen, J.P.C., 1995a. Case study: Statistical validation of

    simulation models. European Journal of Operational Re-

    search 87 (1), 2134.

    Kleijnen, J.P.C., 1995b. Verication and validation of simula-

    tion models. European Journal of Operational Research 82

    (1), 145162.

    Kleijnen, J.P.C., 1995c. Sensitivity analysis and optimization of

    system dynamics models: Regression analysis and statistical

    design of experiments. System Dynamics Review 11 (4),

    275288.

    Kleijnen, J.P.C., 1999. Experimental design for sensitivity

    analysis, optimization, and validation of simulation models.In: Banks, J., (Ed.), Handbook of Simulation. Wiley, New

    York. (Preprint: CentER Discussion Paper, no. 9752.).

    Kleijnen, J.P.C., Gaury, E., 1998. Risk analysis of robust

    system design. In: Medeiros, D.J., Watson, E.F., Carson,

    J.S., Manivannan, M.S. (Eds.), Proceedings of the 1998

    Winter Simulation Conference, pp. 15331540.

    Kleijnen, J.P.C., Standridge, C., 1988. Experimental design and

    regression analysis: An FMS case study. European Journal

    of Operational Research 33 (3), 257261.

    Kleijnen, J.P.C., van Groenendaal, W., 1992. Simulation: A

    statistical perspective. Wiley, Chichester.

    Kleijnen, J.P.C., Cheng, R.C.H., Bettonvil, B., 1998. Validation

    of trace-driven simulation models: bootstrapped tests.

    Working Paper.

    Kleijnen, J.P.C., Feelders, A.J., Cheng, R.C.H., 1998. Boot-

    strapping and validation of metamodels in simulation. In:

    Medeiros, D.J., Watson, E.F., Carson, J.S., Manivannan,

    M.S. (Eds.), Proceedings of the 1998 Winter Simulation

    Conference, pp. 701706.

    Kleijnen, J.P.C., Van Ham, G., Rotmans, J., 1992. Techniques

    for sensitivity analysis of simulation models: a case study of

    the CO2 greenhouse eect. Simulation 58 (6), 410417.

    Linhart, H., Zucchini, W., 1986. Model selection. Wiley, New

    York.

    Pierreval, H., 1996. A metamodel approach based on neural

    networks. International Journal in Computer Simulation 6

    (3) 365378.Rao, C.R., 1959. Some problems involving linear hypothesis in

    multivariate analysis. Biometrika 46, 4958.

    Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P., 1989. Design

    and Analysis of Computer Experiments (Includes Com-

    ments and Rejoinder). Statistical Science 4 (4), 409435.

    Saltelli, A., Andres, T.H., Homma, T., 1995. Sensitivity analysis

    of model output; performance of the iterated fractional

    factorial design method. Computational Statistics and Data

    Analysis 20, 387407.

    Sanchez, S.M., Sanchez, P.J., Ramberg, J.S., Moeeni, F., 1996.

    Eective engineering design through simulation. Interna-

    tional Transactions Operational Research 3 (2) 169-185.

    28 J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429

  • 7/29/2019 Kleijnen - A Methodology for Fitting and Validating Metamodels in Simulation

    16/16

    Sargent, R.G., 1991. Research issues in metamodeling. In:

    Nelson, B.L., Kelton, W.D., Clark, G.M. (Eds.), Proceed-

    ings of the 1991 Winter Simulation Conference, pp. 888

    893.Sargent, R.G., 1996. Verifying and validating simulation

    models. In: Charnes, J.M., Morrice, D.M., Brunner, D.T.,

    Swain, J.J. (Eds.), Proceedings of the 1996 Winter Simula-

    tion Conference, pp. 5564.

    Schlesinger, S., et al., 1979. Terminology for model credibility.

    Simulation 32 (3), 103104.

    Van Groenendaal, W.J.H., Kleijnen, J.P.C., 1997. On the

    assessment of economical risk: Factorial design versus

    Monte Carlo methods. Journal of Reliability Engineering

    and Systems Safety 57 (1), 103105.

    Verkooyen, W.J.H., 1996. Neural networks in economic

    modelling. CentER, Tilburg University, Tilburg.Welch, W.J., Buck, R.J., Sacks, J., Wynn H.P., c.s., 1992.

    Screening, predicting, and computer experiments. Techno-

    metrics 34 (1), 1525.

    Yu, B., Popplewell, K., 1994. Metamodel in manufacturing: a

    review. International Journal of Production Research 32 (4),

    787796.

    Zeigler, B., 1976. Theory of Modelling and Simulation. Wiley/

    Interscience, New York.

    J.P.C. Kleijnen, R.G. Sargent / European Journal of Operational Research 120 (2000) 1429 29