univariate input models for stochastic simulation

8/3/2019 Univariate Input Models for Stochastic Simulation

1/17

Univariate input models for stochastic simulationME Kuhl1, JS Ivy2, EK Lada3, NM Steiger4, MA Wagner5 and JR Wilson2*1

Rochester Institute of Technology, Rochester, NY, USA;

2

North Carolina State University, Raleigh, NC, USA;3SAS Institute Inc., Cary, NC, USA; 4University of Maine, Orono, ME, USA; 5SAIC, Vienna, VA, USA

Techniques are presented for modelling and then randomly sampling many of the continuous univariate probabilistic

input processes that drive discrete-event simulation experiments. Emphasis is given to the generalized beta distribution

family, the Johnson translation system of distributions, and the Be zier distribution family because of the flexibility of

these families to model a wide range of distributional shapes that arise in practical applications. Methods are described

for rapidly fitting these distributions to data or to subjective information (expert opinion) and for randomly sampling

from the fitted distributions. Also discussed are applications ranging from pharmaceutical manufacturing and medical

decision analysis to smart-materials research and health-care systems analysis.

Journal of Simulation (2010) 4, 8197. doi:10.1057/jos.2009.31; published online 26 February 2010

Keywords: simulation; continuous univariate input models; generalized beta distributions; Johnson translation

system of distributions; Be zier distributions

1. Introduction

One of the main problems in the design and construction of

stochastic simulation experiments is the selection of valid

input modelsthat is, probability distributions that accu-

rately mimic the behaviour of the random input processes

driving the system under study. Often the following

interrelated difficulties arise in attempts to use standard

distribution families for simulation input modelling:

1. Standard distribution families cannot adequately repre-sent the probabilistic behaviour of many real-world

input processes, especially in the tails of the underlying

distribution.

2. The parameters of the selected distribution family are

troublesome to estimate from either sample data or

subjective information (expert opinion).

3. Fine-tuning or editing the shape of the fitted distribution

is difficult because (i) there are a limited number of

parameters available to control the shape of the fitted

distribution, and (ii) there is no effective mechanism for

directly manipulating the shape of the fitted distribution

while simultaneously updating the corresponding para-meter estimates.

In modelling a simulation input process, the practitioner

must identify an appropriate distribution family and then

estimate the corresponding distribution parameters; and the

problems enumerated above can hinder the progress of both

of these model-building activities.

The conventional approach to identification of a stochas-

tic simulation input model encompasses several procedures

for using sample data to accept, reject, or somehow rank

each of the distribution families in a list of well-known

alternatives. These procedures include (i) informal graphical

techniques based on probability plots, frequency distribu-

tions, or box plots; and (ii) statistical goodness-of-fit tests

such as the KolmogorovSmirnov, chi-squared, Anderson

Darling, and Crame rvon Mises tests. For a detaileddiscussion of these procedures, see Sections 6.36.6 of Law

(2007) and Stephens (1974). Unfortunately, none of these

procedures is guaranteed to yield a definitive conclusion. For

example, identification of an input distribution can be based

on visual comparison of superimposed graphs of a

histogram of the available data set and the fitted probability

density function (p.d.f.) for each of several alternative

distribution families. In this situation, however, the final

conclusion depends largely on the number of class intervals

(also called bins or cells) in the histogram as well as the

class boundaries; and a different layout for the histo-

gram could lead the user to identify a different distributionfamily. Similar anomalies can occur in the use of statis-

tical goodness-of-fit tests. In small samples, these tests can

have very low power to detect lack of fit between the

empirical distribution and each alternative theoretical

distribution, resulting in an inability to reject any of the

alternative distributions. In large samples, moreover, practi-

cally insignificant discrepancies between the empirical

and theoretical distributions often appear to be statis-

tically significant, resulting in rejection of all the alternative

distributions.

*Correspondence: JR Wilson, Edward P. Fitts Department of Industrialand Systems Engineering, North Carolina State University, 111 LampeDrive, Daniels Hall, Room 370, Campus Box 7906, Raleigh, NorthCarolina 27695-7906, USA.E-mail: [email protected]

Journal of Simulation (2010) 4, 8197 r 2010 Operational Research Society Ltd. All rights reserved. 1747-7778/10

www.palgrave-journals.com/jos/


2/17

After somehow identifying an appropriate family of

distributions to model an input process, the simulation user

also faces problems in estimating the associated distribution

parameters. The user often attempts to match the mean

and standard deviation of the fitted distribution with the

sample mean and standard deviation of a data set, but shape

characteristics such as the sample skewness and kurtosis are

less frequently considered when estimating the parameters

of an input distribution. Some estimation methods, such as

maximum likelihood and percentile matching, may simply

fail to yield parameter estimates for some distribution

families. Even if several distribution families are readily

fitted to a set of sample data, the user generally lacks a

definitive basis for selecting the appropriate best-fitting

distributionin particular, several commercial input-model-

ling packages base their model-selection procedure on an

unspecified combination of some of the goodness-of-fit test

statistics mentioned above, and the details of the model-

selection procedure are actually concealed from the user on

the grounds that such information is proprietary. A notableexception to this is the automatic distribution-fitting

procedure of JMP 8 (SAS Institute Inc., 2008), which makes

transparent use of the Akaike information criterion (Akaike,

1974) as the basis for selecting the distribution that yields the

best fit to a given data set.

The task of building a simulation input model is further

complicated if sample data are not available. In this

situation, identification of an appropriate distribution family

is arbitrarily based on whatever information can be elicited

from knowledgeable individuals (experts); and the corre-

sponding distribution parameters are computed from sub-

jective estimates of simple numerical characteristics of the

underlying distribution such as the mode, selected percen-

tiles, or low-order moments. In summary, there is some

evidence that many simulation practitioners lack a clear-cut,

definitive procedure for identifying and estimating high-

fidelity stochastic input models (or even merely acceptable,

rough-cut input models); consequently, simulation output

analysis is often based on input processes of questionable

validity. The latter observation, coupled with the current

capabilities and limitations of typical off-the-shelf simulation

input-modelling software, has led to the research that is

surveyed in this article for handling some of the difficulties

outlined above.

This invited article is an expanded version of a series ofintroductory tutorials on simulation input modelling, which

we have been asked to present at the Winter Simulation

Conference for the past several years (Kuhl et al, 2006,

2008a,b). In this article techniques are presented for

modelling and then randomly sampling many of the

continuous univariate probabilistic input processes that

drive discrete-event simulation experiments, with the pri-

mary focus on methods designed to alleviate the difficulties

encountered in using conventional approaches to simulation

input modelling. Emphasis is given to the generalized beta

distribution family (Section 2), the Johnson translation

system of distributions (Section 3), and the Be zier distribu-

tion family (Section 4) because in our experience these

families can be most readily and effectively used in a broad

diversity of simulation applicationsespecially in large-scale

applications for which reasonably accurate input models

must be delivered under severe time pressure, and the user

may not have immediate access to detailed knowledge of the

physics of all the input processes so that empirical input

models must be formulated and fitted quickly using readily

available sample data or subjective information. For each

distribution family, we describe methods for fitting distri-

butions to sample data or expert opinion and then for

randomly sampling the fitted distributions. Much of the

discussion concerns public-domain software and fitting

procedures that facilitate rapid univariate simulation input

modelling. To illustrate these procedures, we also discuss

applications ranging from pharmaceutical manufacturing

and medical decision analysis to smart-materials research

and health-care systems analysis. Finally in Section 5conclusions and recommendations are presented, including

a brief discussion of other discrete and continuous distri-

bution families, which can be used for simulation input

modelling. In a companion article (Kuhl et al, 2010), we

discuss some multivariate distributions that frequently arise

in probabilistic simulation input modelling; see also Sections

34 of Kuhl et al(2006).

2. Generalized beta distribution family

Suppose X is a continuous random variable with lower limit

a and upper limit b whose distribution is to be approximatedand then randomly sampled in a simulation experiment. In

such a situation, it is often possible to model the proba-

bilistic behaviour of X using a generalized beta distribution,

whose p.d.f. has the form

fXx Ga1 a2x a

a11b xa21

Ga1Ga2b aa1 a21

for apxpb

1

where G(z) R1

0 tz1etdt (for z40) denotes the gamma

function. For graphs illustrating the wide range of distribu-

tional shapes achievable with generalized beta distributions,

see one of the following references: pp 9293 of Hahn and

Shapiro (1967); pp 291293 of Law (2007); or pp 1114 of

Kuhl et al (2008b), which is available online.

If X has the p.d.f. (1), then the cumulative distribution

function (c.d.f.) ofX, which is defined by FXx PrfXpxgRx

1 fXwdw for all real x, unfortunately has no con-venient analytical expression; but the mean and variance of

X are respectively given by

mX EX a1b a2a

a1 a22

82 Journal of Simulation Vol. 4, No. 2


3/17

and

s2X EX mX2

b a2a1a2

a1 a22a1 a2 1

3

Recall that for a continuous p.d.f. fX( ), a mode m is a localmaximum of that function; and if there is a unique global

maximum for fX( ), then the p.d.f. is said to be unimodal,and m is usually called the most likely value of the random

variable X. Ifa1,a2X1 and either a141 or a241, then the

beta p.d.f. (1) is unimodal; and the mode is given by

m a1 1b a2 1a

a1 a2 2a1; a2X1 and a1a241 4

Equations (2)(4) reveal that key distributional character-

istics of the generalized beta distribution are simple functions

of the parameters a, b, a1, and a2; and this facilitates input

modelling, especially in pilot studies in which rapid model

development is critical.

2.1. Fitting beta distributions to data or subjective

information

Given a random sample {Xi: i 1,y, n} of size n from thedistribution to be estimated, let X(1)pX(2)p?pX(n) denote

the order statistics obtained by sorting the {Xi} in ascen-

ding order so that X(1) min{Xi: i 1,y, n} and X(n) max{Xi: i 1,y, n}. We can fit a generalized beta distribu-tion to this data set using the following sample statistics:

ba 2X1 X2;

bb 2Xn Xn1

X 1n P

n

i1X

i; S2 1

n1 Pn

i1X

i X2

9=; 5

In particular the method of moment matching involves (i)

setting the right-hand sides of (2) and (3) equal to the sample

mean X and the sample variance S2, respectively; and (ii)

solving the resulting equations for the corresponding

estimates ba1 and ba2 of the shape parameters. In terms ofthe auxiliary quantities

d1 Xbabb ba and d2 Sbb ba

the moment-matching estimates of

ba1 and

ba2 are given by

a1 d21 1 d1

d22 d1; ba2 d11 d12

d22 1 d1 6

AbouRizk et al (1994) discuss BetaFit, a Windows-based

software package for fitting the generalized beta distribution

to sample data by computing estimators ba, bb, ba1, and ba2using the following estimation methods:

moment matching with ba X(1) and bb X(n); feasibility-constrained moment matching, so that the fea-

sibility conditions

baoX(1) and X(n)o

bb are always satisfied;

maximum likelihood (assuming a and b are known andthus are not estimated); and

ordinary least squares (OLS) and diagonally weightedleast squares (DWLS) estimation of the c.d.f.

Figure 1 demonstrates the application of BetaFit to a

sample of 9980 observations of end-to-end chain lengths

(in angstro ms) of the ionic polymer Nafion based on themethod of moment matching. In Section 3.5 below, we

provide further details on the origin of the Nafion data set

and its relevance to the problem of predicting the stiffness

properties of a certain class of smart materials. Like all

the software packages mentioned in this article, BetaFit is in

the public domain and is available on the Web site via

www.ise.ncsu.edu/jwilson/page3.

For rapid development of preliminary simulation models,

practitioners often base an initial input model for the

random variable X on subjective estimates

ba,

bm, and bb of

the minimum, mode, and maximum, respectively, of the

distribution of X. Although the triangular distribution is

often used in such circumstances, it can yield excessively

heavy tailsand hence grossly unrealistic simulation re-

sultswhen the distance bbbm between the estimates of theupper limit and mode is much larger than the distance bmbabetween the estimates of the mode and lower limit, or vice

versa. The generalized beta distribution is usually a better

choice in such situations; but there is some difficulty in

selecting the shape parameters to yield the desired value bmfor the mode. For an elaboration of this point in the context

of project-management simulations, see Vanhoucke (2010).

In many project-management and quality-control applica-

tions, it is convenient to assume that the standard deviation

of the random variable at hand is one-sixth of thecorresponding range; and if we equate the right-hand sides

of (3) and (4), respectively, with the subjective estimates

(bbba )2/36 and bm of the variance and mode of X, then wemust solve a cubic equation to obtain the corresponding

shape parameters of the beta p.d.f. (1). In terms of the

auxiliary quantity

q bm babb ba

we see that in the special cases in which q 0 or q 1, therequired shape parameters are exactly given by

ba1 1 and ba2 3:87227 ifq 0ba1 3:87227 and ba2 1 ifq 1' 7(For a detailed justification of (7), see the Appendix of this

article, which contains exact computing formulas for the

shape parameters of a beta distribution with user-specified

values of the end-points, mode, and variance.)

For the more common case in which 0oqo1, remarkably

accurate, simple approximations to the shape parameters of

the beta distribution with minimum ba, mode bm, maximum

bb, and standard deviation (

bb

ba )/6 can be conveniently

ME Kuhl et alUnivariate input models for stochastic simulation 83


4/17

calculated from the asymmetry ratio

r bb bmbm ba 1 qq

so that the required shape parameters are given by

ba1 r2 3r 4r2 1

and ba2 4r2 3r 1r2 1

8

see pp 202203 of Wilson et al (1982) and McBride and

McClelland (1967). If 0.02pqp0.98, then the error in the

approximation (8) is less than 3%; and if 0.1pqp0.9, then

the error in this approximation is less than 1.2%. To handle

situations in which the estimated mode bm is very close to oneof the estimated end-points ba and bb (that is, qo0.02 orq40.98), see the Appendix. In the application of beta

distributions to a problem in medical decision making that is

detailed in Section 2.4 below, the error in using the

approximation (8) was essentially zero (that is, less than

108) on each of 50 different beta distributions used in the

associated simulation study.

AbouRizk et al(1991) discuss the Visual Interactive Beta

Estimation System (VIBES), a Windows-based software

package that enables graphically oriented fitting of general-ized beta distributions to subjective estimates of: (i) the end-

points a and b; and (ii) any of the following combinations of

distributional characteristics:

the mean mX and the variance sX2 ,

the mean mX and the mode m, the mode m and the variance sX

2 ,

the mode m and an arbitrary quantile xp FX1(p)

for pA(0, 1), or

two quantiles xp and xq for p, qA(0, 1).

Figure 1 Beta p.d.f. (top panel) and c.d.f. (bottom panel) fitted to 9980 Nafion chain lengths.



5/17

As a general-purpose tool for simulation input modelling,

the generalized beta distribution family has the following

advantages:

It is sufficiently flexible to represent with reasonableaccuracy a wide diversity of distributional shapes.

Its parameters are easily estimated from either sampledata or subjective information.

On the other hand, generating samples from the beta

distribution is relatively slow; and in some applications,

the time to generate beta random variables can be a

substantial fraction of the overall simulation run time

(Wilson et al, 1982).

2.2. Generating beta variates

Although most general-purpose simulation packages pro-

vide a generator of beta random variables, in our experience

some care is required to verify the performance of a betavariate generator in cases where any shape parameter is less

than one or is very large (say, greater than 30). Note that

Equations (7)(8) always yield 1pa1, a2p4 while Equations

(A1)(A5) in the Appendix always yield a1, a2X1; and in

these situations, we have obtained excellent results using two

procedures available in Press et al (2007). To generate a

generalized beta random variable X with minimum a,

maximum b, and shape parameters a1 and a2, the first

method uses Gammadev of Press et al (2007) to generate

Y(a1, a2), a standard beta random variable on the unit

interval [0,1] with shape parameters a1 and a2; and then the

desired random sample is given by

X a b aYa1; a2 9

In terms of the incomplete beta function

Ixa1; a2 Ga1 a2

Ga1Ga2

Zx0

ta111 ta21dt

for 0pxp1

10

(which coincides with the c.d.f. FY(a1, a2)(x) Pr{Y(a1,a2)px}of a standard beta random variable Y(a1,a2) for 0pxp1),

the second method for generating X is based on inversion of

the c.d.f. of X,

X F1X U a b aF1Ya1; a2

U

a b aI1U a1; a211

where UBUniform [0, 1] is a random number and we use the

procedure invbetai of Press et al (2007) to obtain a highly

accurate approximation to Ix1(a1, a2) for all x in [0, 1].

Remark 1. In the companion paper on multivariate input

modelling (Kuhl et al, 2010), Ix1(a1, a2), and the associated

approximation invbetai of Press et al(2007) are important

tools in our approach to building multivariate beta distri-

butions as well as stationary univariate time series whose

marginals are generalized beta distributions.

2.3. Application of beta distributions to pharmaceutical

manufacturing

Pearlswig (1995) provides a good example of a pharmaceu-

tical manufacturing simulation whose credibility depended

critically on the use of appropriate input models. In this

study of the estimated production capacity of a plant that

had been designed but not yet built, the usual three-time

estimates (ba, bm, and bb ) were obtained from the processengineer for each of the operations in manufacturing

a certain type of effervescent tablet. Unfortunately very

conservative (ie, large) estimates were provided for the upper

limit

bb of each operation time; and when triangular

distributions were used to represent batch-to-batch variation

in actual processing times for each operation within each

step of production, the resulting bottlenecks resulted in very

low estimates of the probability of reaching a prespecified

annual production level.

As in many simulation applications in which subjective

estimates ba, bm, and bb are elicited from experts, the estimatebm of the modal (most likely) time to perform a givenoperation was substantially more reliable than the estimatesba and bb of the lower and upper limits on the same operationtime. When all the triangular distributions in the simulation

were replaced by generalized beta distributions using (8) to

ensure conformance to the engineers estimate of the most

likely processing time for each operation within each step,the resulting annual tablet production was in excellent

agreement with the production of similar plants already in

existence. This simple remedy restored the faith of manage-

ment in the validity of the overall simulation model, which

was subsequently used to finalize certain aspects of the

design and operation of the new plant.

2.4. Application of beta distributions to medical decision

analysis

In the following application of simulation input modelling

to medical decision analysis, we compare two alternativemethods for estimating the parameters of a generalized beta

distribution from limited sample data or subjective informa-

tion about the minimum, mode, and maximum values of the

target random variable. The discussion is also intended to

illustrate the extent to which simulation-generated outputs

may depend on the end-points of the fitted beta distributions

used in the simulation. This example provides insight into

the issues surrounding the use of the generalized beta

distribution to represent a simulation input that is subject to

randomness or uncertainty when that distribution must be



6/17

fitted to subjective information or some combination of

limited sample data and subjective information.

Cost-effectiveness studies are frequently used in medical

decision making for comparing various treatment or

intervention alternatives. The Panel on Cost-Effectiveness

in Health and Medicine (Gold et al, 1996) defines cost-

effectiveness analysis (CEA) as y a method designed to

assess the comparative impacts of expenditures on different

health interventionsy that y involves estimating the net,

or incremental, costs and effects of an interventionits costs

and health outcomes compared with some alternative.

Decision models for CEA involve a large number of input

parameters, each subject to substantial uncertainty. In

particular, these studies involve uncertainty and random

variability with respect to the following quantities:

(a) Probability of occurrence for each health-related out-

come of interest;

(b) Utilitythat is, a number between 0 (death) and 1

(perfect health) that is assigned to each state of health oroutcome relevant to item (a); and

(c) Cost in constant dollars for each disease state and

intervention.

There is variability between patients and parameter un-

certainty, each reflected in the standard errors associated

with simulation-based estimates of mean performancefor

example, the expected values of the costs, quality-adjusted

life years, and utilities resulting from alternative treatments.

Therefore an accurate assessment of cost effectiveness must

involve sensitivity analysis and must attempt to model

the inherent variability and uncertainty in these parameter

estimates. Probabilistic sensitivity analysis is one method for

performing a multiway sensitivity analysis in which all

parameters subject to uncertainty are varied simultaneously

by Monte Carlo sampling from the distributions postulated

for those parameters.

Xu et al (2010) develop a decision-tree model for

determining the cost effectiveness of cesarean delivery upon

maternal request (CDMR) for women having a single

childbirth without indications. Their model compares

CDMR with trial of labour (TOL) considering all possible

short- and long-term outcomes and the resulting conse-

quences for the mother and neonate. The model takes theform of a decision tree containing over 100 chance events.

For each parameter in their decision model, Xu et al use

either literature-based or expert opinionbased estimates for

the mode, minimum, and maximum values. Typically there

is limited information available for parameter distribution

estimation; moreover, there is significant variability in the

parameter values because of substantial uncertainty regard-

ing mode of delivery with respect to utility measures, the

probabilities of outcomes, and outcome costs. Here we

explore two examples from Xu et al in which we fit beta

distributions for utility and probability parameter estimates

by two different approaches:

Using the approximation based on Equations (7) and (8);and

Using the version of the so-called Beta PERT distribu-tion that is implemented in the @RISK software (Palisade

Corporation, 2009), which is usually termed the RiskPertdistribution and is detailed in Equations (12) and (13)

below.

To illustrate each approach, we discuss in some detail how

we formulated probabilistic input models of the following

quantities:

(i) P(Vag), the probability of a vaginal delivery given that

the decision maker pursues a trial of labour; and

(ii) U(SpVag), the utility associated with a spontaneous

vaginal delivery given that the decision maker pursues a

trial of labour.

A trial of labour is a decision to attempt a vaginal

delivery; this will result in a vaginal delivery or an emergency

cesarean section. Given a vaginal delivery, there are two

possible outcomes: a spontaneous vaginal delivery or an

instrumental vaginal delivery. For the probability of a

vaginal delivery P(Vag), the most likely value of 0.9

was obtained from the published literature. Not only was

0.9 the most frequently cited value, it was also judged

to be the highest-quality estimate in terms of sample size

and its applicability to populations cited in the literature.

The values 0.844 and 0.97 were taken to be the lower and

upper bounds on P(Vag), respectively, because they

corresponded to the smallest and largest estimates found in

the literature. The associated estimates of the utility

U(SpVag) resulting from a spontaneous vaginal delivery

were obtained similarly; and the mode, minimum, and

maximum values found in the literature were 0.92, 0.69, and

1.0, respectively.

While the minimum and maximum values were the

smallest and largest values found in the available literature,

we recognized that the true lower bound might be less than

the estimated minimum and the true upper bound might be

greater than the estimated maximum in many cases. In

contrast to Xu et al, who assume that the minimum andmaximum values from the literature correspond to the 0.025

and 0.975 percentiles, we explored the effect of assuming

that the true lower and upper bounds could be obtained

by taking an appropriate offset from the original estimated

minimum and maximum values, where the offset is

expressed as a fraction c of the original estimate of the

range,

a0 maxf0; a cb ag and

b0 minfb cb a; 1g forc40



7/17

Based on the original estimate of the mode m as well as the

new estimates a0 and b0 of the true minimum and maximum

values, respectively, for each distribution used in the

probabilistic sensitivity analysis, we fitted a beta distribution

using the approximation for the associated shape parameters

given by Equations (7) and (8). In addition, we fitted the

RiskPert version of the beta distribution by assuming that

the mean and variance of the random variable X satisfy the

following equations,

mX a0 4m b0

6and s2X

b0 mXm a0

712

so that the corresponding shape parameters are given by

a1 6mX a

0

b0 a0

and a2 6

b0 mXb0 a0

6 a1 13

(Note that whereas Equations (2) and (3) are always true for

a beta random variable X, Equations (12) and (13) are only

satisfied when X has a RiskPert distribution, which is aspecial type of beta distribution.)

The value for c was varied from 0 to 0.1. Varying c

yielded small changes in the shape parameters for the beta

distributions fitted by each method. However, we found that

the value of c had an effect on the cost-effectiveness

decision; and the effect varied depending on the type of

distribution used for all the probabilities and utilities in the

decision tree. For cA[0, 0.02), there was a significant

difference in the effectiveness of CDMR and TOL (ie, the

95% confidence interval for the mean difference in the utility

between CDMR and TOL did not include zero) when using

beta distributions fitted by each method. For cA[0.02, 0.07],

there was a significant difference in the effectiveness of

CDMR and TOL only when using beta distributions fitted

via Equations (7) and (8). And for c40.07, the difference in

effectiveness of CDMR and TOL was not significant for

either method of fitting beta distributions.

The difference in the effect of c as a function of the

distributional assumptions can be explained by the shapes of

the beta distributions fitted by each method. The p.d.f.s of

the fitted beta distributions for P(Vag) and U(SpVag) are

shown in Figure 2, subfigures 2(a)2(f), for the cases in

which c 0, 0.05, and 0.1. For all the other betadistributions used in this application, similar behaviour

was seen in the superimposed plots of the beta p.d.f. fittedvia Equations (7) and (8) versus the beta p.d.f. fitted via

Equations (12) and (13). While each fitted distribution has

the desired mode in each case, the RiskPert distribution

based on (12) and (13) has fatter tails than those of the p.d.f.

based on (7) and (8); moreover, we see that for the RiskPert

distribution, the variance clearly depends on the mean. As

indicated above, the assumptions about the variance that

underlie Equations (7) and (8) differ substantially from the

assumptions about the mean and variance that underlie the

RiskPert distribution; and these differences lead to different

conclusions about the cost-effectiveness of CDMR com-

pared with TOL when cA[0.02, 0.07].

Remark 2. Several general conclusions emerged from the

foregoing applications to pharmaceutical manufacturing and

medical decision analysis. When input modelling is based on

estimates of the minimum, most likely, and maximum values

of a target random variable, there is often substantial

uncertainty in the estimates of the extreme values; and in

such situations the fitted distribution should generally have

most of its probability concentrated in the vicinity of the

estimated mode, which is much more accurate than the other

two estimates. The generalized beta distribution is usually a

good choice for rapid input modelling in these situations;

and often acceptable results can be obtained using either

Equations (7) and (8) or Equations (12) and (13). In our view

the primary disadvantage of Equations (12) and (13) is that

the variance of the fitted distribution is a function of its

mean. In general the analysis of a simulation-generated

response is complicated by dependence of the variance of theresponse on its mean; and numerous variance-stabilizing

transformations have been proposed to avoid such undesir-

able behaviour (Irizarry et al, 2003). In some types of

applications, it may be necessary to study systematically the

sensitivity of the simulation-generated results to changes in

the assumed values of the mode and variance of each input

random variable; and in this case the development given in

the Appendix can be used to investigate the impact of

independently varying the postulated values of the mode and

variance of the fitted beta distribution.

3. Johnson translation system of distributions

Starting from a continuous random variable X whose

distribution is unknown and is to be approximated and

subsequently sampled, Johnson (1949) proposes the idea of

inferring an appropriate distribution by identifying a suitable

translation (or transformation) of X to a standard normal

random variable Z with mean 0 and variance 1 so that

ZBN(0, 1). The translations have the form

Z g d gX x

l

14

where g and d are shape parameters, l is a scale parameter,

x is a location parameter, and g( ) is a function whose formdefines the four distribution families in the Johnson

translation system,

gy

lny for SL lognormal family

ln y ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

y2 1p

for SUunbounded family

ln y=1 y for SB bounded familyy for SNnormal family

8>>>>>:DeBrota et al (1989a) detail the advantages of the Johnson

translation system of distributions for simulation input



8/17

modelling, especially in comparison with the triangular,

beta, and normal distribution families.

3.1. Johnson distribution and density functions

If (14) is an exact normalizing translation ofXto a standard

normal random variable, then the c.d.f. of X is given by

FXx F g d gx x

l

!forall x 2 H

where: (i) Fz 2p1=2Rz

1 exp 12 w

2

dw denotes

the c.d.f. of the N(0, 1) distribution; and (ii) the space H

of X is

H

x; 1 for SL lognormal family

1; 1 for SU unbounded family

x; x l for SB bounded family

1; 1 for SN normalfamily

8>>>>>>>>>:

0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

0.5

1

1.5

2

2.5

0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

0.5

1

1.5

2

2.5

U(SpVag), = 0.05 U(SpVag), = 0.10

0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 10

0.5

1

1.5

2

2.5

0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

0.5

1

1.5

2

2.5

P(Vag), = 0.10 U(SpVag ), = 0.0

0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.980

0.5

1

1.5

2

2.5

0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 10

0.5

1

1.5

2

2.5

P(Vag), = 0.0 P(Vag), = 0.05

Figure 2 Beta distributions fitted to P(Vag), the probability of vaginal delivery (subfigures 2(a)2(c)) and to U(SpVag), the utility ofspontaneous vaginal delivery (subfigures 2(d)2(f)), where the solid line is the fit using Equations (7) and (8) and the dashed line is theRiskPert fit using (12) and (13).



9/17

The p.d.f. of X is given by

fXx d

l2p1=2g0

x x

l

exp

1

2g d g

x x

l

!2( )

for all xAH, where

g0y

1=y for SL lognormal family1=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

y2 1p

for SUunbounded family1=y1 y for SB bounded family1 for SNnormal family

8>>>:For graphs illustrating the diversity of distributional shapes

that can be achieved with the Johnson system of univariate

distributions, see DeBrota et al(1989a) or pp 3437 of Kuhl

et al (2008b).

3.2. Fitting Johnson distributions to sample data

The process of fitting a Johnson distribution to sample data

involves first selecting an estimation method and the desired

translation function g( ) and then obtaining estimates of thefour parameters g, d, l, and x. The Johnson translation

system of distributions has the flexibility to match (i) any

feasible combination of values for the mean mX, variance sX2 ,

skewness

SkX EX mX3=s3X

and kurtosis

KuX EX mX4=s4X

or (ii) sample estimates of the moments mX, sX2

, SkX, andKuX. Moreover, in principle the skewness SkX and kurtosis

KuX uniquely identify the appropriate translation function

g( ). Although there are no closed-form expressions for theparameter estimates based on the method of moment

matching, these quantities can be accurately approximated

using the iterative procedure of Hill et al (1976). Other

estimation methods may also be used to fit Johnson

distributions to sample datafor example, in the FITTR1

software package (Swain et al, 1988), the following methods

are available:

OLS and DWLS estimation of the c.d.f.; minimum L1 and LN norm estimation of the c.d.f.; moment matching; and percentile matching.

3.3. Fitting SB distributions to subjective information

DeBrota et al (1989b) discuss VISIFIT, a public-domain

software package for fitting Johnson SB distributions to

subjective information, possibly combined with sample data.

The user must provide estimates of the end-points a and b

together with any two of the following characteristics:

the mode m; the mean mX; the median x0.5;

arbitrary quantile(s) xp or xq for p, qA(0, 1); the width of the central 95% of the distribution; or the standard deviation sX.

3.4. Generating Johnson variates by inversion

After a Johnson distribution has been fitted to a data set,

generating samples from the fitted distribution is straight

forward. First, a standard normal variate ZBN(0, 1) is

generated. Then the corresponding realization of the

Johnson random variable X is found by applying to Z the

inverse translation

X x l g1Z g

d

15

where for all real z we define the inverse translation function

g1z

ez for SLlognormal familyez ez=2 for SUunbounded family1=1 ez for SBbounded familyz for SNnormal family

8>>>: 16Remark 3. Although most popular general-purpose

simulation packages provide an acceptable generator of

standard normal random variables, we are particularly

interested in generating Z by the method of inversion,

ZF1(U), where UBUniform[0, 1] is a random numberand we use the approximation to F1( ) that is availablevia Normaldist of Press et al(2007). Also recommended is

the approximation to F1( ) given in Section 26.2.22 ofAbramowitz and Stegun (1972). As documented in the

companion paper on multivariate input modelling (Kuhl

et al, 2010), an accurate approximation to F1( ) will bea key element in our approach to building multivariate

extensions of the Johnson translation system of distribu-

tions as well as stationary univariate time series whose

marginals are Johnson distributions.

3.5. Application of Johnson distributions to

smart-materials research

Matthews et al (2006), Weiland et al (2005), and Gao and

Weiland (2008) present a multiscale modelling approach for

the prediction of material stiffness of a certain class of smart

materials called ionic polymers. The material stiffness

depends on multiple parameters, including the effective

length of the polymer chains composing the material. In a

case study of Nafion, a specific type of ionic polymer,



10/17

Matthews et al (2006) develop a simulation model of the

conformation of Nafion polymer chains on a nanoscopic

level, from which a large number of end-to-end chain lengths

are generated. The p.d.f. of end-to-end distances is then

estimated and used as an input to a macroscopic-level

mathematical model to quantify material stiffness.

Figure 3 shows the empirical distribution of 9980

simulation-generated observations of end-to-end Nafion

chain lengths (in angstro ms). Superimposed on the empirical

distribution is the result of using the DWLS estimation

method to fit an unbounded Johnson (SU) distribution to the

chain length data. Figure 3 reveals a remarkably accurate fit

to the given data set. Furthermore, comparing the Johnson

fit in Figure 3 with the beta fits for the same data set in

Figure 1, we see that the Johnson distribution is able to

capture certain key aspects of the Nafion data set that the

beta distribution is unable to represent adequately.

Gao and Weiland (2008), Matthews et al (2006), and

Weiland et al (2005) conclude that the estimates of the

distribution of chain lengths obtained by fitting an appro-priate Johnson distribution to the data are more intuitive

than those using other density estimation techniques for the

following reasons. First, it is possible to write down an

explicit functional form for the Johnson p.d.f. fX(x) that is

simple to differentiate. This is a crucial property because the

second derivative fX0 0

(x) of the p.d.f. will be used as an input

to a mathematical model to estimate material stiffness.

Second, there is a relatively simple relationship between the

Johnson parameters and the material stiffness. Weiland et al

(2005) summarize the results of a sensitivity analysis for the

Johnson parameters and the corresponding effect on

material stiffness. In general, Weiland et al find that

increasing the location parameter x leads to an increase in

predicted stiffness. Similarly, increasing the shape parameter

d or decreasing the scale parameter l both lead to marginally

higher predicted material stiffness. Establishing a consistent

relationship between these parameters and stiffness would

first serve to extend the current theory to stiffness predic-

tions, and may ultimately also serve as a step toward the

custom design of materials with specific stiffness properties.

3.6. Application of Johnson distributions to health-care

systems analysis

In a recent study of the arrival patterns of patients who have

scheduled appointments at a community health-care clinic,

Alexopoulos et al (2008) find that patient tardiness (ie, the

patients deviation from the scheduled appointment time) is

most accurately modelled using an SU distribution. Specifi-cally they consider data on patient tardiness collected by the

Partnership of Immunization Providers, a collaborative

public-private project created by the University of California,

San Diego School of Medicine, Division of Community

Pediatrics, in association with community clinics and small,

private provider practices. Alexopoulos et al(2008) perform

an exhaustive analysis of 18 continuous distributions, and

they conclude that the SU distribution provides superior fits

to the available data.

4. Be zier distribution family

4.1. Definition of Bezier curves

In computer graphics, a Be zier curve is often used to

approximate a smooth (continuously differentiable) function

on a bounded interval by forcing the Bezier curve to pass

in the vicinity of selected control points {pi(xi, zi)T:

i 0,1,y, n} in two-dimensional Euclidean space. (Through-out this article, all vectors will be column vectors unless

otherwise stated; and the roman superscript T will denote the

transpose of a vector or matrix.) Formally, a Be zier curve of

degree n with control points {p0, p1,y, pn} is given

parametrically by

Pt Xni0

Bn;itpi for t 2 0; 1 17

where the blending function Bn,i(t) (for all tA[0,1]) is the

Bernstein polynomial

Bn;it n!

i!n i!ti1 tnifor i 0; 1; . . . ; n 18

10 0 10 20 30 40 50 60 70 80 90

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10 0 10 20 30 40 50 60 70 80 900

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Figure 3 Johnson SU c.d.f. (left panel) and p.d.f. (right panel) fitted to 9980 Nafion chain lengths.



11/17

4.2. Bezier distribution and density functions

If X is a continuous random variable whose space is the

bounded interval [a, b] and if X has c.d.f. FX( ), and p.d.f.fX( ), then in principle we can approximate FX( ) arbitrarilyclosely using a Be zier curve of the form (17) by taking a

sufficient number (n 1) of control points with appropriate

values for the coordinates (xi, zi)T

of the ith control pointpi for i 0,y, n. If X is a Be zier random variable, then thec.d.f. of X is given parametrically by

Pt fxt; FXxtgT

for t 2 0; 1 19

where

xt Pni0

Bn;itxi

FXxt Pni0

Bn;itzi

9>>=>>; 20Equation (20) reveals that the control points p0, p1,y, pnconstitute the parameters regulating all the properties of

a Be zier distribution. Thus the control points must be

arranged so as to ensure the basic requirements of a c.d.f.: (i)

FX(x) is monotonically nondecreasing in the cutoff value x;

(ii) FX(a) 0; and (iii) FX(b) 1. By utilizing the Be zierproperty that the curve described by (19)(20) passes

through the control points p0 and pn exactly, we can ensure

that FX(a) 0 if we take p0 (a, 0)T; and we can ensure that

FX(b) 1 if we take pn (b,1)T. See Wagner and Wilson

(1996a) for a complete discussion of univariate Be zier

distributions and their use in simulation input modelling.

If X is a Be zier random variable with c.d.f. FX( ) given

parametrically by (19), then it follows that the correspondingp.d.f. fX(x) for all real x is given parametrically by

Pt fxt;fXxtgT

for t 2 0; 1

where x(t) is given by (20) and

fXxt

Pn1i0

Bn1;itDzi

Pn1i0

Bn1;itDxi

In the last equation, Dxi xi 1xi and Dzi zi 1zi (for

i 0,1,y, n1) represent the corresponding first differencesof the x- and z-coordinates of the original control points

{p0, p1,y, pn} in the parametric representation (19) of the

c.d.f.

4.3. Generating Bezier variates by inversion

The method of inversion can be used to generate a Be zier

random variable whose c.d.f. has the parametric representa-

tion displayed in Equations (19) and (20). Given a random

number UBUniform [0, 1], we perform the following steps:

(i) find tUA[0, 1] such that

Xni0

Bn;itUzi U 21

and (ii) deliver the variate

XXni0

Bn;itUxi 22

The solution to (21) can be computed by any root-finding

algorithm such as Mu llers method, Newtons method, or

the bisection method. Codes to implement this approach to

generating Be zier variates are available on Web site

www.ise.ncsu.edu/jwilson/page3.

Remark 4. As documented in the companion paper on

multivariate input modelling (Kuhl et al, 2010), the inversion

scheme specified in Equations (21) and (22) for generating

Be zier random variables will be a key element in ourapproach to building multivariate extensions of the uni-

variate Be zier distributions as well as stationary univariate

time series whose marginals are Be zier distributions.

4.4. Using PRIME to model Bezier distributions

PRIME is a graphical, interactive software system that

incorporates the methodology detailed in this section to help

an analyst estimate the univariate input processes arising in

simulation studies. PRIME is written entirely in the C

programming language, and it has been developed to run

under Microsoft Windows. A public-domain version of the

software is available on the previously mentioned Web site.

PRIME is designed to be easy and intuitive to use. The

construction of a c.d.f. is performed through the actions of

the mouse, and several options are conveniently available

through menu selections. Control points are represented as

small black squares, and each control point is given a unique

label corresponding to its index i in Equation (17). Figure 4

shows a typical session in PRIME, where the c.d.f. and p.d.f.

windows are both displayed.

In the absence of data, PRIME can be used to model an

input process conceptualized from subjective information or

expertise. Section 5.1 of Wagner and Wilson (1996a)

contains a detailed example of the interactive use of PRIMEfor subjective input modelling; here we merely provide an

overview of this approach to using PRIME. The representa-

tion of the conceptualized distribution is achieved by adding,

deleting, and moving the control points via the mouse. Each

control point acts like a magnet that pulls the curve in the

direction of the control point, where the blending functions

(ie, the Bernstein polynomials defined by Equation (18))

govern the strength of the magnetic attraction exerted on

the curve by each control point. Clicking (ie, selecting) and

dragging (ie, moving) a control point causes the displayed



12/17

c.d.f. to be updated (nearly) instantaneously. If they are

displayed, the corresponding p.d.f., the first four moments(that is, the mean, variance, skewness, and kurtosis), and

selected percentile values of the Be zier distribution are

updated (nearly) simultaneously in adjacent windows so that

the user gets immediate feedback on the effects of moving

selected control points. Thus, the user has a variety of

readily available indicators and measures, as well as visually

appealing displays, to aid in the construction of the

conceptualized distribution.

As detailed in Wagner and Wilson (1996a, b), PRIME

includes several standard estimation procedures for fitting

distributions to sample data sets:

OLS estimation of the c.d.f.; minimum L1 and LN norm estimation of the c.d.f.; maximum likelihood estimation (assuming a and b are

known);

moment matching; and percentile matching.

Figure 5 shows a Be zier distribution that was fitted to the

same data set consisting of Nafion polymer chain lengths as

shown in Figure 3. In this application of PRIME, we

obtained the fitted Be zier distribution automatically, where:

(i) the number of control points (n 1) was determined bythe likelihood ratio test detailed in Wagner and Wilson

(1996b); and (ii) the components of the control points were

estimated by the method of OLS. Figure 5 shows that

a Be zier distribution yielded an excellent fit to the given

data set.

As another example that illustrates the capability of

PRIME and the Be zier distribution family to handle

multimodal data, we describe briefly an input-modelling

problem that arose in a manufacturing simulation study. For

more details on this application using an earlier version of

PRIME that did not incorporate automatic determination of

the number of control points to be used in the fittedBe zier distribution, see Section 5.2 of Wagner and Wilson

(1996a). Surface mount capacitors were stored in lots of

varying sizes in a facility adjacent to the insulation resistance

(IR) testing area. To model the operation of the IR testing

area, we needed to estimate the distribution of capacitor lot

sizes in the storage facility.

Capacitor lot-size data were available for 2083 tested lots.

The left-hand panel of Figure 6 displays the empirical c.d.f.

for this data set and the final fitted Be zier c.d.f.; and the

right-hand panel displays a histogram and the final fitted

Be zier p.d.f., where all of the original observations were

divided by 1000 for simplicity. Notice that in the vicinity of

20 and 270 on the new scale (that is, lot sizes expressed in

1000s), there are pronounced peaks in the histogram. Usually

such a bimodal distribution indicates that the sample was

taken from two distinct distributions that must be fitted

separately so that the overall fitted distribution is a mixture

of the two component distributions; for an elaboration of this

point, see Remark 5 below. However in the current context,

the production engineers were unable to provide any addi-

tional information that would have enabled us to model

the lot-size distribution as a mixture of two simpler distri-

butions; and thus we were forced to exploit the capabilities

of PRIME for modelling multimodal distributions.

The fitted Be zier distribution displayed in Figure 6 wasobtained in two steps using the method of OLS. First we

simply used the default settings of PRIME to fit a Be zier

distribution with six control points; and the resulting fit was

unimodal and was judged to be unsatisfactory based on

visual inspection of the fitted p.d.f. and c.d.f. (As detailed in

Wagner and Wilson (1996a), several other widely used

commercial input-modelling packages also yielded unsatis-

factory fits to this data set precisely because they do not

include any distribution families that can adequately handle

multimodal data sets.) In the second step of using PRIME to

Figure 4 PRIME windows showing the Be zier c.d.f. (left panel) with its control points and the p.d.f. (right panel).



13/17

fit a Be zier distribution to the lot-size data set, we used the

option for automatic determination of the number of control

points starting from the current configuration. As shown in

Figure 6, the final fitted Be zier distribution had 13 control

points; and the fitted p.d.f. and c.d.f. closely approximated

the corresponding histogram and empirical c.d.f. for the lot-

size data set.

Remark 5. If a data set has two or more clearly

distinguishable sources each with its own distribution, then

an alternative approach to fitting a multimodal distribution

to the overall data set is to represent the corresponding

c.d.f. (or p.d.f.) as a mixture of the c.d.f.s (or p.d.f.s) for

the individual sources, where the mixing probabilities are

the associated long-run percentages of the overall data set

Figure 5 Be zier distribution fitted to 9980 Nafion chain lengths.

Figure 6 Be zier distribution fitted to capacitor lot-size data set of size 2083.



14/17

obtained from each source; see Section 8.2.2 of Law (2007).

In this situation it is natural to fit a distribution to the

subsample from each source separately; and then the

corresponding estimate of the mixing probability is simply

the fraction of the entire data set obtained from the relevant

source. This approach could not be used in the lot-size

application described above because separate sources of data

could not be identified.

The Be zier distribution family, which is entirely specified

by its control points {p0, p1,y, pn}, has the following

advantages:

It is extremely flexible and can represent a wide diversityof distributional shapes. For instance, Figures 4 and 6

depict multimodal distributions that are easily constructed

using PRIME, yet impossible to achieve with other

distribution families.

If data are available, then the likelihood ratio test of

Wagner and Wilson (1996b) can be used in conjunctionwith any of the estimation methods enumerated above to

find automatically both the number and location of the

control points.

In the absence of data, PRIME can be used to determinethe conceptualized distribution based on known quanti-

tative or qualitative information that the user perceives to

be pertinent.

As the number (n 1) of control points increases, so doesthe flexibility in fitting Be zier distributions. The inter-

pretation and complexity of the control points, however,

does not change with the number of control points.

5. Conclusions and recommendations

The common thread running through this article is the focus

on robust input models that are computationally tractable

and sufficiently flexible to represent adequately many of the

probabilistic phenomena that arise in many applications of

discrete-event stochastic simulation. For another approach

to input modelling with no data, see Craney and White

(2004).

The emphasis in this article has been on the beta, Johnson,

and Be zier families because of their flexibility and because

we have found that in practice, they can be most effectivelyapplied to simulation projects in which a large number of

input models must be built under conditions in which the

user lacks either of the following: (i) detailed information

about the mechanism generating the target inputs; or (ii) the

time to gather the information specified in (i) and use that

information to derive the precise functional form of the

relevant distribution. For situations in which the user has

more information about the genesis of the continuous

univariate distribution to be modelled, we have found the

Pearson system of distributions can often be used effectively;

see Chapter 4 of Elderton and Johnson (1969) and Sections

6.26.13 of Stuart and Ord (1994). Johnson et al(1994, 2004)

provide a comprehensive discussion of continuous univariate

distributions; see also Kotz and van Dorp (2004). For a

similar treatment of discrete univariate distributions, see

Johnson et al (2005).

Notably missing from this article is a discussion of

Bayesian techniques for simulation input modelling, a topic

that we think will receive increasing attention from

practitioners and researchers alike in the future. In selecting

the input models for a simulation, we must account for three

main sources of uncertainty:

1. Stochastic uncertainty arises from dependence of the

simulation output on the random numbers generated and

used on each runfor example, the random number U

used in generate a generalized beta random variable Xvia

Equation (11).

2. Model uncertainty arises when the correct input model isunknown, and we must choose between alternative input

models with different functional forms that adequately fit

available sample data or subjective informationfor

example, the generalized beta, Johnson SU, and Be zier

distributions fitted to the Nafion data set as depicted in

Figures 1, 3 and 5, respectively.

3. Parameter uncertainty arises when the parameters of the

selected input model(s) are unknown and must be

estimated from sample data or subjective information.

Although stochastic uncertainty is much more widely

recognized by simulation practitioners than the other twotypes of uncertainty, it is not always a major source of

variation in simulation output as demonstrated by Zouaoui

and Wilson (2004) using an M/G/1 queueing system

simulation in which stochastic uncertainty accounts for only

2% of the posterior variance of the average waiting time in

the queue, while model uncertainty regarding the exact

functional form of the service-time distribution accounts for

18% of the posterior varianceand thus 80% of the

posterior variance is due to uncertainty regarding the exact

numerical values of the arrival rate and the parameters of the

service-time distribution. In such a situation, conventional

approaches to input modelling have the potential to yield a

grossly misleading picture of the inherent accuracy of

simulation-generated system performance measures such as

the average queue waiting time. For an introduction to

Bayesian input modelling, see Chick (1999, 2001) and

Zouaoui and Wilson (2003, 2004).

Another topic not discussed in this article is the use of

heavy-tailed distributions in simulation input modelling. If

the random variable X has a heavy-tailed distribution, then

1 FXx PrfX4xg $ cxa as x ! 1 23



15/17

where c40 is a location parameter, a is a shape parameter

with aA(1,2), and B means that the ratio of the left- and

right-hand sides of (23) tends to 1 as x-N. Heavy-tailed

distributions frequently arise in simulations of computer and

communications systems (Crovella and Lipsky, 1997;

Greiner et al, 1999; Heyde and Kou, 2004). Fishman and

Adan (2005) discuss some situations in which the lognormal

distribution (a member of the Johnson translation system)

can provide a reasonable substitute for a heavy-tailed

distribution.

Additional material on techniques for simulation input

modelling will be posted to the Web site http://www.ise

.ncsu.edu/jwilson/more_info.

AcknowledgementsPartial support for some of the research describedin this article was provided by National Science Foundation GrantDMI-9900164.

References

AbouRizk SM, Halpin DW and Wilson JR (1991). Visual

interactive fitting of beta distributions. J Constr Eng Mngt

117: 589605.

AbouRizk SM, Halpin DW and Wilson JR (1994). Fitting beta

distributions based on sample data. J Constr Eng Mngt 120:

288305.

Abramowitz M and Stegun IA (1972). Handbook of Mathematical

Functions with Formulas, Graphs, and Mathematical Tables.

Dover: New York.

Akaike H (1974). A new look at the statistical model identification.

IEEE T Automat Contr AC-19: 716723.

Alexopoulos C et al (2008). Modeling patient arrival times in

community clinics. Omega 36: 3343.Chick SE (1999). Steps to implement Bayesian input distribution

selection. In: Farrington PA, Nembhard HB, Sturrock DT and

Evans GW (eds). Proceedings of the 1999 Winter Simulation

Conference. Institute of Electrical and Electronics Engineers:

Piscataway, NJ, pp 317324, http://www.informs-sim.org/

wsc99papers/044.PDF, accessed 28 March 2009.

Chick SE (2001). Input distribution selection for simulation

experiments: Accounting for input uncertainty. Opns Res 49:

744758.

Craney TA and White N (2004). Distribution selection with no data

using VBA and Excel. Qual Eng 16: 643656.

Crovella ME and Lipsky L (1997). Long-lasting transient

conditions in simulations with heavy-tailed workloads. In:

Andradottir S, Healy KJ, Withers DH and Nelson BL (eds).

Proceedings of the 1997 Winter Simulation Conference. Instituteof Electrical and Electronics Engineers: Piscataway, NJ,

pp 10051012, http://www.informs-sim.org/wsc97papers/1005

.PDF, accessed 8 July 2009.

DeBrota DJ et al (1989a). Modeling input processes with

Johnson distributions. In: MacNair EA, Musselman KJ and

Heidelberger P (eds). Proceedings of the 1989 Winter Simulation


Piscataway, NJ, pp 308318, http://www.ise.ncsu.edu/jwilson/

files/debrota89wsc.pdf, accessed 28 March 2009.

DeBrota DJ, Dittus RS, Roberts SD and Wilson JR (1989b). Visual

interactive fitting of bounded Johnson distributions. Simulation

52: 199205.

Dickson LE (1939). New First Course in the Theory of Equations .

Wiley: New York.

Elderton WP and Johnson NL (1969). Systems of Frequency Curves.

Cambridge University Press: Cambridge.

Fishman GS and Adan IJB (2005). How heavy-tailed distributions

affect simulation-generated time averages. ACM Trans Model

Comput Simul 16: 152173.

Gao F and Weiland LM (2008). A multiscale model applied to ionic

polymer stiffness prediction. J Mater Res 23: 833841.Gold MR, Siegel JE, Russell LB and Weinstein MC (1996).

Cost-effectiveness in Health and Medicine. Oxford University

Press: New York.

Greiner M, Jobmann M and Lipsky L (1999). The importance of

power-tail distributions for modeling queueing systems. Opns

Res 47: 313326.

Hahn GJ and Shapiro SS (1967). Statistical Models in Engineering.

Wiley: New York.

Heyde CC and Kou SG (2004). On the controversy over tailweight

distributions. Opns Res Lett 32: 399408.

Hill ID, Hill R and Holder RL (1976). Algorithm AS99: Fitting

Johnson curves by moments. Appl Stat 25: 180189.

Irizarry MA et al (2003). Analyzing transformation-based simula-

tion metamodels. IIE Trans 35: 271283.

Johnson NL (1949). Systems of frequency curves generated by

methods of translation. Biometrika 36: 149176.

Johnson NL, Kemp AW and Kotz S (2005). Univariate Discrete

Distributions, 3rd edn, Wiley-Interscience: New York.

Johnson NL, Kotz S and Balakrishnan N (1994). Continuous

Univariate Distributions, Vol. 1, 2nd edn, Wiley-Interscience:

New York.

Johnson NL, Kotz S and Balakrishnan N (2004). Continuous

Univariate Distributions, Vol. 2, 2nd edn, Wiley-Interscience:

New York.

Kotz S and van Dorp JR (2004). Beyond Beta: Other Continuous

Families of Distributions with Bounded Support and Applications.

World Scientific: Singapore.

Kuhl ME et al (2006). Introduction to modeling and generating

probabilistic input processes for simulation. In: Perrone LF,et al. (eds). Proceedings of the 2006 Winter Simulation


Piscataway, NJ, pp 1935, http://www.informs-sim.org/

wsc06papers/003.pdf, accessed 28 March 2009.

Kuhl ME et al (2008a). Introduction to modeling and generating

probabilistic input processes for simulation. In: Mason SJ, et al.

(eds). Proceedings of the 2008 Winter Simulation Conference.

Institute of Electrical and Electronics Engineers: Piscataway,

NJ, pp 4861, http://www.informs-sim.org/wsc08papers/

008.pdf, accessed 28 March 2009.

Kuhl ME et al (2008b). Introduction to modeling and generating

probabilistic input processes for simulation. Slides accom-

panying the oral presentation of Kuhl et al (2008a), http://

www.ise.ncsu.edu/jwilson/files/wsc08imt.pdf, accessed 28 March

2009.Kuhl ME et al (2010). Multivariate input models for stochastic

simulation. J Simul (in preparation).

Law AM (2007). Simulation Modeling and Analysis 4th edn,

McGraw-Hill: New York.

Matthews JL et al (2006). Monte Carlo simulation of a solvated

ionic polymer with cluster morphology. Smart Mater Struct 15:

187199.

McBride WJ and McClelland CW (1967). PERT and the beta

distribution. IEEE Trans Eng Mngt EM-14: 166169.

Palisade Corp (2009). Getting started in @RISK. Palisade

Corp.: Ithaca, NY, http://www.palisade.com/risk/5/tips/EN/gs/,

accessed 5 July 2009.



16/17

Pearlswig DM (1995). Simulation modeling applied to the single pot

processing of effervescent tablets. Masters thesis, Integrated

Manufacturing Systems Engineering Institute, North Carolina

State University, Raleigh, NC, http://www.ise.ncsu.edu/jwilson/

files/pearlswig95.pdf, accessed 28 March 2009.

Press WH, Teukolsky SA, Vetterling WT and Flannery BP (2007).

Numerical Recipes: The Art of Scientific Computing, 3rd edn.

Cambridge University Press: Cambridge.

SAS Institute Inc (2008). JMP 8 Statistics and Graphics Guide.http://www.jmp.com/support/downloads/pdf/jmp8/jmp_stat_

graph_guide.pdf, accessed 28 October 2009.

Stephens MA (1974). EDF statistics for goodness of fit and some

comparisons. J Am Stat Assoc 69: 730737.

Stuart A and Ord K (1994). Kendalls Advanced Theory of Statistics,

Volume 1: Distribution Theory, 6th edn, Edward Arnold: London.

Swain JJ, Venkatraman S and Wilson JR (1988). Least-squares

estimation of distribution functions in Johnsons translation

system. J Stat Comput Simul 29: 271297.

Vanhoucke M (2010). Using activity and sensitivity and network

topology information to monitor project time performance.

Omega (forthcoming).

Wagner MAF and Wilson JR (1996a). Using univariate Be zier

distributions to model simulation input processes. IIE Trans 28:

699711.

Wagner MAF and Wilson JR (1996b). Recent developments in

input modeling with Bezier distributions. In: Charnes JM,

Morrice DJ, Brunner DT and Swain JJ (eds). Proceedings of the

1996 Winter Simulation Conference. Institute of Electrical

and Electronics Engineers: Piscataway, NJ, pp 14481456,

http://www.ise.ncsu.edu/jwilson/files/wagner96wsc.pdf, accessed

28 March 2009.

Weiland LM, Lada EK, Smith RC and Leo DJ (2005). Application

of rotational isomeric state theory to ionic polymer stiffness

predictions. J Mater Res 20: 24432455.

Wilson JR, Vaughan DK, Naylor E and Voss RG (1982). Analysis

of Space Shuttle ground operations. Simulation 38: 187203.

Xu X et al (2010). Pelvic floor consequences of cesarean delivery

on maternal request in women with a single birth: A cost-effectiveness analysis. J Womens Health 19: 147160.

Zouaoui F and Wilson JR (2003). Accounting for parameter uncer-

tainty in simulation input modeling. IIE Trans 35: 781792.

Zouaoui F and Wilson JR (2004). Accounting for input-model

and input-parameter uncertainties in simulation. IIE Trans 36:

11351151.

Appendix

Exact computation of shape parameters for beta

distribution fitted to user-specified mode and variance

To simplify the notation in this appendix, we let a, m,and b denote the user-specified minimum, mode, and

maximum of the target distribution with aob and mA[a, b]

as if these quantities were known exactly; in practice of

course it is often necessary to use estimates ba, bm, and bbof these quantities in the following development. In

this appendix, we provide exact computing formulas

for the shape parameters a1 and a2 of the generalized

beta distribution (1) on the interval [a, b] that has the

user-specified mode m and the user-specified variance

sX2 (ba)2/o.

If o412 (so that the desired beta distribution has a

smaller variance than that of the uniform distribution on the

interval [a, b]), then for any value of mA[a, b], there is a

unique generalized beta distribution on [a, b] with a unique

mode at m. (Ifo 12, then it can be shown that we musthave a1 a2 1 so that the beta distribution with the givenmode and variance coincides with the uniform distribution

on [a, b]. Since the mode is assumed to be unique, this

uninteresting case is eliminated from further consideration.)

If we set the right-hand side of (4) equal to m and the right-

hand side of (3) equal to (ba)2/o, then we obtain thefollowing equivalent system of equations in terms of the

asymmetry ratio r (bm)/(ma), provided m4a so thatroN:

a31 Ba21 Ca1 D 0

a2 ra1 1 r

'A1

where

B 3r3 2r2 5 or 41 r3

C 3r3 5r2 o 3r 5 o

1 r3

D r3 4r2 5r 2

1 r3

9>>>>>=>>>>>;A2

Remark 6. In the case that m a so that r N, we solvethe mirror image problem for which m b and r 0; andthen we interchange the resulting shape parameters to obtain

a generalized beta distribution whose mode coincides with its

minimum. See also Remark 7 below.

It can be proved that ifo412, then for all rA[0,N] thecubic equation in a1 defined by (A1)(A2) has a nonnegative

discriminant

D 18BCD 4B3D B2C2 4C3 27D2

so that the cubic equation has three real roots {zj:j 1,2,3}such that:

z141

z2; z3o1

'A3

As possible values ofa1, the roots z2 and z3 are unacceptable

for the following reasons:

(i) The assignment a1A(0, 1) yields a generalized beta

distribution with an asymptote at its lower limit a,

which seems intuitively problematic and is clearly

unacceptable when the user-specified mode m exceeds

the lower limit.

(ii) The assignment a1p0 does not define a legitimate

generalized beta distribution.

We are therefore left with the unique assignment a1 z1;and a computing formula for a1 can be derived from the



17/17

explicit solution to a cubic equation as follows (see Sections

3338 of Dickson, 1939). In terms of the auxiliary quantities

P C 13B

2

Q D 13BC 2

27B3

'A4

we have

a1 z1

43P

1=2cos 1

3cos1 12Q

3P

3=2n o

13B; ifD40

B; ifD 0

(A5

.Finally we take a2 ra1 1r to complete the specificationof the generalized beta distribution.

Remark 7. In general to avoid numerical difficulties that

can occur with large values of r (that is, when r ) 1),we recommend the following approach to the use of

Equations (A1)(A5). If (bm)/(ma)41, then we solvethe mirror image problem for which r (ma)/(bm)o1;and finally we interchange the resulting shape parameters to

obtain a generalized beta distribution with the user-specified

mode m.

Received 13 July 2009;

accepted 9 November 2009 after one revision


univariate input models for stochastic simulation

Documents