industrial mathematical and statistical modeling workshop for

86
Thirteenth Industrial Mathematical and Statistical Modeling Workshop for Graduate Students July 23-31, 2007 North Carolina State University Raleigh, NC Edited by Zhilin Li, Sharon R. Lubkin, Mette S. Olufsen, and Ralph C. Smith

Upload: hakiet

Post on 02-Jan-2017

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Industrial Mathematical and Statistical Modeling Workshop for

Thirteenth Industrial Mathematical and Statistical Modeling

Workshop for Graduate Students

July 23-31, 2007 North Carolina State University

Raleigh, NC

Edited by Zhilin Li, Sharon R. Lubkin, Mette S. Olufsen, and Ralph C. Smith

Page 2: Industrial Mathematical and Statistical Modeling Workshop for

Faculty mentors Ilse Ipsen, Kazufumi Ito, Min Kang, Zhilin Li, Xiaobiao Lin, Mette Olufsen, Jeff Scroggs, Kartik Sivaramakrishnan Department of Mathematics, Box 8205, North Carolina State University, Raleigh, NC 27695-8205 USA

Participants

Megan Buzby Colorado State University Field Cady University of Washington Zhen Chen Auburn University Kuan-yu Chen National Chiao Tung University Tianbing Chen University of California, Berkeley Geoffrey Cox University of California, Irvine Kristen Dang UNC Chapel Hill Asha Dixit Auburn University Julia Eaton University of Washington Erica Fields University of Georgia Rachel Fonstad University of Iowa/Winona State University Tobias Graf Emory University Christina Ho Texas Tech University Alaina Houmard University of North Carolina Wilmington Maged Ismail Claremont Graduate University Mark Iwen University of Michigan - Ann Arbor Shuang Ji University of Texas at Arlington Yogesh Joshi New Jersey Institute of Technology Hoi Tin Kong University of Georgia Felix Krahmer Courant Institute of Mathematical Sciences Yijiang Li University of Illinois at Chicago Bo Li University of Toledo Kang-Ling Liao National Chiao Tung University Sarah Lukens Tulane University Don Kumudu Mallawaarachchi Texas Tech University Xiaoyu Mu University of Tennessee - Knoxville Sarah Olson NC State University Jim Reale-Levis NC State University Sandra Milena Hurtado Rua University of Connecticut Suman Sanyal University of Missouri - Rolla Tsvetanka Sendova Texas A&M University Svetlana Simakhina Florida State University Stephanie Vance University of Washington Xinli Wang New Jersey Institute of Technology Chang Xu Virginia Polytechnic Institute Ke Xu UNC Chapel Hill Claire (Yun) Zhang University of Washington

Page 3: Industrial Mathematical and Statistical Modeling Workshop for

Preface This volume contains the proceedings of the Industrial Mathematical and Statistical Modeling Workshop for graduate students, held at the Center for Research in Scientific Computation at North Carolina State University (NCSU) in Raleigh, North Carolina from July 23-31, 2007. This was the thirteenth such workshop to be held at NCSU, and brought together 37 graduate students from 27 universities. On the morning of the first day, industrial scientists presented six research problems. Students were divided into six teams to work on mathematical and statistical modeling problems presented. In contrast to neat, well-posed academic exercises that are typically found in coursework or textbooks, the workshop problems are challenging real-world problems that required the varied expertise and fresh insights of the group for their formulation, solution and interpretation. Each group spent the first eight days of the workshop investigating their project and reported their findings in half-hour public seminars on the final day of the workshop. This year’s projects were Density Estimation from Interval Data Bureau of Labor Statistics Presenter: Meghan O’Malley Private Equity Integrated Risk Management and Portfolio Optimization Cherokee Investment Partners Presenter: Charles Atkins Analyzing Papilloma Data in Mice Constella Group Presenter: Marjolein Smith Microbial Indicators of Drinking Water Contamination Environmental Protection Agency Presenter: Michael Messner Estimating Rocket Parameters from Flight Data MIT Lincoln Laboratories Presenter: John Peach Image processing for infrared thermography Nippon Steel Presenters: Junichi Nakagawa, Belal Gharaibeh, Keng H. Chuah, Tomoya Takeuchi, Tadayuki Ito

Page 4: Industrial Mathematical and Statistical Modeling Workshop for

These problems represent a broad spectrum of mathematical and statistical topics and applications. While the problems included many aspects that are challenging to address in the nine-day duration of the workshop, readers of this report will observe remarkable progress on all the projects. We strongly believe that this type of modeling workshop provides a highly valuable research related experience for graduate students that, simultaneously, contributes to the research of the industrial participants. One of the biggest challenges in real-world research is defining the problem and identifying what can be done about it. Our workshop participants get experience in this crucial life skill. In addition, the workshop improves graduate students’ ability to communicate and interact with scientists who are not traditional mathematicians, but who require and employ mathematical tools in their work. Through this unique experience that exposes students to the means by which mathematics is applied outside academia, the workshop has helped many students clarify their career aspirations. In several cases, in past workshops, students have been hired by the participating companies. Students interested in academic careers also find a renewed sense of excitement about applied mathematics and statistics from broadening their horizon beyond that which is typically presented in mathematics graduate education. Active participation of all involved during a nine day period of almost uninterrupted work, plus a series of social activities, enhanced the success of the workshop. The organizers are grateful to the participants for their dedicated effort and contributions. Funding for the workshop was provided by the Statistical and Applied Mathematical Sciences Institute (SAMSI) with additional financial support, personnel, and facilities provided by the Center for Research in Scientific Computation (CRSC) and the Department of Mathematics at North Carolina State University. Finally, we would like to express our sincere gratitude to Deborah Leistikow, Lesa Denning and Madeline Kamal for their efforts and help in all administrative matters. We are also grateful to (who??) for conducting the CRSC lab demonstration and to Ben Lobo, Ellen Peterson, Sara Torres, and Cory Winton for their assistance in providing transportation.

Zhilin Li, Sharon Lubkin, Mette S. Olufsen and Ralph C. Smith

Raleigh, 2007

Page 5: Industrial Mathematical and Statistical Modeling Workshop for

DENSITY ESTIMATION FROM INTERVAL DATA

Zhen Chen1, Rachel Fonstad2, Suman Sanyal3, Svetlana Simakhina4, Stephanie Vance5, Chang Xu6

Problem Presenter:Meghan O’Malley

Bureau of Labor StatisticsFaculty Advisors: Dr. Ilse Ipsen7 & Dr. Xiaobiao Lin8

Abstract

The forms for Occupational Employment Statistics request wage data from approximately 1.2 millionbusiness establishments as counts of employees in 12 wage intervals. The interval counts are used to calculateoccupational means and percentiles for the underlying wage distributions. Estimating population parametersfrom such low-detail data is difficult; density estimation leads to more precisely inference of statistics such asmeans and percentiles than the 12-bar histogram does. We construct three simple, versatile models informedby the interval percentages which can be used for various distributions. After that we compute means andpercentiles from the models and compare them with the means and percentiles from the true distributions. Inmost cases our models give satisfying approximation to the true distributions.

1Auburn University2University of Iowa3University of Missouri - Rolla4Florida State University5University of Washington6Virginia Tech7North Carolina State University8North Carolina State University

1

Page 6: Industrial Mathematical and Statistical Modeling Workshop for

1 Introduction and Motivation

Due to practical restrictions such as budget and time, data is often collected in broad categories rather thanspecific points. When a person looks at a histogram of these broad categories, he or she will usually mentallysmooth out the histogram, imagining the form of the underlying distribution. Formulas to obtain specificvalues for parameter estimates ignore the underlying distribution. They are usually calculated using weightedaverages of the broad category values. In a sense, by failing to use information on relationships betweencategories, this method ignores much of the information in a histogram. Is it possible to use the intervalsand basic intuition about “smoothness” to recover a density function? Could a recovered density function givebetter parameter estimates than weighted averages of the interval values?

For confidentiality purpose as well as the motivation of creating a versatile algorithm which is used fordensity estimation for data from any underlying distribution, we perform a 2-step simulation study.

In the first step of our simulation, data is generated from various truncated distributions by using sta-tistical software R. The underlying truncated distributions are from 3 categories, the unimodal family likeNormal and Gamma (i.e. for some value m (the mode), it is monotonically increasing for x ≤ m andmonotonically decreasing for x ≥ m.), and Bimodal shape curves (with two different modes) from themixture of two unimodel curves. Then, we create the histogram in a manner such that the area of eachbar will be identical with the area beneath the curve for each interval. The data set created for test isthe 12-bar histogram with each interval boundary and interval percentage known. The intervals we haveare A = [5.15, 7.5), B = [7.5, 9.5), C = [9.5, 12), D = [12, 15.25), E = [15.25, 19.25), F = [19, 25, 24.5), G =[24.5, 31),H = [31, 39.25), I = [39.25, 49.75), J = [49.75, 63.25),K = [63.25, 80), L = [80, 118.85).

Next, we implement our 2 methods separately to each histogram generated in the first step, and estimatethe parameters of the underlying distribution from the histogram information.

At the end, the comparison between the estimated parameters or statistics (Mean, Median, Percentiles)from our algorithm with the true parameters or statistics from the underlying distribution is performed to testthe Goodness-of-Fit of our methods.

2 Data Generation

2.1 Unimodal Histogram Data Generation

The following R codes is listed for demonstration of the data generation process, the complete code is given inthe Appendix at the end of the report.

The code following illustrates the procedure of creating unimodal histogram.

#### For the Unimodal Gamma distribution with alpha=12,beta=2 ###

n=100000; ## number of data generated

alpha=12; beta=2; ## alpha and beta are the parameters of gamma distribution;

X=rgamma(n, alpha, beta); ## function rgamma() is used to create gamma

distribution sample;

########### 12 Areas Computation ##########

A = length(X[5.15<=X & X <7.5])/length(X)

B = length(X[7.5 <=X & X <9.5])/length(X)

C = length(X[9.5 <=X & X <12])/length(X)

.........

where, in the third line of the code above, an i.i.d. (independent and identical distributed) sample is generatedfrom the Gamma distribution with parameters α = 12 and β = 2 by using function rgamma(·). After that, apercentage histogram is computed for each interval based on this i.i.d. sample. Notice that since the mean ofthe sample is far away from the left boundary 5.15, the truncation effect is trivial ( P (X < 5.15) = O(10−6),where X is the random variable following gamma distribution and function P (·) is the probability density

2

Page 7: Industrial Mathematical and Statistical Modeling Workshop for

function). But, in the bimodal sample generation process as in the following example, the truncation effectmust be considered by using conditional probability formula.

2.2 Bimodal Histogram Data Generation

This section presents the way to get a bimodal histogram from mixing two unimodal density functions andexplain how to neutralize the truncation effect. Without loss of generality, we take the following as an exampleof generating bimodal histogram.

First generate two iid samples from Gamma distributions Gamma1(α1 = 20, β1 = 2) and Gamma2(α2 =20, β2 = 2)respectively by using function rgamma(·). After that, the truncation effect is computed as

C = P (X < 5.15) =∫ 5.15

−∞f(x)dx

where, f(x) is the PDF(Probability Density Equation) of the underlying distribution. Then, the conditionalpdf is

P (X|X > 5.15) =P (X)

P (X > 5.15)=

P (X)1− P (X < 5.15)

=P (X)1− C

At the end, the two samples and are mixed together according to the known two ratios that add up to be one(for example: r1 = 3/4 and r2 = 1/4), as well as the truncation effect as in the following formula:

M(X|X > 5.15) = r1 ∗ P1(X) ∗ 11− C1

+ r2 ∗ P2(X) ∗ 11− C2

where C1 and C2 are the two truncation effects of the two distributions respectively.

2.3 Test Data Sets Summary and Expected Simulation Result

With the same idea, we successfully generated 6 unimodal PDF curves and 4 bimodal curves, which is listedin the table below:

CDF Parameters Mean Pct 10 Pct25 Pct50 Pct75 Pct906 Unimodal Curves

G (α = 12, β = 2) 24 15.66 19.04 23.33 28.24 33.20N (µ = 35, σ = 9) 35.01 23.49 28.94 35 41.07 46.54N (µ = 58, σ = 15) 58.01 38.79 47.89 58.00 68.12 77.23N (µ = 6.75, σ = 1.5) 7.15 5.64 6.20 7.02 7.94 8.80N (µ = 3.5, σ = 3.5) 7.42 5.47 5.98 6.99 8.43 9.99

LogN (µ = 3, σ = 1) 35.93 7.98 12.41 22.40 42.29 76.154 Bimodal Curves

0.87N1+0.13N2 (µ1 = 58, σ1 = 15), (µ2 = 58, σ2 = 15) 46.81 30.00 36.39 43.89 52.90 74.650.9N1+0.1N2 (µ1 = 58, σ1 = 15), (µ2 = 58, σ2 = 15) 69.00 57.68 61.46 65.84 70.81 90.00

0.85N1+0.15N2 (µ1 = 35, σ1 = 10), (µ2 = 75, σ2 = 5) 40.29 23.19 29.61 37.24 46.88 67.890.75N+0.25G (µ = 16, σ = 9), (α = 30, β = 2) 27.58 9.29 13.71 20.44 33.57 61.17

where, N, G and LogN represents Normal, Gamma and Log Normal Distribution respectively. Row 1-9corresponding to data set 1-9 that we will use to test the algorithm in the following sections.

Our major goal is to recover the parameters and statistics in the table by using the limited information fromthe percentage histogram. We will also compare the discrete mean computed from the histogram bar-chartwith the continuous mean computed from our recovered density curves.

3

Page 8: Industrial Mathematical and Statistical Modeling Workshop for

3 Approach I : Polynomial Approximation

One of the way to approach the problem is to use polynomial interpolation [2]. Here we have freedom to choosepiecewise polynomial models of degree 0 to 12 since we have 13 conditions to satisfy. We want to preserve 13areas under the interpolating curve in each 13 intervals given to us. It would be nice to use standard finitedifferences technics or splines (natural or B-splines) but it is not sufficient for our problem of interest. We arenot given collocation points we are given areas in each interval. One can construct a cumulative distributionfunction (CDF) based on the histogram and obtain collocation points for standard interpolation procedures.By definition, a CDF approximation curve must be monotonically increasing otherwise we violate positivityof the probability density function (PDF) which is a derivative function of the CDF. This is the reason whystandard finite differences and high order spline methods can not be used for solving the problem of interest.We observe that simple piecewise linear continuous connection at the collocation points of the CDF gives usa continuous approximation for CDF and discontinuous approximation for the PDF, i.e. we are back to theinitial stage of histogram. If we smooth out the CDF approximation with quadratic polynomials (splines orfinite differences) we obtain a smooth CDF approximation and a continuous PDF approximation. One stepfurther in this way (natural cubic splines) gives us twice continuously differentiable CDF approximation andcontinuously differentiable PDF approximation. But there is no way to get a curve positive everywhere usingthis standard methods. And there is another issue which has to be taken into account – the greater the degreeof an interpolating polynomial the worse it looks because of fluctuations of high degree polynomials. Onemore aspect that we focus on is to keep the algorithm simple and robust. We have to keep in mind thatthis interpolation modeling is needed for statistical data containing thousands of data sets which have to beanalyzed using a computer code. So we have to minimize the computational time and keep the algorithm assimple as possible.

3.1 Method 1: Piecewise linear approximation

We are given the partition of the wage interval [0,∞) as {x1, x2, . . . , x13} and areas of corresponding bars ineach interval of the partition {A1, A2, . . . , A13}. We use a piecewise linear approximation to fit a histogram. Inother words we are generating a PDF approximation preserving areas of the histogram. To insure uniquenessof the approximation we have to make some additional assumptions about the statistical data. We assumethat discontinuity of PDF at x2 is allowed; the area of the first bar is zero; the PDF is zero, linear or convexup and decreasing to zero on the interval [x13,∞) (the tail). We choose the tail for our approximation modelto be linearly or exponentially decaying. We pick these two forms of the tail because the exponential tailgives the upper bound on the mean and the linear tail gives the lower bound on the mean. We have 12 areasA2, A3, . . . , A13 to preserve with 12 pieces in the linear form. (The first bar is always zero, i.e. A1 = 0.)

y = f2(x) = a2x + b2 for x ∈ [xb, x3], where xb ∈ [x3, x3],y = f3(x) = a3x + b3 for x ∈ [x3, x4],...

......

y = f12(x) = a12x + b12 for x ∈ [x12, x13],y = f13(x).

The tail y = f13(x), x ∈ [x13,+∞) is chosen to be in linear or exponential form.For the linearly decaying tail, we have

f13(x) = a13x + b13 for x ∈ [x13, xe] ⊂ [x13,+∞), (1)

where xe = f−113 (0). We have 12 sets of a and b parameters plus xb and xe are to be found. Area matching

gives 12 conditions to satisfy, which are∫ x3

xb

(a2x + b2) dx = A2,

4

Page 9: Industrial Mathematical and Statistical Modeling Workshop for

∫ xi+1

xi

(aix + bi) dx = Ai for i=3,4,. . . , 12,∫ xe

x13

(a13x + b13) dx = A13.

In addition we have xb = min[x2, f−12 (0)] and the continuity conditions at each intermediate point (except

x2)

yi(xi+1) = yi+1(xi+1), for i = 3, . . . , 13.

So we have 25 unknowns and 25 conditions.For the exponential decaying tail, we have

f13(x) = a13 exp(−b13x), x ∈ [x13,+∞)

We have 12 sets (a,b) parameters to find and xb. Equations: 12 area modeling equations and 11 continuityconditions and xb = min[x2, f

−12 (0)]. Totally, 25 unknowns and 24 equations. To add one more missing

condition to have a unique solution we make the additional assumption that

f2(x3) =(

A2

∆x2+

A3

∆x3

), (2)

i.e. the curve passes through the middle point of neighboring bars 2 and 3.Now to take care of “bad” intervals, where a line can go negative, we split it into two piecewise linear

subintervals xi < x < xi+1. Then we set (xi+1, yi+1) = (xi+1, 0) and (x, y) = (Ai

yi+ xi,

Ai

∆xi). This conserves

the area of each bar and gives a unique solution.

Illustration of a piecewise linear approach.

3.2 Method 2: Piecewise quadratic approximation

Since we agreed to minimize fluctuations in PDF approximation, using quadratic polynomials we do thefollowing. We force the curve to cross middle parts. We accept this interpolation to refine the shape of thehistogram even better than the piecewise linear approach. We skip the first interval where A1 = 0. We make

5

Page 10: Industrial Mathematical and Statistical Modeling Workshop for

the second interval to be linear. All other intervals have some quadratic form and again as in Method 1 wemake the last interval (the tail) to be (a) linearly decreasing and (b) exponentially decreasing.

y = b2x + c2, x ∈ [xb, x3] ⊂ [x2, x3],y = a3x

2 + b3x + c3, x ∈ [x3, x4],...

......

y = a12x2 + b12x + c12,

y = f(x).

where f(x) = b13x + c13 for x ∈ [x13, xe] or f(x) = a13 exp(−b13x) for x ∈ [x13,∞). We have 10 sets ofparameters (ai, bi, ci) for parabolas and b2, c2, b13, c13, x

∗, xe making it a total of 36 unknowns. To uniquelydetermine these parameters we set the following conditions:

A2 =∫ x3

xb

(b2x + c2) dx,

Ai =∫ xi+1

xi

(aix2 + bix + ci) dx for i = 3, 4, . . . , 12,

A13 =∫ xe

x13

(b13x + c13) dx.

Matching values at the ends of each intervals we have

fi(xi+1) = fi+1(xi+1) =

(Ai+1

xi+2−xi+1+ Ai

xi+1−xi

)2

for i = 2, 3, . . . , 13. (3)

In total we get 36 equations to uniquely find the 36 unknowns. Now again we have to take care of ”bad”intervals where a parabola can have a negative value. We decrease a degree of a polynomial by one in suchintervals. This procedure will be enough to preserve the area and keeps the number of unknowns the same asthe number of equations to get a unique solution.

We note that there could be different ways to get over such situations. Number of improvements to suchsubjected algorithms can be applied further. Making different assumptions about the unknown PDF, differentresults can be obtained. For example we can use the following values of the approximating curves at the endsof intervals, (a) geometric average of heights of two neighboring bars, i.e. fi(xi+1) =

√AiAi+1

(xi+2−xi+1)(xi+1−xi),

(b) weighted average of heights of four neighboring bars.

6

Page 11: Industrial Mathematical and Statistical Modeling Workshop for

Illustration of a piecewise quadratic approach.

4 Approach II: Weighted Normal Approximation

In this method we try to fit two weighted normal distributions, f = w1f1 + w2f2, where w1 and w2 are twoweights and f1 and f2 are two normal density functions.

4.1 Attempt I

Fit distribution curve by minimizing the following function:

12∑k=1

∣∣∣∣∫Ik

f(x)a

dx− pk

∣∣∣∣ (4)

where Ik represents the kth interval, pk is the area of the kth interval and a is the truncation effect defined as

a =∫ 118.85

5.15

f(x)dx. (5)

Here (4) is a function of w1, µ1, σ1, w2, µ2, σ2, where, µi and σi are the mean and the standard deviation ofnormal distribution (i = 1, 2). Issues with directional derivative being “Zero”.(i.e. too small).

4.2 Attempt II

Transform each of the 12 interval linearly to unit intervals.

i.e. [5.15, 481) → [0, 12]

Specifically, if Ik is the kth interval, then Ik is mapped linearly to [k-1,k]. For instance let Ik be the interval[xL, xU ], then the mapping function is

x → 1xU − xL

x + k − xU

xU − xL, xL ≤ x ≤ xU . (6)

This results in 12 intervals of equal width. Then try to fit distributions as above. The problem with thismethod is that the pdf transformed back onto [5.15, 481) is usually discontinuous at interval boundaries.

7

Page 12: Industrial Mathematical and Statistical Modeling Workshop for

4.3 Attempt III

Fit multiple distributions (Normal distribution) to the scaled histogram on [0, 12] with fixed variance. Thatis: 6 total distributions

f1 ∼ N(µ1, 1)f2 ∼ N(µ2, 1)f3 ∼ N(µ3, 1/2)f4 ∼ N(µ4, 1/2)f5 ∼ N(µ5, 2)f6 ∼ N(µ6, 1/4)

f =6∑

i=1

wifi with∑6

i=1 wi = 1 and each wi ≥ 0

Here we don’t have any condition given on µ1, . . . , µ6 and N(µ1, 1) is the normal distribution with mean µ1

and variance 1.Minimization Process by Using Matlab

Matlab Program—- Optimization of error function previously given

Choose initial vector: −→x = (µ1, . . . , µ6, w1, . . . , w6) (Algorithm is very sensitive to initial vector). Whereinitial µi values chosen to be center of 2 initials with largest probability initially.

Optimize objective function with “fmincon”function in the optimization toolbox of matlab.

Objective function: sum of squared error in actual and estimated interval probabilities, which is definedas following

g(µ1, µ2, . . . , µ6, w1, w2, . . . , w6) :=12∑

k=1

[(6∑

i=1

wi (Φ(k, µi, σi)− Φ(k − 1, µi, σi))

)− pk

]2

(7)

where,

σ1 = σ2 = 1σ3 = σ4 = 1/2

σ5 = 2σ6 = 1/4

Output coefficient vector: test absolute difference between histogram of model and actual differences. Ifabsolute difference is greater than the tolerance = 0.1 then run optimization algorithm with new initial vectortake better of the two estimates in term of objective function g(−→x ).

In order to mean must integrate our initial distributions∫ 481

5.15

xπ(f(x))dx

where, π represents the piecewise linear change of variable function initially used.

5 Result and Discussion

5.1 Means

The comparisons of each approach’s mean to the ”true” mean can be visualized by the following graphs.Using dataset1, the linear polynomial approximation with a linear tail matched closest by absolute error to

8

Page 13: Industrial Mathematical and Statistical Modeling Workshop for

the ”true” mean. In dataset2, dataset3, dataset5 and dataset9, the linear polynomial approximation withan exponential tail was the best method. Dataset7 and dataset8’s ”true” means were best described by thequadratic polynomial estimation with a linear tail. Dataset4 and dataset6 both had a ”true” mean that wascalculated closest from the linear combination/optimization method.

opt: Optimization Method; quad-lin: Quadratic polynomial estimation with linear tail; quad-exp:Quadratic polynomial estimation with exponential tail; lin-lin: Linear estimation with linear tail; lin-expLinear estimation with exponential tail; true: True mean.

9

Page 14: Industrial Mathematical and Statistical Modeling Workshop for

Figure 1: Comparison of mean for nine data sets.

10

Page 15: Industrial Mathematical and Statistical Modeling Workshop for

5.2 Percentile

For each data set’s ”true” percentile (10, 25, median, 75, and 90) value are compared against the three generalapproaches; optimization/linear combination, quadratic, and linear polynomial approximation. Dataset1 anddataset8’s percentiles match closest with the linear combination method. In dataset2 and dataset3 the methodapproximating furthest from the ”true” values is the quadratic polynomial approximation. Dataset4 doesnot seem to have a best estimation by any of the methods. For dataset5, either one of the polynomialapproximations appear to work best in estimating the distribution of the data set. All of the methods arecomparable based on dataset6, similarly for dataset7.

11

Page 16: Industrial Mathematical and Statistical Modeling Workshop for

Figure 2: Comparison of percentiles for 9 data sets.

12

Page 17: Industrial Mathematical and Statistical Modeling Workshop for

6 Conclusion

In summary we have five fitting methods: four variances of the piecewise approximation plus the normalapproximation. Based on the comparison result, the piecewise linear approximation with linear decaying tailgets the highest score and in most cases gives the best inference of means and percentiles among the five.Future works could be as follows: (1) The piecewise approximation method might be improved by usinghigher degree polynomials [4] such as cubic curves to get more precisely fitting. More than that it is alsopossible to trying piecewise exponential or power curves. However these approaches may lead to multiple oreven infinite solutions. For the polynomial approximation a very promising approach is to use monotone cubicHermite splines [1] for CDF with optimized smoothness. For the choice of connection point different valuescan be chosen, for example geometric or weighted averages of heights of neighboring bars. (2) The normalapproximation method might be improved by incorporating variances as optimization parameters. And theresult might be better if we find a way to choose the initial parameters smartly. (3) The use of additionalinformation about the real data will be helpful. More tests should be run.

References

[1] Carlson, R. E. & Fritsch, F. N.Monotone piecewise cubic interpolation, SIAM J. Numer. Anal., Volume17, 1980, pages 238–246.

[2] Michelle Schatzman, Numerical analysis–A mathematical introduction, Oxford, 2002.

[3] John A. Rice, Mathematical Statistics and Data Analysis, 3rd Edition.

[4] M.Berzins. Adaptive polynomial interpolation on the evenly spaced meshes, Accepted for publication.

13

Page 18: Industrial Mathematical and Statistical Modeling Workshop for

Appendix A: Relative R code

#### For the Unimodal Gamma distribution alpha=12,beta=2 ###

n=100000; ## number of data generated

alpha=12; beta=2;

X=rgamma(n, alpha, beta);

length(X);

mean(X); median(X);

quantile(X, probs=c(10,25,50,75, 90)/100)

########### 12 Areas Computation ##########

A = length(X[5.15<X & X <=7.5])/length(X)

B = length(X[7.5 <X & X <=9.5])/length(X)

C = length(X[9.5 <X & X <= 12])/length(X)

D = length(X[12 <X & X<=15.25])/length(X)

E = length(X[15.25<X & X<=19.25])/length(X)

F = length(X[19.25<X & X<=24.50])/length(X)

G = length(X[24.50<X & X<=31.00])/length(X)

H = length(X[31.00<X & X<=39.25])/length(X)

I = length(X[39.25<X & X<=49.75])/length(X)

J = length(X[49.75<X & X<=63.25])/length(X)

K = length(X[63.25 <X& X<=80]) /length(X)

L = length(X[80 <X & X])/length(X)

#### For the Bimodal Mixture PDF of two Gamma distributions ###

### Gamma1(alpha=20,beta=2) and Gamma2(alpha=6,beta=2) ###

n=100000; ## number of data generated

shape=30; rate=1; beta=2;

gammar1=rgamma(n, shape, rate = 1, scale = beta);

gammar2=rgamma(n, 6, rate = 1, scale = beta);

c=1-pgamma(5.15,shape = 6,rate = 1,scale=2);#truncation factor#

f1=3/4

f2=1/4

trucgamma2=gammar2[gammar2>5.15]

samgamma2=sample(trucgamma2,f1*n)

samgamma=sample(gammar, f2*(1/c)*n)

X=c(samgamma2,samgamma)

mean(X);

median(X);

quantile(X, probs=c(10,25,50,75, 90)/100)

14

Page 19: Industrial Mathematical and Statistical Modeling Workshop for

PROBLEM 2: Private Equity Integrated Risk Management and Portfolio Optimization 1

Megan Buzby2, Field Cady3, Alaina Houmard4, Shuang Ji5, Hoi Tin Kong6, Don Kumudu Mallawaarachchi7

Problem Presenter:Charles Atkins

Cherokee Investment Partners

Faculty Mentor:Jeff Scroggs

Department of Mathematics, North Carolina State University

Abstract

In corporate finance, the Discounted Cash Flow (DCF) model is used as a basic framework for businessdecision making; however DCF is incapable of capturing the value of options embedded in many situations.We propose to value real options, and incorporate this valuation process into an Asset Liability Model. Weuse data provided by Cherokee Investment Partners and value options using both Balck-Scholes and DecisionTrees.

1 Introduction

Cherokee Investment Partners, LLC (CIP) is a redevelopment firm. CIP invests in abandoned and contam-inated industrial properties called brownfields. The purchased properties undergo remediation so that theycan be redeveloped into residential and/or commercial property. CIP is regularly presented with candidateprojects. Use of risk assessment and management assists CIP in choosing which properties to pursue.

Our challenge is to incorporate real options with Asset Liability Management to form a unified quantitativemodel for project risk assessment and financial management. In this paper we will provide some preliminarydefinitions and explanations in Section ??, then in Section ?? we will discuss some of the existing modelsused such as the Decision Tree Model and the Black- Scholes Model. In Section ?? we will discuss the Black-Scholes Model with delay option and then Classical Asset Liability Management in Section ??. Our new AssetLiability Management Model will be explained in Section ?? followed by the results of the model in Section?? and conclusion in Section ??.

2 Preliminary Information

We begin by explaining the basics of options in terms of financial options and real options. In the financialmarket an option provides the holder a right but not obligation to buy or sell an underlying asset at a fixedprice, namely the strike price K, at or before the expiration period T of the option. There are two typesof options – a call option and put option. A call(put) option allow the buyer of the option to buy(sell) theunderlying asset at the strike price. If the underlying asset value at the expiration ST is higher(lower) thanthe strike price, the option is usually exercised. The buyer of the option buys(sells) the asset at the strikeprice. The difference between the asset value and the strike price ST −K is the profit. On the other hand,if the underlying asset value at the expiration ST is lower(higher) than the strike price K, then the option isnot exercised and the buyer of the option loses the amount he/she paid for the option.

1Research supported by NSF, NCSU, and Cherokee Investment Partners2Colorado State University3University of Washington4University of North Carolina Wilmington5University of Texas Arlington6University of Georgia7Texas Tech University

1

Page 20: Industrial Mathematical and Statistical Modeling Workshop for

The term ”Real Option” is introduced by Prof. Steward Myers at MIT Sloan School of Management. Inany kind of business, a company needs to deal with a sequence of decision making process such as building amanufacturing plant, extending product line, investing in research and development, etc. These are called RealOptions because they involve a physical asset, not just financial instruments. In each strategic investment, acompany has the right but not obligation to exploit these opportunities. Taking real options into account cansignificantly affect the evaluation of the potential investment.

3 Applying Existing Models to the Texas Project

Our group was provided with a summary of scenarios of an example referred to as the ‘Texas project’ [?].There are five different scenarios;

Scenario 1: Base Case,

Scenario 2: 25% increase in costs,

Scenario 3: 25% increase in costs, 25% decrease in absorption rate,

Scenario 4: 25% increase in costs, 25% decrease in revenue, and

Scenario 5: 25% increase in costs, 25% decrease in revenue and absorption rate.

The cost consists of the sum of the purchase price, remediation costs and infrastructure costs. The AbsorptionRate measures the rate at which units are sold over a period of time. Revenue is the amount received in sales.For each scenario the value of asset in terms of millions of dollars and the CIP multiples are given in Table??.

Scenario1 2 3 4 5

Value of the Asset $22.6 $24.2 $27.9 $26.2 $33.9CIP Multiple 2.79× 2.13× 1.98× 1.39× 1.26×

Table 1: Summary of Scenarios

Our task is to decide if the value of the call option is greater than $500,000. The CIP multiple is a multipleof the funds invested that CIP expects to get back. From the data provided we were able to explore variousmodels. For the models we will set our price of the call option C to be $500,000 and the strike price K to be $25,000,000. The decision to buy the option will depend on the value of the call option calculated from variousmodels. If the call option obtained through calculation is greater than $500,000 then we will choose to buythe option, otherwise we will not. Now we explore, what models to apply and decide if we would purchasethe option to buy the property. Note that we will be calculating the price of the European call option, C.European options can only be exercised at expiration, while the American option can be exercised before orat expiration.

3.1 Decision Tree Pricing Model

In case of several scenarios, the Decision Tree Model (see Fig. ??) can be used to help guide decisions thatconsider the expected multiples, the asset value and the price of the option. In general, we suppose there aren scenarios s1, s2, . . . ,sn, each of which is associated with a probability pi and a multiple mi. We computethe expected value of the option using Eqn. (??).

expected value =n∑

i=1

pi max(miS −K, 0). (1)

Now let’s consider the Texas project where there are five scenarios. The Strike price is set to K =$25, 000, 000. The multiples are the CIP multiples given in Table ??. A probability pi is chosen for scenario i.

2

Page 21: Industrial Mathematical and Statistical Modeling Workshop for

Figure 1: Decision Tree Model

Figure 2: Texas project Decision Tree Model

We let p1 = 0.4, p2 = 0.3, and p3 = p4 = p5 = 0.1. The Texas project can therefore be depicted in the followingFig. ??. After calculating Eqn. (??) for the Texas project we obtain an expected value of $30,450,000 whichis significantly greater than the call price of $500,000. Therefore we would decide to purchase the option.

3.2 Black-Scholes Model

We used the Decision Tree Model to value the option and now we will look at the Black-Scholes Model. TheBlack-Scholes Model values a European call with the formula in Eqn. (??).

C = SN(d1)−Ke−rtN(d2) (2)

d1 =ln(S/K) + (r + σ2/2)t

σ√

t

d2 = d1 − σ√

t

where C denotes the value of the call option, K denotes the strike price of the option, S denotes the value ofthe underlying asset at time 0, t denotes the time to expiration of the option, r denotes the riskless interestrate corresponding to the life of the option and σ2 denotes the variance of the value of underlying assets. The

3

Page 22: Industrial Mathematical and Statistical Modeling Workshop for

riskless interest rate is from a different kind of investment, for example a bond, and it is the interest rate atwhich an investor can earn interest without incurring any risk.

We apply Eqn. (??) to price the option of the five scenarios of the Texas project. The parameters aredefined to fit the idea of pricing the option for the scenarios. The values of each asset, Si, i = 1, · · · , n aregiven in Table ??. The time allotted to decide on the option, t, and the riskless rate, r, are the same for everyscenario. We set t at 2 time periods and set r at 4% to find the value of the call option after t = 2 years. Thevariance in the value of the asset is σ2, and we calculate σ to be 0.1555 from the variance of the values of theasset in Table ??. We obtain the following values of the option in terms of millions of dollars:

Scenario1 2 3 4 5

C $0.91 $2.26 $5.41 $3.96 $10.61

Table 2: Black-Scholes Model - Value of the Option

The average of the options from the five scenarios is $4,600,000. This value is also significantly greater thanthe call price of $500,000 so again we would choose to purchase the option.

4 Black-Scholes Model with Delay Option

Our group was also provided with an Excel workbook of the Texas project [?] which details the expectedcash flow. We will use this information to calculate the value for delaying the remediation option 1 or 2years. The asset was purchased in June 2007 by CIP and consists of 350 acres of mixed use development.This included single family homes, business/office and commercial/retail acreage, and mixed use acreage forretail/apartment/condo development. We assumed the purchase price of the land was $25,000,000 and theremediation and infrastructure costs are expected to be $12,500,000 each. From here on, we will consider’remediation’ to include remediation and infrastructure. In addition, we assumed the remediation to takeplace on a schedule of two years (8 quarters) with equivalent costs each quarter ($3,125,000).

Given a table of revenue from the sale of the land after remediation, we consider three cases:

1. No change: The purchase and remediation take place as scheduled and is finished in the 8th quarterafter purchase.

2. Four quarter delay: Purchase as scheduled, but delay remediation for 4 quarters. Note that the delay ofremediation would also delay any revenues an equivalent 4 quarters.

3. Eight quarter delay: Purchase as scheduled, but delay remediation for 8 quarters.

Using the data provided in the Excel workbook, we calculated the call price of purchasing the land (in the firstcase) and of delaying for 4 and 8 quarters (in the 2nd and 3rd cases, respectively). Since we will be applyingthe idea of dividends we will consider the cost of delay y as a dividend yield. The Black-Scholes formula givenby Eqn. (??) is modified to incorporate dividends and the new Black-Scholes Model [?] used is Eqn. (??).

C = Se−ytN(d1)−Ke−rtN(d2) (3)

d1 =ln(S/K) + (r − y + σ2/2)t

σ√

t

d2 = d1 − σ√

t

For each case, the strike price K is still $25,000,000, the riskless rate r is set at 5%, and the variation σ2 isset at 0.045. The present value PV and length of time in quarters t differ for each case. As in [?, page 35],the cost of delay y is estimated to be 1/t, again where t is the length of the total time period.

4

Page 23: Industrial Mathematical and Statistical Modeling Workshop for

We calculate the present value by using the formula:

PV =n∑

t=1

Ct

(1 + i)t(4)

where Ct is your net cash flow at time t as seen in Figure ?? for Case 1.

Figure 3: Cash Flow with no delay

The discount rate, i, corresponds to the discount on the future cash flow and is set at 5%. The time andcalculated present values from Eqn. (??) in terms of millions of dollars are given in Table ??.

Case1 2 3

PV $23.8 $17.3 $10.0t 20 24 28

Table 3: Time and Present Value

We obtained the following values of the option using Eqn. (??).

Case1 2 3

C $39.0 $24.9 $22.9

Table 4: Black-Scholes Model with Delay Option - Value of the Option

The value of the option for each case is also significantly greater than the call price of $500,000 so we wouldchoose to purchase the option.

5 Classical Asset Liability Management

Asset Liability Management (ALM) is any of several systems for turning money management problems intomath problems. Your first instinct may be that this is ridiculous; you should put money in what appear to be

5

Page 24: Industrial Mathematical and Statistical Modeling Workshop for

good investments and only use math to make sure the books balance. But ALM, where the macrostructureof a financial portfolio is determined with recommendation from a computer program, is the norm amongpension funds and university endowments, and is a growing trend even among individual investors. ALM isspecially designed for long-term investments with multiple assets and liabilities, so that the effects of individualinvestments are minimized, and the laws of large numbers make the mathematical model increasingly accurate.

As a simple illustrative example, let us assume that there are n assets a1, . . ., an in which we can invest,and N scenarios for their future performance with probabilities p1, . . ., pN . Let us assume that any tradesyou make are executed at the end of an economic quarter. Let Xsat be the fraction of your portfolio in asseta, at time t, if scenario s happens. Let us also say that you want to maximize your expected wealth after Qquarters. That is, you want to maximize

E [W ] =∑

s

psW (s, x), (5)

where x is a vector of all the Xsat and W (s, x) is the final wealth if you invest according to x and scenario shappens. Two scenarios may look the same up until time t, and the decisions based on identicle data shouldbe the same. The result is the following linear programming problem.

maxX

∑s

psW (s, x) (6)

subject to

1. if scenario i and scenario j are identical until time t, Xiat = Xjat

2. for all s and t,∑

s Xsat = 1

If your scenarios and their probabilities are realistic (a big if, but lots of smart people are working on thatwith good results), solving the math problem gives sound investment advice.

6 Our New ALM Model and Computational Implementation

Following classical ALM models, we assume there is a probability distribution function over the space ofeconomic scenarios. However, whereas the strategy space in the classical model is simply Rn (the componentsof a strategy being the Xsat), the possibilities with projects is more complicated (due to necessarily sequentialnature of steps, etc.). Thus, we found it more conceptually accessible (and computationally implementable)to think of optimizing decision strategies over the space of scenarios. A decision strategy is a function whichaccepts the history of the portfolio, future projections, and current projections, as its arguments and gives acourse of action for each project in the portfolio.

In the specific model that we implemented, each project starts off with a list of projected costs and a listof projected revenues for each step of the project. At each time step, for each project, the remaining costsand revenues are randomly perturbed, and a strategy is chosen. If the strategy is to continue as planned, thefirst element in the projected revenue list, divided by (1 + r)t, is added to total cash, and the discounted firstprojected cost is subtracted. These elements are then removed from the lists. If the project is accelerated, thefirst two costs/revenues are realized and removed. If the project is delayed, no action is taken. If the projectis abandoned, it is deleted and has no further impact on the portfolio value.

Five strategies were chosen to consider;

1. always continue as planned,

2. always delay,

3. always accelerate,

4. react based only on the most recent changes in cost/revenue projections, and

5. calculate the total remaining discounted revenues and costs.

6

Page 25: Industrial Mathematical and Statistical Modeling Workshop for

For each scenario, at each time stage, we make a decision for all the projects, respectively, by the five differentstrategies. Decisions (accelerate, continue, delay, abandon) are made and the portforlio is updated accordingly.Since changes in cost, price, and absorption rate vary, we generated 1000 different scenarios. After we have runthrough all the years, we look at the last updated portforlio averaged over the 1000 scenarios. This simulationalso considered multiple projects at the same time.

Our trials are a proof-of-concept for this kind of simulation, and an expected verification of the efficacy ofthe NPV in making financial decisions. However, we emphasize the many shortcomings of this simulation.First, the scenarios are only marginally realistic. Second, the parameters for the strategies were based on”reasonable-looking” values. And third, the model ignores the many complications of real business dealings(taxes, contract clauses particular to a certain project, etc.) A more robust version of the code was used whichattempted to correct for many of these shortcomings.

7 Results

Strategy1 2 3 4 5

Portfolio Value 18.9693 10.0000 19.1747 16.6808 20.9425

Table 5: New Asset Liability Management Model - Portfolio Value

After running our simulations it was consistently found that strategy 5 performed the best, followed inorder by strategies 3, 1, 4, and 2. The ending portfolio values for each of the strategies are shown in Table??. It was not surprising that the optimal strategy was scenario 5 since it was based on projected cost andrevenue. Although the scenarios we generated are somewhat unrealistic, the decision making process providesa good framework to incorporate real options with the ALM model.

8 Conclusion

This project represents both a learning process on our part and a venture to and, we believe, beyond, thefrontiers of modern quantitative finance. Of the students, only one approached the project with any backgroundin financial math, but we all left with a strong foundation in both quantitative financial models and their placein the world of business. Asset liability management, in its various incarnations, is the overarching frameworkof modern finance, taking math from a peripheral tool of economic planning to a unifying component. Andoptions, both real and tradable, are among the most important elements of modern finance.

Additionally, to our knowledge there has been no work in the past to fit real options into an ALM framework;ALM models have focused on fluid, continuous assets which can be restructured at will. But real options areinherently discrete, and they have complications not seen in other assets, so fitting them into a larger financialpicture is challenging. Our conceptual framework optimizes not over continuous variables so much as strategies,which have both discrete and continuous parts. Our computational work is a proof of concept that this kindof optimization is both possible and promising.

Given more time, we would like to expand our work in several ways. In particular, our computationalmodel should be made more flexible and more realistic. It would be interesting to compare its performancewith traditional ALM portfolios, and even to explore portfolios containing both projects and liquid assets.These are the first steps into a new area, and its possibilities remain to be seen.

7

Page 26: Industrial Mathematical and Statistical Modeling Workshop for

References

[1] Berger, Adam J. and Mulvey, John M. The Home Account Advisor, Asset and Liability Managementfor Individual Investors, Worldwide Asset and Liability Management, edited by W.T. Ziemba and J.M.Mulvey, Cambridge University Press, Cambridge, 1998.

[2] Carino, David R., Kent, T., Myers, D.H., Stacy, C., Sylvanus, M., Turner, A.L., Watanabe, K., andZiemba, W.T. The Russell-Yasuda Kasai Model: An Asset/Liability Model for Japanese Insurance Com-pany Using Multistage Stochastic Programming, Worldwide Asset and Liability Management, edited byW.T. Ziemba and J.M. Mulvey, Cambridge University Press, Cambridge, 1998.

[3] Damodaran, Aswath The Promise and Peril of Real Options, Stern School of Business, New York Univer-sity, New York, Unpublished Manuscript.

[4] Day, Alastair L. Mastering Risk Modelling, Pearson Education Limited, Great Britain, 2003.

[5] Ho, Thomas S. Y. and Lee, Sang Bin The Oxford Guide to Financial Modeling: Applications for CapitalMarkets, Corporate Finance, Risk Management, and Financial Institutions, Oxford University Press, Inc.,New York, 2004.

[6] IMSM Example Cherokee Model, Microsoft Excel Workbook for Texas Project, Cherokee Investment Part-ners, June 2007.

[7] L. Lamport, LATEX: A Document Preparation System, Addison-Wesley Publishing Company, New York,1994.

[8] Ziemba, William T. and Mulvey, John M. Asset Liability Managment Systems for Long-Term Investors:Discussion of the Issues, Worldwide Asset and Liability Management, Cambridge University Press, Cam-bridge, 1998.

8

Page 27: Industrial Mathematical and Statistical Modeling Workshop for

This manuscript appears in the proceedings of the 2007 Industrial Mathematical and Statistical ModelingWorkshop for Graduate Students held at the Center for Research in Scientific Computation at North CarolinaState University, July 23-31, 2007.

www.ncsu.edu/crsc/imsm

Jeffrey S. ScroggsFinancial MathematicsNorth Carolina State UniversityBox 7640Raleigh, NC [email protected]

Megan BuzbyColorado State [email protected]

Field CadyUniversity of [email protected]

Alaina HoumardUniversity of North Carolina WilmingtonWilmington, NC [email protected]

Shuang JiUniversity of Texas [email protected]

Hoi Tin KongUniversity of [email protected]

Don Kumudu MallawaarachchiTexas Tech [email protected]

Charles AtkinsCherokee Investment Partners

9

Page 28: Industrial Mathematical and Statistical Modeling Workshop for

PROBLEM 3: Analyzing Papilloma Data in Mice

Kuan-yu Chen1, Kristen Dang2, Julia Eaton3, Sandra Hurtado Rua4, Yogesh Joshi5, Xiaoyu Mu6

Problem Presenter:

Marjolein Smith, Constella Group, Research Triangle Park, North Carolina.

Abstract

Mathematical modelling of physiological systems is playing an increasingly important role in toxicologystudies. Typically, these studies require the use of many experimental animals and are thus costly andtime-consuming. Models can help with both problems by allowing an investigator to conduct hypothesistesting with a model that is based on parameters measured from biological systems. Specifically, for thisproject, the Constella Group wished to design a model that would predict incidence of papillomas in micebased on the dermally- or orally-applied dose of 2,3,7,8-Tetrachlorodibenzo-p-Dioxin (TCDD). We developeda differential equation model to describe the concentration of TCDD in various body tissues over time. Theconcentration predictions were then used in a stochastic model of papilloma formation, where the expectednumber of papillomas is assumed to be a Poisson random variable. This variable is a linear function of theskin concentration of TCDD. We show that, using our model predictions for skin concentration of TCDD,there is a consistent relationship between drug dose and papilloma incidence, regardless of the route of drugadministration.

1 Introduction

Many toxicology studies use mice as models for the effects of chemicals on mammalian systems. The typ-ical experiment requires the sacrifice of many mice in order to have a complete experimental design thatincludes both genders, 3-4 chemical dose levels, and 50 replicates over 2 years. Industry trends toward usingfewer animals and decreasing the costs of these experiments are leading to increased reliance on mechanisticmath models that can be used to help design mouse experiments and test physiological hypotheses. Thesemodels, typically manifest as differential equation models, are known in industry as Physiologically-BasedPharmacoKinetic (PBPK) models.

The Constella Group uses PBPK models among other techniques in their work on health-related re-search. As part of the National Institute of Environmental Health Sciences’ National Toxicology Program,they conducted a mouse model study of the relationship of dose levels to papilloma formation caused by2,3,7,8-Tetrachlorodibenzo-p-Dioxin (TCDD), a ubiquitous environmental contaminant[7]. This study usedthe transgenic Tg.AC mouse, which has been genetically altered such that all cells are in an initiated, pre-cancerous state. TCDD was administered dermally by skin painting and orally by gavage in separate groupsof mice in order to compare the effect of route of exposure. In total, there were 13 different treatment groups:9 groups of 20 mice had part of their skin painted at different doses and 4 groups were given oral doses. Theexperiment lasted 26 weeks.

Data output from the experiment includes the mean body weights and concentration of TCDD in the skin,liver, and fat for each treatment group. These measurements were taken at the end of the study. We also havethe weekly papilloma counts for the individual animals in all groups.

Here we develop a PBPK differential equation model to describe the concentration of TCDD in variousbody tissues throughout the course of the experiment. We use published parameter values where possible;others are estimated to fit the observed concentrations of TCDD in the skin, fat, and liver tissue at the end of

1National Chiao Tung University, Taiwan2University of North Carolina at Chapel Hill3University of Washington4University of Connecticut5New Jersey Institute of Technology6University of Tennessee

1

Page 29: Industrial Mathematical and Statistical Modeling Workshop for

the experiment. Next, we model papilloma growth using an stochastic model. Here, we assume that the totalnumber of papillomas on treatment j, j = 1, . . . , 13, follow a Poisson distribution with mean λi(t), where trepresents the time in weeks t = 0, 1, · · · , 26. The log linear generalized linear model was used to model themean of the total number of papillomas per treatment versus the dioxin concentration in the skin at time t.

Finally, we were interested in ascertaining how much of the TCDD was retained in these tissues relative tothe total amount administered on average. Due to limitations in the data, the computation that ensued gaveus a rough idea, at best, as to the retention of the TCDD.

2 Methods

2.1 An example toy model

A simple PBPK model is based on the following system:

1C

2C12

r!!"

21r

#!!

Figure 1: Simple model of two tissues.

In this model, a change in amount of substance in tissue C1 is calculated by the difference between theamount coming in and the amount going out. We can derive simple differential equations to calculate theamount in tissue C1 and C2 that account for the difference in inputs and outputs:

dC1dt = −r12C1(t) + r21C2(t)

dC2dt = −r12C1(t) + r21C2(t)

In this model, r12, r21 represent the rates of movement between the boxes and have units of 1/t.

2.2 The PBPK Model

2.2.1 Model description and equations

We developed a model to describe the movement of TCDD throughout the mouse’s body based on prior workon similar models [3]. We consider the concentration of TCDD in the blood, liver, fat, skin, gastrointestinal(GI) tract, and another generic compartment that represents other parts of the body. We assume that all thechanges of amount of TCDD in a given tissue compartment depend on the blood flow into or out of the partof the body, applied drug doses, and metabolism.

The dermal drug application is represented as a source term in the change of the concentration in the skin.Since the chemicals painted on the skin may not be totally absorbed, we model a skin absorption rate. Theoral drug application is represented as a source term in the gastrointestinal system, which also includes anabsorption rate for a similar reason as in the dermal test.

We separate the skin into two parts: the TCDD-painted part and the remainder. We approximate thepainted part to be about 5 percent of the whole skin. In oral drug application, both parts should have similarconcentrations of TCDD, since there is no other source term on the skin.

We model 80 percent of the blood flow to liver as coming from the capillary blood of the GI tract, and theother 20 percent from the usual blood flow. Based on the measurements from [7], we assume that TCDD isaccumulated in the liver and fat tissue. Hence we may assume the amount flowing into these tissues is largerthan that flowing out and the respective rate parameters for in and out flow will be different. Conversely, inthe GI tract, we assume the rate of outflow is larger than that inflow.

2

Page 30: Industrial Mathematical and Statistical Modeling Workshop for

Figure 1. Schematic drawing of the PBPK model used in this study

Blood

Fat

Other

Parts of

Body

Liver GI

Tract

Normal

Skin

Dose-

Painted

Skin

Oral

Dose Dermal

Dose

Figure 2: Schematic diagram of the PBPK model.

Except for the GI tract, liver, and fat, the change rate for other tissues depends on the difference ofconcentration between blood and the tissue. We assume the other parameters for the inward blood flow arethe same to those for the outward flow.

We derive the change rate from the simple physical law:

The change rate of TCDD = (Some constant) · (Total blood flow) · (Flow fraction in the part)· (Concentration of TCDD in the part) · (Blood density),

where Total blood flow = (Cardiac output) * (Body weight)0.75.

The change rate is obtained by:

The change rate of TCDD = lim∆t→0

1∆t

(TCDD amount at the time (t + ∆t)− TCDD amount at the time (t))

= lim∆t→0

1∆t

(Body weight) · (Weight fraction of the part)

·(Blood weight fraction in the part)·(Concentration at the time (t + ∆t)− Concentration at the time (t))

Hence we have

The change rate of TCDD =(Body weight) · (Weight fraction of the part)

·(Blood weight fraction in the part) · dC

dt

where C represents the concentration.Denote Wfrac as the blood weight fraction, Bfrac as the blood fraction, ρ is the blood density, and cd is

the cardiac output. Note that the total blood flow is given by the cardiac output multiplying the body weightto the power 0.75, as the literature mentions. By dividing the body weight on both sides, the whole equationis written as:

WfracdC

dt= Constant ·BfracCρ

cd

(Body weight)0.25+ (source terms)− (metabolism).

3

Page 31: Industrial Mathematical and Statistical Modeling Workshop for

Now, by the diagram of PBPK model, we may write down all of the equations as

WBdCB

dt= −(qBF + qBG + qBL + qBP + qBR + qBS)CB+

qFBCF + qGBCG + qLBCL + qPBCP + qRBCR + qSBCS

WFdCF

dt= qBF CB − qFBCF

WGdCG

dt= oral ∗ absorption rate + qBGCB − CG(qGB + qGL)

WLdCL

dt= qBLCB + qGLCG − qLBCL −Metabolism

WPdCP

dt= dermal ∗ absorption rate + qBP CB − qPBCP

WRdCR

dt= qBRCB − qRBCR

WSdCS

dt= qBSCB − qSBCS

Here, CB , CF , CG, CL, CP , CR, CS are the concentration in blood, fat tissue, GI tract system, liver, dose-painted skin, other parts of the body, and the normal skin, respectively.

The weight fraction parameters are listed in Table 1. All parameters were from an allometric conversionof values reported in published works (M. Kohn, personal communication). The weight fraction parameterWX for part X of the body refers to the data of physical facts for female mice. Note that WR is calculatedby weight average of brain, muscle, kidney, lung, bone, and other viscera. WP is about 1/20 of the whole skinsince the dose-painted part of the skin is near 5 percent.

Table 1: Weight fraction parametersparameter part of the body value

WB blood 0.054WF fat 0.000819WG GI tract 0.001266WL liver 0.017019WP dose-painted skin 0.00024795WR other parts of body 0.036726WS normal skin 0.00471105

The flow coefficients are listed in Table 2. All parameters except qBF , qBL, and qGB were taken froman allometric conversion of values reported in published works (M. Kohn, personal communication). Theremainder were estimated to fit the data. The notation qXY represents the blood flow fraction from com-partment X to compartment Y, the density of blood, cardiac output, and the body weight. We assume thatqBP , qBS , qPB , qSB are equal since all of them are skin and they would have the same properties. We alsoassume that qBF is larger than qFB and qBL is larger than qLB by considering the accumulation of TCDD infat and liver respectively. We assume that qGL is four times of qBL according to the literature mentioned above.

The blood density is approximated by 1.05 g/mL. The cardiac output is 16.5 L/hr/g0.75. We estimatethe absorption rate through the skin to be 0.75 and for absorption through the stomach to be 0.9. The rateof metabolism is following by the Michaelis-Menton model:

Metabolism rate =1.117 ∗ CL

1.306 + CL

where the number 1.117 and 1.306 are from the literature.

4

Page 32: Industrial Mathematical and Statistical Modeling Workshop for

Table 2: Flow coefficients (multiplied by bodyweight0.25)parameter relation value

qBF blood to fat 0.04244625qBG blood to GI tract 0.02442825qBL blood to liver 0.06237qBP blood to dose-painted skin 0.0100485qBR blood to other parts of body 0.09927225qBS blood to normal skin 0.0100485qGL GI tract to liver 0.24948qFB fat to blood 0.0121275qGB GI tract to blood 0.17099775qLB liver to blood 0.003465qPB dose-painted skin to blood 0.0100485qRB other parts of body to blood 0.09927225qSB normal skin to blood 0.0100485

2.2.2 Numerical method for PBPK model

We adapt Forward-Euler method for simplicity. Let C be the vector which consists of the concentration ofTCDD in the blood, fat, GI tract, liver, dose-painted skin, other parts of body and normal skin in order, i.e.

C = [CB , CF , CG, CL, CP , CR, CS ]T

Let W be the diagonal matrix with the entries on the diagonal corresponding to the weight fraction of eachpart, respectively, i.e.

W =

WB

WF

WG

WL

WP

WR

WS

Let Q be the matrix with the entries, which are related to flow coefficients and positioned by the blood flowpath, i.e.

Q =

−qBF − qBG − qBL − qBP − qBR − qBS qFB qGB qLB qPB qRB qSB

qBF −qFB

qBG −qGB − qGL

qBL qGL −qLB

qBP −qPB

qBR −qRB

qBS −qSB

and let the source and metabolism term be

DM = [0, 0, 0.9 ∗ (oral dose),1.117 ∗ CL

1.306 + CL, 0.75 ∗ (dermal dose), 0, 0]T .

Then the numerical scheme for the PBPK model can be written as

W ∗ Cn+1 − Cn

∆t= Q ∗ Cn + DM.

5

Page 33: Industrial Mathematical and Statistical Modeling Workshop for

for the Forward-Euler method. The notation Cn refers to the concentration at time (n ∗∆t). The matrices Qand the vector DM have to be modified for characteristic time ∆t. We pick up ∆t as a small fraction of timeunit which is in hours, since the cardiac output has the unit L/hr/g0.75. The initial data for concentration isset to zero.

Since the experiment was executed for a twenty-six week period, we set our final time Tmax = 26. Thetime step size ∆t = 1/16800 = 0.000059523809524 is chosen to be 1/100 hour. The doses applied per day areas stated in [7]:

For dermal application = 0, 2.1, 7.3, 15, 33, 52, 71, 152, 326ng/kg/day

For oral application = 0, 75, 321, 893ng/kg/day.

2.3 Estimation of weight with respect to time

For both the system of differential equations in section 2.2.2 and for an analysis of TCDD retention insection 2.5, it is necessary to understand how the weights of the mice varied over time.

Due to the special circumstances of this experiment, the values of the weekly average weights of the micewere included neither in the published papers nor the additional data made available to us[2]. We only havedata on the final average weights of the mice. To find an ad hoc estimate for this function, our supervisorprovided us with data on the average weekly weights from a different data set taken from female Tg.AC micewho grew to be roughly the same weight as those in this experiment over the same length of time. The differentdata set consisted of average weights of two groups (sizes not included) of female Tc.AG mice from weeks 1through 25. From[7], we can compute the average final weights of the dermal and oral groups.

We estimate the relationship between body weight w, indicator variables Ij where Ij = 0 if the jth groupis a dermal treatment group and Ij = 1 if the jth group is an oral treatment group, and time t by a linearmodel in w, using the linear least squares technique. The best model is given by

wt,j = α0 + α1 log(t) + α2Ij ,

where t = 1, . . . , 26 is measured in weeks and j = 1, 2, . . . , 13 represents the thirteen treatment groups (9dermal and 4 oral).

The fitted model is:

wt,j = 17.56070 + 2.96503log(t) + 0.56800Ij .

Below are the statistics of this fit.

ANOVA TABLEEstimate Std. Error t value Pr(> |t|)

Intercept 17.56070 0.22330 78.643 < 2e− 16 ***log(t) 2.96503 0.08614 34.420 < 2e− 16 ***

I 0.56800 0.14083 4.033 0.000201 ***

2.4 Stochastic Model for Papilloma Counts

Let Yj(t) be counts of total number of papilloma’s cells on treatment j at time t where j = 1, · · · , 11, andt = 0, 1, · · · , 26. Here, treatment refers to the variable used to classufy the counts given doses and protocol(dermal or oral).

The Poisson sampling model for counts assume that the total number of papilloma’s cells on treatment j attime t are independent random variables with a poisson distribution with mean E[Yj(t)] = V ar[Yj(t)] = λj(t),where t represents the time in weeks.

The Poisson distribution is used for counts of events that occurs randomly over time or space, whenoutcomes in disjoint periods are independent. In our case, assuming independence over disjoint periods isreasonable, however assuming the mean and variance are the same could be incorrect. Nonetheless for this

6

Page 34: Industrial Mathematical and Statistical Modeling Workshop for

analysis we assume the papilloma counts behave as Poisson random variables mean E[Yj(t)] = V ar[Yj(t)] =λj(t).

The log linear generalized linear model was used to model the mean of the total number of papillomas,Yj(t), as function of the dioxin concentration in the skin CSKIN

j (t) at time t, and the treatment effect j.The generalized linear model specification is listed below:

(1) The Random Component: Yj(t) ∼ Poisson(λj), where λj = λj(t). That is, the mean of theexponential random variable λj(t) is a function of time t and treatment j. The probability mass functionis given by:

P (Yj(t)|λj(t)) =e−λj λ

Yj

j

Yj != e−λj eYj log λj Yj !−1.

For Yj = 0, 1, 2, · · ·

(2) The systematic Component: The systematic component is a linear predictor as a function of theexplanatory variables CSKIN

j (t), and the treatment which is the dioxin concentration in the skin foreach protocol (dermal, oral). For the treatment j, Ij = 1 if the observation correspond to treatment j,zero otherwise, where j = 1, . . . , 11, and t = 0, 1, . . . , 26. The proposed systematic component is givenby:

η = β0 + β1CSKINj (t) + β2,jIj + β3,jC

SKINj (t)Ij .

(3) The Link function: The link function, g(.), used here is the canonical link function widely used in theliterature for Poisson regression models: g(λj(t)) = log (λj(t)). For further references see[6, 4].

In summary, the proposed model is given by:

log (λj(t)) = β0 + β1CSKINj (t) + β2,jIj + β3,jC

SKINj (t)Ij = x′jβ

j ,

Where

• Yj(t) ∼ Poisson(λj).

• The matrix of explanatory variables X = [xjk], which dimension is 297 × 4 , contains the informationabout:

– CSKINj (t) is the dioxin concentration in the skin at time t, for the treatment j,

– Ij = 1 if the observation corresponds to treatment j, zero otherwise, andFor j = 1, . . . , 11, and t = 0, 1, . . . , 26.

The logarithm of the log likelihood function is given by:

l (Yj(t)|λj(t)) =297∑t=1

3∑k=0

11∑j=1

Yj(t)Xj,kβj −297∑t=1

exp(x′jβj

).

We estimate the parameters using the iteratively reweighed least squares technique (IRLS), see[6, 4].

2.5 TCDD retention in certain tissues

The calculation involved the following considerations.As the dose administered to each mouse was dependent upon its weight, we need to understand how the

weight of the mice varied over time. This problem is addressed in section 2.3.For each dosage group within the dermal and oral studies, we computed the dosage (in ng) per day given

the corresponding function representing the weight of the mice over time. Summing, we obtained a roughestimate for the total mass of TCDD administered to each treatment group over the 26 week period.

The second main component to this computation was estimating the amount of TCDD in the tissues forwhich the final concentrations of the compound were already given. The average absolute liver weights were

7

Page 35: Industrial Mathematical and Statistical Modeling Workshop for

given in [7], so estimating the average weight of TCDD retained in the liver for each treatment group wasstraightfoward.

For the fat and skin, we were given a table of well-established weight fractions for female mice: skin weightratio of 0.1653 and an fat weight ratio of 0.063. Using these and the average final weights of the mice, weestimated the average final weights of the skin and fat. In addition, in the dermal case we assumed thatthe fraction of the skin weight corresponding to the painted area was 5% of the total weight of the skin (seesection 2.2.1).

3 Results

3.1 Results for the PBPK Model

In comparing the predictions of our model to the observed data (Figures 3-5), we see a good fit for theresults in the skin tissue, both painted and normal, (ρ = .98, p-value = 1.3e-05 and ρ = .98, p-value = .0004,respectively) and the fat tissue (ρ = .99, p-value = 6.7e-07). The fit is less good for the liver data (ρ = .93,p-value = .005). Our predictions about the concentration seems to be proportional to the dose given. Theconcentration in the dose-painted skin initially increases very fast. Since the flow coefficient from skin to bloodis quite small (Table 2), the dose accumulation in painted skin is faster than its diffusion in the initial stage.After the blood transports a large enough amount of TCDD to other parts of the body, the dose accumulationis slower. A similar pattern is observed for the normal skin tissue.

In the oral dose experiment, since we have fewer points to fit, the predicted results are a good fit to theobserved data for skin, fat, and liver (Figures 6-8). Because there are so few data points, we did not conducta correlation analysis for this data. Again, the model predicts that the concentration increases proportionallyto the dose given. Since the flow coefficient from the GI tract to the blood is quite large compared to thecoefficient from the skin to blood (Table 2), the TCDD travels through the body faster than in the dermalexperiment. The concentration in the skin shows a more linear relationship with time in the oral experimentthan in the dermal experiment.

8

Page 36: Industrial Mathematical and Statistical Modeling Workshop for

10!4 10!3 10!2 10!1101

102

103

104

105

average daily dose

dose!p

aint

ed s

kin

TCD

D

our predictexperiment data

10!4 10!3 10!2 10!1101

102

103

104

average daily dose

norm

al s

kin

TCD

D

our predictexperiment data

Figure 3: Predicted and observed average endpoint concentrations of TCDD versus dose in painted (left) andnormal (right) skin in the dermal application experiment.

9

Page 37: Industrial Mathematical and Statistical Modeling Workshop for

10!4 10!3 10!2 10!1102

103

104

105

average daily dose

liver

TC

DD

our predictexperiment data

10!4 10!3 10!2 10!1101

102

103

104

average daily dose

fat T

CD

D

our predictexperiment data

Figure 4: Predicted and observed average endpoint concentrations of TCDD versus dose in liver (left) and fat(right) tissue in the dermal application experiment.

10

Page 38: Industrial Mathematical and Statistical Modeling Workshop for

0 5 10 15 20 25 300

2000

4000

6000

8000

10000

12000

week

dose!p

aint

ed s

kin

TCD

D

predict TCDD in dose!painted skin

0 5 10 15 20 25 300

500

1000

1500

2000

2500

week

norm

al s

kin T

CDD

predict TCDD in normal skin

Figure 5: Predicted concentration of TCDD over the course of the experiment in the painted (left) and normal(right) skin when applied at the rate of 326 ng/kg/day.

11

Page 39: Industrial Mathematical and Statistical Modeling Workshop for

10!3 10!2 10!1102

103

104

average daily dose

skin

TC

DD

our predictexperiment data

Figure 6: Predicted and observed average endpoint concentrations of TCDD versus dose in normal skin in theoral application experiment.

12

Page 40: Industrial Mathematical and Statistical Modeling Workshop for

10!3 10!2 10!1103

104

105

106

average daily dose

liver

TC

DD

our predictexperiment data

10!3 10!2 10!1103

104

105

average daily dose

fat T

CD

D

our predictexperiment data

Figure 7: Predicted and observed average endpoint concentrations of TCDD versus dose in liver (left) and fat(right) tissue in the oral application experiment.

13

Page 41: Industrial Mathematical and Statistical Modeling Workshop for

0 5 10 15 20 25 300

1000

2000

3000

4000

5000

6000

7000

week

skin

TCD

D

predict TCDD in skin

Figure 8: Predicted concentration of TCDD in skin over the course of the experiment when applied at therate of 893 ng/kg/day.

3.2 Results for Papilloma Counts

We estimate the parameters using iteratively reweighed least squares technique (IRLS) in R. The data were fitto the model described in section 2.4 separately for each dose within each treatment type (dermal, oral). Thisresulted in estimation of 11 values for β0 and for β1. The fit of the data to the model is shown separately foreach dose/treatment in Figures 9-10, where the x-axis shows the predicted concentration in the skin, derivedfrom the results of our PBPK model.

Finally, we plotted the model parameters, β0 and β1 for all models against drug dose (Figure 11). Notethat for the dermal and oral protocol, the intercept of the linear systematic component of the model increasesas doses increases, which is expected since it must be a higher initial count of papillomas for high doses. Weobserved also that the slope of the of the linear systematic component for dermal protocol decreases as doseincreases which suggests that time plays less of a role in the formation of papillomas for higher dosages thanat lower dosages–at higher dosages the skin reaches saturation faster.

We can observe that the intercept of the linear systematic component for dermal and oral protocol havesimilar slopes for comparable doses.

The parameter estimates are listed below:Table of Parameters

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -7.94761 2.82676 -2.812 0.00493 **treatment2 -0.18450 3.42335 -0.054 0.95702treatment3 -0.88453 3.03957 -0.291 0.77105treatment4 2.28114 2.90093 0.786 0.43166treatment5 4.84908 2.84930 1.702 0.08878 .treatment6 5.27172 2.84608 1.852 0.06399 .treatment7 5.97703 2.83561 2.108 0.03504 *treatment8 8.03507 2.82985 2.839 0.00452 **treatment9 -0.69776 4.48344 -0.156 0.87632treatment10 -1.08117 4.04651 -0.267 0.78933

14

Page 42: Industrial Mathematical and Statistical Modeling Workshop for

treatment11 -0.03826 3.17358 -0.012 0.99038c.t 0.46537 0.17060 2.728 0.00637 **treatment2:c.t -0.31079 0.17377 -1.789 0.07369 .treatment3:c.t -0.37201 0.17084 -2.177 0.02945 *treatment4:c.t -0.43276 0.17062 -2.536 0.01120 *treatment5:c.t -0.44906 0.17060 -2.632 0.00848 **treatment6:c.t -0.45410 0.17060 -2.662 0.00777 **treatment7:c.t -0.46003 0.17060 -2.697 0.00701 **treatment8:c.t -0.46344 0.17060 -2.717 0.00660 **treatment9:c.t -0.44859 0.17074 -2.627 0.00861 **treatment10:c.t -0.46102 0.17061 -2.702 0.00689 **treatment11:c.t -0.46374 0.17060 -2.718 0.00656 **---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 14121.65 on 296 degrees of freedomResidual deviance: 767.95 on 275 degrees of freedomAIC: 1390.0

Note that 16 of the 22 parameter estimates are significantly different from zero at 5% significant level.Fitted Model

Treatment 1: dose: 2.1 and protocol:dermal

log (λ1(t)) = −7.9476 + 0.4654CSKIN1 (t)

Treatment 2: dose:7.3 and protocol:dermal

log (λ2(t)) = −8.1321 + 0.1545CSKIN2 (t)

Treatment 3: dose:15 and protocol:dermal

log (λ3(t)) = −8.8321 + 0.0933CSKIN3 (t)

Treatment 4: dose: 33 and protocol:dermal

log (λ4(t)) = −5.6664 + 0.0326CSKIN4 (t)

Treatment 5: doses: 52 and protocol:dermal

log (λ5(t)) = −3.0985 + 0.0163CSKIN5 (t)

Treatment 6: dose: 71 and protocol:dermal

log (λ6(t)) = −2.6758 + 0.0113CSKIN6 (t)

Treatment 7: dose: 152 and protocol:dermal

log (λ7(t)) = −1.9705 + 0.0053CSKIN7 (t)

Treatment 8: dose: 326 and protocol: dermal

log (λ8(t)) = 0.0874 + 0.0019CSKIN8 (t)

Treatment 9: dose:75 and protocol:oral

log (λ9(t)) = −8.6453 + 0.0167CSKIN9 (t)

15

Page 43: Industrial Mathematical and Statistical Modeling Workshop for

Treatment 10: dose:321 and protocol:oral

log (λ10(t)) = −9.0287 + 0.0043CSKIN10 (t)

Treatment 11: dose:893 and protocol:oral

log (λ11(t)) = −7.9858 + 0.00163CSKIN11 (t)

3.3 TCDD Retention

For the functional relationship between weight (in grams) and time (in weeks) for the dermal group, weobtained

weight(t) = 17.56070 + 2.96503 ∗ log(t).

For the oral group, we obtained

weight(t) = 17.56070 + 0.56800 + 2.96503 ∗ log(t).

Final TCDD amounts (in ng) in certain tissues, dermal study:

Dose Liver Remote skin Local skin Fat Total2.1 0 0 0.05±0.03 0 0.05±0.037.3 0 0 0.08±0.05 0.35±0.07 0.43±0.1115 0.27±0.25 0.88±0.15 0.13±0.06 0.60±0.32 1.88±0.7933 0.81±0.79 1.35±0.73 0.31±0.18 1.12±0.58 3.59±2.2852 1.14±1.35 1.42±0.70 0.36±0.33 1.54±0.65 4.47±3.0471 4.77±11.01 2.03±1.53 0.70±0.65 2.12±1.80 9.62±14.99152 86.69±55.50 3.58±1.13 0.71±0.45 5.89±1.84 96.87±58.91326 143.73±80.14 12.30±6.74 2.44±1.46 13.92±6.07 172.38±94.41

Final TCDD amounts (in ng) in certain tissues, oral study:

Dose Liver Skin Fat Total75 2.78±1.53 1.87±0.77 2.02±0.71 6.67±3.01321 112.09±69.42 12.55±7.19 16.40±12.93 141.03±89.54893 250.74±138.99 27.52±13.90 45.82±15.86 324.08±168.74

Estimated TCDD delivered on average in ng, dermal study:

Dose TCDD (ng) % retained by liver,fat,skin Error* (%)2.1 9.2926 0.58 0.327.3 32.3028 1.34 0.3515 66.3756 2.83 1.1833 146.0262 2.46 1.5652 230.1019 1.94 1.3271 314.1777 3.06 4.77152 672.6057 14.40 8.76326 1442.5622 11.95 6.54

*error not including that of weight estimates

Estimated TCDD delivered on average in ng, oral study:Dose TCDD (ng) % retained by liver,fat,skin Error* (%)75 331.8778 2.01 0.91321 1420.4370 9.93 6.30893 3951.5584 8.20 4.27

16

Page 44: Industrial Mathematical and Statistical Modeling Workshop for

*error not including that of weight estimates

We were unable to obtain an explicit functional relationship between the percentage retained by thesetissues and the dose levels. Also, due to the ad hoc weight estimate we made (whose error is not includedin the above errors), the actual fractional amounts retained by these organs could differ more than what theabove percent error accounts for. The general trend seems to be that the tissues retain more TCDD at higherdose levels. Also, much of the TCDD accumulated in other tissues or was expelled from the system, since atmost 15% of the total amount administered was retained in the liver, skin, and fat (dose 152, dermal study).

4 Discussion

We have developed a mechanistic-stochastic hybrid model that predicts incidence of papillomas based onapplication of a particular dose of TCDD. Our model shows a reasonably good fit to experimental dataavailable, including predictions of concentration of TCDD in skin, fat, and liver tissue, as well as incidence ofpapillomas, for both dermally and orally administered drug regimens.

Our PBPK model required relatively few assumptions and relied largely on experimentally-measured pa-rameters. The close match of our predictions to the observed data suggests that our estimated absorptionrates in the skin and GI tract are reasonable, as are the estimated differences in the flow coefficients betweenblood and the GI tract. On the other hand, our fit to the observed TCDD concentrations in liver might beimproved by altering the metabolism function, or by modeling additional physiological detail.

The stochastic model employs a log-linear generalized linear model to describe the relationship of papillomacount to skin concentration of TCDD. The success of this part of the model is dependent on both the accuracyof the PBPK model in predicting skin concentration and the fit of the observed data (papilloma counts) tothe model. Except for the lowest doses of drug, we see that the model is a good fit to the observed papillomadata, which suggests that both assumptions are valid.

We also observe that there appears to be a consistent relationship between dose and papilloma counts, asmodeled using our estimated parameters. This suggests that a single function could be fit to the relationshipbetween dose and the stochastic parameters and could be used to predict papilloma counts at other, non-testeddoses of drug. While the intercept term appears to increase with dose, reaching a plateau and medium doses,the relationship of slope to dose is a decreasing function. The relationship could suggest that above a certainconcentration of TCDD, the incidence of papillomas does not increase – that the chemical has saturated itsbiological receptors.

Because this part of our model is explicitly modelled as a Poisson variable, future work could developconfidence intervals on the parameters and predictions of the model. We may also want to consider tests ofthe data to ensure that the distribution is best modelled by a Poisson variable, since currently we assumethat the counts per week are independent events and that the mean and variance of the counts per weekare the same. We could also use the model to test whether the TCDD is distributed evenly throughout theskin. Currently, we use the weighted average of the concentrations in normal and painted skin as input to thestochastic model. To verify that the chemical is spread evenly throughout these separate portions, we couldinput their values separately into the stochastic model and estimate a compound slope that is equal to thesum of their individual fitted slope parameters.

Regarding actual final amounts of TCDD in the skin, liver, and fat, it appears that similar fractions of thedrug are retained in the conglomerate of these tissues among similar dose levels. Also, in both the dermal andoral studies, the greatest fraction was retained in the liver. An interesting question, beyond the scope of thisstudy and the motivation for the original problem, would be to study the formation of precancer growths inthe liver.

Further analysis will include looking at the fraction of TCDD retained in each individual tissue against theconcentration in these Finally, we could implement a more sophisticated fit of body weight to time, perhapsusing interpolation. These details may refine our fit in both the PBPK and stochastic parts of the model, butwe assume the qualitative results will remain the same.

17

Page 45: Industrial Mathematical and Statistical Modeling Workshop for

References

[1] Van Birgelen, A., Johnson, J., Fuciarelli, A., Toft, J., Mahler, J., Bucher, J. Dose- and time-response ofTCDD in Tg.AC mice after dermal and oral exposure Dioxin99: International Symposium on HalogenatedEnvironmental Organic Pollutants and Pops, Italy, 1999, 42, 235-239.

[2] Brooks, E., Kohn, M., van Birgelen, A., Lucier, G., Portier, C. Stochastic Models for Papilloma FormationFollowing Exposure to TCDD, Organohalogen Compounds 1999, 41, 522-524.

[3] Cole, C., Tran, H., and Schlosser, P. Physiologically Based Pharmacokinetic Modeling of BenzeneMetabolism in Mice through extrapolation from in vitro to in vivo Journal of Toxicology and Enviro-mental Health, Part A, 62:439-465,2001.

[4] Dey, D., and Ravishanker, N. A first Course in Linear Model Theory Chapman and Hall, 2002.

[5] Dunson, D., Haseman, J., van Birgelen, A., Stasiewiecz, S., Tennant, R. Statistical Analysis of SkinTumor Data from Tg.AC Mouse Bioassays Toxicological Sciences, 55, 293-302, 2000.

[6] McCullagh, P., and Nelder, J. Generalized Linear Models Chapman and Hall, 1989.

[7] Wyde, M., Braen, A., Hejtmancik, M., Johnson, J., Toft, J., Blake, J., Cooper, S., Mahler, J., Vallant,M., Bucher, J., Walker, N. Oral and Dermal Exposure to 2,3,7,8-Tetrachlorodibenzo-p-Dioxin (TCDD)Induces Cutaneous Papillomas and Squamous Cell Carcinomas in Female Hemizygous Tg.AC TransgenicMice. Toxicological Sciences, 82, 34-45, 2004.

[8] Zager, M., Tran, H., Schlosser, P. A Delayed Nonlinear PBPK Model for Genistein Dosimetry in RatsBulletin of Mathematical Biology, 69, 93-117, Jan 2007, Springer, New York.

18

Page 46: Industrial Mathematical and Statistical Modeling Workshop for

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Papilloma counts versus c(t)Dermal!1

c(t)

papi

lom

a co

unts

observedfitted

0 10 20 30 40 50 60

01

23

45

Papilloma counts versus c(t)Dermal!2

c(t)

papi

lom

a co

unts

observedfitted

0 20 40 60 80 100 120

05

1015

2025

30

Papilloma counts versus c(t)Dermal!3

c(t)

papi

lom

a co

unts

observedfitted

0 50 100 150 200 250 300

05

1015

2025

3035

Papilloma counts versus c(t)Dermal!4

c(t)

papi

lom

a co

unts

observedfitted

0 100 200 300 400

010

2030

4050

Papilloma counts versus c(t)Dermal!5

c(t)

papi

lom

a co

unts

observedfitted

0 100 200 300 400 500 600

010

2030

4050

60

Papilloma counts versus c(t)Dermal!6

c(t)

papi

lom

a co

unts

observedfitted

0 200 400 600 800 1000

050

100

150

Papilloma counts versus c(t)Dermal!7

c(t)

papi

lom

a co

unts

observedfitted

0 500 1000 1500 2000 2500

050

100

150

200

Papilloma counts versus c(t)Dermal!8

c(t)

papi

lom

a co

unts

observedfitted

Figure 9: Fitted and observed papilloma counts versus predicted TCDD skin concentration for the dermalexperiment.

19

Page 47: Industrial Mathematical and Statistical Modeling Workshop for

0 100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

Papilloma counts versus c(t)Oral!1

c(t)

papi

lom

a co

unts

observedfitted

0 500 1000 1500 2000

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Papilloma counts versus c(t)Oral!4

c(t)

papi

lom

a co

unts

observedfitted

0 1000 3000 5000

05

1015

Papilloma counts versus c(t)Oral!3

c(t)

papi

lom

a co

unts

observedfitted

Figure 10: Fitted and observed papilloma counts versus predicted TCDD skin concentration for the oralexperiment.

0 200 400 600 800

!8!6

!4!2

0

Intercept versus dose

dose

b0

0 200 400 600 800

0.00

0.05

0.10

0.15

Slope versus dose

dose[2:11]

b1[2:11]

Figure 11: Model parameters versus dose for all treatments, with the intercept, β0, on the left, and the slope,β1, on the right.

20

Page 48: Industrial Mathematical and Statistical Modeling Workshop for

Modeling Probability of Drinking Water Contamination

Presenter: Michael MessnerFaculty Mentor: Mette Olufsen

Tianbing Chen, Asha Dixit, Christina Ho, Jim Reale-Levis, Yijiang Li, Sarah Olson

IMSM Workshop, July 2007

1 Introduction

The drinking water in the United States is among the safest in the world, but risks ofwaterborne illness remain. During the 1990’s, there was nearly sixteen outbreaks of water-borne illnesses reported annually. These reported outbreaks caused hundreds of thousandsof individual cases of waterborne illness. Reported illnesses represent a small fraction ofoverall effects of waterborne illnesses, since most go unreported. Reported illness numbersonly serve as a minimum estimate of the total effects of contaminated drinking water. Thevast majority of the reported illnesses were due to a single Cryptosporidium outbreak inMilwaukee in 1993 [5]. The outbreak was caused by an ineffective filtration process in twomunicipal water treatment plants. The outbreak caused illness in over 400,000 individuals.The total illness related cost of that outbreak was estimated at over $91 million [1]. Thehigh cost of waterborne illness reinforces the need for appropriate safeguards in the form ofwater quality monitoring.

Waterborne illnesses can be caused by a variety pathogens. Cryptosporidium is an ex-ample of a parasitic protozoan. Parasitic protozoa are known to have accounted for 21% ofreported outbreaks in the US in 1990s. Bacteria, like the well known E. Coli., accountedfor 18% and viruses another 6%. The rest were either caused by chemical or undeterminedagents. Waterborne pathogens generally cause gastrointestinal illness resulting in diarrhea,cramping, nausea, and/or vomiting.

Cryptosporidium and E.coli. are often caused by fecal contamination of the drinkingwater supply. These pathogens live in the intestinal tracts of warm blooded animals and arepassed in the feces of an infected animal. In fact, a wide variety of waterborne pathogensindicate fecal contamination of water. This is why fecal contamination is the primary focusof biological water quality regulations.

The Total Coliform Rule (TCR) was published in 1989, and set Maximum ContaminantLevel Goals (MCLGs) as well as Maximum Contaminant Levels (MCLs) for the presenceof total coliforms in drinking water. MCLGs are regulatory objectives which indicate theoptimal level of a given contaminant. MCLs are the legal limits for a given contaminant.The TCR also details the type and frequency of testing that water systems must undertake.The rule applies to all public water systems. Coliforms are a broad class of bacteria, some

1

Page 49: Industrial Mathematical and Statistical Modeling Workshop for

of which live in the digestive tracts of humans and other warm blooded animals. Coliformsthat are usually found in digestive tracts are referred to as fecal coliforms. Due to the highercost of fecal coliform testing, the TCR requires only the less expensive Total Coliform teston routine samples. Specifically, the TCR requires systems to monitor for total coliforms ata frequency proportional to the number of people served. If any sample tests positive fortotal coliforms, the system must perform the following additional tests:

• Further test the positive culture for the presence of either fecal coliforms or E. coli.

• Take one set of 3-4 repeat samples at sites located within 5 or fewer sampling sitesadjacent to the location of the routine positive sample within 24 hours.

• Take at least 5 additional samples at the site of the original positive result during thenext month of operation.

In July of 2007 the Federal Advisory Committee Act (FACA) met to discuss revising theTCR. As part of this review of the TCR, the EPA is interested in identifying data sources andpotential analyses to support the advisory committee. Currently, a broad database of TCRsampling data is being assembled. Many states previously only reported TCR violations,so complete sampling information was not available. Most states are now moving towardscomplete reporting of TCR sampling results. This additional data may provide valuableinsights into the effectiveness of the TCR.

Specifically, it would be of interest to determine the frequency of positive TC results fora variety of system types (large or small, ground water or surface water, etc.). It wouldalso be of interest to determine how the frequency of fecal coliform and E.coli. given apositive TC result varies across those various types of systems. It is known that largesystems generally have a lower frequency of positive TC results than smaller systems. Itwould also be of interest to determine the effectiveness of repeat testing after positive TCresults because it is often difficult to obtain samples from adjacent locations. There is also apossibility of seasonal effects due to the freezing and subsequent rupture of pipes as well asdue to variations in agriculture runoff and weather patterns. Determining a more effectivesampling strategy would be of primary interest to the FACA committee. Interested readersare referred to [1, 2, 10] for more details.

2 Distributions

In order to model the hit rate, probability of a positive test in a given system, theBinomial and Beta-Binomial distribution will be used.

2.1 Binomial distribution

The probability density function of binomial distribution [6] is

f(k; n, p) =

(n

k

)pk(1− p)n−k.

If X is a binomial distributed variable, then the expected value of X is

E(X) = np,

2

Page 50: Industrial Mathematical and Statistical Modeling Workshop for

and the variance isV ar(X) = np(1− p).

2.1.1 Beta distribution

The beta family of distributions [3, 8] is a continuous family on (0, 1) indexed by twoparameters α and β and is often used to model proportions that lie between 0 and 1. Theprobability density function is

f(x|α, β) =1

B(α, β)xα−1(1− x)β−1, 0 < x < 1, α, β > 0,

where the beta function

B(α, β) =

∫ 1

0

xα−1(1− x)β−1dx =Γ(α)Γ(β)

Γ(α + β).

If X is a binomial distributed variable, then the expected value of X is

E(X) =α

α + β,

and the variance is

V ar(X) =αβ

(α + β)2(α + β + 1).

A useful fact to be used:

∂B(x, y)

∂x= B(x, y) (ψ(x)− ψ(x + y)) , (1)

where

ψ(x) , −∫ ∞

0

e−xt

1− e−tdt =

d

dxlog(Γ(x)).

2.2 Beta-Binomial distribution

A variable with a beta binomial distribution is distributed as a binomial distributionwith parameter p , where p is distribution with a beta distribution with parameters α andβ . For n trials, it has probability function

p(k|n) =B(k + α, n− k + β)

B(α, β)

(n

k

). (2)

If X is a beta-binomial distributed variable, then the expected value of X is

E(X) =nα

α + β,

and the variance is

V ar(X) =nαβ(n + α + β)

(α + β)2(α + β + 1).

• In Emprirical Bayes methods, the Beta-binomial model is an analytic model where thelikelihood function is specified by a binomial distribution and the conjugate prior is aBeta distribution [7].

3

Page 51: Industrial Mathematical and Statistical Modeling Workshop for

3 The Approach

In this section we describe our beta-binomial model approach.

3.1 Maximum Likelihood Function

From Equation (2), the MLE function [4] is

L(α, β) =n∏

i=1

p(ki|ni)

=n∏

i=1

B(ki + α, ni − ki + β)

B(α, β)

(ni

ki

)

∝ 1

(B(α, β))n

n∏i=1

B(ki + α,Ni − ki + β)

(3)

where

• ni: total number of routine samples for each location i,

• ki: total number of positive total coliform samples for each location i,

• n: total number of locations for a given system of interest.

3.2 Log-Likelihood

Minimizing a log-likelihood function with respect to the model parameters α and β isequivalent to maximizing the likelihood function in Equation 3:

log(L(α, β)) ∝ −n log (B(α, β)) +n∑

i=1

log (B(ki + α, ni − ki + β)) . (4)

Since α and β are strongly correlated, we first use a location parameter a and a precisionparameter b that are less correlated such that

a =α

α + β, b =

1√α + β

,

or, equivalently,

α =a

b2, β =

1− a

b2.

As a result, Equation (4) can be rewritten as

log(L(α, β)) ∝ −n log

(B

(a

b2,1− a

b2

))

+n∑

i=1

log

(B

(ki +

a

b2, ni − ki +

1− a

b2

)) (5)

4

Page 52: Industrial Mathematical and Statistical Modeling Workshop for

Next will need to take the partial derivative with respect to a and b of the log-likelihoodfunction. To simplify the notation, let

F , log(L(α, β)),

B1 , B

(a

b2,1− a

b2

),

B2 , B

(ki +

a

b2, ni − ki +

1− a

b2

).

Following the fact (1), we have

∂B1

∂a=

B1

b2

( a

b2

)− ψ

(1− a

b2

)),

∂B2

∂a=

B2

b2

(ki +

a

b2

)− ψ

(ni − ki +

1− a

b2

)).

By the chain rule, we have the partial derivative as

∂F

∂a=−n

b2

( a

b2

)− ψ

(1− a

b2

))

+1

b2

n∑i=1

(ki +

a

b2

)− ψ

(ni − ki +

1− a

b2

)).

Similarly,

∂B1

∂(

1b2

) = B1

(aψ

( a

b2

)+ (1− a) ψ

(1− a

b2

)− ψ

(1

b2

)),

∂B2

∂(

1b2

) = B2

(aψ

(ki +

a

b2

)+ (1− a) ψ

(ni − ki +

1− a

b2

)− ψ

(ni +

1

b2

)).

It follows that

∂F

∂b=−2n

b3

(1

b2

)− aψ

( a

b2

)− (1− a) ψ

(1− a

b2

))

+−2

b3

n∑i=1

(aψ

(ki +

a

b2

)+ (1− a) ψ

(ni − ki +

1− a

b2

)− ψ

(ni +

1

b2

))

Therefore, we have a system of nonlinear equations:

f(a, b) = 0,

g(a, b) = 0,

where

f(a, b) , ∂F

∂a, g(a, b) , ∂F

∂b.

The next step is to apply a nonlinear solver(such as Newton-Raphson method [9]) to findvalues for a, b. A few comments before we move forward:

• A good initial guess is important for most nonlinear solvers. In our case, we will applythe method of moments to obtain the initial guess for a, b.

• The analytical formulas for the Jacobian matrix is not available for our case.

5

Page 53: Industrial Mathematical and Statistical Modeling Workshop for

3.3 Bayesian Method

For all the locations within one system, we assume that the hit-rate follows Beta (αprior, βprior)distribution. For location i, denote the number of times of sampling with ni, the number oftimes of positive response with ki, then

ki ∼ binomial(ni, pi)

pi ∼ beta(αprior, βprior)

The joint distribution of pi and ki is

f(ki, pi) =

(ni

ki

)pki

i (1− pi)ni−ki

Γ(αprior + βprior)

Γ(αprior)Γ(βprior)p

αprior−1i (1− pi)

βprior−1 (6)

f(ki, pi) =

(ni

ki

)Γ(αprior + βprior)

Γ(αprior)Γ(βprior)p

ki+αprior−1i (1− pi)

ni−ki+βprior−1 (7)

The marginal pdf of ki is

f(ki) =

∫ 1

0

f(ki, pi) dpi =

(ni

ki

)Γ(αprior + βprior)

Γ(αprior)Γ(βprior)

Γ(ki + αprior)Γ(ni − ki + βprior)

Γ(ni + αprior + βprior)

The posterior distribution of pi given ki is

f(pi|ki) =f(ki, pi)

f(ki)=

Γ(ni + αprior + βprior)

Γ(ki + αprior)Γ(ni − ki + βprior)p

ki+αprior−1i (1− pi)ni−ki+βprior−1

which means f(pi|ki) ∼ Beta(ki + αprior, ni − ki + βprior).The natural estimators of α and β given the data are ki + αprior and ni − ki + βprior. A

simple way is to treat mean(ki + αprior) and mean(ni − ki + βprior) as our final estimate ofα and β.

Setting αprior = 1, βprior = 1 which means pi ∼ uniform(0, 1) is a resonable way to assignthe prior information. Different prior information can also be applied.

4 Numerical Results

In this section we describe the numerical results from our experiments. Table 4 summa-rizes all the notations we used in categorize different systems. For more details about thebackground, we refer readers to [1].

4.1 Total Coliform

We present values of (α, β) in this section. In particular, we tested and compared two ap-proaches mentioned above – MLE and Bayesian. Table 4.1 summarizes the resutls. Figure 1through 4 are the plots of representative cases from State A,B,C,D.

6

Page 54: Industrial Mathematical and Statistical Modeling Workshop for

Table 1: Summary of Data for Analysis

Category Explanatory Parameters

States (4) A, B, C, DSystems (3) C = Community

NC = Transient Non-CommunityNTNC = Non-Transient Non-Community

Population (2) Low (< 1000)High (≥ 1000)

Sources of Water (3) R = Raw Ground WaterF = Finished (treated) ground waterS = Surface water

Analyte Code (2) 3014 = E. Coli3100 = Total Coliform

Presence (2) 1 = Presence (bacteria detected)2 = Absence (not detected)

Sample Types (2) RT = RoutineRP = Repeat

0 0.2 0.4 0.6 0.8 10

5

10

15ACHSF − Beta(.2061,12.889)

Hit Rate

Den

sity

(a)

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25ANCHGF − Beta(16.0865,197.4581)

Hit Rate

Den

sity

(b)

Figure 1: Beta Distributions for State A

7

Page 55: Industrial Mathematical and Statistical Modeling Workshop for

0 0.2 0.4 0.6 0.8 10

10

20

30

40BCHGF − Beta(.7802,86.1623)

Hit Rate

Den

sity

(a)

0 0.2 0.4 0.6 0.8 10

5

10

15

20BNCLGF − Beta(.7448,12.4538)

Hit Rate

Den

sity

(b)

Figure 2: Beta Distributions for State B

0 0.2 0.4 0.6 0.8 10

5

10

15CCLGF − Beta(1.6636,27.014)

Hit Rate

Den

sity

(a)

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12CNTNCLGF − Beta(.3152,3.5224)

Hit Rate

Den

sity

(b)

Figure 3: Beta Distributions for State C

0 0.2 0.4 0.6 0.8 10

5

10

15DNCLSF − Beta(.9973,15.1972)

Hit Rate

Den

sity

(a)

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25DNCHGF − Beta(1.5947,49.6791)

Hit Rate

Den

sity

(b)

Figure 4: Beta Distributions for State D

8

Page 56: Industrial Mathematical and Statistical Modeling Workshop for

Table 2: Parameter Estimation Results for Beta-Binomial Distribution: Modelinghit rates of Coliform for a given routine sample across locations for a given system. A* means Newton method fails to converge. n is the total number of locations inside aparticular system. ni is the total number of sampling for a given location i.

Systems for State A Range of ni n (α, β) from MLE (α, β) from Bayesian

ACHGF [4,98] 733 * (1.10, 12.48)ACHSF [4,53] 136 (0.21, 12.89) (1.19, 19.79)ACLGF [4,34] 475 (0.52, 12.79) (1.27, 8.11)

ANCHGF [4,50] 23 (16.09, 197.46) (1.78, 10.83)ANCLGF [4,46] 207 (0.43, 2.72) (2.49, 8.79)

ANTNCLGF [4,29] 165 (0.41, 5.45) (1.71, 9.24)

Systems for State B

BCHGF [4,5656] 333 (0.78, 86.16) (1.20, 189.57)BCHSF [4,11434] 109 * (1.28, 194.17)BCLGF [4,213] 969 (0.74, 4.20) (3.41, 62.08)BCLSF [4,204] 73 (0.19, 1.74) (1.88, 62.74)

BNCLGF [4,275] 1173 (0.74, 12.45) (2.88, 31.57)BNTNCLGF [4,210] 256 (0.85, 21.17) (2.70, 42.27)

Systems for State C

CCHGF [4,87] 2592 (0.06, 0.61) (1.12, 10.31)CCHSF [4,221] 679 * (1.11, 52.70)CCLGF [4,67] 2806 (1.66, 27.01) (1.49, 8.90)CCLSF [4,35] 41 (1.65, 30.34) (1.39, 8.66)

CNCHLF [4,47] 140 * (1.06, 11.41)CNCLGF [4,42] 920 (0.90, 12.82) (1.57, 8.28)

CNTNCHGF [4,70] 88 (0.38, 9.62) (1.33, 10.73)CNTNCLGF [4,28] 280 (0.32, 3.52) (1.67, 8.45)

Systems for State D

DCLGF [4,52] 665 (0.92, 22.76) (2.20, 30.89)DNCHGF [5,175] 36 (1.59, 49.68) (4.75, 110.78)DNCLGF [4,231] 6349 (0.54, 8.46) (1.53, 9.10)DNCLSF [5,134] 79 (1.00, 15.20) (3.44, 45.85)

DNTNCHGF [6,436] 13 * (2.92, 178.69)

9

Page 57: Industrial Mathematical and Statistical Modeling Workshop for

Table 3: Probability of hit rates less than a given percentage for Coliform: for agiven routine sample across locations for a given system

Systems for State A Pr(Hit Rate< 0.10) Pr(Hit Rate< 0.05)

ACHSF 1.00 0.97ACLGF 1.00 0.89

ANCHGF 1.00 0.91ANCLGF 0.95 0.59

ANTNCLGF 0.99 0.76

Systems for State B

BCHGF 1.00 1.00BNCLGF 1.00 0.82

BNTNCLGF 1.00 0.92

Systems for State C

CCLGF 1.00 0.92CCLSF 1.00 0.89

CNCLGF 1.00 0.77CNTNCHGF 1.00 0.88CNTNCLGF 0.98 0.73

Systems for State D

DCLGF 1.00 0.92DNCHGF 1.00 0.98DNCLGF 1.00 0.79DNCLSF 1.00 0.80

10

Page 58: Industrial Mathematical and Statistical Modeling Workshop for

4.2 Conditional E. Coli

In this section, we repeat the numerical experiments as conducted in the previous section.The difference is that we use the sub data sets – ni is total number of positive sampling forlocation i while ki is the number of E. Coli positive cases out of ni. From this sub data set,we would like to study the conditional probability distribution. Table [?] summarizes the nu-merical results for (α, β) and Figure 5, ?? are the plots of Beta distribution of representativecases.

Table 4: Parameter Estimation Results for Beta-Binomial Distribution: Modelinghit rates of E. Coli for repeat samples across states for a given system. A * means Newtonmethod fails to converge.

System Range of ni n (α, β) from MLE (α, β) from Baysian

CHGF [4,319] 1700 * (0.0044, 6.41)CHSF [4,44] 180 (0.1164, 4.51) (0.0050, 1.87)CLGF [4,66] 180 * (0.020, 4.10)CLSF [4,35] 51 (0.0488, 1.39) (0.038, 0.96)

CNCLGF [4,76] 1412 (0.0579, 4.76) (0.026, 2.19)CNTNCHGF [4,65] 94 (0.0075, 0.44) (0.0038, 0.25)CNTNCLGF [4,64] 388 (0.0167, 1.92) (0.0094, 1.05)

Table 5: Probability of hit rates less than a given percentage for E. Coli givenColiform: for a repeat sample across states for a given system

Systems Pr(Hit Rate< 0.10) Pr(Hit Rate< 0.05)

CHSF 0.41 0.40CLSF 0.87 0.81

CNCLGF 0.92 0.89CNTNCHGF 0.29 0.28CNTNCLGF 0.57 0.52

References

[1] Occurrence and monitoring document for the final ground water rule.

[2] U.S. waterborne disease statistics 1991-2000 new international partnership announcedat 3rd world water forum.

[3] S. Nadarajah A. Gupta, Handbook of beta distribution and its applications, MarcelDekker,Inc, 2004.

[4] G. Casella and R. Berger, Statistical inference, Duxbury Press, 1990.

11

Page 59: Industrial Mathematical and Statistical Modeling Workshop for

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

CNTNCHGF − Beta(.0075,.4413) E.Coli given Coliform presence

Hit Rate

Den

sity

(a)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

CNTNCLGF − Beta(.0167,1.9187) E.Coli given Coliform presence

Hit Rate

Den

sity

(b)

Figure 5: Conditional Beta Distributions

[5] P. Corso, Cost of illness in the 1993 waterborne cryptosporidium outbreak, milwaukee,wisconsin, http://www.cdc.gov/ncidod/eid/vol9no4/02-0417.htm.

[6] V. C. Elart, Binomial distribution handbook for scientists and engineers, Boston Press,1944.

[7] D. Griffiths, Maximum likelihood estimation for the beta-binomial distribution and anapplication to the household distribution of the total number of cases of a disease, Bio-metrics 29 (1973), 637–648.

[8] A. K. Gupta and S. Nadarajah, Handbook of beta distribution and its applications,Marcel Dekker Inc., 2004.

[9] S. Teukolsky W. Vetterling W. Press, B. Flannery, Numerical recipes in C, CambridgeUniversity Press, 1986.

[10] N. J. Hoxie W. R. M. Kenzie and M. E. Proctor, A massive outbreak in milwaukee ofcryptosporidium infection transmitted through the public water supply.

12

Page 60: Industrial Mathematical and Statistical Modeling Workshop for

������������� ����������������������������������� �!���"� �$#%�&�('��*)+��!�

,.-0/21435-06�78/:9<;>=<?@35ACBEDGFHAC-0IKJMLON�=MPQ/$RSAKT2=VU&D235D2W�RYXMZ2-E[VLO\]=M^_AC[MICAS`aD][Mb c]=Vde-e^_X<f

g 3O/�hMIi-Ej g 35-ELO-0[ k5-03>lm /2WM[ g -ED�BnW

oprq RSAC[VB0/2IC[sRtD]h4/23nDuk5/235Ai->LF<D2B0XMIvkr6 o -0[ k5/23nL0lw WVAiICAi[�RYAx=

dyD]3OkOACZzU{AC|:D235D2j$D]Z{3OAKLOWM[VD][} /23OkOW7QD]35/2ICAi[VD!U knDukO-�~_[MAC|2-03nLOAvkr6

�z�H�E�0�u�M�2����5�E�n�2�0�.�x�O�r�0�n�����u�����]���O�n�����n���0�����Y�]�!�n�0�������&�����u�4�n�z�<�: ������¡�K z�2�0�¢�0�������4�n���0�� 0�0£:�0�O�u¤S¥M�]�O�u�$�0�¢�0�n $ ��V�5¦§�: 

���V���¨���u¤Q�u�4��©@�4�]¤<¥M�u n���¨���u�V z�u�4���x���$��ª«�$�: n ­¬��V�]�V�¨���x���� ­�r®���¦V�$���K n n��¤C�z�u�¯�s ¨¥V�5�0� ©8�­���«¥¯�&¤° 0�E±³²<���������0���G���¯�$�´¦M�� �����V���¨���u¤S¥M�]�O�u�$�0�¢�0�n «®n�O�u�µ�u�+�2�� ��0��£u�5�­¥V�u���¨���u���r®$�´¦M�z�O�>�5�����:¶  «·Q�v¸>¦&��ª.¹º�»����£u��¤C�5¥V�5���´¦{�O�n�»�5¥2¥¯�O�>�2�n¦M�� »���§�]�O�2�0�­����2�0���u����¸ �>�E�­�n¥2¥¯�O�0¼����$�u�¨���u�M G�½®e��¦V���$±!²4¦&�K y¤i�n�!�¢�$��¦V�%¬0�V�� ��¨���u���½®e��¤�¤¿¾C¥M�u 0�5�]�4�� n ��½®e��¦V�e���V£u�0�n ��À¥4�5�2�0¤i���Gª@���¹t¦&���5¦¹º�t¥¯�O�u£u�n�e��¦V�u�t�]�$���V���¨���u¤&�¨�O���0�n�0�¢�]���e�]�&����� ¸e�´¦M�À��¦&���{ n�¨��� ¸Q¥M¦M�u 0�_�2�>�� 8�4�]�t�2�0�����������4�«�.���u���u¤{�&�<��¬��V���¨�O�r�0�5�0���u���y�5�u 0�5��u�+�2�u�����n�x®��u�O�$ O¦{�&�.���u¹Á�³�¨���$�E±�Â��$�´¦M�0�§����£u��¤C�5¥V�5���u�+�u¤¿�������¯�u�¨��£:�z�$���´¦M�>�s����n�u�_¥4�&���$�´¦M�$�5�E�n�2�0�:¶  ��¨�O���0�n�0�¢�]����{ ���� ¸!�´¦M�G�u���v¸2���4�]¤{¥M�]�O�u�$�0�¢�0�n y���¢¥V�0�4�2�0�V���u��´¦M�%�u� ¸2¤i�G�r®e�´¦M�e�´¦{���{ ��8�]�4�À©º�4�u¤¯�¨���!�G�n�: 0�n�­�u�s 0�u¤v£E���{¸»�� ��: ������Ã�½®�n¬��V�]�x���]�M y�O�]��¦V���y�´¦M�]� 0�u¤¿£>��� ¸s�]�³Ä@Å.ÆÀ±

Ç ÈSÉ!ÊHËÁÌ­ÍÎsÏQÊÁÐ�ÌGÉ

q WM-«j!/2kOAC/2[!/]ÑSDe35/{BnZ�-�kÁAKLºJ&-0kO-035j!Ai[V-EJ!h{6GDe[{XMjGh<-E3º/]ѯÒVD235D2j�-0kO-E35LÁD]14->B�kOAC[Mbek5WM-ÀÑ´/23nB�->L@D2B�kOAC[Mby/2[$Aik5L@h</&J&6�ÓÁ,.AC|2-E[D.kOACj!-_WMAKL½kO/2356%/2ѯkOWM-_D2Ivk5Avk5XVJ&-«/2ѯkOWM-Ô3O/&BnZ2-0kE=]ÕQ-«ÕQ/2XMIKJ�IiACZ2-«kO/y-EL½kOACj$Duk5-ÀkOWM->L½-_ÒVD]3nD]j!-�k5-03nL0=�D][VJ!XMIikOACj$Duk5-0IC6GÒM35-EJMACB�kBnWVD235D�B�kO-E3OAKL½kOAKB0LS/2ÑVk5WM-�3O-Ej!D2Ai[<J&-03H/]ÑMk5WM-QÖVACb2W kEÓ p kÁACL×J&->L½AC3O->Jyk5/ÔWVD:|�-QD2[GD2Iib�/235Avk5WMjØkOWVD]k×->Lrk5Aij$D]kO-ELtk5WM-ELO-�ÒVD]3nD]j!-�k5-03nL/u|2-E3@k5WM-_Ñ´-EÕ8->Lrk�[{XMj%h4-03�/2ÑYJMDuknD%Ò4/2AC[ k5LºkO/�h<-eD]hMIC-ÔkO/%ÒV3O->J&ACB�k8k5WM-eD]IikOAikOXVJ&-�=&|2-EIi/&B�Aikr6$D][VJ»D2BEB�-0IC-03nDuk5Ai/�[$XM[ kOACI¯j�/2kO/�3hMXM35[M/�X&kE=VD2[VJzkO/!WVD:|2-yD2[»->Lrk5Aij$Duk5-y/]ÑtkOWM-�D�B0B�XV35D�B�6$ÕÀAikOWsÕÀWMAKBnW»k5WM-06»j$D:6$h<-yÒM35-EJMACB�kO-EJSÓ

ÙÁÚ½Ù ��Û«Ü��ÀÝ@ÞV߯ÜtàâáÁß4ã&äOå«æçݺå�áè��ÝÀßMéºÜêãq WM-_Ñ´/23nB�-ELº/2[zD%3O/&BnZ�-�k8D23O-Àk5WM35XVLrk>=&J&35D2bV=�IiAiÑ�kE=MD2[VJ$ÕQ-0ACb2W k.ë ì>í¢Ó×FV/238kOWMAKL8ÒM3O/�hMIC-0js={Õ8-«ÕÀACICI¯D�LOLOXMj!-«-09{/2î¢D]kOj!/�LOÒMWM-E3OAKBÖVACb2W kÀ/2[MIC6»LO/�kOWM-yICAiÑ�k_D][VJ�J&3nD]b�Ñ´/23nB�->L8ÕÀACICISh<-yïE-035/VÓq WM-.Ñ´/23nB�-yJ&XV-ekO/�kOWM-yÕQ-0ACb2W k«ðHñ³AKLÀb2AC|2-E[zh{6

ðHñ�ò(ó»ô4õ ör÷:ø

ÕÀWM-E3O-eóùAKLQkOWM-yj$D�LOL�/2Ñtk5WM-y3O/&BnZ�-�kÀD][VJzô­ACLÀkOWM-eÑ´/�35B0-eJMXM-ekO/!b23nD:|{Avkr6�Óq WM-Àk5WM35XVLrkQACLºkOWV-«Ñ´/23nB�-_ÕÀWMACBnWzj�/u|�-EL@kOWM-Ô3O/&BnZ2-0kÁÑ´/235ÕQD235JSÓ p k�ACL@kOWM-Ô3O->D2B�k5Ai/�[!Ñ´3O/�j } -0Õ�k5/2[Sú L q WVAi3nJ­RYD:Õ+k5/�kOWV-

D2BEB�-EIi-E35D]kOAC/2[z/2ÑYk5WM-y-0[Mb�Ai[V-2Ó q WM-yb�/u|2-035[MAC[Mb%->û XVDuk5Ai/�[zÑ´/�3�kOWM-eÑ´/�35B0-eJMXM-ekO/�kOWV3OXVL½k_ðHüaACLÀb�Ai|�-0[»h 6

ð ü ò(ýEþ�ÿ��ó�ô � õ öxì�øÕÀWM-E3O-��ó ACL_kOWV-%j$D2L5L�ÖV/uÕ 3nDukO-�=Vô � ACL«kOWM-Gb23nD:| Aik5D]kOAC/2[VD2ISB�/�[VLrknD][ kE=4D][VJ�ý0þ ÿ�ACL«kOWM-�L½Ò4-EB0A��<B%Aij!ÒMXVICLO-2=4j!-ED2LOXM35-EJsAC[LO-EB�/�[VJMLEÓ q WM-�LOÒ4-EB�A��<B�ACj�ÒVXMICLO-!BnWVD]3nD2B�kO-035ACï0-EL«kOWM-�-��­B0Ai-E[VB�6�/2Ñ@kOWM-!LO6{L½kO-Ejs=êAxÓ -2ÓyAikyACLÔkOWV-G3nDuk5Ai/»/]Ñ@k5WM-$D]j!/2XM[ k./]ÑkOWV3OXVL½k«ÒM35/&J&XVB�->J­k5/$Õ8-EAib�W k8ÖV/uÕ�/]ÑHkOWM-yÒV3O/�Ò<-EIiIKD][ k5LEÓ

� ��� ��������� ���������! �"#� ���$���%�& ('�)����*�#���+ ��� ��������� ��������,-���$��.$�& / ��� ��������� �������102�$"#��3%�4 025�"& ����6 �%� ��������� ���7�8 �:9<;$�������:��)�������� ��5%���6���=02��>@?����$"#�$.��A$ ��� ��������� ������� 8 �$����?B�C ����$"#�#�D ('E�F?D �GH��"JI �#"#"

÷

Page 61: Industrial Mathematical and Statistical Modeling Workshop for

FHAib�XM3O-!÷�l×FV3O-E-yP8/&J&6��_AKD]b�35D2j /]ÑÁFM/�35B0-EL�D2B�kOAC[Mb!/2[sD��À/&BnZ2-�k>=&/2h&knD]AC[M-EJzÑ´35/2j k5WM- }�� U � ÕQ-0hsL½AikO-­ë ì>í¢Ó

q WM-ek5/]k5D2I¯Ñ´/23nB�-yD2B�kOAC[Mb$/2[»kOWV-�3O/&BnZ�-�k�ACL�k5WM-%LOXMj /2Ñ×D2IiI¯k5WM-yÑ´/�35B0-ELÀD2B�kOAC[Mb!/2[�kOWV-y3O/&BnZ2-0kE=MD2[VJ­k5WM-%D�B0B�-EIi-E35D]kOAC/2[B0D2[»h4-eÑ´/2XV[VJ�h 6»J&Ai|{AKJ&AC[Mb!kOWMAKL�k5/]k5D2I¯Ñ´/23nB�-eh{6$kOWV-yAi[VL½k5D2[ k5D][V-0/2X<L�j!D�LOL�ó ��� �ó�

ò�� ðó òýEþ�ÿ �ó+ô �ó ��� �ó�� � ô¯õ ö� ø

ÕÀWM-E3O-Àó � ACLÁkOWM-ÀAC[MAikOAKD]IMj$D2L5LÁ/2Ñ<k5WM-À3O/&BnZ�-�k@Duk@IKD]XM[<BnWS=2D][<J��ó�ò���ó������HAKL×kOWM-À3nDuk5-�/]ÑêBnWVD][Mb�-�/]ѯj!D�LOLºD2LHkOWV-À3O/&BnZ2-0kÖVAC-ELEÓ q WM-»j!D�LOLGBnWVD][Vb2-EL%D2Lyk5WM-z-E[Mb2AC[M-»-�9&Ò<-EICL%-�9&WVD2XVLrk�b�D�L½->L�J&XM35Ai[Vb�kOWM-zkOWM35XVL½kOAC[Mb�Ò4-035AC/{JS=×D2[VJâAKLGD�B�/2[<LrknD][ kû XVD][ k5Avkr6­XV[�k5AiISkOWM-y-E[Mb2AC[M-ehMXM35[VLÀ/�X&kEÓFM/�3.D»IC/2[Mb�35D2[Mb2-G3O/&BnZ�-�kE=4k5WM-G35/&BnZ2-0kÔAKLe/2X&knL½AKJ&-�/]Ñ@k5WM-$?ºD]3OkOWYú L.Duk5j�/ L½ÒVWM-035-�Ñ´/�3.j!/ Lrke/]ѺAik5L.Ö<Aib�W�k>=SL½/�ÕQ-GB0D2[

B�/�[VLOACJ&-E3�kOWM-�D]kOj!/�LOÒMWM-E3OAKBÔ-�1¯-EB�knL�kO/!h4-y[M-0b�IiACb2AChMIC-2Ó

ÙÁÚ�� ��Û«Ü�#�á����yÜSÞMã����Ôä��_á×åÔß¯Ü � �«à�ã{Ü!�,.AC|2-0[+k5WM-AC[MAvk5ACD2I_D][<J �<[VD]I«Ò4/�LOAikOAC/2[VL$/2ÑeDa35/&BnZ2-0k$D2Ii/�[MbaÕÀAvk5W+kOWM-�k5/]knD]IÀk5Aij!-J&->L½AC3O->J§kO/akO3nD:|2-EI�kOWMAKL­J&AKL½k5D][<B�-2=RYD2j%h4-03OkEú LÀÒM35/2hMIC-0j ACL�k5/ �V[VJ�k5WM-�AC[MAvk5ACD2IS|2-EIi/&B0Avkr6�L½/�kOW<Duk«k5WM-y3O/&BnZ�-�kÀÕÀAiICIYAC[VJ&-E-EJ�-0[<JsXMÒ�DukÀkOWV-B�V[VD2ISÒ</ L½AikOAC/2[�D]kkOWV- �V[VD]I¯k5Aij!-yÑ´35/2j AvknL«LrknD]3OkOAC[Mb�Ò</ L½AikOAC/2[âëi÷�í¨Ó`�-B�/2[<L½AKJ&-03!k5WMAKL$ÒM35/2hMIC-0j Ai[+kOWV-B0D2LO-�/]Ñ.kOWV-FtIKDukz?8D]3OkOW o /&J&-0I#"Àk5WVDukzAKLE=8ÕQ-�Aib�[M/235-�Ñ�D2B�kO/23nL$LOXVBnW+D�L!kOWV-

B�XV3O|uDuk5XM35-Ô/2ÑSkOWM-y?8D]3OkOWS=&D][<J»B0/2[VLOACJM-038kOWM-.krÕQ/]î�J&Aij!-E[VL½AC/2[<D]I¨=�Ò4/2AC[ k�j$D�LOLQj!ACL5LOAiIC-Ôb�-0/�j�-0kO3562Ó p [­k5WMAKLÀB0D2LO-2= k5WM- �V[VD]IÒ4/�LOAvk5Ai/�[VLyö%$'&Võ)(*& øQ/]ÑYk5WM-y35/{BnZ�-�kÀD]35-eb2AC|2-0[�h{6

$+&Gò�$tö��½ø!, �$.-�ö���& � �½ø�õ ö%/{ø( & ò�(¯ö��½ø0, �( - ö%� & � �½ø �

÷ì ô_ö�� & � �½ø N ö�1�ø

ÕÀWM-E3O-aö �$+-yõ �(�-«ø�D23O-zkOWM-�J&->L½AC3O->J³|�-0IC/{B0Avk5Ai->LGÑ´3O/�jµRYD]jGh<-E3½k>ú L�ÒM35/2hVIi-Ej�Ó�U{/2IC|{Ai[VbÑ´/23!k5WM-�J&-ELOAC3O->J³|2-EIi/&B0Avk5Ai->L�Ñ´35/2j-Eû XVD]kOAC/2[�1Gb2AC|2-EL

�$+-(ò $'& � $tö%�½ø� & � �õ ö�2 ø

�( - ò (*& � (êö%�½ø��& � � �

÷ì ô_ö%� & � �½ø ö#32ø

ì

Page 62: Industrial Mathematical and Statistical Modeling Workshop for

��������� ���� �0�� � �S��� �������p [aÒM3nD2B�k5ACB0-2=YkOWM-zD�B�k5XVD]I@|�-0IC/&B�Aikr6/]ÑQkOWM-z3O/&BnZ2-0keJM/ ->Ly[M/2k�XVL½X<D]ICIi6�j!D]k5BnWaXMÒâÕÀAikOW�kOWV-zJ&-ELOAC3O->J�|�-0IC/&B�Aikr6�Ñ´/2XM[<J�h{6LO/2IC| AC[Mb�RtD]jGh<-E3½k>ú LGÒM35/2hMIC-0jsÓ�U&Ai[VB0-zÕQ-»J&/�[M/]k�WVD:|2-zAij!ÒMXMIKLOAi|�-zj!ACL5LOAiIC-EL�ö´j!AKLOLOAiIC-EL%kOW<Duk�BED]["LOÒ4-0-EJ³XMÒ§D2[VJ³LOIi/uÕJ&/uÕÀ[zD]kQÕÀACICIKø�=&Õ8-Ô[M-0->J$kO/!D2J��rXVLrkQkOWV-eL½kO-E-035Ai[VbGLO/�kOW<DukQkOWM-.3O/&BnZ�-�kºk535D:|�-0IKL@/�[­kOWV-.J&->L½AC35-EJ­B0/2XM3nL½-*" kOWMAKLQACLQZ{[M/uÕÀ[zD�LRYD2j%h4-03Ok_,.XMACJVD][VB0-2Ó`�-%AC[�k53O/&J&X<B�-�k5WM-�-0353O/�3�����h4-�krÕQ-0-E[�kOWV-GD2B�kOXVD2It|�-0IC/&B�Aikr62=4/23.j�AKL5L½ACIi-%kOWM35XVL½kÔ|�-EB�k5/23>=4D2[VJsk5WM-GJ&->L½AC35-EJ�|�-0IC/{B0Avkr6

Duk«D�k5Aij!- �����ò�� � � - ö�� ø

ÕÀWM-E3O-���òèö2�$Hõ!�(&øQACLÀkOWM-y|�-0IC/{B0Avkr6$/2Ñtk5WM-yj!ACL5LOAiIC-2=MD2[VJ���-�òèö �$+-yõ �(�-«øQAKLQkOWV-�J&-ELOAi35-EJ»|2-EIi/&B0Avkr6�ÓFM/�3_/�XM3_35/&BnZ2-�kÀk5/zIKD][VJ�Duk_k5WM-GJM-ELOAi35-EJsÒMIKD2B0-%DukÔkOWM-�J&-ELOAC3O->JskOACj!-2=¯Õ8-%[M-0->JskO/�D]ICAib�[skOWM-�D2B�k5XVD]IH|2-EIi/&B0Avkr6�ÕÀAvk5W

kOWV-�|2-EIi/&B�Aikr6$k5/$h<-yb D]AC[M-EJê=VAxÓ -2ÓC=MÕQ-yÕQD2[�kÀk5/$j$D]Z2-����!ò ��/23«kO/­LO/2j!-yL½X2�­B0Ai-E[�k5Ii6�L½j$D2IiIS|uD2IiXM-�Ó q WM-E[S=Mk5WM-y-0[Mb�Ai[V-B�XMk5LÀ/21�D][VJ»kOWV-yj�AKL5L½ACIi-.ÖVAC-EL«hVD]ICIiAKL½kOAKB0D]ICIC6$kO/!Aik5LÀAC[ kO-0[<J&-EJ»k5D23Ob�-�k>Óq /%D]ICAib�[!kOWM-Ô|2-EIi/&B�Aikr6�k5/�k5WM-ÔJ&->L½AC35-EJ!|2-EIi/&B0Avkr6�=]kOWV-ÔJ&AC3O->B�k5Ai/�[$/]ÑêkOWV-_35/{BnZ�-�k>ú L@D�B0B0-0IC-03nDukOAC/2[$LOWM/2XVICJ!h4-_Ò4/2AC[ kOAC[Mb%AC[

kOWV-GJ&AC3O->B�k5Ai/�[�/]Ñ×kOWMAKLÔ-0353O/�3����<Ó q WV-�j$D]b2[VAvk5XVJ&-%/2Ñ×kOWM-�D2B0B0-0IC-03nDuk5Ai/�[ �!#"%$ ACL_Z{[M/uÕÀ[sÑ´3O/�j k5WM-GJMAi|{AKJ&Ai[Vb$kOWM-%kO/2k5D]IÑ´/23nB�->L�h{6$kOWM-yAC[VL½k5D2[�knD][M-E/2XVLQj$D2L5LE=MD2L�AC[ VÓ

'& ò �!(")$��� &* ��� * ö�+ ø

', ò !#"%$�-� ,* ��� */. ö½÷�� ø

ÙÁÚ10 ��Ý �32 �ÀãMä½å_æèãMÛ«Ü �µäOà à{ä54rÜ76�àâ��ÞMá�8 ÜYß4ã&Ý@Þ � �_å �«ÜSÞ�#�á����yÜYÞ&ã(���_ä)�«áÁå_߯Üq /zB�/�j�ÒVX&kO-%kOWM-%kO3nD��r-EB�kO/�3O6­/2ÑÁkOWM-Gj!ACL5L½ACIC-�XVLOAi[Mb»RYD]jGh<-E3½ke,.XMACJVD][VB0-2=4Õ8-�h4-0b�Ai[�ÕÀAvk5WAi[VAvk5ACD2ItJMD]k5Dö�$ � õ�( � õ �$ � õ �( � ø�=�V[VD2IºJMD]k5Daö%$'&Võ)(*&Võ���& ø�=tLOÒ4-EB�A��<B!ACj!ÒMXMIKL½-zýEþ�ÿ�=tkOWV-$Ai[MAikOAKD]Iºj!D�LOL�/]ÑQkOWM-$35/&BnZ2-�kyó � D][VJ�k5WM-$35D]kO-!/]ÑÀBnWVD2[Mb2-!/]ÑQkOWV-j$D2L5LÔJMXM3OAC[MbzkOWM-GkOWM35XVL½kOAC[MbzÒVWVD2LO- �ó�Ó q WMACL.3nDuk5-�/]Ñ8BnW<D][Mb�-�ACLeB0/2[VL½k5D2[�k>=4D�L«k5WM-�3O/&BnZ2-0kÔ-09{Ò4-0IKL.L½Ò4-0[ k.Ñ´XM-0I×ICAi[M->D]35Ii6J&XM35AC[Mb�kOWMAKLÀÒ4/23OkOAC/2[�/]ÑHkOWM-eÖVACb2W k>Óq /»B�/�j!ÒMX&kO-�k5WM-GÒ4/�LOAikOAC/2[�D2[VJ|�-0IC/&B�Aikr6�/]Ñ@k5WM-�kO3nD��r-EB�kO/�3O6sDuke-ED�BnWkOACj!-�L½kO-EÒ�/u|�-03_kOWM-$J&XM3nDuk5Ai/�[/]Ñ8kOWM-GÖVACb2W kE=

D][:9 �Ô?+LO/2IC|2-03QAKL�Aij!ÒMIC-0j!-0[ k5-EJ»/2[ o �Áq R � P D]ÒMÒMICAC-EJzkO/GkOWM-e|2-EIi/&B0Avkr6­D2[VJzD�B0B�-EIi-E35D]kOAC/2[zLOXMh;�r->B�kQkO/!WM/�j�/�b2-E[M-0/�XVLAC[MAvk5ACD2IÀB�/2[<J&Avk5Ai/�[VLEÓ � Ñ�kO-03!->D2BnW³AC[ kO-0b�35D]kOAC/2[³L½kO-EÒS=ÁkOWV-�J&->L½AC35-EJâ|�-0IC/{B0Avkr6�/]Ñ_kOWM-�3O/&BnZ�-�k%AKL%/�h&k5D2Ai[V-EJ³AC[§DÑ´-0->J&hVD2BnZÑ�D2LOWMAC/2[/u|�-03_kOACj�-GXM[ kOACI7�-�sAKLeL½X2�­B0Ai-E[�k5Ii6�LOj$D]ICIxÓ q WM-�k535D<�r-EB�kO/2356�-0[VJMLÔÕÀWM-E[ ( & ò=�M=¯/23Ô-Eû XMAC|uD]IC-0[ kOIC62=¯ÕÀWM-E[kOWV-35/{BnZ�-�k�ICD2[VJMLEÓ

> ?A@CBED ËÁÌ�F:GHB�I

,.AC|2-0[­kOWM-.ÒVD]3nD]j!-0kO-03nL«ö�$ � õ�( � õ �$ � õ �( � õ�$ & õ)( & õ)� & õ5ý þ ÿ õ½ó � õ��ósø�={Õ8-.B0D][­B0/2j!ÒMX&k5-_k5WM-«k535D<�r-EB�kO/2356�/]ÑêkOWV-_35/{BnZ�-�kºXV[VJ&-03RYD2j%h4-03Ok_,.XMACJVD][VB0-2ÓFM35/2jkOWMAKL­Z{[M/uÕÀIC-EJMb2-2=8kOWV-ÒM35/2hMIC-0j h<->B�/2j!->L$WM/uÕÃkO/³B�/�j!ÒMX&kO-kOWM->L½-Ò<D]3nD]j!-�k5-03nL!b2AC|2-0[+AC[&Ñ´/235j$DukOAC/2[+Ñ´35/2j

�rXVL½k.DzÒMAi->B�-G/]Ñ×kOWM-G3O/&BnZ�-�kEú LÀk535D<�r-EB�kO/2356»JMXM3OAC[Mb$k5WM-�k5WM35XVLrk5Ai[VbzÒMWVD�L½-%/]Ñ@Aik5L_ÖVAib�W kE=¯D][VJsk5W{XVLÔB0/2j!ÒMX&k5Ai[Vb$kOWM-�-0[ k5Ai35-kO3nD��r->B�k5/23562Ó q WM-�JMD]k5D�b2AC|2-E[»AKL�kOWV-yÒ</ L½AikOAC/2[�D][<J­k5Aij!-�/2ÑYkOWV-�ÒMAC-EB�-e/2Ñtk5WM-ekO3nD��r->B�kO/�3O6zZ [V/uÕÀ[�J&XV3OAC[Mb�kOWM-ek5WM35XVLrk5Ai[VbÒMWVD�L½-�/]Ñ8kOWM-$35/&BnZ2-0kEú L«ÖVACb2W k­ö%$�J½õ)(KJrõ)�LJ¨øNMJPO ; DukRQ"Ò4/2AC[ k5L.Ai[�k5WM-�kOACj�-!AC[ kO-035|uD]I«ë � � õ)�5Sní¢ÓT_-035-2=SÕQ-!B�/2[<L½AKJ&-03.k5WM-!kOACj!-AC[�k5-035|uD]I×h<-Eb2AC[M[MAC[Mb�Duk �ÔòU�»D2[VJ�-E[VJ&AC[MbsDuk�L½/�j!- �V[<D]I×kOACj!- �5SWV �NXZYu=SÕÀWV-035- �NXZYGACLÔk5WM-!hMXM35[M/2X&kek5Aij!-!ÕÀWM-0[�kOWV--0[Vb2AC[M-yB�X&knLÀ/2X&k>=M35-EJ&XVB0Ai[VbGk5WM-y[{XMj%h4-03À/2ÑtÒVD235D2j!-�kO-E35L8k5/$[M/]kÀAC[VB�ICXVJ&-ek5WM-yAi[VAvk5ACD2IYB�/�[VJ&AikOAC/2[VLyö�$ � õ�( � ø�Óq /­B�/2j!ÒMXMkO-�k5WM-ELO-�ÒVD235D2j!-�kO-E35LE=MÕQ-ykO356sJ&Av1¯-035-0[ k_j!-0kOWM/&JMLÔD][VJB0/2j!ÒVD23O-e35-ELOXMIvknL0ÓÀ`�-�XVLO-%JMD]k5D!Ñ´35/2j�J&Ai14-E3O-E[ k/2[V-�î�LrknD]b2-.35/{BnZ�-�knL.ë :íSk5/!kO->Lrk«/�XM3�j�-0kOWM/&JMLEÓ

��Ú½Ù ��Ü ���Ôß4ãMärݺå Ý7[­�sá×ÞMá��çÜêã&ÜSÞVàFM35/2j /2hVLO-035|uDuk5Ai/�[+/2ÑÔk5WM-\9 �.? LO6{L½kO-EjµÑ´/�3­XMÒ¯JMDuk5Ai[Mb³kOWM-35/&BnZ2-0k$kO3nD��r-EB�kO/�3O6�=8ÕQ-sÑ´/2XM[<J+kOWM-j$D2b2[MAikOXVJM-/]ÑekOWV-D2BEB�-EIi-E35D]kOAC/2[»BED][�h<-�JM-EB�/�XMÒMIC-EJzÑ´3O/�j k5WM-�L½6&L½kO-Ej�=&A¨Ó -2Ó

!#"%$ òý0þ ÿQô] � �

ö½÷2÷:ø

Page 63: Industrial Mathematical and Statistical Modeling Workshop for

� � ��� ������� � �� �� �� �� ��� �� �����p 3nD][MAKD][�U{WVD2WVD]hMî5÷ �K���Ô÷���T � M÷ ì� K� 1K+K�K� 3 2 1*��2� ÷��K�} 9_î �R9 } ,Ôî�P �K���Ô÷���T � M÷ ì�2K+ ÷�+ �K� � ÷��K� ÷��K�U{W<D]WVD2h&î 1 �K���Ô÷���T � M÷ ì� K� 1�1K��/'� 3 2 1*��2� ÷�÷>ìU{WVD2WVD]hMî 1 � �K���Ô÷���T � M÷ ì� K� 1�1K�*1�2 3 2 1���2� ÷�÷>ì} 9_î ��9 } ,Ôî � �K���Ô÷���T � M÷ ì� K� ÷ 1K�K� � 3 2 1���2� ÷�÷ 1

q D]hMIC-!÷2l g D235D2j�-0kO-E35LºÑ´/23��@D23OAC/2XVL �À/&BnZ�-�k5Lp [VAvk5ACD2It,.XM-EL5L q 35XM- g D]3nD]j!-�k5-03nL 78/2j!ÒMX&k5-EJ g D]3nD]j!-�k5-03nL �À-0IKDuk5Ai|�-y?@353O/�3

ýEþ�ÿ ÷�� � ì� K� ì2ì<+VÓ + +K+ +�2K� V � . ÷��o ÷�� � +K� . 3� *1 +K�VÓ 32ì�1<�*2*1 V � . ÷��

q D]hMIC-�ì&l q 3OXM-yD2[VJs78/�j!ÒMX&kO->J g D]3nD]j!-�k5-03nLºh{6 o -0kOWM/&J�/]Ñ��À->J&XVB�k5Ai/�[

D][<J�LO/!h{6zB�XM35|2- �MkOkOAC[Mb!kOWV-�D2B0B0-0IC-03nDuk5Ai/�[S={Õ8-yÕQ/2XMIKJ»h<-�D2hMIi-ek5/!-EL½kOACj!D]kO-(�rXVL½k%ö�ýEþ�ÿ«õ ] ø �V3nLrk«ÕÀAikOWV/2X&kÀk5WM-�JMDuknD�/]ÑkOWV-_35/{BnZ�-�k>ú L �<[VD]I<kOACj!-.D2[VJ!Ò</ L½AikOAC/2[YÓ×~ÔL½AC[Mb !#"%$ ò N& ,³ö �, ,�ô&ø½Nu={Õ8-ÔB�/2j!ÒMXMkO- !#"%$ Ñ´3O/�jçkOWM-_kO3nD��r-EB�kO/�3O6G/]ÑêkOWV-35/{BnZ�-�k>=2XVLOAi[Vb�kOWV- �V[MAikO-eJ&Ai14-E3O-E[VB�-Ô-EL½kOACj$DukO->LÁÑ´35/2jçkOWV-_Z{[M/uÕÀ[­JMD]k5Dyk5/%-09{ÒV3O->LOL �!#"%$! = kOWM-.D2B0B0-0IC-03nDuk5Ai/�[�j$D2b2[MAikOXVJM-B�/�3O35-ELOÒ4/2[VJ&AC[Mb�ÕÀAvk5W�JMDuknD�D]k�kOACj�- � J Óp [�/]kOWV-03ÀÕQ/23nJML0=&ÕQ-eÕQD2[ kQk5/ �V[VJ#"Ãò�öxý þ ÿ õ ] ø8k5WVDuk«j!AC[MACj�ACï0->L

ð$ö�ý þ ÿ õ ] ø×ò N& ,(ö , ,âôMø N �ýEþ�ÿÀô] � � .

ö½÷>ì�ø

q WM->L½-eÒVD235D2j�-0kO-E35L�B0D2[zk5W{XVLÀh4-�B�/2j!ÒMXMkO-EJ�AC[VJ&-EÒ<-E[VJ&-0[ k«/2Ñtk5WM- �V[VD2ISJMDuknDG/2Ñtk5WM-y35/{BnZ�-�k>ú L8kO3nD��r->B�k5/23562Ó`�-_B�/�j�ÒVX&kO->Jö�ý0þ ÿÀõ ] ø×h{6!XVL½AC[Mbyk5WM-_[VIiAC[2�Mk�B�/2j!j$D][<J!/2[ o D]kOIKD]hSÓ q WM-«35-0IKDuk5Ai|�-«-03535/23Á/]ѯk5WM-_ÒVD235D2j!-�kO-E35L@ÕQD�L

IC-EL5LQkOWVD2[�÷$�z=ML½-E- q D]hVIi-%ìMÓ

% &(')'sËÁÌ+*yÏ @ Ç-,=DèË B_ÍÐ�ÏQÊ×ÐEÉ#.0/ @zλÊ2143¡Ì65�É ?çÐ�I B

T_D:|{AC[MbGD2[z->Lrk5Aij$Duk5-_Ñ´/�3yö�ý0þ ÿÀõ ] ø�={Õ8-eLO-ED]3nBnW$Ñ´/�3QÕ�D:6&LÁk5/GÒM35-EJMACB�kQk5WM- �V[VD]IêJMD]k5D�ö%$'&Võ)��&�ø�Ó��À-EBED]ICIx=&XM[<J&-03ÀRYD2j%h4-03Ok,.XMAKJMD][<B�-2=�kOWV-_|2-EIi/&B0Avkr6GACLºXMÒ4JVDukO->J$AC[zDeÑ´-0->J&hVD�BnZ�Ñ�D2LOWMAC/2[!k5/%D]ICAib�[$Ai[»J&Ai35-EB�kOAC/2[!/]ÑYJM-ELOAi35-EJ�|2-EIi/&B0Avkr6�h{6�D�B0B�-EIi-E35D]kOAC[MbkOWV-G35/&BnZ2-�k.Ai[�kOWM-!J&AC3O->B�k5Ai/�[/]Ñ@k5WM-�-0353O/�3 ���sò=� � ��-e=SD][VJ�ÕÀWV-0[\����AKLÔLOX2�­B�AC-0[ k5Ii6LOj$D]ICIx=4kOWV-�-E[Mb2AC[M-�B�X&knLÔ/21HÓU{AC[VB�-�k5WMACLe|uD2IiXM-!AKL./2h&knD]AC[M-EJ�Duke-ED�BnWkOACj!-$Lrk5-0Ò�XVL½AC[MbsXMÒ¯JMDuk5-EJ�Ai[MÑ´/235j!D]kOAC/2[�Ñ´3O/�j�kOWM-$ÒM35AC/23ÔkOACj�-$L½kO-EÒS=SAikeAKLe[M/2kAC[�k5XMAikOAC|2-.kOWVD]k�kOWMAKL�|uD]ICXM-yB�/2XVICJ»h<-ej!/&J&-0IC-EJ�D][<J­XVLO-EJ»/u|2-E3ºk5WM-ekO3nD��r->B�kO/�3O6�Ó�T_/uÕ8-E|2-E3E= Ii/{/�Z AC[Mb�DukÀÒMIC/]k5L�/2Ñ ���$/u|2-E3kOACj!-�Ñ´/�3Ô|uD]35AC/2XVLÀ35/&BnZ2-0k_ÒVD]3nD]j!-0kO-03nL«LOWM/uÕ«LÀk5WVDuk.Ò<-E3OW<D]ÒVL_AvkyB�/�XMICJsh4-�j�/&J&-EIi->Jû XVD2JM35D]kOAKB0D]ICIC62=V/�3_h{6sL½/�j�- �Mk½k5Ai[VbÒ4/2IC6 [V/2j!ACD2IxÓq WM-eAKJ&-ED�Ñ´/�3�kOWMAKL«D]ÒVÒM3O/ D2BnW»ACL�k5/$Ai[ k5-035Ò</�ICD]kO-.kOWM-y-E3O35/23 ���4=<D][<J­k5/$XVL½-ek5WM-yAC[�k5-035Ò</�ICD]kOAC[Mb�Ñ´XM[VB�kOAC/2[�kO/$ÒM35-EJMACB�k

B�XMk./21âkOACj�-!D][<JIC/{BEDuk5Ai/�[SÓ�FM35/2j�k5WMACL.AC[&Ñ´/235j$Duk5Ai/�[S=¯Õ8-!B0D2[�ÒM35-EJ&AKB�keÕÀWM-E3O-!D][VJÕÀWM-0[�kOWM-�35/{BnZ�-�k.ÕÀAiICI×ICD2[VJ�Ñ´35/2jhVD�L½AKB.ÒMW{6{LOAKB0LEÓFM35/2j kOWV-.JMD]k5D­ö�$ J õ�( J õ�� J øLMJ O ; = ÕQ-ÔBED][Sú kQD2ÒMÒM35/:9&Aij$Duk5-�ö1��� &� õ ��� , õ�� J øLMJ O ; -�9MD2B�kOIC62= h<->B0D]X<L½-ÔZ [V/uÕÀAi[Mb-���$D2LºÕ8-EIiID2L_Z{[M/uÕÀAi[Vb �»ÕQ/2XVICJ�j!->D][ÕQ-2=¯D2LÔD][/�hVL½-E3O|�-03>=MÕQ/2XMIKJ�-�9MD�B�kOIC6�Z [V/uÕ � - =Vk5WM-�RtD]jGh<-E3½k6�Á-EIi/&B0Avkr6�=4ÕÀWVACBnWÕQ-�WVD:|�-

[M/!Õ�D:6­/]ÑHZ [V/uÕÀAi[Mb<ÓÁFtACb2XV3O- !L½WV/uÕ«LQkOWM-�B0/2j!ÒMX&k5-EJ»3O->L½XMIik5LQÑ´/�3ÀkOWV- �V[VD]IêÒ<D]3nD]j!-�k5-03nL8X<L½AC[Mb�kOWMAKLÀj!-�k5WM/&JêÓq /zb2-0k.D23O/�XM[VJ�kOWVACLE=êÕ8-GXVLO-%/]k5WM-03ÔÑ´-ED]kOXM35-ELÔ/]ÑÁk5WM-�ÒM3O/�hMIC-0j kOWVD]kÔÕQ-GJ&/�Z{[M/uÕyÓÔ~ÔL½AC[MbzkOWM-!D]ÒMÒV3O/:9&ACj!D]kOAC/2[sÑ´/�3

D2BEB�-EIi-E35D]kOAC/2[S=HÕ8-�B0D2["B�/�j�ÒVX&kO-»AvknL�JMAi35-EB�kOAC/2[S=×ÕÀWMAKBnW³AKLGD2ICLO/�kOWV-�L5D]j!-»JMAi35-EB�kOAC/2[³DukGkOWM-sB�/�3O35-EB�kOAC/2[a|2-EIi/&B0Avkr6 ���<=D]IikOWV/2XMb�W�Õ8-GWVD:|2-y[M/»ÕQD:6»/]Ñ@Z{[M/uÕÀAC[MbzD2[ 6 k5WMAi[VbzD]h4/2X&kÔkOWM-Gj$D]b2[VAvk5XVJ&-2Ó.FM35/2j -�9&Ò4-035Aij!-0[ knD]ISk53OAKD]IKLE=<ÕQ-%/�hVL½-E3O|�-EJkOW<DukÀkOWVACL«D2[Mb2IC-eACLÀB0/2[VL½k5D2[ kQÑ´/�3«D]ICI¯kOACj!-�J&XM35Ai[Mb!k5WM-.kOWM35XVL½kOAC[Mb$Ò4-035Ai/&JêÓ

� IKLO/V=4Õ8-�J&/zZ{[M/uÕØkOW<DukyD2LÀk5WM-GkOACj�-G/]ÑÁk5WM-%Ö<Aib�W�k.b2/{-EL«kO/»hMXM35[M/2X&kÔkOACj!-2=êhMX&keL½AC[VB0-����D]ÒMÒM35/�D�BnWM-EL«ï0-035/V=<kOWV-D][Vb2IC-�h<-0krÕ8-E-0[sk5WM-%|�-0IC/{B0Avkr6�D][VJ �-�sLOWM/2XMIKJ�D]IKLO/$b2/$k5/­ï0-E3O/<Ó q WV-%[M-09 k.ACJM-ED$AKLÀk5/zB�/�j�ÒVX&kO-%kOWM-G|2-0IC/&B�Aikr6»D2[VJskOWV-J&AC3O->B�k5Ai/�[$/]ѯkOWM-.D2BEB�-0IC-03nDuk5Ai/�[G/2ѯkOWM-.JMDuknDyÒ</�Ai[ knL0={D][VJ�k5WM-0[­kO356%k5/%B�/�j�ÒVX&kO-_D2[$-�9&ÒM35-EL5L½AC/2[�Ñ´/�3@kOWVACL8D][Mb�Ii-Àk5/�Ò4/2IC6 �VkD][<J»D2ÒMÒM35/:9{ACj$Duk5-_k5WM-y3O/{/2k5LQkO/�k5WMACL«-�9&ÒM35-EL5L½AC/2[»b2-�k«hVXM3O[V/2X&k�kOACj!-2ÓT«/uÕQ-0|�-03>=VLOAi[VB0-%LOW XMkÔ/]1�kOACj!-%AKL«[M/2kÔ-09&D�B�k5Ii6�ÕÀWM-0[\�-��-Eû XVD2ICL_ï0-E3O/<=VhMX&keDukeL½/�j�-GL½X2�­B0Ai-E[�k5Ii6�LOj$D]ICISkOACj!-2=4kOWV-

D][Vb2IC-.AKL�[M/2k_L½X2�­B0Ai-E[�k5Ii6»B�IC/�LO-Ôk5/$ï0-E3O/<Ó

/

Page 64: Industrial Mathematical and Statistical Modeling Workshop for

0 50 100 150 200 250 300 3500

200

400

600

800

1000

1200

1400

1600

t

del v

actual delv in x

actual delv in y

computed delv in x

computed delv in y

FHACb2XM35-�ì&l��Á-0IC/&B�Aikr6z-E3O35/23 ���­/u|2-E3QkOACj�-yAC[aö%$Yõ)(MøÀJ&AC3O->B�kOAC/2[<L0Ó

78/2j!ÒMX&k5-EJ � B�kOXVD2I �À-EICD]kOAC|2-y?@353O/�3� þ - 1<+MÓC÷>ì 1K�MÓ 1&÷ � � . �M÷��& M÷>ìMÓ 3<� M÷ ��� . �K�V÷$'& ì<+ +�/ K�MÓ �M÷ �K� �K�K� ��� . �K� ì

q D2hMIC- Mlº78/2j!ÒMX&k5-EJ g D]3nD]j!-0kO-03 �À->L½XMIik5L«XVL½AC[Mb­U{W{X&kOî �Ô/uÕÀ[ g 35-EJ&AKB�k5/23 o -�k5WM/&J»Ñ´/23 �V[VD2ISJMDuknD

0�Ú½Ù � 2 2ÔÞMÝ@áÁßVÛ ��� ��Û_Ü��$ä�ã{ãMä½å«æ ��ÜêãMÛ«Ý��� [�D]IikO-E3O[VD]kOAC|2-%D2ÒMÒM35/�D�BnW»k5/ �V[<J&Ai[Vb­kOWV-G35/&BnZ2-�k>ú L_ÒVD]3nD]j!-0kO-03nL�öxý0þ ÿÀõ �ó�õ½ó � õ)$+&Võ�(*&Võ���& ø_D2[VJsk5WM-%-E[�k5Ai35-�Ö<Aib�W�k.ÒVDuk5WÑ´35/2j JMDuknDe/]ÑSDeÒ</�3½k5Ai/�[!/]ѯkOWM-«kO3nD��r-EB�kO/�3O6%Ñ´3O/�j �@ò��.k5/ �Áò��5S�AKL×k5/ykO35-ED]k@kOWM-_ÒM3O/�hMIC-0j D�L×kOW<Dukº/2ÑêD�[M/2[MICAC[M-ED23ÁIi->D2L½kL5û�X<D]35-EL@ÒV3O/�hMIi-Ejs=�ÕÀWM-E3O-ÔJMDuknDzö%$�J½õ)(KJ¢øLMJ O ; AKL �Mk½k5-EJ­ÕÀAikOWzkOWM-Ôj!/{JM-0I��×ö�ý0þ ÿÀõ��ó�õOó � õ�$+&Võ�( &Mõ)��&Mõ)�½ø×kOW<Duk�ACL8[M/2[VIiAC[M-ED23ºAC[��Óy`�-!BnWV/ / L½-GDz35-ELOACJMXVD]ItÑ´XM[VB�kOAC/2[�� J ö%�½ø_ò��×ö "�õ��½ø � ö%$ J õ)( J ø�=SD][VJkO356�kO/sBnWM/{/�LO--" ò ö�ý þ ÿ õ �ó�õOó � õ�$ & õ)( & õ)� & ø.LO/kOW<Duk�kOWM- �Mk�AKLÀD2L�B�IC/�LO-ÔD�L8Ò4/�L5LOAihMIC-ÔkO/�j�AC[MACj!AiïE-Ôk5WM-yL½XVj /]ÑSk5WM-eIi->D2L½k�L5û XVD]35-EL8/]ÑYkOWM-e35-ELOACJ&X<D]I¨Ó p [»/]k5WM-03�ÕQ/23nJML0= ÕQ-Õ�D][ k�kO/!j!Ai[MACj!AiïE-

� ö "�øºòM�J O ;ö�×ö "õ)� J ø � ö�$ J õ�( J ø½ø N ö½÷ ø

/u|2-E3QkOWM-yÒVD235D2j!-�kO-E35L " òèö�ý0þ ÿÀõ �ó�õOó � õ�$+&<õ)(*&&õ)��&�ø�ÓFHAi3nL½kE=&ÕQ-ekO356$kO/$35-EJ&X<B�-.kOWM-y[{XMjGh<-E3«/]Ñ×|uD]35ACD2hMIC-ELEÓ } /]k5-Ôk5WVDuk.L½AC[VB�-.k5WM-ekO3nD��r->B�kO/�3O6$-E[VJMLÀÕÀWV-0[�kOWM-yj!AKLOLOACIi-yWMAik5LkOWV-�b�3O/�XM[VJê=êÕQ-�Z [V/uÕ�kOWVD]k (*&�ò3�VÓGFVXM3½k5WM-035j!/235-2=êÕQ-�D2ICLO/�Z [V/uÕ�kOWVD]keÑ´3O/�j�kOWM-!j$D]b�[MAvk5XVJ&-!/]Ñ8kOWM-$D�B0B�-EIi-E35D]kOAC/2[J&XM-ek5/�kOWM-ek5WM3OX<Lrk5Ai[Mb!Ñ´/�35B0-2=

!#"%$ ö%�½ø@òý0þ ÿ�ô] � �

õaÕÀWM-035- ] ò ó ��ó . ö½÷ /{ø

U{AC[VB�-«kOWMAKLÁAKLÁkOWV-«/2[MIC6%IC/&B0D]kOAC/2[!/]ѯ/&B0B�XV3O35-0[VB0-�/]Ñêó � D2[VJ �ó AC[!kOWM-ÔB�/2j!ÒMXMk5Duk5Ai/�[!/]Ñ4kOWM-Àk535D<�r-EB�kO/23562=2Õ8-«BED][$B�/�[VL½AKJ&-E3kOWV-0j B�/�XMÒMIC-EJ�D][<J�L½/�Ii|�-yÑ´/23 ] D2Ii/�[M-2Ó#T_-0[VB0-2=4Õ8-�35-EJMXVB�-yk5WM-%ÒV3O/�hMIi-Ej Ñ´/�3_j!AC[MAij!ACï0AC[Mbz/u|�-03À/�[MIi6»Ñ´/2XV3_|uD]35ACD2hMIC-EL "" òèö�ý0þ ÿÀõ ] õ�$+&Võ���& ø�Óq /sLO/2IC|2-GkOWVACLE=YÕQ-$XVLO-Gk5WM-$hMXMACIik%Ai[aLO/2IC|2-03e[MICAC[2�Mk%AC[ o Duk5ICD2hSú L�U{k5Duk5ACL½kOAKB0L q /{/2ICh4/:9¯=êÕÀWVACBnW �MknLy[M/2[VIiAC[M-ED23eIi->D2L½k

L5û�X<D]35-ELÀJMD]k5D!h{6»D­,eD2XVLOL½î } -0Õ�k5/2[�j!-�kOWV/{JSÓ q /$B�/2j!ÒMXMkO-yk5WM-�|2-EIi/&B0Avkr6zD2[VJsD2BEB�-0IC-03nDuk5Ai/�[»/2Ñtk5WM-�35/&BnZ2-�k«J&XV3OAC[Mb!kOWV-Z{[M/uÕÀ[�Ò</�3½k5Ai/�[»/2Ñtk5WM-�JMDuknDM=&ÕQ-eXVL½-yLO-EB0/2[VJ»/23nJ&-03«B0-0[ kO3nD]I �V[VAvk5-�J&Av1¯-035-0[<B�AC[MbVÓ

1

Page 65: Industrial Mathematical and Statistical Modeling Workshop for

FHACb2XM35- Ml g /2IC6 �MkÀ/]Ñ q 3nD��r->B�kO/�3O6!XVLOAC[Mb!kOWM-%U&W XMk½î �_/uÕÀ[ g 35-EJ&AKB�k5/23 o -0kOWM/&J»Ñ´3O/�j kOWV- } 9_î ��9 } ,Ôî¢P�35/&BnZ2-0k

p [VAvk5ACD2It,.XM-EL5L q 35XM- g D]3nD]j!-�k5-03nL 78/2j!ÒMX&k5-EJ�U{/2ICX&k5Ai/�[$+& /��Ô÷��2c 6�Ô÷���c . ÷ 2��Ô÷���c� & ì<+ � M÷ �ìM÷ý0þ ÿ ì<� � ì� K� ì� �o �M÷ +K�VÓ 32ì�1&÷ + �MÓ 3]ì�1

q D2hMIC- /<l �À-ELOXMIik5LÀ|{AKD�j�-0kOWM/&J�÷.Ñ´3O/�j D][ p [MAikOAKD]It,.XV-EL5L

FM/�3ykOWV-­Ai[VAvk5ACD2I8b�XM-EL5L " ò ë /'�K� �K�K�Võ5ìK+K�Mõnì<� �Mõ 1<+K�<��� 2� � 3�2*1uí¨=êAC[³B�/�[�k535D�Lrkyk5/�kOWV-zD2B�kOXVD2Iºj!/&J&-0I8ÒVD]3nD]j!-�k5-03nLë K�K� �K� �Mõ) V÷ Võ5ì� �Mõ 1�+K�<� � 2* � 3 2 1:í¢=�ÕQ-eb2-�k�k5WM-yÒM35-EJ&AKB�k5-EJ»kO3nD��r->B�kO/�3OAC-EL�LOWM/uÕÀ[�Ai[ �Vb2XV3O- /VÓ

� IikOWV/2XMb�W»k5WM-y35-ELOXMIvknL«J&-Ej�/�[VL½kO3nDukO-yD!35-0IKDuk5Ai|�-Ô-E3O35/23À/2ÑÁì � 1�� Ñ´/23�kOWV-%B�/�j�ÒVX&kO->J»|uD2IiXM->LÀ/]ÑÀö%$'&<õ���& ø�=&k5WM-035-�D]35-L½kOACIiI×D$B�/�XMÒMIC-�/]Ñ@ÒM35/2hVIi-Ej!LEÓ 9Ô[M-�ÒM35/2hVIi-EjÃAKLÀk5WVDuk_ÕÀWVAiIC-�k5WMACLÔj$D:6zh4-GD2[D2B0B0-0Ò&knD]hMIC-ekO/�Ii-E35D2[VB�-eÑ´/�3_j$D][{6�ÒMW{6&L½AKB0D2IÒM35/2hMIC-0j$LE=]Aik×AKL×[M/2kÁÑ´/23×3O/&BnZ�-�k5LEÓ q WV-�JMAv1¯-035-0[VB0-�/]Ñ=�<[VD]IMÒ4/�LOAikOAC/2[S=]ÕÀWVAiIC-�/2[MIC6%D]h4/2XMkÁDyì �Ø3O-EICD]kOAC|2-Q-0353O/�3E=uBED][!Lrk5AiICIMh4-kOWV/2XVL5D][VJVL@/]Ñêj!-�k5-03nL0={LO/ykOWMAKL8Õ8/�XMICJ![V/]kºh4-ÔD2[­D2B0B0XM3nDukO-ÀÕ�D:6�k5/%->Lrk5Aij$Duk5-«ÕÀWM-E3O-_D�3O/&BnZ2-0k@ÕÀAiICI4ICD2[VJ!Ai[zÒM35D�B�k5ACBED]ICAvkr6�Ó

� [V/]kOWV-03�ÒM3O/�hMIC-0j�AKL�k5WVDuk!b2/{/&J³35-ELOXMIik5L�J&-0Ò4-0[VJ"/2[§b2/{/&JâAi[VAvk5ACD2IQb�XM-EL5LO-ELE=ÁÕÀWMACBnW§j$D:6�[V/]k!h<-sD:|uD]ACICD2hMIi-zAi[+D35-ED]ICAKLrk5ACBeLOAvk5XVDuk5Ai/�[SÓ

0�Ú�� ��Û«Ü�)(ä��«ä)�«Ü áÁå � ��ݺå�� �_ÜSÞ��$ä�ã{ãMä½å«æç��ÜêãMÛ«Ý��FM35/2jkOWM-D2h</u|�-�J&ACL5B�X<LOLOAi/�[+D2[VJ+j!-�kOWV/{JVL0=QkOWV-[VDuk5XM3nD]I«û XM-EL½kOAC/2[+k5/³D2LOZ³ACL­ÕÀW 6§Õ8-JMACJ&[Yú kzX<L½-sk5WM-�-EL½kOACj$DukO->JÒVD235D2j�-0kO-E35L�Ñ´/�3Göxý þ ÿ õ ] ø«B�/2j!ÒMXMkO-EJAi[->û XVDuk5Ai/�[³÷>ì$D2[VJ�kOWV-0[�LO/2IC|2-ek5WM-�[M/2[MICAC[M-ED23_IC-ED2L½kÔL5û XVD]35-ELÀÒV3O/�hMIi-Ej ÷ !Ñ´/�3kOWV-y3O-Ej!D2Ai[VAi[Mb!ÒVD235D2j!-�kO-E35Leö�$ & õ�� & ø�Óp keACLyB0Ii->D]3.Ñ´3O/�j �Vb2XV3O-�1$k5WVDukek5WM- �Mk½k5Ai[MbsAKL.b2/{/&J-0[V/2XMb�W�[M->D]3.kOWM-!Z{[M/uÕÀ[�JMDuknD�ö´kOWM-!35-0IKDukOAC|2-�-03535/23_AKL!÷����ê;r\

Ò4/2AC[�krÕÀAKLO->ø�=�hMX&k8D�L×kOACj!-«AC[VB�35-ED�L½->L0=uk5WM-«ÒM35-EJMACB�kO-EJ�k535D<�r-EB�kO/2356yÑ�D2IiIKLÁ/]1»Ñ´3O/�j k5WM-_D2B�kOXVD2IM/2[V-2=�D2[VJ�kOWV- �V[VD2IM-EL½kOACj$DukO->JÒVD235D2j�-0kO-E35LÔD]35-�[M/]kÔkOWVD]k.b�/{/{JSÓ p [�k5WMAKLÔ-09MD]j!ÒMIC-2='$ & � ÷ . 2*1K+�� 1!D][VJ � & � ì� �ì . 3�3&=tö�ÕÀWM-035-�k5WM-�D�B�k5XVD]IH|uD]ICXM-EL.D]35- ]-�÷���D][VJ V÷ %Ñ´/23�$+&­D][VJ��&!35-ELOÒ<->B�kOAC|2-EIi6Mø�ÓT«/uÕQ-0|�-03>=]Ñ´/�3QD%h<-0k½kO-E3�Ai[MAikOAKD]I¯b�XM-EL5L_ë � �K�K� �Mõnì<�K�]í¨=�2GL½WV/uÕ«L@k5WVDuk8kOWM- �Mk�ACLQh4-�k½k5-03>Ó@U{XM35ÒM3OAKLOAi[Mb�Ii6�=�6�/2XSú ICI4LO-0-.ÕÀAvk5W

kOWV-�LOD2j�-eAC[MAikOAKD]ISb�XM-EL5LÀD2LQ/2XV3 �V35L½k«/2[V-2={kOWV- �Mk«ÕQD�L�h<-0k½k5-03ÀÑ´/�3�kOWM- �V3nL½k�j!/&J&-EIxÓU k5XVJ&6{Ai[Vb%k5WM-_k535D<�r-EB�kO/235AC-ELºB0/2j!ÒMX&k5Ai[VbGXVLOAC[Mb�k5WMACL�j!/&J&-EIx={ÕÀWMAKBnWzIC/{/2Z$B�/�j�ÒVIi-0kO-0IC6­J&Ai14-E3O-E[ kE={Ii->J$k5/%k5WM-.û XM->Lrk5Ai/�[

/]ÑtXM[MAKû�XV-0[M->LOL "{ACL8kOWV-Ô35/&BnZ2-�kºkO3nD��r-EB�kO/�3O6�XM[MAKû XM-_k5/GAik5LQb�Ai|�-0[zJMD]k5D� q WM-.D2[VLOÕ8-E3E={DuÑ�k5-03�L½/�j�-.D][VD2Ii6&LOACLE={L½WM/uÕQ-EJ$kOWVD]kkOWVACLÀAKL�[M/2kÀkOWM-�BED2LO-2Ó

2

Page 66: Industrial Mathematical and Statistical Modeling Workshop for

0 1 2 3 4

x 105

−2

0

2

4

6

8

10x 10

4

x

yy from dataprediction of yy from computation

0 100 200 300 4000

0.5

1

1.5

2

2.5

3

3.5x 10

5

t

x

x from dataprediction of xx from computation

0 100 200 300 400−2

0

2

4

6

8

10x 10

4

t

y

y from dataprediction of yy from computation

FHAib�XM35- /<lÁ78/�j!ÒVD]35ACLO/2[G/]Ñ g 35-EJ&AKB�k5-EJ �ÔD]k5DyD2[VJ � B�k5XVD]I.�ÔD]k5D.Ñ´/238FtICACb2W k q 3nD��r->B�k5/2356yXVL½AC[MbyFHAvkOkOAC[Mb o -0kOWM/&JêÓ@,.3nD]ÒMW<LJ&-EÒMACB�k_D]IikOAikOXVJM-2=Mb�3O/�XM[VJzkO35X&kOWS=<D][VJ»|2-EIi/&B�Aikr6$Ñ´35/2j IC-�Ñ�k_kO/!3OACb2W k

� ?A@CB�È�G�G!1 DØÌ���B«ÍÉCB���� Ì���Ê @CBµÈSÉ���B_Ë� B DèËÁÌ�F:G B I

q WM-eAC[ |�-03nL½-ÔÒM35/2hMIC-0j AC[{|2-EL½kOACb�D]kO->J$WM-035-./]ÑHB0/2j!ÒMX&k5Ai[VbGÖ<Aib�W�k�ÒVD235D2j�-0kO-E35LºÑ´3O/�j ÖVACb2W k«JMDuknD»ö�$ JOõ�(KJ½õ��LJ¢øLMJ O ; kOXM35[VLQ/2XMkkO/»h<-�ACIiIiî¨Ò4/�LO-EJSÓ�T«AC[ k5L./]Ñ@ÕÀW{6�kOWVACLÔj$D:6�h4-!L½/­W<D:|2-�Ò4/2ÒMÒ4-EJ�XMÒ�AC[�k5WM-�B0/2j!ÒMX&k5Ai[Vb»D]ÒVÒM3O/ D2BnWM->L0=4L½X<BnW�D2L«kOWM-GÑ�D2B�kkOWV-�J&Ai35-EB�kOAC/2[�/]Ñ7���­ACLÀB0/2[VL½k5D2[ kQÑ´/�XM[VJ�AC[sD]k½k5-0j!Ò&kOAC[Mb­D2ÒMÒM35/�D2BnW�÷2Óq /yÒM3O/u|�-ÀÕÀW{6 ����ACL8B�/2[<LrknD][ kE=�Õ8-«IC/{/2Z!Duk@k5WM-_D2[Mb2IC-_DukºkOWM-��V35L½k@krÕQ/ekOACj!-ÔL½kO-0Ò<L0= ÕÀWM-0[��@ò ��D][VJ!ÕÀWV-0[��@ò � �

kO/$J&-E3OAC|2-yD2[»-09{ÒV3O->LOLOAi/�[zÑ´/�3�kOWM-�B0/2[VL½k5D2[ k�D2[Mb2IC-2=MD2[VJ»kOWM-E[�X<L½-eAC[VJ&XVB�kOAC/2[SÓFHAi3nL½kE=&IC-�kEú LÀÒM35/u|2-ÔkOW<DukÀkOWV-�J&Ai35-EB�kOAC/2[�/]Ñ �-�zACL«D!B0/2[VL½k5D2[�k>Ó

÷�Ó � kQk5Aij!- �@ò �V=

��� & ö1��ø�ò � & -yö1��ø � � & ö1��ø@ò$+&� & õ

��� , ö�� ø+ò � , -yö���ø � � , ö1��ø@ò÷ì:ô ��& .

�_- �V[M-�%ò(B0/]k �S;uö & �������; � N $���� ø@ò B�/2k �ê;]öN &��$���� � ø�Ó

U{Ai[<B�- !(")$ J&-0Ò4-0[<JMLÀ/2[�k«/�[MIi6�=&Õ8-yBED][�ÕÀ3OAikO- !#"%$ ö%�½øºò ö%�½ø�Ó � k �@ò �M= ö�� ø@ò�ý���� �ºô � ] Ó q WV-0[

& ö1��ø§ò ö1��ø&B0/�L��, ö�� ø§ò ö�� ø&LOAi[� � ô .

q WM-035-�Ѵ/�3O-�=

� , ö � �½ø�ò � , ö1��ø , , ö�� øN� �Áò ö�LOAi[� � ô&ø5� �$Hö1� �½ø�ò $tö1��ø , � & ö1��øN� � ,(÷ �]ì �& ö���ø5� � N ò�÷ �]ì B0/�L� � � N(êö � �½ø�ò ÷ �]ìMö LOAC[� � ô&ø5� � N

3

Page 67: Industrial Mathematical and Statistical Modeling Workshop for

0 1 2 3

x 105

−2

0

2

4

6

8

10x 10

4

x

y

0 200 400

0

0.5

1

1.5

2

2.5

3x 10

5

t

x

0 200 400−2

0

2

4

6

8

10x 10

4

t

y

y from dataprediction of yy from computation

x from dataprediction of xx from computation

y from dataprediction of yy from computation

FHAib�XM35- 1Ml×78/2j!ÒVD23OAKLO/2[%/2Ñ g 3O->J&ACB�kO->J �.Duk5DeD2[VJ � B�k5XVD]I �.DuknD_Ñ´/�3@FHIiACb2W k q 3nD��r->B�k5/2356.X<L½AC[Mb �À-EJ&X<B�kOAC/2[!/2Ñ g D]3nD]j!-0kO-03nLFHAvkOkOAC[MbVÓ

ìMÓ � kQk5Aij!- �@ò�� ��=

� & - ö1� �½ø�ò$ & � $tö � �½ø��& � � �

ò $ & � ÷ �2ì B�/�L� � �rN��& � � �

� , -�ö � �½ø+ò ÷ �]ì:ôêö%��& � � �½ø �(êö1� �½ø� & � � �

ò ÷ �]ìuô¯ö���& � � �½ø �ö L½AC[ � ô&ø5� � NìMö�� & � � �½ø

��� & ö � �½ø��� , ö � �½ø

ò $ & ,(÷ �]ì B0/�L� � �rN � B�/�L�'� � � &÷ �2ì:ô � N & , L½AC[ � ��ör÷ �]ì � � � ��& ø

ò � B0/�L ,��³B0/�L � LOAi[ ,��³LOAi[ ò(B�/]k �ê;

ÕÀWM-035- � ò�� &�� � N& - , � N, - D][<J��èò � ��ö½÷ �2ìK� � � � & ø�Ó VÓ � L5L½XMj!AC[Mb�k5WM-eJ&Ai35-EB�kOAC/2[­/2Ñ����!ACL�B�/�[VL½k5D][ k8Ñ´/23�� V QL½kO-0Ò<L0Ó � k �Áòèö Q ,"÷>ø5� ��=�Ñ´/�IiIC/uÕÀAC[Mb�k5WM-.L5D]j!-.Lrk5-0ÒS={ÕQ-WVD:|2-

��� & öOö� ,(÷>ø5� �½ø��� , öOö� ,(÷>øN� �½ø ò

��� & ö� � �½ø ,���B0/�L ��� , ö� � �½ø!,���LOAi[ ò(B�/]k �ê; . ö½÷ 1�ø

FM/�IiIC/uÕÀAC[Mb�D]ICI8kOWM-sLrk5-0ÒVL�ICACL½kO->J³D]h4/u|2-�=Á62/�XSú ICI�L½-»AvÑ�$'& � �rN & ACL �M9&-EJê=8D2LGÕ8-EIiI�D2L�ý0þ ÿ D][<J ] =@J&Ai14-E3O-E[ k�35/&BnZ2-�knLÕÀACIiI�WVD:|�-$kOWM-sL5D]j!-­k535D<�r-EB�k5/235Ai->L�h4-�Ñ´/�3O-�D][{6a/]ÑÀkOWV-0AC3�-E[Mb2AC[M->LGLOW{X&k�J&/uÕÀ[YÓ q /k5-EL½kE=@IC-�k $+&³ò� �K� �K�K� ��� . 2�/�D2[VJ��&%ò M÷ ��� . �%D][VJ �M9�ö�ýEþ�ÿ«õ ] ø�=&6�/2XSú ICI¯LO-0-ÔkOWV-ÔkrÕQ/�3O/&BnZ2-0k5L8WVD:|2-ÔkOWV-eL5D]j!-ÔkO3nD��r->B�kO/�3OAC-EL8h4-�Ñ´/235-eL½W{X&k«J&/uÕÀ[zkOACj�-yD]k�@ò 1� GLO-EB0/2[VJMLEÓq WM-E3O-0Ñ´/235-2={kOWV-%L½/�IiXMkOAC/2[»kO/$k5WM-yAi[{|�-03nL½-eÒM35/2hMIC-0jÃACL«[M/]k_XV[MACû XM-�Ó T_/uÕ8-E|2-03>={kOWV-�L½/�Ii|�-03ÀÒM35/u|2->LQkO/$h4-�|2-E3O6­b�/{/{JSÓ

q WM-�ÒVD235D2j!-�kO-E35L�/�h&k5D2Ai[M->J+h 6"[MIiAC[2�Mk�D]35-aë $ & õ)� & õ5ý þ ÿ õ ] í%ò ë V÷ 2M÷ 3�� . �V÷ 2Mõ �ìM÷ . �ì�+Võ5ì � . � �&õ5+ � . 3]ì�1uí¨=tÑ´3O/�jµÕÀWMAKBnW$ & � �rN & ò� . �*2�ì&÷�+� V÷ / ì 3]ìK�<��ì{=EÕÀWMACIi- $ & � �rN & B0/2j!ÒMX&k5-EJGÑ´3O/�jèk5WM-_D�B�kOX<D]IMj!/&J&-0I<ACL K�K� �K� � � V÷ �N8ò� . �*2�ì&÷�+� V÷ / ì 3]ìK�K�2ìMÓ

Page 68: Industrial Mathematical and Statistical Modeling Workshop for

0 1 2 3

x 105

−2

0

2

4

6

8

10x 10

4

x

yy from data

prediction of y

y from computation

0 100 200 300 4000

0.5

1

1.5

2

2.5

3x 10

5

t

x

x from data

prediction of x

x from computation

0 100 200 300 400−2

0

2

4

6

8

10x 10

4

t

y

y from data

prediction of y

y from computation

FHAib�XM35- 2Ml×78/�j!ÒVD]35ACLO/2[�/]Ñ g 35-EJMACB�kO-EJ�D][<J � B�k5XVD]I<FtICACb2W k q 3nD��r->B�k5/235Ai->LtX<L½AC[Mb �À->J&XVB�k5Ai/�[$/]Ñ g D]3nD]j!-�k5-03nL×FtAik½k5Ai[Vb�ÕÀAvk5WD][�ACj�ÒV3O/u|�-EJ»Ai[MAikOAKD]Iêb�XM-EL5L

FHAib�XM35- 3&l � [Mb�Ii-��â/2Ñ q WV3OXVL½kÀÕÀAvk5Ws3O->L½Ò4-EB�kÀkO/�kOWM-%,.35/2XV[VJ

� & É & GnÊ B«Ë@É#*.Ê×Ð���B & ')'sËÁÌ-*yÏ @

FM35/2j /2XM3y[{XMj!-E3OAKB0D2I@-�9&Ò4-035Aij!-0[ knL0=tÕ8- �V[<J�/�X&k�k5WVDukyk5WM-zJ&AC3O->B�k5Ai/�[�/2Ñ8k5WM35XVLrk%JM/ ->Ly[M/2k%BnWVD][Mb�-2Ó p [�k5WMACLGB0D�L½-�=YAikAKL«-ED�L½6zkO/zLO-�kÔXMÒskOWM-%j!D]kOWM-Ej$DukOAKB0D2IYj!/&J&-0ItÕÀAvk5WM/2XMk_XVLOAi[Mb»RYD]jGh<-E3½ke,.XMACJVD][VB0-yj�/&J&-EIx=4ÕÀWMAKBnW�j!-ED][<LÀÕ8-GB0D][ �<[VJkOWV-�kO3nD��r->B�k5/2356»/2Ñ×j!AKLOLOACIi-%Ñ´3O/�jÃk5WM-GJM-ELOAib�[M-03.L½AKJ&-%ÕÀAikOWV/2X&k.[M-0->J&Ai[Vb­kO/�L½/�Ii|�-ekOWM- 9 �.?(L½6&L½kO-Ej�Ó T«-E3O-�b�35D:|{Aikr6»D2[VJkOWV3OXVL½k_D]35-ÔkOWM-y/�[MIi6zÑ´/23nB�->L0Ó � BEB�/23nJ&AC[Mb�kO/ } -EÕ�kO/�[Sú L_RYD:Õy={kOWV-�D2B0B0-0IC-03nDuk5Ai/�[VL8D23O-

�& òý�����ô] � �

B0/�L��

, òý����sô] � �

LOAi[�� � ô¯õ

ÕÀWM-E3O-��»AKLtkOWM-�D][Vb2IC-ºh4-�krÕQ-0-0[�kOWM-8kOWM35XVL½k×D2[VJ�b�3O/�XM[VJêÓ p [�k5-0b�35D]kOAC[MbÀkOWV-QD2h</u|�-º-Eû XVD]kOAC/2[VLYÕÀAvk5W%k5WM-QAi[MAikOAKD]I&B0/2[VJ&AikOAC/2[<LkOW<DukÀkOWV-y|2-0IC/&B�AikOAC-ELÀD23O-.ï0-E3O/$Duk�k5WM-yh<-Eb2AC[M[MAC[Mb$b2AC|2->L

� & ö1��øºò��Mõ� , ö�� ø@ò � .

+

Page 69: Industrial Mathematical and Statistical Modeling Workshop for

`�-y/2h&knD]AC[zk5WM-y-Eû XVD]kOAC/2[VL�/2Ñt|2-EIi/&B0Avkr6�=

� & ö%�½ø�ò B�/�L � �Qý����-�ºô8Ii[]] � �

õ

� , ö%�½ø�ò(LOAi[ �#�Qý����-�ºôºIC[]] � � �

ô � . ö½÷ 2 ø

p [ kO-Eb23nDukOAC[Mbyk5WM-yD]h4/u|2-«->û XVDuk5Ai/�[VL�D]b�D2Ai[zÕÀAvk5W­kOWV-.AC[MAikOAKD]I¯Ò4/�LOAvk5Ai/�[VLQ/]ÑYj!AKLOLOAiIC-!ö��Mõ ��ø�= ÕQ-_/�h&k5D2Ai[zkOWM-.Ò4/�LOAvk5Ai/�[VLQ/]Ñj!ACL5LOAiIC-

$Hö%�½ø�ò�B0/�L��#�Qý����-�ºôêö½ö%� � ] ø{IC[]] � �

, �½ø�õ

(êö%�½ø�ò L½AC[�� �Qý����-�ºôêö½ö�� � ] ø{IC[]] � �

, �½ø �÷ì ô � N . ö½÷ 32ø

� ICI×D]h4/u|2-%->û XVDuk5Ai/�[VL_J&->LOB03OAChMAC[Mb­k5WM-GkO3nD��r-EB�kO/�3O6�/]Ñ@j!AKLOLOAiIC-%AKL.XMÒk5/­k5WM-�LOW{X&k.JM/uÕÀ[�kOACj!- �½þ -yÓ � Ñ�k5-03.kOWM-!LOW XMkJ&/uÕÀ[�k5Aij!-�=<k5WM-�j�/2kOAC/2[�/2Ñ@j�AKL5L½ACIi-GACLeû XMAikO-�L½ACj!ÒMIi-�Ó p k.ACLÔICAiZ�-�kOWV3O/uÕÀAC[MbzDzhVD]ICItÕÀAikOW�L½Ò4-EB0A��<B%|2-EIi/&B�AikOAC-EL.DukeL½Ò4-EB0A��4BÒ4/�LOAvk5Ai/�[VLÀÕÀWM-E3O-.kOWV-�/2[MIC6­Ñ´/�35B0-eACLÀb�35D:|{Aikr62Ó p Ñ�$ & ACLÀkOWM- �V[<D]ISÒMIKD2B0-yÕ8-yÕ�D][ kÀk5/$b2/$D][VJ�� & AKL�k5WM-ykOACj!-�ÕQ-yÕQD2[�k�k5/LOÒ<-E[VJê=&k5WM-0[

$+& ò�$tö%�½þ -_ø!, � & ö%�½þ -«ø�ö���& � �½þ -«ø�õ(*& ò�(êö��½þ*-«ø0, � , ö��½þ*-«ø0ö%��& � �½þ -«ø �

÷ì ôêö%��& � �½þ -«ø N õ ö½÷�� ø

ÕÀWM-E3O-eD]ICI<|uD2IiXM->L $tö%�½þ -_ø�= (êö%�½þ -«ø�=;� & ö%�½þ -«øºD2[VJ � , ö��½þ -_øºBED][»h<-.-0|uD]ICXVD]kO-EJ­Ñ´3O/�j kOWM-.-Eû XVD]kOAC/2[VLeö½÷ 2�øºD][VJ�ör÷ 3]ø�=&D2[VJ(*&%ò��MÓ q WM-E[�kOWM-�D2h</u|�-Ô->û�X<DukOAC/2[<L�h<->B�/2j!-

$ & ò B�/ L�� �Qý���� �ºô¯öOö%� & � ] ø{IC[]

] � �½þ -, � þ - ø�õ

� ò(LOAi[��#�Qý����-�ºôêö½ö%� & � ] ø&Ii[]

] � �½þ -, � þ - ø �

÷ì ô � N & . ö½÷�+ ø

p [âk5WM-»D2h</u|�-zL½6&L½kO-Ej /2Ñ«-Eû XVDuk5Ai/�[VLE=tk5WM-035-»D23O-­Ñ´/2XM3GÒVD]3nD]j!-�k5-03nL �8=��½þ -e=�$+&�D][VJ ��&VÓ p Ñ_krÕ8/Ò<D]3nD]j!-�k5-03nL�D]35-Z{[M/uÕÀ[S=uk5WM-0[!k5WM-À/]k5WM-03@krÕ8/eÕÀACIiIMh4-«LO/2IC|2-EJ�û XMAKBnZ IC62Ó×FM/�3Á-�9MD]j!ÒMIC-2=�Õ8-«D�LOLOXMj!- $+&yD2[VJ ��&yD]35-�Z{[M/uÕÀ[SÓH`�-�k535D2[VLrÑ´/�3OjkOWV-y-Eû XVDuk5Ai/�[VLyör÷�+�ø8AC[sLOXVBnWsD�ÕQD:6$k5WVDuk �½þ -�D][VJ �³J&-EÒ<-E[VJ�/2[�$+&zD][VJ��&!ACj�ÒVIiAKB�AikOIC62Ó

$ N & ,(ö ÷ì ô � & ø N � ý���� N ô N öOö%� & � ] ø{IC[]

] � �½þ -, � N þ - ò �Mõ

$+&Ák5D][ � �÷ì ô � N & ò � . ö¨ì<� ø

U{/�Ii|{AC[Mb%k5WMAKLÀL½6&L½kO-0j -Eû XVDuk5Ai/�[VL�b2AC|2->L�D][»D2Ivk5-035[VDuk5-eÕQD:6$/2ÑHB�/�j!ÒMX&kOAC[Mb�kOWV-.35/&BnZ2-0kEú Lºk535D<�r-EB�kO/2356$ÕÀAikOWM/�X&k«L½/�Ii|{AC[MbD][ 9 �.?«Ó � B�/�j!ÒVD]35ACLO/2[s/2ÑHk5WM-%j!-0kOWM/&JMLÔB0D][�h4-GLO-0-E[�Ai[ �<b2XM35--�MÓ q WM-%-0353O/�3«Ai[sk5WM-GB0/2j!ÒMX&k5-EJ�k535D<�r-EB�k5/2356zAKL_J&XV-kO/zkOWM-�-03535/23nL�Ai[�kOWM-W9 �Ô?�L½/�Ii|�-03>=VÕÀWMACIi-�D][{6s-0353O/�3«AC[�L½/�Ii|{AC[Mb$k5WM-�LO6&Lrk5-0j�/]Ñ@->û XVDuk5Ai/�[VL«AKL_AC[�k5WM-�LO/2IC|2-E3E=<ÕÀWMAKBnWACLj%X<BnW»IC-EL5L«D][<Jz[M/2k_J&-0Ò4-0[<J&-0[ k«/2[sD�hMXMACICJ�XMÒs/2ÑH-E3O35/23QÑ´35/2j ÒM35-0|{AC/2XVL8k5Aij!-�L½kO-0Ò<L0Ó

� ��λÊÁÎsË B�� ÌGË��

9ÔXM3$j!/&J&-0IÔACL­[M/]k­/�[MIi6§XVLO-�Ñ´XMIÀÑ´/�3­35-0b2XVICD23�35/&BnZ2-0k5LE=8hMXMkzD]IKL½/aÑ´/�3$j%XMIikOAiî¢L½k5D2b2-s35/{BnZ�-�knL!/23­LO/2ICACJ§Ñ´XM-EI_35/{BnZ�-�knLGh{6D2L5LOXMj!Ai[Mb�+ � ��/]ѯkOWM-Ôj!/]kO/�3@j$D2L5LÁAKLÁÑ´XM-EIxÓ×`�-_B0D2[­D]IKL½/yXVLO-Àk5WMACL@kO/%LO-�kQD][$XMÒMÒ4-038h</�XM[VJ!/�[!kOWM-«hVXM3O[V/2X&k@k5Aij!- ���À=h{6zLO-�k½k5Ai[Vb��Áò � . + ] ÓH`�-GB0D][sD]IKLO/�LO-�k_D�IC/uÕ8-E3Qh4/2XV[VJ»Ñ´/23�kOWV-�B�/2j!ÒMXMkO-EJ�$'&!h{6»D2L5LOXMj!Ai[MbGkOWV-yj�/2kO/�3«L½W{X&knL«J&/uÕÀ[kOWV-­Ai[<LrknD][ k%DuÑ�k5-03yk5WM-­IKD2L½k�/2hVLO-035|uDukOAC/2[�AC[�k5WM-­Z{[M/uÕÀ[�kO3nD��r->B�k5/23562=YD]kykOACj�- � S Ó�U{-�kOkOAC[Mb�XMÒMÒ4-03�D][VJ�IC/uÕ8-E3yh</�XM[VJMLj$D:6­WM-EIiÒ�kO/ �V[<Jzk5WM-ekO3nD��r->B�k5/2356$/2X&k«/2ÑYkOWV-�L½-0k«/]ÑHLO/2ICX&k5Ai/�[»k535D<�r-EB�kO/235AC-ELºkOWVD]k«Õ8-eÕ�D][ kEÓ

÷��

Page 70: Industrial Mathematical and Statistical Modeling Workshop for

0 0.5 1 1.5 2 2.5 3 3.5

x 105

−2

0

2

4

6

8

10x 10

4

x

y

FtACb2XV3O-R�Ml878/�j!ÒMX&kO->J q 3nD��r->B�kO/�3O6!XVLOAC[Mb p j�ÒV3O/u|�-EJ o -�k5WM/&J»ÕÀAikOW q 35D<�r-EB�kO/2356!Ñ´/2XV[VJzh{6�L½/�Ii|{AC[MbW9 �.?

2.98 2.985 2.99 2.995 3 3.005 3.01

x 105

−200

0

200

400

600

800

x

y

ODE solverAnalytic solution

FHACb2XM35-R+MlÁPQIi/uÕ�XMÒs/]ÑÁJ&Av1¯-035-0[<B�-eAi[ �V[<D]IêÒ4/�LOAvk5Ai/�[VLÀh4-�krÕQ-0-0[ p j!ÒM35/u|2-EJzj!-�kOWV/{JsD2[VJ 9 �Ô?�LO/2IC|2-EJ»j!-�k5WM/&J

÷2÷

Page 71: Industrial Mathematical and Statistical Modeling Workshop for

� j$D<�r/23_D�LOLOXMj!Ò&kOAC/2[sk5WVDuk.ÕQD�L«j$D2JM-ykOWVD]kÔAKL_XV[M3O->D]ICACL½kOAKByAKL_AC[�[M/2k.D2B0/2XM[ k5Ai[Mb­Ñ´/23«k5WM-GB0XM35|:D]kOXM35-�/2ÑÁkOWM-G-ED]3OkOWYÓ7QD]IKB�XVICD]kOAC/2[VL«Ñ´/23eRYD]jGh<-E3½ky,.XMAKJMD][VB0-�knD]Z{Ai[Vb­kOWV-GB�XV3O|uDuk5XM35-%/]ÑÁk5WM-�-ED]3OkOWD23O-%3O->D2J&ACIi6�D:|uD]ACIKD]hMIC-2=¯L½-E-zëi÷�í¢=¯LO/­knD]Z{AC[MbkOWV-G[M-09{keL½kO-EÒ�L½WV/2XMIKJ[M/2kÔh4-%k5WVDukyJ&A��­B�XMIikE=SD2[VJ�LOWM/2XVICJ�h4-!D­[VD]kOXM3nD]IHÒM35/2b235-EL5LOAi/�[SÓ«`�-�D2ICLO/»D�LOLOXMj!-EJ�b�35D:|{Aikr6­k5/h4-eB0/2[VL½k5D2[ kE= ÕÀWMAiIC-eAi[�D2B�kOXVD2IiAikr6$AvkÀBnWVD2[Mb2->LºÕÀAikOW�3O->L½Ò4-EB�kºk5/�J&AKL½k5D][<B�-ÔÑ´3O/�j kOWM-y?ºD23½k5WSú L8B0-0[ kO-E3QD2[VJ­|uD]35AC-EL8/u|2-038kOWV-?ºD23½k5WSú LÀL½XM3OÑ�D2B0-�J&XM-yk5/$Ii/&BED]IêÑ´->DukOXV3O->L«L½X<BnWD2L�k5WM-�-Eû XVD]kO/�3OAKD]IêhMXVIib�-2Ó q WV-�-�9&ÒM35-EL5L½AC/2[»Ñ´/23_b23nD:| Aikr6­ÕÀAikOW�35-ELOÒ<->B�kÀk5/D]IikOAikOX<J&-���D][<J­k5WM-y35D�J&ACXVL�/]ÑtkOWM-y->D]3OkOW�� ��ACLÀb�Ai|�-0[�h{6

ôêö��4ø@ò+ô � ö� �

� � ,�� ø N . ö¨ì&÷:ø

� [V/]kOWV-03eB�/2[<L½AKJ&-03nDuk5Ai/�[skOWVD]k.j$D:6�h<-Gk5D2Z2-E[Ai[ k5/»D2BEB�/�XM[ kÔAC[�D2L5LOXMj!Ai[Mb­kOWM-GÖVACb2W k.J&AC35-EB�k5Ai/�[k5/zh4-G|�-03OkOAKB0D]IHD]35-kOWV-eÑ´/23nB�-EL�ICAiÑ�k_D][VJ�J&3nD]bV=&ÕÀWVACBnWsJ&-0Ò4-0[<Js/2[�Ii/&B0D2ISD]AC3«J&-0[<L½Aikr62ÓT«/uÕQ-0|�-03>=�h<D2LO-EJ­/�[­k5WM-ELO-e3O->L½XMIik5LE=&ÕQ-ÔWVD:|�-.DGj%X<BnW»h4-�kOkO-03ÀXV[VJ&-03nL½k5D][<J&Ai[VbG/2ÑYkOWV-.ÒV3O/�hMIi-Ejs=&[M-EB0-EL5LOD23O6!ÒVD235D2j�-0î

kO-E35LE=MD][<J$kOWV-yAij!Ò4/23Ok5D][<B�-./]ÑHD2[zAC[MAikOAKD]Iêb2XV-EL5Lºk5/!Z2-0-EÒ»AC[�j�AC[VJ»AvÑHÕQD2[ kOAC[Mb%k5/$J&-0|�-0IC/2Òz- �­B�AC-0[ kÀÕQD:6&L8kO/�ÒM3nD2B�k5ACBED]ICIi6B�/�j!ÒMX&kO-yAC[&Ñ´/�3Oj$Duk5Ai/�[»/�[�D�35/&BnZ2-�k�hVD�L½->J»/�[»/�hVLO-035| AC[Mb!D�Ò</�3½k5Ai/�[»/2ÑHAik5LQÖVACb2W k>Ó

� B��<B_Ë B_ÉsÏ#B��

ëi÷�í w D235BnWVD2[S= g D2XMIx= ²<�2�0�x���n�u¤S�]�4�G�<�¨�O�]�¢�¢¸2�������K n ���¤i�$���&�����u�4�n��ª � j�-E3OAKB0D2[ p [VL½kOAikOX&k5-_/]Ñ � -035/2[VD2X&kOAKB0L8D][VJ � Lrk53O/�[VD]XMîkOAKB0LE= p [VB2Ói= �Á/�IiXVj�-%ìM÷�+ �«p � � q D�B�k5ACBED]I o AKLOLOACIi-�U{-E3OAC-ELE=MFHAvÑ�k5W?ºJ&AikOAC/2[S=VìK�K� 3{Ó

ë ì:í� �e�&�¶  �_�¢¸2���V�4�0�M¶  Ô���&�������¢� �Ô�E�n�����x nª�� W k½k5ÒSl -�9&ÒMIC/23nDukOAC/2[YÓ b�35B2Ó [VD�LODVÓ b�/u| �:-EJMXVB0D]kOAC/2[ �>35/{BnZ2-�k �>hMb�j�3>Ó W�k5j!I � =nìK�K� 3&Óë uí Â��5�n¥M�]�ê¶   �½®���u n *Å��� ��¨���V���¨���u� �:Â��zÅ ��� W�kOkOÒSl ÕÀÕÀÕyÓ b2IC/2hVD2ICLO-EB0XM3OAikr62Ó /235b �uÕÀj$J4�:ÕQ/235ICJ �:AC35Du[ �uLOWVDuWVD2h&î5÷�îLOÒ<->B0LEÓ W kOj � =Vì<�K� 3{Ó

ë /]í �Ô-0[M[MAKLE= m 3EÓC= m Ó ?«ÓVD2[VJ�U&BnWV[VD]h4-0I¨=.�À/�h<-E3½keP.Ói=� �&�$�������n�]¤������´¦M�>�: �®��u���t�4�n�]�M ��x�5�u���4�n��Ät¥4�¨���G���>�u�¨���u�"�]�4� �]�V¾¤¿���¯�5�]�ÔÆÀ¬0�V�u�¨���u�M  U p �Ôo = �Á/2ICXMj!-$÷ 2$78IKD2L5L½AKB0L�AC[ � ÒMÒMICAC-EJ o Duk5WM-0j$D]kOAKB0LE=ê÷�+K+*2MÓ

÷>ì

Page 72: Industrial Mathematical and Statistical Modeling Workshop for

PROBLEM 6: DEFECT DETECTION IN STEEL VIA INFRARED THERMOGRAPHY

Tobias Graf1, Maged Ismail2, Mark Iwen3, Felix Krahmer4, Kang-Ling Liao5, Tsvetanka Sendova6, ClaireZhang7

Problem Presenters:Junichi Nakagawa, Belal Gharaibeh, Keng H. Chuah, Tomoya Takeuchi, Tadayuki Ito

Nippon Steel

Faculty Consultant:Dr. Kazufumi Ito

North Carolina State University

Abstract

Defects which appear on steel sheets can be dangerously harmful, especially when used in the automobileindustry. There are many cases of subsurface defects which are often impossible to detect with a naked eye,such as blowholes, metallic inclusions, etc.

Nippon Steel Corporation is the second largest steel producer in the world and manufactures steel slabs forautomobile producers like Toyota. The company is committed to the production of high quality steel sheetsand is looking for nondestructive tests to detect various defects in the steel sheets.

Infrared thermography (IRT) is one such method used to detect subsurface defects in materials. Thetechnique is usually based on applying a high-intensity light pulse. The pulse is absorbed by the surface ofthe material and the resulting photo-thermal waves are captured using infrared cameras (photonic detectors).Defects modify the thermal response of the material, in particular the cooling curves of pixels correspondingto defects differ from those corresponding to “healthy” pixels. The main objective of this project is to find amethod to locate defects in the steel sheets and characterize them.

1 Introduction

1.1 Preliminaries

There are four basic types of defects in the zinc coated steel sheets (see figure 1):

(1) blowhole defect - these are defects which exist just below the surface of a continuously cast slab. Sincemolten steel contains aluminum oxide Al2O3, argon gas is used in the manufacturing process to preventAl2O3 from clogging the nozzle. In the process of rolling, the argon gas bubbles are flattened and forma blowhole defect.

(2) dross defect - this is a slag inclusion of zinc into the steel slab.

(3) unevenness in the steel sheet - this type of defect appears when the rolls used in the rolling process arenot cleaned properly or have imperfections.

(4) unevenness in the zinc layer

1Emory University2Claremont Graduate University & Keck Graduate Institute3University of Michigan4Courant Institute of Mathematical Science, New York University5National Chiao-Tung University6Texas A&M University7University of Washington

1

Page 73: Industrial Mathematical and Statistical Modeling Workshop for

Figure 1: Types of defects.

1.2 Previous work

The dependence of the thermal behavior of a defect on its depth was studied in the one-dimensional case byJ.G. Sun [3]. The author concludes that although the deviation of the cooling curve of a defect from the“standard” cooling curve is not a reliable parameter for quantitative determination of defect depth, it hasbeen widely used as a “primary” parameter to investigate thermography data and it can be used to detectsmall subsurface defects. A major problem for the approach taken in [3] is the prior determination of areference point that is known on a sound material. The averaged temperature from the entire surface is takenas the reference temperature. This approach can work well when the defect region is small and the surface isuniformly illuminated. Methods of this type need to find a characteristic time to correlate with defect depth,so they are susceptible to signal noise.

The project presented herein uses ideas similar to the ones introduced in [5] and [3], generalizing thenumerical simulations to the three-dimensional case, which can be reduced to a two-dimensional problem byusing radial symmetry.

1.3 General framework

We approach the problem of detection of material defects from several different directions. In order to gainunderstanding of how the size and depth of the defect affect the thermal response of the material, a number ofnumerical simulations using COMSOL Multiphisics (a finite element based code) were performed. In additionto this, a technique for the reconstruction of the cooling curves of the material points in a given sample wasdeveloped. Through averaging techniques one can determine the thermal behavior of a sound area and identifyas defects regions whose cooling curves deviate significantly from this average curve. An important part of ourstudies is concerned with finding ways to distinguish between actual defects and spatial noise (e.g. scratches,dark spots, etc.). All these aspects of our work are closely interconnected and we are going to describe howthey fit together in the following sections.

2 Reconstruction of the cooling curves

In order to obtain reliable results, the experimentalist needs to calibrate the photonic detector separately fordifferent temperature ranges. For this reason, in the case of high pulse radiation technique, the data about thecooling curves of the material points is fragmented into parts corresponding to different temperature ranges.Since in each temperature range there are oscillations in regions which are “out of range” as well as in the

2

Page 74: Industrial Mathematical and Statistical Modeling Workshop for

region corresponding to the minimum temperature, the data points between the time when the maximum andminimum temperature is achieved are not all reliable points belonging to the cooling curve. Our approach toidentifying the points on the cooling curve is to find the largest monotonically decreasing sequence within thecamera’s reliable temperature range. We consider only this data to be relevant.

Given that defects of different size and location alter cooling by different amounts over different time scales,we must keep careful track of temperature vs. time after initial heating. Hence, we cannot compare the coolingcurves range by range. Rather, we first need to reconstruct the entire cooling curve from the data for eachpixel using only reliable temperature measurements from each temperature range curve. In what follows wedescribe our approach for constructing a single cooling curve for each pixel by “fitting” the reliable portionsof each temperature range’s curve together.

0 5 10 15 20400

450

500

550

0 10 20 30300

350

400

450

500

0 10 20 30 40300

350

400

450

0 20 40 60280

300

320

340

360

380

Figure 2: Raw date for the four temperature ranges for pixel (40, 30), (reflection technique, sample 6).

Before the actual reconstruction of the cooling curve, one needs to remove the irrelevant points from thedata set, i.e. points corresponding to the heating curve and points that are out of the temperature range ofthe camera.

We have to consider several cases.

(1) In the case when the data8 for the higher temperature range contains at least two points and there is anoverlap between the higher and lower temperature intervals, using linear interpolation we can find theneeded shift which “concatenates” the two data sets. In figure 2, all combinations fall into this category.fall into this category.

(2) When the data for the higher temperature range contains at least two points but there is no overlapbetween the higher and lower temperature intervals, we use extrapolation (quadratic if the first set

8Note that here by data we mean only the relevant data points (see figure 2).

3

Page 75: Industrial Mathematical and Statistical Modeling Workshop for

contains at least three points, linear otherwise) to find the shift. Especially between the first and secondrange, an extrapolation is often necessary, as the slope in this range is very large.

(3) It could happen so that the data for the higher temperature range contains a single point. In this casewe need to extrapolate the lower data set.

1 1.5 2460

480

500

520

540

1 2 3 4350

400

450

500

1 2 3 4320

340

360

380

400

420

0 10 20 30 40290

300

310

320

330

340

350

Figure 3: Relevant part of the cooling curve in the four temperature ranges for pixel (40, 30), (reflectiontechnique, sample 6).

In order to compare different cooling curves, we need the curve of each pixel to have the same time reference.This is violated because of the approximations we have made to find the relative shifts between the parts ofthe cooling curves. For this reason, once we have reconstructed the cooling curve for a given pixel (see figure4a), we interpolate it to find its values at the relevant integer time steps (figure 4b). For that, we calculatethe approximate values at each integer by averaging the values at all points in a neighborhood, weighted bythe distance to the desired integer.

As our goal is to detect points, where the cooling curve deviates from the regular behavior, we need an ideahow the cooling curve is supposed to look like. We want to construct a “generic cooling curve” - representativeof the thermal behavior of a “healthy” pixel. For that, we average the cooling curves in a neighborhood, at eachpixel disregarding the 25% highest and lowest function values (an idea developed by Pickering and Almond[2]). This gives us a “generic cooling curve”.

3 Numerical simulations using COMSOL Multiphysics

In order to better understand how defects affect the thermographic image, we outline several numerical exper-iments. The goal is to set up a model of the heat conduction during the process and obtain theoretical data

4

Page 76: Industrial Mathematical and Statistical Modeling Workshop for

0 5 10 15 20 25 30 35 40 45250

300

350

400

450

500

550

0 5 10 15 20 25 30 35 40 45250

300

350

400

450

500

550

Figure 4: Reconstructed curve for

Figure 5: The model geometry

that later can be used to analyze the experimental data. The final goal is to get a better understanding ofhow different parameters influence the thermographic image and use this knowledge to optimize (if possible)those parameters in the experimental set up and eventually in the final application.

3.1 Theoretical Model

3.1.1 Model Description

• We use a finite element simulation of heat conduction to model the thermal imaging process during IRT.Considering data at the top or bottom of the steel domain allows us to simulate the two different waysto set up a thermography experiment, i.e. reflection and transmission detection, in one simulation.

• We validate the model by comparison to experimental data, and conduct parametric studies for severaldefecr scenarios.

3.1.2 Model Geometry and Dimensions

The geometry we use for the model is the set difference of a rectangle of dimension 10mm × 1mm and arectangle of size 50µm× 5µm (length × height). The smaller one is used to model the defect in the steel slab(See fig. 5, 6, 7).

5

Page 77: Industrial Mathematical and Statistical Modeling Workshop for

Steel Dimensionslength 10 mmheight 1 mm

Figure 6: Model dimensions for half the cross section of a steel slab.

Blowhole Dimensionslength 50µmheight 5µmdepth 70µm

Figure 7: Model dimensions for half the cross section of a blowhole defect.

Note: Since we assume symmetry about the vertical (z–)axis, the data above is used for the computation.The corresponding physical dimensions are cosequently twice the model dimensions.

3.1.3 The Governing Equation

The heat conduction in the steel is modelled by the following differential equation:

ρCp∂T

∂t−∇ · (k∇T ) = 0 (1)

where we use the following constants

Symbol Name ValueCp Heat capacity 475 J

kg·Kk Thermal conductivity 44.5 W

m·Kρ Density 7850 kg

m3

3.1.4 The Boundary Conditions

In the following we introduce some notations in the table below

Symbol Name Valueq0 Inward heat fluxh Heat transfer coefficient 10 W

m2·KTinf External temperature 295 K

ε Emissivity 0.6σ Boltzmann constant 1.3806503m2·kg

s2·KTamb Ambient temperature 295 K

and use the following boundary conditions for our model:

• For the top of the outer boundary we use a heat flux boundary condition of the form

n · (k∇T ) = q0 + h(Tinf − T ) + σε(T 4amb − T 4). (2)

The heating impulse is simulated through the boundary condition on the top of the outer boundary:

q0(t) = 3.5 · 108 · χ(0,0.00033)(t) (3)

where χ(0,0.00033)(t) is the characteristic function on the interval (0, 0.00033).

6

Page 78: Industrial Mathematical and Statistical Modeling Workshop for

Outer Boundary BC Type Equationtop: heat flux n · (k∇T ) = q0 + h(Tinf − T ) + σε(T 4

amb − T 4)bottom: heat flux n · (k∇T ) = h(Tinf − T ) + σε(T 4

amb − T 4)left: insulation n · (k∇T ) = 0

right: insulation n · (k∇T ) = 0

Figure 8: Boundary conditions for the numerical model on the outer boundary

Defect Boundary BC Type Equationtop insulation n · (k∇T ) = 0

bottom insulation n · (k∇T ) = 0left insulation n · (k∇T ) = 0

Figure 9: Boundary conditions for the numerical model on the inner boundary

• For the bottom boundary condition we use a heat flux condition of the form

n · (k∇T ) = h(Tinf − T ) + σε(T 4amb − T 4). (4)

• The right boundary is assigned an insulation condition:

n · (k∇T ) = 0 (5)

• The inner boundaries model the steel defect interface. We assume that the defect encloses an insulatingmaterial and use insulation boundary conditions.

3.2 Numerical Results

We implemented a series of numerical experiments, all following the above structure, varying only the defectdepth in 100µm steps from 100µm to 900µm. We present only the results for 100µm and 200µm, but includethe other cases in our qualitative discussion.

3.2.1 Parametric Study A: Varying Defect Depth

• Depth: 100µm We model the heat convolution in a steel slab containing a subsurface ”blow–hole”defect at 100µm below the top and analyse the temperature distributions corresponding to reflectionand transmission detection.

– Heat Conduction in the Steel Slab:We plot several stages (fig. 3.2.1) of the heat conduction through the cross–section of a steel slab.

In the next section, we will evaluate the temperature distribution at the top and bottom surface ofthe steel slab,i.e. simulate reflection and transmission thermography.

– Selected Cooling Curves for the Reflection Case:The temperature distribution on the upper surface corresponds to the thermal image obtained byreflection thermography. In other words, the data simulates an experimental set up with heat sourceand camera on the same side – in our case the top – of the steel slab. For a better analysis thecooling curves are plotted on a linear (fig. 3.2.1a) and a logarithmic (3.2.1b) scale.

– Selected Cooling Curves for the Transmission Case:The temperature distribution on the upper surface corresponds to the thermal image obtained bytransmission thermography. In other words, the data simulates an experimental set up with heat

7

Page 79: Industrial Mathematical and Statistical Modeling Workshop for

Figure 10: The thermal distribution after 1,2,3 and 80ms, defect depth 100µm

Figure 11: Cooling curves of selected points on a linear and logarithmic scale in the reflection case, defectdepth 100µm

8

Page 80: Industrial Mathematical and Statistical Modeling Workshop for

Figure 12: Cooling curves of selected points on a linear and logarithmic scale in the transmission case, defectdepth 100µm

source and camera on different sides – in our case heat source at the top, camera at the bottom –of the steel slab. For a better analysis the cooling curves are plotted on a linear (fig. 3.2.1a) and alogarithmic (3.2.1b) scale.

• Depth: 200µm We model the heat convolution in a steel slab containing a subsurface blowhole defectat 200µm below the top and analyze the temperature distributions corresponding to reflection andtransmission detection.

– Heat Conduction in the Steel Slab: We plot several stages (fig. 3.2.1) of the heat conductionthrough the cross–section of a steel slab.

– Selected Cooling Curves for the Reflection Case: The temperature distribution on the uppersurface corresponds to the thermal image obtained by reflection thermography. In other words, thedata simulates an experimental set up with heat source and camera on the same side – in our casethe top – of the steel slab. For a better analysis the cooling curves are plotted on a linear (fig.3.2.1a) and a logarithmic (3.2.1b) scale.

– Selected Cooling Curves for the Transmission Case:

• Summary: The numerical experiments show that a blowhole defect changes the temperature distrib-ution at the top and the bottom surface. For the reflection case, the model predicts a hot spot in theare above the defect, since the enclosed material (e.g. argon) acts as an insulation layer. Similarly, weobserve a cold spot in the area under the defect at the bottom surface, which corresponds to transmis-sion detection. Intuition and tThe numerical results suggest that the temperature between the defectand sound area depends on the defect depth. The deeper the defect is below the surface that faces thecamera, the smaller is the temperature difference.

Also, the numerical simulation shows that there are critical time intervals for the defect detection whichalso depends on depth and the position of the camera. This effect might be used to localize defects moreacurately by combining reflection and transmission images to estimate the defect area and depth.

3.2.2 Parametric Study B: Varying Defect Size

We compute the temperature distribution for defects of varying sizes (250µm, 500µm, 1000µm) at constantdepth 70µm.

9

Page 81: Industrial Mathematical and Statistical Modeling Workshop for

Figure 13: The thermal distribution after 1,2,3 and 5ms, defect depth 200µm

Figure 14: Cooling curves of selected points on a linear and logarithmic scale in the reflection case, defectdepth 200µm

10

Page 82: Industrial Mathematical and Statistical Modeling Workshop for

Figure 15: Cooling curves for a sound and a defective point on the surface (reflection case) for varying defectsizes: 250µm, 500µm, 1000µm

• Cooling Curves:Figure 15 shows the cooling curves for a sound and a defective point point on the steel surface. In thereflection experiment the defective area is temporarily hotter than the sound area, due to the insulatingproperties of the blowhole defect. The defect sizes are 250µm, 500µm, 1000µm.

• Summary: To analyse the effect of the defect size on the thermographic image we compute the thermalcontrast curves, i.e. the difference of the cooling curves of the sound and the healthy point for each defectsize, respectively. In figure 16 we observe that the larger the defect, the higher the thermal contrast.

3.3 Outline of Future Experiments

There are several experiments that would further improve the understanding of the processes that are relevantfor thermographic defect detection and localization.

• Rigorous analysis of the varying depth: The data for a defect of constant size but varying depthneeds to be analyzed more systematically. One goal could be to determine critical time intervals thatgive the biggest temperature difference between the defect area and and the sound surrounding area.These time intervals will presumably differ with the defect depths and depend on the camera position(i.e. reflection or transmission detection). This could lead to a second goal, which is, how these shiftscan be used to locate the defect more accurately.

• Analysis of the effect of the defect size: The next step would be to vary the defect size at constantdepths in order to understand how this quantity influences the thermal images. Supposedly, largerdefects (in horizontal directions) will produce higher temperature differences.

• Combined analysis for depth and size: The previous experiments should be combined in order todetect and localize defects in a more reliable manner.

11

Page 83: Industrial Mathematical and Statistical Modeling Workshop for

10−4

10−3

10−2

10−1

100

0

50

100

150

200

250

300

time [sec]

0.25 mm0.5 mm1 mm

Figure 16: Thermal contrast curves for defect sizes 250µm, 500µm, 1000µm.

• Analysis of model dimensions and boundary conditions: The model dimensions could be chosento be closer to the dimensions that arise in the final application, also, there are other boundary conditionsone could think of.

• Validation of the models: In order to validate the numerical results they have to be compared toexperimental data, possibly even for different materials that are better understood.

4 Defect detection

4.1 Single frame detection strategy

The numerical simulations described above suggest that the thermal behavior of a defect has the greatestdeviation from that of a sound area in the very initial stages of the cooling curve. For this reason we concentrateour attention on the first few time frames.

In order to reduce the influence of the fact that the steel slab is heated unevenly (see figure 17), we applythe two-dimensional spatial Laplacian to the data from an initial frame, computed using a central differencescheme. The use of the two-dimensional Laplacian can also be motivated by the following formulation of thethree-dimensional equation of thermal conductivity:

∂T

∂t= α

(∂2T

∂x2+

∂2T

∂y2+

∂2T

∂z2

)⇔

∂T

∂t− α

∂2T

∂z2= α4xyT

(6)

where T (t, x, y, x) is the temperature at (x, y, z) at time t and α is the thermal conductivity. Equation (6)2can be viewed as the one-dimensional heat equation

∂T

∂t− α

∂2T

∂z2= f(x, y, z, t)

where f(x, y, z, t) = α4xyT can be interpreted as a source function (caused by the defects).Our conjecture is that the spots where the Laplacian is the greatest in absolute value are either defects or

spatial noise (e.g. scratches, dark spots, etc.). In order to distinguish between spatial noise and actual defectswe need to study the cooling curves corresponding to the “problematic” spots.

12

Page 84: Industrial Mathematical and Statistical Modeling Workshop for

50 100 150 200 250 300

50

100

150

200

250

Figure 17: Sample frame, (reflection technique, sample 5).

40 50 60 70 80

10 20 30 40 50 60 70

5

10

15

20

25

30

35

40

45

50

55

330

335

340

345

350

−1

−0.5

0

0.5

1

1.5

2

2.5

3

Figure 18: images of an initial time frame and its Laplacian - large sample 6

13

Page 85: Industrial Mathematical and Statistical Modeling Workshop for

4.2 Difficulties and solution ideas

Our numerical experiments suggest that the greatest deviation in the cooling curve of a defect from the coolingcurve of a sound material is in the very initial stages right after the maximum temperature is reached. Thisindicates that we have the greatest chance of detecting a defect by looking at the first few time frames ofthe relevant data in the maximum temperature range. Since this is the period when the cooling rate is thegreatest and the highest temperature range of the infrared camera9 used is 150 − 350◦C, we only have threedata points at the most belonging to this temperature range. There are cases where there is just a single pointin this most significant range. This can prevent us from finding a good approximation of the relative shiftbetween the neighboring ranges. A consequence of errors in the shift is the occurrence of discontinuities in thetemperature distribution for time frames after the matching. These jumps will be picked up by the Laplacian(see figure19), so the error detection method described in section 4.1 will not give reasonable results. Also, werealize that the last few time frames do not occur in the interpolated cooling curves of all pixels, confirmingthat shifting errors in the range of several time frames are present.

40 50 60 70 80

395

400

405

410

415

420

425

430

435

440

445

10 20 30 40 50 60 70

5

10

15

20

25

30

35

40

45

50

55 −15

−10

−5

0

5

10

15

Figure 19: images of an initial time frame and its Laplacian - small sample 6

Due to the monotonicity, whenever no extrapolation is necessary, the first relevant point of the lowertemperature range can be located uniquely between two points of the higher range. So in most cases, theerrors only occur in the fractional part. This part should be the same for all pixels as it is also the fractionalpart of the relative shift between the time reference frames associated to the two temperature ranges. Thatis why we average the fractional part over all values hoping that the average is a good approximation of theactual value.

However, this does not lead to any improvement, in fact the temperature plots show even more disconti-nuities than before (see figure 20).

40 50 60 70 80

440

450

460

470

480

490

10 20 30 40 50 60 70

10

20

30

40

50−10

−5

0

5

10

15

Figure 20: images of an initial time frame and its Laplacian - small sample 6

So very likely, errors occur in the integer part as well. Hence, we directly calculated the shift between thetime reference frames, but that did not help either. This strongly suggests that at least parts of the problemlies in the data. Indeed, figure 4a shows that the second and third data points are almost identical, whereasthe fourth and fifth differ significantly. In theory, as they lie exactly one time frame right of the second and

9FLIR (SC 6000) Photonic detector Resolution: 640x512 pixels, Speed: 120 Hz (fps), Temperature sensitivity: 0.01◦C

14

Page 86: Industrial Mathematical and Statistical Modeling Workshop for

third point, they should also be close together. This shows that the data from the different temperature rangesdo not agree, which naturally causes difficulties when trying to match them.

In conclusion, it is critical to have as many points as possible in the highest temperature range of thecooling curve.

5 Discussion and future work

Our numerical experiments show that the highest temperature ranges are most important for error detection.That is why it is crucial to avoid shift errors especially in this temperature range. For that, an additionalcalibration range capturing high temperature values could be helpful, as it would provide more data points inthe higher temperature range and allow better data matching.

If that is not available, the errors caused by wrong data might be avoided by repeated measurements. Inany case, with the data at hand, it seems almost impossible to accurately detect defects.

References

[1] Takashi Miyake, Masafumi Morishita, Hitoshi Nakata and Masahiko Kokita Influence of Sulphur Contentand Molten Steel Flow on Entrapment of Bubbles to Solid/Liquid Interface ISIJ International Vol. 46(2006) , No. 12 pp.1817-1822

[2] S. G. Pickering D.P. Almond Automated Defect Detection for Pulsed Transient Thermography Review ofQuantitative Nondestructive Evaluation Vol. 26 (2007), pp. 1585-1592

[3] J. G. Sun Analysis of Pulsed Thermography Methods for Defect Depth Prediction Journal of Heat TransferVol. 128 (2006), No. 4, pp. 329-338

[4] W. J. Parker, R. J. Jenkins, C. P. Butler and G. L. Abbott, Flash Method of Determining ThermalDiffusivity, Heat Capacity, and Thermal Conductivity J. Appl. Phys., vol. 32 (1961), pp. 16791684.

[5] Xavier Maldague Theory and Practice of Infrared Technology for Nondestructive Testing 2001, John Wiley& Sons

[6] Xavier Maldague Advances in Signal Processing for Nondestructive Evaluation of Materials 1993, NATOASI Series

15