systems analysis of stochastic and population balance ... · frameworks such as stochastic and cell...

Systems Analysis of Stochastic and Population Balance Models for

Chemically Reacting Systems

by

Eric Lynn Haseltine

A dissertation submitted in partial fulfillment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

(Chemical Engineering)

at the

UNIVERSITY OF WISCONSIN–MADISON

2005

c© Copyright by Eric Lynn Haseltine 2005All Rights Reserved

i

To Lori and Grace,for their love and support

ii

Systems Analysis of Stochastic and Population Balance Models for

Chemically Reacting Systems

Eric Lynn HaseltineUnder the supervision of Professor James B. Rawlings

At the University of Wisconsin–Madison

Chemical reaction models present one method of analyzing complex reaction pathways. Mostmodels of chemical reaction networks employ a traditional, deterministic setting. The short-comings of this traditional framework, namely difficulty in accounting for population het-erogeneity and discrete numbers of reactants, motivate the need for more flexible modelingframeworks such as stochastic and cell population balance models. How to efficiently usemodels to perform systems-level tasks such as parameter estimation and feedback controllerdesign is important in all frameworks. Consequently, this thesis focuses on three main areas:

1. improving the methods used to simulate and perform systems-level tasks using stochas-tic models,

2. formulating and applying cell population balance models to better account for experi-mental data, and

3. applying moving-horizon estimation to improve state estimates for nonlinear reactionsystems.

For stochastic models, we have derived and implemented techniques that improvesimulation efficiency and perform systems-level tasks using these simulations. For discretestochastic models, these systems-level tasks rely on approximate, biased sensitivities, whereascontinuous models (i.e. stochastic differential equations) permit calculation of unbiased sen-sitivities. Numerous examples illustrate the efficiency of these methods, including an applica-tion to modeling of batch crystallization systems.

We have also investigated using cell population balance models to incorporate bothintracellular and extracellular levels of information in viral infections. Given experimental im-ages of the focal infection system for vesicular stomatitis virus, we have applied these modelsto better understand the dynamics of multiple rounds of virus infection and the interferon(antiviral) host response. The model provides estimates of key parameters and suggests thatthe experimental technique may cause salient features in the data. We have also proposed an

iii

efficient and accurate model decomposition that predicts population-level measurements ofintracellular and extracellular species.

Finally, we have assessed the capabilities of several state estimators, including moving-horizon estimation (MHE) and the extended Kalman filter (EKF). When multiple optima arisein the estimation problem, the judicious use of constraints and nonlinear optimization as em-ployed by MHE can lead to improved state estimates and closed-loop control performancethan the EKF. This improvement comes at the price of the computational expense required tosolve the MHE optimization.

v

Acknowledgments“Whatever you do, work at it with all your heart, as working for the Lord, not for men, since you knowthat you will receive an inheritance from the Lord as a reward.”-Colossians 3:23-24

I first thank God, creator of heaven and earth, by whose grace I have had the opportu-nity to complete the work comprising this thesis.

I thank my wife Lori, for her love, patience, and support. I would not have had thecourage to aim so high without your encouragement. Also, the years in Madison would nothave been as special without your presence.

I thank my daughter Grace, who has always been able to make me smile during thispast year no matter how far graduation seemed away.

I am grateful to my family: my parents, Doug and Lydia, and my brother, David. With-out your support and guidance through the years of my life, I would not be where I am today.I also wish to thank my in-laws, Carl and Linda Rutkowski, in particular for supporting mywife these past five years.

I thank my extended church family at Mad City Church: the Billers, the Thompsons,the Smiths, the Sells, and the Konkols. In particular, I wish to acknowledge Shane and KarenBiller, who have loved, supported, and prayed for my family as if we were part of their own.

There are many people in the chemical engineering department at the University ofWisconsin whom I must also acknowledge. First, I thank my advisor, Jim Rawlings, for givingme great latitude to exercise my creativity and to study interesting problems. I am alwaysamazed by your ability to identify the important problems in a field. It has been a great honorto work with you and learn from you. I am also grateful to John Yin for first listening tomy modeling ideas, then making ways for me to collaborate with his group. I am deeplyindebted to Gabriele Pannocchia, who always made time to answer my questions, no matterhow trivial. Since imitation is the highest form of flattery, I have tried to be as patient, kind,and understanding to my junior group members as you were to me. I could always count oneither reasoning out research problems or taking a break for humor with Aswin Venkat (a.k.a.the British spy). Thank you, Matt Tenny, for your help in the office and the weight room,although perhaps I would have graduated sooner if you had not introduced me to Nethack.Brian Odelson and Daniel Patience always kept me from taking research too seriously, be itrounding everyone up for a game of darts, or getting MJ to drop by for an ice cream break.Thanks also to John Eaton for Octave and Linux support; who would have figured five yearsago that I would install Linux on my laptop? It has been a pleasure getting to know Paul

vi

Larsen, Murali Rajamani, and especially Ethan Mastny, who listened to almost all of my ideason stochastic simulation. I also thank former Rawlings group members Jenny Wang, ScottMiddlebrooks, and Chris Rao for their help during my first years in the group. Finally, I havehad the great pleasure of getting to know the Yin group over the past year. In particular, Ithank Vy Lam for graciously putting up with of my experimental questions. I am also gratefulto Patrick Suthers and Hwijin Kim for their friendship.

ERIC LYNN HASELTINE

University of Wisconsin–MadisonFebruary 2005

vii

ContentsAbstract ii

Acknowledgments v

List of Tables xiii

List of Figures xv

Chapter 1 Introduction 1

Chapter 2 Literature Review 52.1 Traditional Deterministic Reaction Models . . . . . . . . . . . . . . . . . . . . . 52.2 Systems Level Tasks for Deterministic Models . . . . . . . . . . . . . . . . . . . 8

2.2.1 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.4 Sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Stochastic Reaction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.1 Monte Carlo Simulation of the Stochastic Model . . . . . . . . . . . . . . 162.3.2 Performing Systems Level Tasks with Stochastic Models . . . . . . . . . 25

2.4 Population Balance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 3 Motivation 293.1 Current Limitations of Stochastic Models . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Integration Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.2 Systems Level Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Current Limitations of Traditional Deterministic Models . . . . . . . . . . . . . 333.3 Current Limitations of State Estimation Techniques . . . . . . . . . . . . . . . . 34

Chapter 4 Approximations for Stochastic Reaction Models 354.1 Stochastic Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.1 Slow Reaction Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.1.2 Fast Reaction Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.3 The Combined System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.1.4 The Equilibrium Approximation . . . . . . . . . . . . . . . . . . . . . . . 40

viii

4.1.5 The Langevin and Deterministic Approximations . . . . . . . . . . . . . 414.2 Numerical Implementation of the Approximations . . . . . . . . . . . . . . . . . 44

4.2.1 Simulating the Equilibrium Approximation . . . . . . . . . . . . . . . . . 464.2.2 Simulating the Langevin and Deterministic Approximations: Exact Next

Reaction Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2.3 Simulating the Langevin and Deterministic Approximations: Approxi-

mate Next Reaction Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3 Practical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.1 Enzyme Kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.4.2 Simple Crystallization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.4.3 Intracellular Viral Infection . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.5 Critical Analysis of the Stochastic Approximations . . . . . . . . . . . . . . . . . 62

Chapter 5 Sensitivities for Stochastic Models 695.1 The Chemical Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.2 Sensitivities for Stochastic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.1 Approximate Methods for Generating Sensitivities . . . . . . . . . . . . 715.2.2 Deterministic Approximation for the Sensitivity . . . . . . . . . . . . . . 725.2.3 Finite Difference Sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . 745.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3 Parameter Estimation With Approximate Sensitivities . . . . . . . . . . . . . . . 795.3.1 High-Order Rate Example Revisited . . . . . . . . . . . . . . . . . . . . . 80

5.4 Steady-State Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.4.1 Lattice-Gas Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Chapter 6 Sensitivity Analysis of Discrete Markov Chain Models 876.1 Smoothed Perturbation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.1.1 Coin Flip Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.1.2 State-Dependent Simulation Example . . . . . . . . . . . . . . . . . . . . 93

6.2 Smoothing by Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976.3 Sensitivity Calculation for Stochastic Chemical Kinetics . . . . . . . . . . . . . . 1006.4 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Chapter 7 Sensitivity Analysis of Stochastic Differential Equation Models 1037.1 The Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047.2 Sensitivity Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7.2.1 Simple Reversible Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.2.2 Oregonator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.3 Applications of Parametric Sensitivities . . . . . . . . . . . . . . . . . . . . . . . 1097.3.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

ix

7.3.2 Calculating Steady States . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.3.3 Simple Dumbbell Model of a Polymer in Solution . . . . . . . . . . . . . 114

7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Chapter 8 Stochastic Simulation of Particulate Systems 1198.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198.2 Stochastic Chemical Kinetics Overview . . . . . . . . . . . . . . . . . . . . . . . 121

8.2.1 Stochastic Formulation of Isothermal Chemical Kinetics . . . . . . . . . 1218.2.2 Extension of the Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . 1228.2.3 Interpretation of the Simulation Output . . . . . . . . . . . . . . . . . . . 123

8.3 Crystallization Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 1248.4 Stochastic Simulation of Batch Crystallization . . . . . . . . . . . . . . . . . . . . 126

8.4.1 Isothermal Nucleation and Growth . . . . . . . . . . . . . . . . . . . . . 1268.4.2 Nonisothermal Nucleation and Growth . . . . . . . . . . . . . . . . . . . 1358.4.3 Isothermal Nucleation, Growth, and Agglomeration . . . . . . . . . . . 138

8.5 Parameter Estimation With Stochastic Models . . . . . . . . . . . . . . . . . . . . 1418.5.1 Trust-Region Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 1428.5.2 Finite Difference Sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . 1428.5.3 Parameter Estimation for Isothermal Nucleation, Growth, and Agglom-

eration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1458.6 Critical Analysis of Stochastic Simulation as a Modeling Tool . . . . . . . . . . . 1468.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Chapter 9 Population Balance Models for Cellular Systems 1519.1 Population Balance Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519.2 Application of the Model to Viral Infections . . . . . . . . . . . . . . . . . . . . . 153

9.2.1 Intracellular Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1539.2.2 Extracellular Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1549.2.3 Final Model Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . 1549.2.4 Model Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

9.3 Application to In Vitro and In Vivo Conditions . . . . . . . . . . . . . . . . . . . . 1569.3.1 In Vitro Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1569.3.2 In Vivo Initial Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1589.3.3 In Vivo Drug Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

9.4 Future Outlook and Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

Chapter 10 Modeling Virus Dynamics: Focal Infections 17110.1 Experimental System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

10.1.1 Modeling the Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 17310.1.2 Modeling the Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 17310.1.3 Analyzing and Modeling the Images . . . . . . . . . . . . . . . . . . . . . 174

10.2 Propagation of VSV on BHK-21 Cells . . . . . . . . . . . . . . . . . . . . . . . . . 175

x

10.2.1 Development of a Reaction-Diffusion Model . . . . . . . . . . . . . . . . 17610.2.2 Analysis of the Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

10.3 Propagation of VSV on DBT Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 17910.3.1 Refinement of the Reaction-Diffusion Model . . . . . . . . . . . . . . . . 18010.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18310.3.3 Model Prediction: Infection Propagation in the Presence of Interferon

Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18910.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19010.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Chapter 11 Multi-level Dynamics of Viral Infections 19511.1 Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19611.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

11.2.1 Initial Infection for a Generic Viral Infection . . . . . . . . . . . . . . . . 19711.2.2 VSV/DBT Focal Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . 20111.2.3 Model Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

11.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Chapter 12 Moving-Horizon State Estimation 21512.1 Formulation of the Estimation Problem . . . . . . . . . . . . . . . . . . . . . . . 21712.2 Nonlinear Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21812.3 Extended Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21812.4 Monte Carlo Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22012.5 Moving-Horizon Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22312.6 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

12.6.1 Comparison of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22512.6.2 Evaluation of Arrival Cost Strategies . . . . . . . . . . . . . . . . . . . . . 231

12.7 EKF Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23312.7.1 Chemical Reaction Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 23512.7.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23712.7.3 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24412.7.4 Computational Expense . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

12.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25012.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

12.9.1 Derivation of the MHE Smoothing Formulation . . . . . . . . . . . . . . 25512.9.2 Derivation of the MHE Filtering Formulation . . . . . . . . . . . . . . . . 25612.9.3 Equivalence of the Full Information and Least Squares Formulations . . 25712.9.4 Evolution of a Nonlinear Probability Density . . . . . . . . . . . . . . . . 260

xi

Chapter 13 Closed Loop Performance Using Moving-Horizon Estimation 26513.1 Regulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26513.2 Disturbance Models for Nonlinear Models . . . . . . . . . . . . . . . . . . . . . 266

13.2.1 Plant-model Mismatch: Exothermic CSTR Example . . . . . . . . . . . . 26813.2.2 Maximum Yield Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

13.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

Chapter 14 Conclusions 277

Bibliography 281

Vita 293

xiii

List of Tables2.1 Types of cell population models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Model parameters and reaction extents for the enzyme kinetics example . . . . 514.2 Model parameters and reaction extents for the simple crystallization example . 534.3 Comparison of time steps for the simple crystallization example . . . . . . . . . 564.4 Model parameters and reaction extents for the intracellular viral infection example 594.5 Simulation time comparison for the intracellular viral infection example . . . . 61

5.1 Parameters for the lattice-gas example. . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1 Parameters for the coin flip example . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.1 Parameter values for the simple reversible reaction. . . . . . . . . . . . . . . . . 1077.2 Parameter values for the Oregonator system of reactions. . . . . . . . . . . . . . 1097.3 Parameters for the simple dumbbell model. . . . . . . . . . . . . . . . . . . . . . 1157.4 Results for the simple dumbbell model. . . . . . . . . . . . . . . . . . . . . . . . 115

8.1 Nucleation and growth parameters for an isothermal batch crystallizer . . . . . 1278.2 Nonisothermal nucleation and growth parameters for a batch crystallizer . . . 1368.3 Nucleation, growth, and agglomeration parameters for an isothermal, batch

crystallizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1408.4 Parameters for the parameter estimation example. . . . . . . . . . . . . . . . . . 1448.5 Estimated parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

9.1 Model parameters for in vitro simulation . . . . . . . . . . . . . . . . . . . . . . . 1579.2 Model parameters for in vivo simulation . . . . . . . . . . . . . . . . . . . . . . . 1609.3 Comparison of actual and fitted parameter values for in vivo simulation of an

initial infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1639.4 Additional model parameters for in vivo drug therapy . . . . . . . . . . . . . . . 165

10.1 Parameters used to describe the experimental conditions. . . . . . . . . . . . . . 17310.2 Parameter estimates for the VSV/BHK-21 focal infection models. . . . . . . . . 17810.3 Hessian analysis for the parameter estimates of the original VSV/BHK-21 focal

infection model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

xiv

10.4 Hessian analysis for the parameter estimates of the revised VSV/BHK-21 focalinfection model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

10.5 Parameter estimates for the VSV/DBT focal infection models. . . . . . . . . . . 18510.6 Hessian analysis for the parameter estimates of the reaction-diffusion VSV/DBT

focal infection model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18610.7 Hessian analysis for the parameter estimates of the first segregated VSV/DBT

focal infection model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18710.8 Hessian analysis for the parameter estimates of the second segregated VSV/DBT

focal infection model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

11.1 Model parameters for the initial infection simulation. . . . . . . . . . . . . . . . 19811.2 Initial conditions and rate constants for the intracellular reactions of the VSV

infection of DBT cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20611.3 Initial conditions and rate constants for the reactions describing the intracellular

host antiviral response of the VSV infection of DBT cells. . . . . . . . . . . . . . 20711.4 Extracellular model parameters for the infection of DBT cells by VSV. . . . . . . 208

12.1 Sample size required to ensure that the relative mean square error at zero is lessthan 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

12.2 EKF steady-state behavior, no measurement or state noise . . . . . . . . . . . . 23812.3 EKF steady-state behavior, no measurement or state noise . . . . . . . . . . . . 24212.4 A priori initial conditions for state estimation . . . . . . . . . . . . . . . . . . . . 24612.5 Effects of a priori initial conditions, constraints, and horizon length on state es-

timation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25412.6 Comparison of MHE and EKF computational expense. . . . . . . . . . . . . . . 254

13.1 Model Steady States for a Plant with Tc = 300 K, T = 350 K . . . . . . . . . . . . 26813.2 Maximum yield CSTR parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . 273

xv

List of Figures2.1 Microscopic volume considered in the equation of continuity for two dimensions. 62.2 Optimal control seeks to drive the output to set point. . . . . . . . . . . . . . . . 92.3 Parameter estimation seeks to minimize the deviations between the model pre-

diction and the data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Illustration of the strong law of large numbers given a uniform distribution. . . 172.5 Illustration of the central limit theorem given a uniform distribution. . . . . . . 19

3.1 Computational time per simulation as a function of nAo. . . . . . . . . . . . . . 303.2 Extent of reaction as a function of nAo. . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Finite difference sensitivity for the stochastic model. . . . . . . . . . . . . . . . . 323.4 Cyclic nature of viral infections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1 Comparison of the stochastic-equilibrium simulation to exact stochastic simu-lation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Comparison of approximate tau-leap simulation to exact stochastic simulation. 544.3 Comparison of approximate stochastic-Langevin simulation to exact stochastic

simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4 Comparison of exact stochastic-deterministic simulation to exact stochastic sim-

ulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.5 Comparison of approximate stochastic-deterministic simulation to exact stochas-

tic simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.6 Squared error trends for the exact and approximate stochastic-deterministic sim-

ulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.7 Intracellular viral infections: (a) typical and (b) aborted. . . . . . . . . . . . . . . 604.8 Evolution of the template probability distribution for the (a) exact stochastic and

(b) approximate stochastic-deterministic simulations. . . . . . . . . . . . . . . . 624.9 Comparisons of the template probability distribution for the exact stochastic

and approximate stochastic-deterministic simulations. . . . . . . . . . . . . . . . 634.10 Comparison of the template mean and standard deviation for exact stochastic,

approximate stochastic-deterministic, and deterministic simulations. . . . . . . 644.11 Comparison of the genome mean and standard deviation for exact stochastic,

approximate stochastic-deterministic, and deterministic simulations. . . . . . . 64

xvi

4.12 Comparison of the structural protein mean and standard deviation for exactstochastic, approximate stochastic-deterministic, and deterministic simulations. 65

5.1 Comparison of the exact, approximate, and central finite difference sensitivitiesfor a second-order reaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.2 Comparison of the exact and approximate sensitivities for the high-order rateexample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 Relative error of the approximate sensitivity s with respect to the exact sensitiv-ity s as the number of nA,o molecules increases for the high-order rate example. 78

5.4 Comparison of the exact, approximate, and finite difference sensitivity for thehigh-order rate example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.5 Comparison of the (a) parameter estimates per Newton-Raphson iteration and(b) model fit at iteration 20 using the approximate and finite difference sensitiv-ities for the high-order rate example. . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.6 Results for the lattice-gas model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.1 Mean E[Sn] as a function of the number of coin flips n . . . . . . . . . . . . . . . 926.2 Mean sensitivity ∂E[Sn]

∂θ as a function of the number of coin flips n . . . . . . . . 936.3 Comparison of nominal and perturbed path for SPA analysis . . . . . . . . . . . 946.4 SPA analysis of the discrete decision. . . . . . . . . . . . . . . . . . . . . . . . . . 946.5 Illustration of the branching nature of the perturbed path for SPA analysis . . . 966.6 Mean E[nk] as a function of the number of decisions k . . . . . . . . . . . . . . 976.7 Mean sensitivity ∂E[nk]

∂θ as a function of the number of decisions k . . . . . . . . 976.8 Comparison of the exact and simulated (a) mean and (b) mean integrated sen-

sitivity for the irreversible reaction 2A→ B. . . . . . . . . . . . . . . . . . . . . . 99

7.1 Results for the simple reversible reaction re-using the same random numbers. . 1087.2 Results for the simple reversible reaction using different random numbers. . . . 1097.3 Results for one trajectory of the Oregonator cyclical reactions. . . . . . . . . . . 1107.4 Results for parameter estimation of the simple reversible reaction example. . . 1127.5 Results for steady-state analysis of the Oregonator reaction example: estimated

state per Newton iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.1 Method for calculating the population balance from stochastic simulation. . . . 1258.2 Mean of the stochastic solution for an isothermal crystallization with nucleation

and growth, 1 simulation, characteristic particle size ∆ = 0.01, system volumeV = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.3 Mean of the stochastic solution for an isothermal crystallization with nucleationand growth, average of 100 simulations, characteristic particle size ∆ = 0.01,system volume V = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8.4 Average stochastic simulation time based on 10 simulations and V = 1 . . . . . 128

xvii

8.5 Mean of the stochastic solution for an isothermal crystallization with nucleationand growth, average of 100 simulations, characteristic particle size ∆ = 0.1,system volume V = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.6 Deterministic solution by orthogonal collocation for isothermal crystallizationwith nucleation and growth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

8.7 Deterministic solution by orthogonal collocation for isothermal crystallizationwith nucleation and growth, inclusion of the diffusivity term. . . . . . . . . . . 134

8.8 Total and supersaturated monomer profiles for nonisothermal crystallization . 1368.9 Crystallizer and cooling jacket temperature profiles . . . . . . . . . . . . . . . . 1378.10 Mean of the exact stochastic solution for nonisothermal crystallization with nu-

cleation and growth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1378.11 Mean of the approximate stochastic solution for nonisothermal crystallization

with nucleation and growth, propensity of no reaction a0 = 10 . . . . . . . . . . 1388.12 Deterministic solution by orthogonal collocation for nonisothermal crystalliza-

tion with nucleation and growth, inclusion of the diffusivity term . . . . . . . . 1398.13 Zeroth moment comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1398.14 First moment comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1408.15 Mean of the stochastic solution for an isothermal crystallization with nucleation,

growth, and agglomeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1418.16 Comparison of final model prediction and measurements for the parameter es-

timation example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1458.17 Convergence of parameter estimates as a function of the optimization iteration. 146

9.1 Fit of a structured, unsegregated model to experimental results. . . . . . . . . . 1599.2 Time evolution of intracellular components and secreted virus for the intracel-

lular model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1609.3 Fit of a structured, unsegregated model to experimental results. . . . . . . . . . 1619.4 Dynamic in vivo response of the cell population balance to initial infection . . . 1629.5 Extracellular model fit to dynamic in vivo response of an initial infection . . . . 1629.6 Dynamic in vivo response to initial treatment with inhibitor drugs I1 and I2. . . 1669.7 Effect of drug therapy on in vivo steady states. . . . . . . . . . . . . . . . . . . . 167

10.1 Overview of the experimental system. . . . . . . . . . . . . . . . . . . . . . . . . 17210.2 Measurement model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17410.3 Comparison of representative experimental images to model fits. . . . . . . . . 17510.4 Comparison of the initial uninfected cell concentration for the original and re-

vised models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17710.5 Comparison of representative experimental images to model fits for VSV prop-

agation on DBT cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18010.6 Comparison of intracellular production rates of virus and interferon for the seg-

regated model of VSV propagation on DBT cells. . . . . . . . . . . . . . . . . . . 184

xviii

10.7 Comparison of representative experimental images to model predictions forVSV propagation on DBT cells in the presence of interferon inhibitors. . . . . . 189

10.8 Experimental (averaged) images obtained from the dynamic propagation ofVSV on BHK-21 cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

10.9 Experimental (averaged) images obtained from the dynamic propagation ofVSV on DBT cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

11.1 (a) Comparison of the full and decoupled model solutions for the initial infec-tion example. (b) Percent error for the decoupled model solution, assuming thefull solution is exact. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

11.2 Schematic of modeled events for the infection of DBT cells by VSV. . . . . . . . 20211.3 Detailed schematic of modeled events for the up-regulation of interferon (IFN)

genes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20311.4 Comparison of experimental data, simple segregated model fit, and the devel-

oped model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21111.5 Comparison of total production of virus (VSV) and interferon (IFN) per cell for

the simple segregated model and intracellularly-structured, segregated model. 21211.6 Dynamic measurement of mRNA species for the focal infection system. . . . . 212

12.1 Comparison of potential point estimates (mean and mode) for (a) unimodal and(b) bimodal a posteriori distributions. . . . . . . . . . . . . . . . . . . . . . . . . 216

12.2 Example of using the kernel method to estimate the density of samples drawnfrom a normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

12.3 Example of using a histogram to estimate the density of samples drawn from anormal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

12.4 Extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22612.5 Contours of P (x1|y0,y1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22712.6 Clipped extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . 22812.7 Moving-horizon estimation results. . . . . . . . . . . . . . . . . . . . . . . . . . . 22912.8 Contours of max

x0

P (x1,x0|y0,y1). . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

12.9 A posteriori density P (x1|y0,y1) calculated using a Monte Carlo filter with den-sity estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

12.10Contours of P (x4|y0, . . . ,y4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23112.11Contours of max

x1,...,x3

P (x1, . . . ,x4|y0, . . . ,y4) with the arrival cost approximated

using the smoothing update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23212.12Contours of max

x1,...,x3


as a uniform prior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23312.13Contours of max

x1,...,x9


using the smoothing update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23412.14Extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23912.15Clipped extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . 240

xix

12.16Moving-horizon estimation results. . . . . . . . . . . . . . . . . . . . . . . . . . . 24112.17Extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24312.18Clipped extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . 24412.19Moving-horizon estimation results. . . . . . . . . . . . . . . . . . . . . . . . . . . 24512.20Extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24612.21Moving-horizon estimation results. . . . . . . . . . . . . . . . . . . . . . . . . . . 24712.22Extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24812.23Moving-horizon estimation results. . . . . . . . . . . . . . . . . . . . . . . . . . . 24912.24Clipped extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . 25012.25Moving-horizon estimation results. . . . . . . . . . . . . . . . . . . . . . . . . . . 25112.26Clipped extended Kalman filter results. . . . . . . . . . . . . . . . . . . . . . . . 25212.27Moving-horizon estimation results. . . . . . . . . . . . . . . . . . . . . . . . . . . 253

13.1 General diagram of closed-loop control for the model-predictive control frame-work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

13.2 Exothermic CSTR diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26813.3 Steady states for the Exothermic CSTR example. . . . . . . . . . . . . . . . . . . 26913.4 Exothermic CSTR feed disturbance. . . . . . . . . . . . . . . . . . . . . . . . . . . 26913.5 Exothermic CSTR results: rejection of a feed disturbance using an output dis-

turbance model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27113.6 Exothermic CSTR: Comparison of best nonlinear results to linear MPC results. 27213.7 Maximum yield CSTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27313.8 Maximum yield CSTR steady states . . . . . . . . . . . . . . . . . . . . . . . . . 27413.9 Maximum yield CSTR: temporary output disturbance . . . . . . . . . . . . . . . 27413.10Maximum yield CSTR results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

1

Chapter 1

IntroductionChemical reaction models present one method of assimilating and interpreting complex re-action pathways. Usually a deterministic framework is employed to model these networksof chemical reactions. This framework assumes that a system evolves in a continuous, well-prescribed manner. Systems-level tasks seek to extract the maximum amount of utility fromthese models. Most of these tasks, such as parameter estimation and feedback control, can beposed in terms of optimization problems.

For systems containing few numbers of particles, such as intracellular reaction net-works, concentrations are not large enough to justify applying the usual smoothly-varyingassumption made in deterministic models. Rather, there are a countably finite number ofchemical species in the given system. Stochastic reaction models consider such mesoscopicphenomena in terms of discrete, molecular events that, given a cursory examination, occurin a “random” fashion. These stochastic simulations are merely realizations of a deterministi-cally evolving probability distribution. Here, one must use simulation to reconstruct momentsof this distribution due to the tremendous size of the probability space. The basis for thesemodels is well established in the literature, but the methods that govern the exact simulationof these models often become computationally expensive to evaluate and hence have greatroom for improvement. Additionally, relatively little work has been performed in extendingsystems-level tasks to handling these sorts of models. Consequently, there exists a need to firstformulate reasonable analogs of these traditionally deterministic tasks in a stochastic setting,and then propose methods for efficiently performing these tasks.

One of the simplest, yet most intriguing biological organisms is the virus. The viruscontains enough genetic information to replicate itself given the machinery of a living host.So powerful is this strategy that viral infections present one of the most potent threats to hu-man survival and well-being. The Joint United Nations Programme on HIV/AIDS (UNAIDS)estimates that in 2002, 42 million people were living with HIV/AIDS, 5 million people werenewly infected with HIV, and 3.1 million people died due to AIDS related illnesses. The WorldHealth Organization estimates that of the 170 million people currently suffering from hepatitisC, roughly one million will develop cancer of the liver during the next 10 years. In the UnitedStates alone, researchers estimate that the 500 million cases of the common cold contractedannually cost $40 billion in health care costs and lost productivity [31]. Hence there is a vi-

2

tal humanitarian and economic interest in systematically understanding how viral infectionsprogress and how this progression can be controlled. Accordingly, researchers have investedsignificant amounts of time and money towards determining the roles that individual compo-nents such as the genome or proteins play in viral infections. As of yet, however, there existsno comprehensive picture that quantitatively incorporates and integrates data on viral infec-tions from multiple levels. Again, models offer one manner of consolidating the vast amountof information contained across these levels, and systems-level tasks provide one method ofconveniently extracting information.

This dissertation considers the role of deterministic and stochastic models in assimilat-ing dynamic data. The primary focus is on maximizing the information available from thesemodels as well as applying such models to experimental systems. The remainder of this thesisis organized as follows:

• Chapter 2 reviews literature pertaining to simulation of deterministic and stochasticchemical reaction models and methods for extracting information from these simula-tions, such as parameter estimation and state estimation. Here, we introduce the sensi-tivity as a useful quantity for performing systems-level tasks.

• Chapter 3 provides motivation for solving the problems addressed in this thesis.

• Chapters 4 through 7 examine stochastic simulation with an emphasis on stochasticchemical kinetics. We present this material in the following order:

– In Chapter 4, we derive approximations for stochastic chemical kinetics for sys-tems with coupled fast and slow reactions. These approximations lead to simu-lation strategies that result in drastic reductions of computational expense whencompared to exact simulation methods.

– Chapter 5 considers biased approximations for calculating mean sensitivities fromsimulation for the stochastic chemical kinetics problem, and then applies these sen-sitivities to calculate steady states and estimate parameters.

– Chapter 6 explains how the discrete nature of the stochastic chemical kinetics for-mulation makes obtaining unbiased estimates of mean sensitivities difficult, thenexplores several techniques for calculating these unbiased estimates.

– Chapter 7 considers unbiased estimates for sensitivities of simulations governedby stochastic differential equations. Here, we simply differentiate the continuoussamples paths to obtain the desired sensitivities, then use the sensitivities to per-form useful tasks.

– Chapter 8 applies some of the stochastic simulation methods developed in previ-ous chapters to solve the batch crystallization population balance. The flexibility ofthe simulation allows the modeler to focus on modeling the experimental systemrather than the numerical methods required to solve the resulting models.

3

• Chapters 9 through 11 address population balance models for viral infections. We con-sider the following issues:

– Chapter 9 derives a population balance model incorporating information fromboth the intracellular and extracellular levels of description. To explore the util-ity of this model, we compare numerical results from this model to other simplermodels for experimentally relevant conditions.

– Chapter 10 considers modeling of experimental data from the focal infection sys-tem. This experimental system provides dynamic image data for multiple roundsof virus infection and antiviral host response. Here, we place an emphasis on deter-mining the minimal level of modeling complexity necessary to adequately describethe experimental data.

– Chapter 11 proposes a decomposition technique for solving population balancemodels when flow of information is restricted from the extracellular to intracellularlevel. The goal is to efficiently and accurately solve population balance modelswhile reconstructing population-level dynamics for intracellular and extracellularspecies.

• Chapters 12 and 13 consider one specific system-level task, namely state estimation.These chapters focus on the probabilistic formulation of the state estimation problem,in which the goal is to calculate the state estimate that maximizes the a posteriori dis-tribution (the probability of the current state conditioned on all available experimentalmeasurements). We examine the following topics:

– Chapter 12 outlines conditions for generating multiple modes in the a posterioridistribution for some relevant chemically reacting systems. We then constructexamples exhibiting such conditions, and compare how several state estimators,namely the extended Kalman filter, moving-horizon estimator, and Monte Carlofilters, perform for these examples.

– Chapter 13 examines how multiple modes in the a posteriori distribution can affectthe performance of closed-loop feedback control for different estimators.

• Finally, Chapter 14 presents conclusions, outlines major accomplishments, and discussespotential areas of future work.

5

Chapter 2

Literature ReviewModels for chemical reaction networks usually arise in a traditional, deterministic setting.Given a deterministic model, we can consider performing various systems level tasks such asparameter estimation and control. We can generally pose these tasks in terms of an optimiza-tion. In this context, a quantity known as the sensitivity becomes useful for efficient solutionof the optimization. The shortcomings of the traditional deterministic framework motivate theneed for alternatives that provide a more flexible foundation for chemical reaction modeling.Two such alternatives are stochastic and population balance models. This chapter presents abrief review of the modeling literature for both these subjects and the traditional models.

2.1 Traditional Deterministic Reaction Models

In a deterministic setting, we perform mass balances for the reactants and products of interestusing the equation of continuity. Here we define the mass of these species as a function of time(t) and the internal (y) and external (x) characteristics of the system:

η(t, z)dz = mass of reactants or products (2.1)

z =

[xy

]=

[external characteristicsinternal characteristics

](2.2)

We now consider an arbitrary, time-varying control volume V (t) spanning a space in z. Thisvolume has a time-varying surface S(t). The normal vector ns points from the surface awayfrom the volume, and the vector vs specifies the velocity of the surface. The vector vz specifiesthe velocity of material flowing through the volume. Figure 2.1 depicts a low-dimensionalrepresentation of this volume.

Assuming that V (t) contains a statistically significant amount of mass, the conservationequation for the species contained in V (t) is

d

dt

∫V (t)

η(t, z)dz︸︷︷︸accumulation

=∫

V (t)Rηdz︸︷︷︸

generation

−∫

S(t)F · nsdΩ︸︷︷︸

convective + diffusive flux

+∫

S(t)η(t, z)(vs · ns)dΩ︸︷︷︸

flux due to surface motion

(2.3)

6

ns

V (t)

S(t)

vs

vs

vz

nsvz

z1

z2

Figure 2.1: Microscopic volume considered in the equation of continuity for two dimensions.

in which Rη refers to the production rate of the species η, F is the total flux, and dΩ is thedifferential change in the surface. Making use of the Leibniz formula permits differentiatingthe volume integral

d

dt

∫V (t)

η(t, z)dz =∫

V (t)

∂η(t, z)∂t

dz +∫

S(t)η(t, z) (vs · ns) dΩ (2.4)

Substituting equation (2.4) into equation (2.3) yields∫V (t)

∂η(t, z)∂t

dz =∫

V (t)Rηdz−

∫S(t)

F · nsdΩ (2.5)

Now apply the divergence theorem to the surface integral to obtain∫V (t)

∂η(t, z)∂t

dz =∫

V (t)Rηdz−

∫V (t)∇ · Fdz (2.6)

Combining all terms into the same integral yields∫V (t)

∂η(t, z)∂t

dz +∇ · F−Rηdz = 0 (2.7)

Since the element V (t) is arbitrary, the argument of the integral must be zero; this result yieldsthe microscopic equation of continuity:

∂η(t, z)∂t

+∇ · F = Rη (2.8)

7

Equation (2.8) is the most general form of our proposed model. Both Bird, Stewart, and Light-foot [11] and Deen [24] derive this equation without consideration of internal characteristics.We consider a time-varying control element, so our derivation is more akin to that of Deen [24].

Traditionally, one assumes that there are no internal characteristics of interest. Equa-tion (2.8) then further reduces to:

∂η(t,x)∂t

+∇ · F = Rη (2.9)

Additionally, we can write the total flux F as the sum of convective and diffusive fluxes

F = η(t,x)vx + f (2.10)

We now assume that the reactor is well-stirred so that neither η nor Rη depend on the externalcoordinates x. This assumption implies that there is no diffusive flux, i.e. f = 0, which yields

∂η(t,x)∂t

+∇ · (η(t)vx) = Rη (2.11)

Next, we integrate over the time-varying reactor volume Ve:∫Ve

∂η(t)∂t

+∇ · (η(t)vx) dx =∫

Ve

Rηdx (2.12)∫Ve

∂η(t)∂t

dx +∫

Ve

∇ · (η(t)vx) dx =∫

Ve

Rηdx (2.13)

Vedη

dt+∫

Ve

∇ · (ηvx) dx = RηVe (2.14)

in which we have dropped the time dependence of η for notational convenience. Applyingthe divergence theorem to change the volume integral to a surface integral yields

Vedη

dt+∫

Se

ne · (ηvx) dΩe = RηVe (2.15)

in which

• Se is the time-varying surface of the reactor volume Ve,

• dΩe is the differential change in this surface, and

• ne is the normal vector with respect to the surface pointing away from the reactor vol-ume.

Clearly η does not change within the reactor volume. However, changes to the surface as wellas influx and outflow of material across the reactor boundary affect η as follows∫

Se

ne · (ηvx) dΩe =∫

Se,1

ne · (ηvx) dΩe,1︸︷︷︸flow across the reactor surface

+∫

Se,2

ne · (ηvx) dΩe,2︸︷︷︸surface expansion due to reactor volume changes

(2.16)

= qfηf − qη +ηdVe

dt(2.17)

8

in which q and qf are respectively the effluent and feed volumetric flow rates, and ηf is theconcentration of η in the feed. The resulting conservation equation is

Vedf

dt− qfηf + qη + η

dVe

dt= RηVe (2.18)

d(ηVe)dt

= qfηf − qη +RηVe (2.19)

Equation (2.19) is commonly associated with continuous stirred tank reactors (CSTR’s). Alter-natively, we could have derived the plug flow reactor (PFR) design equation by starting withequation (2.9) and assuming that the reactor is well mixed in only two external dimensions.

2.2 Systems Level Tasks for Deterministic Models

Performing systems level tasks such as parameter estimation, model based feedback control,and process and product design requires a different set of tools than those required for puresimulation. Many systems level tasks are conveniently posed as optimization problems. Webriefly review several of these tasks, namely optimal control, state estimation, and parameterestimation, and introduce the sensitivity as a useful quantity for performing these tasks.

2.2.1 Optimal Control

Optimal control consists of minimizing an objective of the form

minu0,...,uN

Φ =N∑

k=0

(yk − ysp)T Q(yk − ysp) + (uk − usp)T R(uk − usp) + (∆uk)T S∆uk (2.20a)

s.t. xk+1 = F (xk,uk) (2.20b)

yk = h(xk) (2.20c)

∆uk = uk − uk−1, d(xk) ≥ 0, g(uk) ≥ 0 (2.20d)

in which

• yk is the measurement at time tk;

• uk is the input at time tk;

• xk is the state at time tk;

• F (xk,uk) is the solution to a first-principles model (e.g. equation (2.19)) over the timeinterval [tk, tk+1);

• ysp and usp are the measurement and input, respectively, at the desired set point;

• Q and R are matrices that penalize deviations of the measurement and input from setpoint; and

• S is a matrix that penalizes changes in the input.

9

In general, the optimal control problem considers an infinite number of decisions, i.e. thecontrol horizon N is infinite. As shown in Figure 2.2, the goal of optimization (2.20) is to drivethe measurements to their set points. Most control applications consist of discrete time sample,so we have formulated the model, equation (2.20b), in discrete time also.

Future

k k + 1 k + 2 k

uk

yk

Past

k − 1

Pres

ent

control objectivevalue of

Figure 2.2: Optimal control seeks to drive the output to set point by minimizing deviations ofboth the output y and the input u from their respective set points.

There is a wealth of control literature that examines the properties of the equation (2.20).For example, this formulation does not even guarantee that the controller will drive the out-puts to set point. Rather, one must include additional conditions such as enforcing a terminalpenalty on each optimization (i.e. yN = ysp) or adding a terminal penalty to the final measure-ment yN that quantifies the cost-to-go for an infinite horizon. We refer the interested reader tothe literature for additional information on this subject [119, 118, 90].

2.2.2 State Estimation

State estimation poses the problem: given a time course of experimental measurements anda dynamic model of the system, what is the most likely state of the system? This problem isusually formulated probabilistically, that is, we would like to calculate

xk|k = arg maxxk

P (xk|y0, . . . ,yk) (2.21)

in which xk is the state at time tk, yk is the measurement at time tk, and xk|k is the a poste-riori state estimate of x at time tk given all measurements up to time tk. The nature of the

10

estimator depends greatly on the choice of dynamic model. For linear, unconstrained sys-tems with additive Gaussian noise, the Kalman filter [144] provides a closed-form solution toequation (2.21). For constrained or nonlinear systems, solution of this equation may or maynot be tractable. One computationally attractive method for addressing the nonlinear systemis the extended Kalman filter, which first linearizes the nonlinear system, then applies theKalman filter update equations to the linearized system [144]. This technique assumes that thea posteriori distribution is normally distributed (unimodal). Examples of implementationsinclude estimation for the production of silicon/germanium alloy films [93], polymerizationreactions [103], and fermentation processes [55]. However, the extended Kalman filter, or EKF,is at best an ad hoc solution to a difficult problem, and hence there exist many barriers to thepractical implementation of EKFs (see, for example, Wilson et al. [163]).

2.2.3 Parameter Estimation

Parameter estimation seeks to reconcile model predictions with experimental data, as shownin Figure 2.3. In particular, we would like to maximize the probability of the mean parameter

A + B↔ C

k

yk

302520151050

7

6

5

4

3

2

1

0

Figure 2.3: Parameter estimation seeks to minimize the deviations between the model predic-tion (solid line) and the data (points).

set θ given the measurements yk’s

maxθ

PΘ|Y 0,...,Y N(θ|y0, . . . ,yN ) (2.22)

in which θ and yk are realizations of the random variables Θ and Y k, respectively. For con-venience, we drop the subscript denoting the random variable unless required for clarity. Weassume that the measurements yk’s are generated from an underlying deterministic model

11

whose measurements are corrupted by noise, i.e.

xk+1 = F (xk;θ) (2.23)

yk = h(xk) + vk (2.24)

vk ∼ N (0,Π) ∀k = 0, . . . , N (2.25)

in which

• the state variables xk’s are simply convenient functions of the parameters θ and

• the variables vk’s are realizations of the normally distributed random variable ξ ∼ N (0,Π) 1.

Using Bayes’ Theorem to manipulate the joint distribution P (θ|y0, . . . ,yN ) yields

P (θ|y0, . . . ,yN )P (y0, . . . ,yN )︸︷︷︸constant

= P (y0, . . . ,yN |θ)P (θ) (2.26)

P (θ|y0, . . . ,yN ) ∝ P (y0, . . . ,yN |θ)P (θ) (2.27)

In general, P (θ) is assumed to be a noninformative prior so as not to unduly influence theestimate of the parameters. For the chosen disturbances (i.e. normally distributed), Box andTiao show that the noninformative prior is the distribution P (θ) = constant [14]. We derive thedistribution of P (y0, . . . ,yN ,θ) from the known distribution P (v0, . . . ,vN ,θ) in the mannerdescribed by Ross [130]. This derivation require use of the inverse function theorem fromcalculus [132]. First define the function mapping (v0, . . . ,vN ,θ) onto (y0, . . . ,yN ,θ) as

f(v0, . . . ,vN ,θ) =

h(x0(θ)) + v0

...h(xN (θ)) + vN

θ

(2.28)

We require that

1. f(v0, . . . ,vN ,θ) can be uniquely solved for v0, . . . ,vN and θ in terms of y0, . . . ,yN andθ. This condition is trivially true because

vk = yk − h(xk(θ)) ∀k = 0, . . . , N (2.29a)

θ = θ (2.29b)

2. f(v0, . . . ,vN ,θ) has continuous partial derivatives at all points and the determinant of

1The notationN (0,Π) refers to a normally distributed random variable with mean 0 and covariance Π.

12

its Jacobian is nonzero. The Jacobian J of equation (2.28) is

J =∂f(v0, . . . ,vN ,θ)

∂zT=

I ∂h(x0(θ))

∂xT0

∂x0

∂θT

. . ....

I ∂h(xN (θ))

∂xTN

∂xN

∂θT

I

(2.30)

zT =[vT

0 . . . vTN θT

](2.31)

If h(xk) and xk are at least once continuously differentiable for all k = 0, . . . , N , then theJacobian has continuous partial derivatives. Also, J is a block-upper triangular matrixwith ones on the diagonal, so its determinant is one (nonzero).

Since these conditions hold, we can calculate the distribution P (y0, . . . ,yN ,θ) via

P (y0, . . . ,yN ,θ) = det(J)−1P (v0, . . . ,vN ,θ) (2.32)

=

(N∏

k=0

Pξ(vk)

)P (θ) (2.33)

Then the desired conditional is

P (y0, . . . ,yN |θ) =P (y0, . . . ,yN ,θ)

P (θ)(2.34)

=N∏

k=0

Pξ(vk) (2.35)

=N∏

k=0

Pξ (yk − h(xk(θ))) (2.36)

We derive the desired optimization problem next:

maxθ

P (θ|y0, . . . ,yN ) ∝ maxθ

N∏k=0

Pξ(vk) (2.37)

= maxθ

log

(N∏

k=0

Pξ(vk)

)(2.38)

= maxθ

N∑k=0

logPξ(yk − h(xk(θ))) (2.39)

∝ minθ

N∑k=1

12(yk − h(xk))TΠ−1(yk − h(xk)) (2.40)

13

Therefore, this problem is equivalent to the optimization

minθ

Φ =12

N∑k=1

eTk Π−1ek (2.41a)

ek = yk − h(xk) (2.41b)

xk+1 = F (xk;θ) (2.41c)

We refer the reader to Box and Tiao [14] and Stewart, Caracotsios, and Sørensen [145]for a more detailed account of estimating parameters from data. Their discussion includes, forexample, calculation of confidence intervals for estimated parameters.

2.2.4 Sensitivities

We define the sensitivity s as

s =∂x

∂θT(2.42)

in which x is the state of the system and θ is a vector containing the parameters of interestfor the system. This quantity is useful for efficiently performing optimization. In particular,sensitivities provide precise first-order information about the solution of the system, and thisfirst-order information is manipulated to calculate gradients and Hessians that guide the non-linear optimization routines. For example, consider the nonlinear optimization for parameterestimation, equation (2.41). A strict local solution to this optimization is obtained when thegradient is zero and the Hessian is positive definite. Calculating these quantities yields

∇θΦ =∂

∂θT

12

∑k

eTk Π−1ek (2.43)

= −∑

k

(∂h(xk)∂xT

k

∂xk

∂θT

)T

Π−1ek (2.44)

= −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1ek (2.45)

∇θθΦ =∂

∂θT∇θΦ (2.46)

=∂

∂θT

(−∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1ek

)(2.47)

= −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1∂h(xk)∂xT

k

sk +(∂h(xk)∂xT

k

∂2xk

∂θk∂θTk

)T

Π−1ek (2.48)

14

The sensitivity s clearly arises in calculation of both of these quantities.Next, we consider calculation of sensitivities for ordinary differential equations (ODE’s)

and differential algebraic equations (DAE’s). This analysis basically summarizes the excellentwork presented by Caracotsios et al. [17].

ODE Sensitivities

ODE systems may be written in the following form:

dx

dt= f(x,θ) (2.49a)

x(0) = x0 (2.49b)

Accordingly, we can obtain an expression for the evolution of the sensitivity by differentiatingequation (2.49a) by the parameters θ:

∂

∂θT

(dx

dt

)=

∂

∂θTf(x,θ) (2.50)

d

dt

(∂x

∂θT

)=∂f(x,θ)∂xT

∂x

∂θT+∂f(x,θ)∂θT

∂θ

∂θT(2.51)

ds

dt=∂f(x,θ)∂xT

s +∂f(x,θ)∂θT

(2.52)

This analysis demonstrates that the evolution equation for the sensitivity is the following ODEsystem:

ds

dt=∂f(x,θ)∂xT

s +∂f(x,θ)∂θT

(2.53a)

si,j(0) =

1 if x0,i = θj

0 otherwise(2.53b)

Equation (2.53) demonstrates two distinctive features about the evolution equation forthe sensitivity:

1. it is linear with respect to s, and

2. it depends only on the current values of s and x.

Therefore, we can solve for s by merely integrating equation (2.53) along with the ODE sys-tem (2.49).

DAE Sensitivities

DAE systems consider the following general form:

0 = g(x,x,θ) (2.54a)

x(0) = x0 (2.54b)

x(0) = x0 (2.54c)

15

where x is the state of the system, x is the first derivative of x, and θ is a vector containingthe parameters of interest for the system. Again, we define the sensitivity s by equation (2.42)and differentiate equation (2.54a) by the θ to determine an expression for the evolution of thesensitivity:

0 =∂

∂θTg(x,x,θ) (2.55)

0 =∂g(x,x,θ)

∂xT

∂x

∂θT+∂g(x,x,θ)

∂xT

∂x

∂θT+∂g(x,x,θ)

∂θT

∂θ

∂θT(2.56)

0 =∂g(x,x,θ)

∂xT

d

dt

(∂x

∂θT

)+∂g(x,x,θ)

∂xT

∂x

∂θT+∂g(x,x,θ)

∂θT(2.57)

0 =∂g(x,x,θ)

∂xT

ds

dt+∂g(x,x,θ)

∂xTs +

∂g(x,x,θ)∂θT

(2.58)

This analysis demonstrates that the evolution equation for the sensitivity of a DAE systemyields a linear DAE system:

0 =∂g(x,x,θ)

∂xTs +

∂g(x,x,θ)∂xT

s +∂g(x,x,θ)

∂θT(2.59a)

si,j(0) =

1 if x0,i = θj

0 otherwise(2.59b)

s(0) = s0 (2.59c)

As is the case for the original DAE system (2.54), we must pick a consistent initial condi-tion (i.e. s0 and s0 must satisfy equation (2.59a)). Again, we find that we can solve for thesensitivities of the system by merely integrating equation (2.59) along with the original DAEsystem (2.54).

2.3 Stochastic Reaction Models

When dealing with systems containing a countably finite number of molecules, deterministicmodels make the unrealistic assumptions that

1. mesoscopic phenomena can be treated as continuous events; and

2. identical systems given identical perturbations behave precisely the same.

For example, most models of intracellular kinetics inherently examine a small number ofmolecules contained within a single cell (the finite number of chromosomes in the nucleus,for example), making the first assumption invalid. Additionally, identical systems given iden-tical perturbations may elicit completely different responses. Stochastic models of chemicalkinetics make no such assumptions, and hence offer one alternative to traditional determin-istic models. These models have recently received an increased amount of attention from themodeling community (see, for example, [3, 91, 79]).

16

Stochastic models of chemical kinetics postulate a deterministic evolution equation forthe probability of being in a state rather than the state itself, as is the case in the usual deter-ministic models. Gillespie outlines the derivation of the evolution equation for this probabilitydistribution in depth [48]. The basis of this derivation depends on the “fundamental hypoth-esis” of the stochastic formulation of chemical kinetics, which defines the reaction parametercµ characterizing reaction µ as:

cµdt = average probability, to first order in dt, that a particular combination of µreactant molecules will react accordingly in the next time interval dt.

We also define

• hµ as the number of distinct molecular reactant combinations for reaction µ at a giventime, and

• aµ(n)dt = hµcµδt as the probability, first order in dt, that a µ reaction will occur in thenext time interval dt.

Given this “fundamental hypothesis”, the governing equation for this system is thechemical master equation

dP (n, t)dt

=m∑

k=1

ak(n− νk)P (n− νk, t)− ak(n)P (n, t) (2.60)

in which

• n is the state of the system in terms of number of molecules (a p-vector),

• P (n, t) is the probability that the system is in state n at time t,

• ak(n)dt is the probability to order dt that reaction k occurs in the time interval [t, t+ dt),and

• νk is the kth column of the stoichiometric matrix ν (a p×m matrix).

Here, we assume that the initial condition P (n, t0) is known.The solution of equation (2.60) is computationally intractable for all but the simplest

systems. Rather, Monte Carlo methods are employed to reconstruct the probability distri-bution and its statistics (usually the mean and variance). We consider such methods subse-quently.

2.3.1 Monte Carlo Simulation of the Stochastic Model

Monte Carlo methods take advantage of the fact that any statistic can be written in terms of alarge sample limit of observations, i.e.

h(n) ,∫h(n)P (n, t)dn = lim

N→∞

1N

N∑i=1

h(ni) ≈ 1N

N∑i=1

h(ni) for N sufficiently large (2.61)

17

Number of Samples

Mea

n(E

[X])

500040003000200010000

0.60.550.5

0.450.4

0.350.3

0.250.2

0.150.1

Figure 2.4: Illustration of the strong law of large numbers given a uniform distribution overthe interval [0, 1]. As the number of samples increases, the sample mean converges to the truemean of 0.5.

in which ni is the ith Monte Carlo reconstruction of the state n. Accordingly, the desired statis-tic can be reconstructed to sufficient accuracy given a large enough number of observations.This statement follows as a direct result of the strong law of large numbers, which we statenext.

Theorem 2.1 (Strong Law of Large Numbers [130].) Let X1, X2, . . . , Xn be a sequence of inde-pendent and identically distributed random variables, each having finite mean E[Xi] = m. Then, withprobability 1,

limn→∞

X1 + . . .+Xn

n= m (2.62)

Proof: See Ross for details of the proof [130].

In this case, reconstructions of the desired statistic, i.e. h(ni), are independent and identicallydistributed variables according to the common density function given by the chemical masterequation (2.60). Therefore, sampling sufficiently many of these h(ni) gives us the convergenceto h(n) specified by the strong law of large numbers.

We illustrate the strong law of large numbers with a simple example. Consider a uni-form distribution over the interval [0, 1]. This distribution has a finite mean of 0.5. The stronglaw of large numbers requires the average of samples drawn from this distribution to approachthe mean with probability one. Figure 2.4 plots the average as a function of sample size; clearlythis value approaches 0.5 as the number of samples increases.

Unfortunately, the strong law of large numbers gives no indication as to the accuracyof the reconstructed statistic given a finite number of samples. An estimate for the degree ofaccuracy actually arises from the central limit theorem, which we state next.

18

Theorem 2.2 (Central Limit Theorem [130].) LetX1, X2, . . . , Xn be a sequence of independent andidentically distributed random variables, each having finite mean m and finite variance σ2. Then thedistribution of

Zn =X1 + . . .+Xn − nm

σ√n

(2.63)

tends to the standard normal as n→∞. That is,

limn→∞

P (Zn ≤ a) =1√2π

∫ a

−∞e−x2/2dx

Proof: See Ross for details of the proof [130]. In this case, we now expect the reconstruction of the desired statistic, i.e. h(n), to

be normally distributed assuming a large enough finite sample N . Simulating this statisticmultiple times (e.g. twenty samples of h(n) reconstructed from N samples each, or 20 × Ntotal samples) permits indirect estimation of standard statistics for h(n) such as confidenceintervals. How does one check whether or not the finite sample size N is large enough tojustify invocation of the central limit theorem, then? Kreyszig proposes the following rule ofthumb for determining this number of samples: if the skewness of the distribution is small, useat least twenty and fifty samples to reconstruct the mean and variance, respectively [75]. Wecan also reconstruct multiple realizations of the ZN distribution, then use statistical tests suchas the Shapiro-Wilk test to test this distribution for normality [137, 131]. If these tests indicatenormality, then we are free to apply the usual statistical inferences for the ZN distribution andhence obtain some measure of the accuracy of the reconstructed statistic h(n).

We illustrate the central limit theorem using again the uniform density over the range[0, 1]. Figure 2.5 compares the Monte Carlo reconstructed density for ZN to the standard nor-mal distribution. For N = 1, the reconstructed density of ZN is obviously not normal; in fact,this plot merely reconstructs the underlying uniform distribution (appropriately shifted). ForN = 20, the reconstructed density of ZN compares favorably to the standard normal.

These statistical theorems, then, ultimately require samples to be drawn exactly fromthe master equation. For nontrivial examples, direct solution of the master equation is notfeasible. Alternatively, one could consider an exact stochastic simulation of the “fundamentalhypothesis” as examined by Gillespie [45]. This method examines the joint probability func-tion, P (τ, µ)dτ , that governs when the next reaction occurs, and which reaction occurs. Here,

P (τ, µ|n, t) = aµ(n) exp

(−

p∑k=1

ak(n)τ

)(2.64)

in which P (τ, µ|n, t)dτ is the probability that the next reaction will occur in the infinitesimaltime interval [t + τ, t + τ + dτ) and will be a µ reaction, given that the original state is n attime t. One can then construct numerical algorithms for simulating trajectories obeying thedensity (2.64).

To our knowledge, no one has yet demonstrated the equivalence between the chemicalmaster equation and stochastic simulation. The fact that these two formulas are somehow

19

(a)

Z1

Prob

abili

tyD

ensi

ty(f

(Z1))

420-2-4

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0

(b)

Z20

Prob

abili

tyD

ensi

ty(f

(Z20))

420-2-4

0.450.4

0.350.3

0.250.2

0.150.1

0.050

Figure 2.5: Illustration of the central limit theorem given a uniform distribution over the in-terval [0, 1]: (a) N = 1 sample and (b) N = 20 samples. Solid line plots the Monte Carloreconstructed density. Dashed line plots the standard normal distribution.

equivalent rests solely on the basis that both arise from the “fundamental hypothesis”. Thisreasoning is tantamount to the logical statements “if A implies B and A implies C, then Bimplies C and C implies B”. This reasoning is incorrect. Here, we demonstrate that one canderive equations (2.60) and (2.64) from one another.

Theorem 2.3 (Equivalence of the master equation and the next reaction probability density.)Assume that P (N0, t0) is known, where

N0 =[n0 n1 . . .

](2.65)

The probability densities generated by the chemical master equation (i.e. equation (2.60)) and the joint

20

density P (τ, µ|n, t)dτ (i.e. equation (2.64)) are identical.

Proof. If these probability densities are indeed equivalent, the evolution equations for thesedensities must be equivalent. Therefore we can prove this theorem by demonstrating that (1)P (τ, µ|n, t)dτ gives rise to the chemical master equation and (2) the chemical master equationgives rise to P (τ, µ|n, t)dτ .

1. Given P (τ, µ|n, t)dτ , derive the chemical master equation.

We consider propagating the marginal density P (nj , t) (dropping the conditional argu-ment (N0, t0) for convenience) from time t to the future time t + dτ . Noting that theprobability of having multiple reactions occur over this time is order dτ , we have

P (nj , t+ dτ) =P (nj , t)

(1−

m∑k=1

limτ→0

P (τ, k|nj , t)dτ

)(2.66)

+m∑

k=1

P (nj − νk, t) limτ→0

P (τ, k|nj − νk, t)dτ +O(dτ) (2.67)

Manipulating this equation gives rise to the chemical master equation:

P (nj , t+ dτ)− P (nj , t)dτ

= −ak(nj)P (nj , t) + P (nj − νk, t)ak(nj − νk) +O(1) (2.68)

limdτ→0

P (nj , t+ dτ)− P (nj , t)dτ

= limdτ→0

m∑k=1

−ak(nj)P (nj , t) + P (nj − νk, t)ak(nj − νk) +O(1)

(2.69)

dP (nj , t)dt

=m∑

k=1

−ak(nj)P (nj , t) + P (nj − νk, t)ak(nj − νk) (2.70)

2. Given the chemical master equation, derive P (τ, µ|n, t)dτ .

In this case, the master equation (2.60) is known. Given that the system is in state n attime t, we seek to derive the probability that the next reaction will occur at time t+τ andwill be reaction µ. This statement is equivalent to specifying

(a) P (n, t) = 1 and

(b) no reactions occur over the interval [t, t+ τ).

Accordingly, the master equation reduces to the following form:

d

dt′

P (n, t′)

P (n + ν1, t′)

...P (n + νm, t

′)

=

−∑m

k=1 ak(n) 0 . . . 0a1(n) 0 . . . 0

......

...am(n) 0 . . . 0

P (n, t′)P (n + ν1, t

′)...

P (n + νm, t′)

, t ≤ t′ ≤ t+ τ

(2.71)

21

in which we have now effectively conditioned each P (n, t′) on the basis that no reactionoccurs over the given interval. Solving for the desired probabilities yields

P (n, t′) = exp

(−

m∑k=1

ak(n)(t′ − t)

)(2.72)

P (n + νj , t′) =

aj(n)∑mk=1 ak(n)

[1− exp

(−

m∑k=1

ak(n)(t′ − t)

)], 1 ≤ j ≤ m (2.73)

Our strategy now is to first note that P (τ, µ|n, t)dτ consists of the independent probabil-ities

P (τ, µ|n, t)dτ = P (µ|n, t)P (τ |n, t)dτ (2.74)

then solve for these marginal densities as a function of the P (n, t′)’s. Conceptually,P (τ |n, t)dτ is the probability that the first reaction occurs in the interval [t+τ, t+τ+dτ).We solve for this quantity by taking advantage of its relationship with P (n, t+ τ)

P (τ |n, t)dτ =m∑

j=1

dP (n + νj , t′)

dt′

∣∣∣∣∣∣t′=t+τ

dτ (2.75)

= −dP (n, t′)dt′

∣∣∣∣t′=t+τ

dτ (2.76)

=m∑

k=1

ak(n)P (n, t+ τ)dτ (2.77)

=m∑

k=1

ak(n) exp

(−

m∑k=1

ak(n)τ

)dτ (2.78)

As expected, P (τ |n, t)dτ is independent of µ.

Similarly, we express P (µ|n, t) as a function of the P (n + νj , t′)’s

P (µ|n, t) =P (n + νµ, t

′)∑mk=1 P (n + νk, t′)

(2.79)

=

aµ(n)∑mk=1 ak(n)

[1− exp

(−

m∑k=1

ak(n)(t′ − t)

)]m∑

j=1

aj(n)∑mk=1 ak(n)

[1− exp

(−

m∑k=1

ak(n)(t′ − t)

)] (2.80)

=aµ(n)∑m

k=1 ak(n)(2.81)

As expected, P (µ|n, t) is independent of τ .

22

Combining the two marginal densities, we obtain

P (τ, µ|n, t)dτ = P (µ|n, t)P (τ |n, t)dτ (2.82)

=aµ(n)∑m

k=1 ak(n)

m∑k=1

ak(n) exp

(−

m∑k=1

ak(n)τ

)dτ (2.83)

= aµ(n) exp

(−

m∑k=1

ak(n)τ

)dτ (2.84)

as claimed.

Theorem 2.4 (Reconstruction of the master equation density from exact simulation.) As-suming conservation of mass and a finite number of reactions, then the probability density at a singlefuture time point t reconstructed from Monte Carlo simulations converges to the density governed bythe chemical master equation almost surely over the interval [t0, t]. That is,

P

lim

N→∞PN (ni, t|N0, t0) = P (ni, t|N0, t0)

= 1 ∀i = 1, . . . , ns (2.85)

in which

• N is the number of exact Monte Carlo simulations,

• PN (n, t|N0, t0) is the Monte Carlo reconstruction of the probability density givenN exact simulations,

• P (n, t|N0, t0) is the density governed by the master equation, and

• ns is the total number of possible species.

Proof: We must show that

P

ψ : lim

N→∞ni,N (ψ, t) = ni(ψ, t)

= 1 ∀i = 1, . . . ns (2.86)

in which

• N =[n1 . . . nns

]T, and

• ni,N is the Monte Carlo reconstruction of ni given N simulations.

Let ε > 0. We must show that there exists an N such that if m > N ,

|P ψ : ni,m(ψ, t) = ni(ψ, t) − 1| < ε ∀i = 1, . . . ns (2.87)

The assumption of conservation of mass and a finite number of reactants indicates that ns isfinite. Choose

Xi(ψ, t) = δ(ψ − ni, t) (2.88)

in which the random variable ψ is generated by running an exact stochastic simulation un-til time t. The mean of this random variable is P (ni, t|N0, t0). Theorem 2.3 states that any

23

simulation scheme obeying the next reaction probability density P (τ, µ|n, t) generates exacttrajectories from the master equation. Therefore, we can apply the strong law of large num-bers, which says that there exists an Ni∀i = 1, . . . , ns such that if m > Ni,

|P ψ : Xi,m(ψ, t) = P (ni, t) − 1| ≤ ε

2∀i = 1, . . . ns (2.89)

Let N = maxiNi. Then if m > N ,

|P ψ : ni,m(ψ, t) = ni(ψ, t) − 1| ≤ |P ψ : ni,N (ψ, t) = ni(ψ, t) − 1| ∀i = 1, . . . ns (2.90)

≤ ε

2∀i = 1, . . . ns (2.91)

< ε ∀i = 1, . . . ns (2.92)

Since ε is arbitrary, the proof is complete. In his seminal works, Gillespie proposes two simple and efficient methods for gener-

ating exact trajectories obeying the probability function P (τ, µ) [45, 46]. Theorem 2.3 provesthat these trajectories obey exactly the chemical master equation (2.60). Gillespie appropri-ately named these algorithms the direct method and the next reaction method. We summarizethese methods in algorithms 1 and 2.

Algorithm 1 Direct Method.Initialize.Set the time, t, equal to zero.Set the number of species n to n0.

1. Calculate:

(a) the reaction rates ak(n) for k = 1, . . . ,m; and

(b) the total reaction rate, rtot =∑m

k=1 ak(n).

2. Select two random numbers p1, p2 from the uniform distribution (0, 1).Let τ = − log(p1)/rtot.Choose j such that

j−1∑k=1

ak(n) < p2rtot ≤j∑

k=1

ak(n)

3. Let t← t+ τ .Let n← n + νj .Go to 1.

Exact algorithms such as the direct method treat microscopic phenomena as discrete,molecular events. For intracellular models, this feature is appealing because of the inherently

24

Algorithm 2 First Reaction Method.Initialize.Set the time, t, equal to zero.Set the number of species n to n0.

1. Calculate the reaction rates ak(n) for k = 1, . . . ,m.

2. Select m random numbers p1, . . . , pm from the uniform distribution (0, 1).Let τk = − log(pk)/ak(n), k = 1, . . . ,m.Choose j such that

j = arg minkτk

3. Let t← t+ τj .Let n← n + νj .Go to 1.

small number of molecules contained within a single cell (the finite number of chromosomesin the nucleus, for example). As models become progressively more complex, however, thesealgorithms often become expensive computationally. Some recent efforts have focused uponreducing this computational load. He, Zhang, Chen, and Yang employ a deterministic equi-librium assumption on polymerization reaction kinetics [61]. Gibson and Bruck refine the firstreaction method, i.e. algorithm 2, to reduce the required number of random numbers, a tech-nique that works best for systems in which some reactions occur much more frequently thanothers [43]. Rao and Arkin demonstrate how to numerically simulate systems reduced bythe quasi-steady-state assumption [113]. This work expands upon ideas by Janssen [69, 70]and Vlad and Pop [157] who first examined the adiabatic elimination of fast relaxing variablesin stochastic chemical kinetics. Resat, Wiley, and Dixon address systems with reaction ratesvarying by several orders of magnitude by applying a probability-weighted Monte Carlo ap-proach, but this method increases error in species fluctuations [126]. Gillespie examines twoapproximate methods, tau leaping and kα leaping, for accelerating simulations by modelingthe selection of “fast” reactions with Poisson distributions [50]. These methods employ ex-plicit, first-order Euler approximations that permit larger time steps to be taken than exactmethods by allowing multiple firings of fast reactions by approximating the next reaction dis-tribution. In explicit tau leaping, one chooses a fixed time step τ , then increments the stateby

n(t+ τ) ≈ n(t) +m∑

k=1

νkPk(ak(n(t))τ) (2.93)

in which Pk(ak(n(t))τ) is a Poisson random variable with mean ak(n(t))τ . In kα leaping, onechooses a particular reaction to undergo a predetermined number of events kα, then deter-

25

mines the time τ required for these events to occur by drawing a gamma random variableΓ(aα(n), kα). Using this value of τ , one draws Poisson random variables to determine howmany events the remaining reactions undergo. A subsequent paper by Gillespie and Pet-zold discusses the error associated with the tau leaping approximation by using Taylor-seriesexpansion arguments [51]. These conditions specify restrictions on the time increment τ toensure that the error in the reconstructed mean and variance remain below a user-specifiedtolerance. However, this error only quantifies the effects of the reaction rate (aj(n)’s) depen-dence upon the state n, not the effect of approximating the exact next reaction distributionwith a Poisson distribution. Rathinam, Petzold, Cao, and Gillespie later present a first-orderimplicit version of tau leaping, i.e.

n(t+ τ) ≈ n(t) +m∑

k=1

νkak (n(t+ τ)) +m∑

k=1

νk [Pk(ak(n(t))τ)− ak(n(t))] (2.94)

This method has greater numerical stability than the explicit version [117].

2.3.2 Performing Systems Level Tasks with Stochastic Models

Employing kinetic Monte Carlo models for systems level tasks is an area of active research.Raimondeau, Aghalayam, Mhadeshwar, and Vlachos consider sensitivities via finite differ-ences and parameter estimation for kinetic Monte Carlo simulations [105]. Drews, Braatz, andAlkire consider calculating the sensitivity for the mean of multiple Monte Carlo simulationsvia finite differences, and apply this method to copper electrodeposition to determine whichparameter perturbations most significantly affect the measurements [25]. Gallivan and Mur-ray consider model reduction techniques for the chemical master equation [39], then use thereduced models to determine optimal open-loop temperature profiles for epitaxial thin filmgrowth [38]. Lou and Christofides consider control of growth rate and surface roughness inthin film growth [81, 82], employing proportional integral control that uses a kinetic MonteCarlo model to provide information about interactions between outputs and manipulated in-puts. This simple form of feedback control does not require an optimization. Laurenzi uses agenetic algorithm to estimate parameters for a model of aggregating blood platelets and neu-trophils [78]. Armaou and Kevrekidis employ a coarse time-stepper and a direct stochasticoptimization method (Hooke-Jeeves) to determine an optimal control policy for a set of reac-tions on a catalyst surface [4]. Siettos, Armaou, Makeev and Kevrekidis use the coarse timestepper to identify the local linearization of the nonlinear stochastic model at a steady state ofinterest [138]. Given the local linearization of the model, standard linear quadratic control the-ory is then applied. Armaou, Siettos and Kevrekidis consider extending this control approachto spatially distributed processes [5]. Finally Siettos, Maroudas and Kevrekidis construct bi-furcation diagrams for the mean of the stochastic models [139].

26

2.4 Population Balance Models

Stochastic models of chemical kinetics pose one alternative to traditional deterministic modelsfor modeling intracellular kinetics. Many biological systems of interest, however, consist ofpopulations of cells influencing one another. Here, we consider the dynamic behavior of cellpopulations undergoing viral infections.

Traditionally, mathematical models for viral infections have focused solely on eventsoccurring in either the intracellular or extracellular level. At the intracellular level, kineticmodels have been applied to examine the dynamics of how viruses harness host cells to repli-cate more virus [73, 27, 29, 3], and how drugs targeting specific virus components affect thisreplication [122, 30]. These models, however, consider only one infection cycle, whereas in-fections commonly consist of numerous infection cycles. At the extracellular level, researchershave considered how drug therapies affect the dynamics of populations of viruses [164, 62, 98,13, 100]. These models, though, neglect the fact that these drugs target specific intracellularviral components. To better understand the interplay of intracellular and extracellular events,a different modeling framework is necessary. We propose cell population balances as one suchframework.

Mathematical models for cell population dynamics may be effectively grouped by twodistinctive features: whether or not the model has structure, and whether or not the model hassegregations [6]. If a model has structure, then multiple intracellular components affect thedynamics of the cell population. If a model has segregations, then some cellular characteristiccan be employed to distinguish among different cells in a population. Table 2.1 summarizesthe different combinations of models arising from these features. In this context, current extra-cellular models are equivalent to unstructured, unsegregated models because the cells in eachpopulation (uninfected and infected cells) are assumed indistinguishable from each other.

Unstructured Structured

Uns

egre

gate

d

Most idealized caseCell population treated as

one-component solute

Multicomponent average celldescription

Segr

egat

ed

Single component,heterogeneous individual cells

Multicomponent description ofcell-to-cell heterogeneity

Most realistic case

Table 2.1: Types of cell population models [6]

The derivation of structured, segregated models stems from the equation of continu-ity. In particular, the derivation is identical as before up to the microscopic equation (2.8),but now considers the effect of various internal segregations upon the population behavior.

27

Fredrickson, Ramkrishna, and Tsuchiya consider the details of this derivation in their seminalcontribution [36]. In recent years, this modeling framework has returned to the literature asresearchers strive to adequately reconcile model predictions with the dynamics demonstratedby experimental data [80, 10, 33]. Also, new measurements such as flow cytometry offer thepromise of actually differentiating between cells of a given population [1, 67], again implyingthe need to model distinctions between cells in a given population.

Notation

aµ(n) µth reaction ratecµdt average probability to O(dt) that reaction µ will occur in the next time interval dtdΩ differential change in the control surface S(t)dΩe differential change in the reactor surface Se

ek deviation between the predicted and actual measurement at time tkF total flux of the quantity η(t, z)f diffusive contribution to the total flux Fhµ number of distinct molecular reactant combinations for reaction µ at a given timeJ Jacobianm mean of a probability distributionN (m,C) normal distribution with mean m and covariance CN0 matrix containing all possible molecular configurations at time t0n vector of the number of molecules for each chemical speciesni ith Monte Carlo reconstruction of the vector n

ne normal vector pointing from the reactor surface Se away from the volume Ve

ns normal vector pointing from the surface S(t) away from the volume V (t)ns total number of possible speciesP probabilityP(m) random number drawn from the Poisson distribution with mean mp random number from the uniform distribution (0, 1)q effluent volumetric flow rateqf feed volumetric flow rateRη production rate of the species ηrtot sum of reaction ratesSe time-varying surface of the reactor volume Ve

S(t) arbitrary, time-varying control volume spanning a space in zs sensitivity of the state x with respect to the parameters θ

s first derivative of the sensitivity with respect to timet timetk discrete sampling timeVe time-varying reactor volumeV (t) arbitrary, time-varying control volume spanning a space in z

28

vk realization of the variable ξ at time tkvs velocity vector for the surface S(t)vx x-component of the velocity vector vz

vz velocity vector for material flowing through the volume V (t)X random variablex external characteristicsx statex first derivative of the state with respect to timexk state at time tkY k distribution for the measurement yk

y internal characteristicsyk measurement at time tkZN random variable whose limiting distribution as N →∞ is the normal distributionz internal and external characteristicsΓ random number drawn from the gamma distributionδ Dirac delta functionη(t, z)dz mass of reactants or productsΘ distribution for the parameter set θθ parameter set for a given modelµ one possible reaction in the stochastic kinetics frameworkν stoichiometric matrixξ N (0,Π)-distributed random variableΠ covariance matrix for the random variable ξσ standard deviationτ time of the next stochastic reactionφ objective functionψ random variable

29

Chapter 3

MotivationThe motivation for this work is the current state of stochastic and deterministic methods usedto model chemically reacting systems. For example, the rapid growth of biological mea-surements on the intracellular level (e.g. microarray and proteomic data) will require muchmore complicated models to adequately assimilate the data contained by these measurements.Therefore we seek to improve the current techniques used to evaluate and manipulate stochas-tic and deterministic models. In this chapter, we examine the current limitations of the existingmethods for using stochastic models, traditional deterministic models, and state estimationtechniques.

3.1 Current Limitations of Stochastic Models

We see two primary limitations of current methods for handling stochastic models:

1. exact integration methods scale with the number of reaction events, and

2. methods for performing systems level tasks require the use of noisy finite differencetechniques.

We illustrate these points next.

3.1.1 Integration Methods

The current options for performing exact simulation of stochastic chemical kinetics are Gille-spie’s direct and first reaction methods [45, 46], and the next reaction method of Gibson andBruck [43]. Gibson and Bruck [43] analyze the computational expenditure of these methods,and find that Gillespie’s methods at best scale with the number of reaction events, whereastheir next reaction method scales with the log of the number of reaction events. To illustratethis point, we consider the simple reaction

2Ak1

−−→k−1

B a(ε) =12k1nA(nA − 1) (3.1)

in which

30

• k1 = 4/(3nAo) and k−1 = 0.1,

• ε is the dimensionless extent of reaction,

• a(ε) is the reaction propensity function,

• nA is the number of A molecules, and

• nAo is the initial number of A molecules.

We consider simulating this system in which there are initially zero B molecules and a variablenumber of A molecules. For this system, the number of possible reactions scales with nAo.We scale rate constants for reactions with nonlinear rates so that the dimensionless extentof reaction remains constant as the variable nAo changes. Figure 3.1 demonstrates that thecomputational time for one simulation scales linearly with nAo, as expected.

Initial number of A molecules (nAo)

Tim

ere

quir

edfo

rsi

mul

atio

n(s

ec)

20181614121086420

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0

Figure 3.1: Computational time per simulation as a function of nAo. Line represents the least-squares fit of the data assuming that a simulation with nAo = 0 requires no computationaltime.

The question arises, then, as to the suitability of these methods for simulating intracel-lular chemistry. As an example, we consider the case of a rapidly growing Escherichia coli (orE. coli) cell. For this circumstance, one E. Coli cell contains approximately four molecules ofdeoxyribonucleic acid (DNA), 1000 molecules of messenger ribonucleic acid (mRNA), and 106

proteins [6]. Simulating these conditions with methods that scale with the number of reactionevents is clearly acceptable for modeling the DNA and mRNA species, but simulating eventsat the protein level is not a trivial task.

Now consider Figure 3.2, which plots how an intensive variable such as the extent ofreaction ε changes as nAo increases. This figure demonstrates that, as the number of moleculesincreases, the extent appears to be converging to a smoothly-varying deterministic trajectory.This simulation exhibits precisely the mathematical result proven by Kurtz: in the thermody-namic limit (n → ∞, V → ∞, n/V = constant), the master equation written for n (number

31

of molecules) collapses to the a deterministic equation for c (concentration of molecules) [76].The appeal of the deterministic equation is that the computational time required for its solu-tion does not scale with the simulated number of molecules. For E. Coli, such an approxima-tion may certainly be valid for reactions among proteins, but not for those among DNA. Weaddress this issue further in Chapter 4.

deterministic20× nAo

10× nAo

nAo

Time

Exte

ntof

reac

tion

1086420

0.90.80.70.60.50.40.30.20.1

0

Figure 3.2: Extent of reaction as a function of nAo.

3.1.2 Systems Level Tasks

A secondary issue arising from stochastic models is how to extract information from thesemodels. Currently, most researchers merely integrate these types of models to determine thedynamic behavior of the system given a specific initial condition and inputs. As pointed outpreviously, this integration is potentially expensive. One recent strategy for obtaining moreinformation from the model involves using finite difference methods to obtain estimates ofthe model sensitivity [105, 25], then using these sensitivities for parameter estimation andsteady-state analysis. For example, we could determine the sensitivity of reaction 3.1 to theforward rate constant k1 by evaluating the central finite difference

s =∂nA

∂k1≈ F (k1 + δ)− F (k1 − δ)

2δ(3.2)

in which

• s is the sensitivity of the state nA with respect to the parameter k1,

• F (x) yields a trajectory from a stochastic model integration given the parameter k1 = x

and nAo initial molecules, and

• δ is a perturbation to the parameter k1.

32

Figure 3.3 plots the perturbed trajectories and the desired sensitivity. At the smaller perturba-tion of δ = 0.2k1, the stochastic fluctuations of the simulation dominate, yielding a noisy, poorsensitivity estimate. The larger perturbation of δ = 0.8k1 yields a smoother sensitivity, but theaccuracy of the central finite difference is questionable. There is obviously significant roomfor improvement in the methods used to calculate this quantity. We consider this issue furtherin Chapters 5 and 6. Additionally, little work has focused on how best to use informationobtained from simulations of stochastic differential equations. Accordingly, we consider sen-sitivities for these types of models in Chapter 7. Finally, we apply many of the tools developedin these chapters to crystallization systems in Chapter 8.

(a)

+δ−δ

Time

SnA

2000

0

-2000

-4000

-6000

-8000

-100001086420

200180160140120100

80604020

(b)

+δ

−δ

Time

SnA

0-1000-2000-3000-4000-5000-6000-7000-8000-9000-10000

1086420

200180160140120100

80604020

Figure 3.3: Finite difference sensitivity for the stochastic model: (a) small perturbation (δ =0.2k1) and (b) large perturbation (δ = 0.8k1).

33

3.2 Current Limitations of Traditional Deterministic Models

We restrict this examination to modeling of viral infections, although the same arguments gen-erally hold for virtually all systems involving populations of cells. Figure 3.4 generalizes thecyclic nature of viral infections. The initiation of a viral infection occurs when the virus isintroduced to a host organism. The virus then targets specific uninfected host cells for infec-tion. Once infected, these host cells become in essence “factories” that replicate and secretethe virus. The cycle of infection and virus production then continues. During this infectioncycle, uninfected cells may continue to reproduce. This cycle is essentially the one proposedby Nowak and May [98].

Infected Cells

death

Uninfected Cells

Generation

Free Virus

Figure 3.4: Cyclic nature of viral infections.

These types of models usually assume that the production rate of virus is directly pro-portional to the concentration of infected cells. This assumption generally permits reduction ofthe model to a coupled set of ordinary differential equations (e.g. three ODE’s to model the un-infected cell population, the infected cell population, and the virus population). This assump-tion is a gross simplification; in fact, many modelers have focused entirely on considering thecomplex chemistry required at the intracellular level to produce viral progeny [73, 27, 29, 3].A more realistic picture of viral infections consists of a combination of the intracellular and ex-tracellular levels. As described in Chapter 2, cell population balance models offer one meansof combining these two levels. Since the literature review uncovered little active research inthis area, we therefore seek to explore the utility of the cell population balance in explainingbiological phenomena. We believe that refined versions of these models may lead to insightson how to best control viral propagation. We first explore the utility of the cell populationbalance in a numerical setting in Chapter 9, then investigate whether or not these types ofmodels are useful in explaining actual experimental data in Chapter 10. Finally, we introducean approximation that significantly reduces the computational expense of solving this class ofmodels in Chapter 11.

34

3.3 Current Limitations of State Estimation Techniques

It is well established that the Kalman filter is the optimal state estimator for unconstrained,linear systems subject to normally distributed state and measurement noise. Many physicalsystems, however, exhibit nonlinear dynamics and have states subject to hard constraints, suchas nonnegative concentrations or pressures. Hence Kalman filtering is no longer directly ap-plicable. Perhaps the most popular method for estimating the state of nonlinear systems is theextended Kalman filter, which first linearizes the nonlinear system, then applies the Kalmanfilter update equations to the linearized system [144]. The extended Kalman filter assumes thatthe a posteriori distribution is normally distributed (unimodal), hence the mean and the modeof the distribution are equivalent. Questions that arise are: how does this strategy performwhen multiple modes arise in the a posteriori distribution? Also, are multiple modes even aconcern for chemically reacting systems? Finally, can multiple modes in the estimator hinderclosed-loop performance? We address the first two of these questions in Chapter 12, and thefinal question in Chapter 13.

Notation

a(ε) reaction propensity functionc concentrations for all reaction specieskj rate constant for reaction kn number of molecules for all reaction speciesnA number of molecules for species As sensitivityδ finite difference perturbationε extent of reaction

35

Chapter 4

Approximations for Stochastic ReactionModels 1

Exact methods are available for the simulation of isothermal, well-mixed stochastic chemicalkinetics. As increasingly complex physical systems are modeled, however, these methods be-come difficult to solve because the computational burden scales with the number of reactionevents [43]. We address one aspect of this problem: the case in which reacting species fluctuateby different orders of magnitude. We expand upon the idea of a partitioned system [113, 157]and simulation via Gillespie’s direct method [45, 46] to construct approximations that reducethe computational burden for simulation of these species. In particular, we partition the sys-tem into subsets of “fast” and “slow” reactions. We make various approximations for the“fast” reactions (either invoking an equilibrium approximation, or treating them determinis-tically or as Langevin equations), and treat the “slow” reactions as stochastic events. Suchapproximations can significantly reduce computational load while accurately reconstructingat least the first two moments of the probability distribution for each species.

This chapter provides a theoretical background for such approximations and outlinesstrategies for computing these approximations. First, we examine the theoretical underpin-nings of the approximations. Next, we propose numerical algorithms for performing the simu-lations, review several practical implementation issues, and propose a further approximation.We then consider three motivating examples drawn from the fields of enzyme kinetics, parti-cle technology, and biotechnology that illustrate the accuracy and computational efficiency ofthese approximations. Finally, we critically examine the technique and present conclusions.

4.1 Stochastic Partitioning

The key ideas are to 1) model the state of the reaction system using extents of reaction asopposed to molecules of species, and 2) partition the state into subsets of “fast” and “slow”reactions. With these two modeling choices, we can exploit the structure of the chemical mas-ter equation, the governing equation for the evolution of the system probability density, by

1Portions of this chapter appear in Haseltine and Rawlings [57].

36

making order of magnitude arguments. We then derive the master equations that govern the“fast” and “slow” reaction subsets. This section outlines these manipulations in greater detail.

We model the state of the system, x, using an extent for each irreversible reaction 2. Anextent of reaction model is consistent with a molecule balance model since

n = n0 + νT x (4.1)

in which, assuming that there are m extents of reaction and p chemical species:

• x is the state of the system in terms of extents (an m-vector),

• n is the number of molecules (a p-vector),

• n0 is the initial number of molecules (a p-vector), and

• ν is the stoichiometric matrix (an m× p-matrix).

The upper and lower bounds of x are constrained by the limiting reactant species. We arbi-trarily set the initial condition to the origin. Given assumptions outlined by Gillespie [48], thegoverning equation for this system is the chemical master equation

dP (x; t)dt

=m∑

k=1

ak(x− Ik)P (x− Ik; t)− ak(x)P (x; t) (4.2)

in which

• P (x; t) is the probability that the system is in state x at time t,

• ak(x)dt is the probability to order dt that reaction k occurs in the time interval [t, t+ dt),and

• Ik is the kth column of the (m×m)-identity matrix I .

The structure of I arises for this particular chemical master equation because the reactions areirreversible. Also, we have implicitly conditioned the master equation (4.2) on a specific initialcondition, i.e. n0. Generalizing the analysis presented in this chapter to a distribution of initialconditions (n0,1, . . . ,n0,n) is straightforward due to the relation

P (x|n0,1, . . . ,n0,n; t) =∑

j

P (x|n0,j ; t)P (n0,j) (4.3)

and the fact that the values of P (n0,j) are specified in the initial condition.Now we examine the time scale over which the extents of reaction change. We must

first determine a relevant time scale so that we can partition the extents into two subsets: thosethat have small propensity functions (ak(x)’s) and occur few if any times over the time scale,and those that have large propensity functions and occur numerous times over the given time

2Note that reversible reactions can be modeled as two irreversible reactions.

37

scale. We designate these subsets of x as the (m− l)-vector y and the l-vector z, respectively.Note that

x =

[y

z

]and I =

[Iy 00 Iz

](4.4)

in which Iy and Iz are (m− l ×m− l)- and (l × l)-identity matrices, respectively. We alsopartition the reaction propensities into groups of fast (cj) and slow (bj)

a1(y,z; t)...

am−l(y,z; t)am−l+1(y,z; t)

...am(y,z; t)

=

b1(y,z; t)...

bm−l(y,z; t)c1(y,z; t)

...cl(y,z; t)

(4.5)

Equation (4.2) becomes

dP (y,z; t)dt

=m−l∑j=1

bj(y − Iyj ,z)P (y − Iy

j ,z; t)− bj(y,z)P (y,z; t)

+l∑

k=1

ck(y,z − Izk)P (y,z − Iz

k; t)− ck(y,z)P (y,z; t) (4.6)

Ultimately, we are interested the determining an approximate governing equation for the evo-lution of the joint density, P (y,z; t), in regimes where fast reaction extents are much greaterthan slow reaction extents. Denoting the total extent space as X, we define a subspace Xp ⊂ Xfor which

ck(y,z) bj(y,z) ∀1 ≤ k ≤ l, 1 ≤ j ≤ m− l,

[y

z

]∈ Xp (4.7)

By defining the conditional and marginal probabilities over this subspace as

P (y,z; t) = P (z|y; t)P (y; t) ∀

[y

z

]∈ Xp (4.8)

P (y; t) =∑

z

P (y,z; t) ∀

[y

z

]∈ Xp (4.9)

we can alternatively derive evolution equations for both the marginal probability of the slowreactions, P (y; t), and the probability of the fast reactions conditioned on the slow reactions,P (z|y; t). Consequently, we then know how the fast and slow reactions evolve over this timescale. Also, this partitioning is similar to that used by Rao and Arkin [113], who partition themaster equation by species to treat the quasi-steady-state assumption. We partition by reactionextents to treat fast and slow reactions.

38

All the manipulations performed in the next two subsections apply only for fast andslow reactions in the partitioned subspace Xp. To simplify the presentation of the results, wedrop the implied notation

∀

[y

z

]∈ Xp

from all subsequent equations.

4.1.1 Slow Reaction Subset

We first address the subset of slow reaction extents y. From the definition of the marginaldensity,

P (y; t) =∑

z

P (y,z; t) (4.10)

Differentiating equation (4.10) with respect to time yields

dP (y; t)dt

=∑

z

dP (y,z; t)dt

(4.11)

Now substitute the master equation (4.6) into equation (4.11) and manipulate to yield

dP (y; t)dt

=∑

z

m−l∑j=1



+l∑

k=1


k; t)− ck(y,z)P (y,z; t)

)(4.12)

=

∑z

m−l∑j=1



+

(∑z

l∑k=1


k; t)− ck(y,z)P (y,z; t)

)︸︷︷︸

0

(4.13)

=∑

z

m−l∑j=1


j ,z; t)− bj(y,z)P (y,z; t) (4.14)

Equation (4.14) is exact; we have made no approximations in its derivation. Also, if we rewritethe joint density in terms of the conditional density using the definition

P (y,z; t) = P (z|y; t)P (y; t) (4.15)

then one interpretation of this analysis is that the evolution of the marginal P (y; t) dependson the conditional density P (z|y; t). We consider deriving an evolution equation for this con-ditional density next.

39

4.1.2 Fast Reaction Subset

We now address the evolution of the probability density for the subset of fast reactions con-ditioned on the subset of slow reactions, P (z|y; t). For our starting point, we use order ofmagnitude arguments, i.e. equation (4.7), to approximate the original master equation (4.6) as

dP (y,z; t)dt

≈l∑

k=1


k; t)− ck(y,z)P (y,z; t) (4.16)

We define this approximate joint density as PA(y,z; t), and thus its evolution equation is

dPA(y,z; t)dt

,l∑

k=1

ck(y,z − Izk)PA(y,z − Iz

k; t)− ck(y,z)PA(y,z; t) (4.17)

Following Rao and Arkin [113], we define the joint density PA(y,z; t) as the product of thedesired conditional density PA(z|y; t) and the marginal density PA(y; t):

PA(y,z; t) = PA(z|y; t)PA(y; t) (4.18)

Differentiating equation (4.18) with respect to time yields

dPA(y,z; t)dt

=dPA(z|y; t)

dtPA(y; t) +

dPA(y; t)dt

PA(z|y; t) (4.19)

Solving equation (4.19) for the desired conditional derivative yields

dPA(z|y; t)dt

=1

PA(y; t)

(dPA(y,z; t)

dt− dPA(y; t)

dtPA(z|y; t)

)(4.20)

Evaluating the marginal evolution equation by summing equation (4.17) over the fast extentsz yields

dPA(y; t)dt

=∑

z

l∑k=1


k; t)− ck(y,z)PA(y,z; t) (4.21)

= 0 (4.22)

Consequently, equation (4.19) becomes

dPA(z|y; t)dt

=1

PA(y; t)

(l∑

k=1


k; t)− ck(y,z)PA(y,z; t)

)(4.23)

=l∑

k=1

ck(y,z − Izk)PA(z − Iz

k|y; t)− ck(y,z)PA(z|y; t) (4.24)

which is the desired closed-form expression for the conditional density PA(z|y; t).

40

4.1.3 The Combined System

For the slow reactions, we approximate the joint density P (y,z; t) as

P (y,z; t) ≈ PA(z|y; t)P (y; t) (4.25)

Combining the evolution equations for the slow and fast reaction extents, i.e. equations (4.14)and (4.24) respectively, then yields the following coupled master equations

dP (y; t)dt

≈m−l∑j=1

(∑z

bj(y − Iyj ,z)PA(z|y − Iy

j ; t)

)P (y − Iy

j ; t)−

(∑z

bj(y,z)PA(z|y; t)

)P (y; t)

(4.26a)

dPA(z|y; t)dt

=l∑

k=1


k|y; t)− ck(y,z)PA(z|y; t) (4.26b)

From these equations, using order of magnitude arguments to produce a time-scale separationhas clearly had two effects: first, the coupled expressions for the marginal and conditionalevolution equations in (4.26) are Markov in nature; and second, the evolution equation forthe fast extents conditioned on the slow extents, PA(z|y), has decoupled from the slow ex-tent marginal, P (y). Additionally, exact solution of the coupled master equations (4.26) is atleast as difficult as the original master equation (4.2) due to the fact that one must solve anindividual master equation of the form of equation (4.26b) for every element of the slow con-ditional equation (4.26a). From a simulation perspective, equation (4.26) is also as difficult toevaluate as the original master equation (4.2) since both of the coupled master equations arediscrete and time-varying. However, approximating the fast extents can significantly reducethe computational expense involved with simulating these coupled equations. Different ap-proximations are applicable based on the characteristic relaxation times of the fast and slowextents. Next, we investigate two such approximations: an equilibrium approximation for thecase in which the fast extents relax significantly faster than the slow extents, and a Langevin ordeterministic approximation for the case in which both fast and slow extents relax at similarrates.

4.1.4 The Equilibrium Approximation

We first consider the case in which the relaxation time for the fast extents is significantlysmaller than the expected time to the first slow reaction. To illustrate this case, we considerthe simple example

Ak1−−k2

B k3−→ C (4.27)

41

We denote the extents of reaction for this example as ε1, ε2, and ε3, and define the reactionpropensities as

a1(x) = k1nA (4.28a)

a2(x) = k2nB (4.28b)

a3(x) = k3nC (4.28c)

If k1, k2 k3, then we can partition ε1 and ε2 as the fast reactions z, and ε3 as the slow extentof reaction y. Additionally, we would expect the fast extents of reaction to equilibrate (relax)before the expected time to the first slow reaction. Returning to the master equation formalism,this equilibration implies that we should approximate the fast reactions, equation (4.26b), as

0 ≈l∑

k=1


k|y; t)− ck(y,z)PA(z|y; t) (4.29)

The resulting coupled master equations are

dP (y; t)dt

≈m−l∑j=1

(∑z

bj(y − Iyj ,z)PA(z|y − Iy

j ; t)

)P (y − Iy

j ; t)−

(∑z

bj(y,z)PA(z|y; t)

)P (y; t)

(4.30a)

0 =l∑

k=1


k|y; t)− ck(y,z)PA(z|y; t) (4.30b)

This coupled system, equation (4.30), is markedly similar to the governing equations for theslow-scale simulation recently proposed by Cao, Gillespie, and Petzold [16]. Their derivationdeviates from ours, however, and the differences deserve some attention. First, Cao, Gillespie,and Petzold [16] partition on the basis of fast and slow species rather than extents, with fastspecies affected by at least one fast reaction and slow species affected by solely slow reactions.We have chosen to remain in the extent space because extents are equilibrating, not chemicalspecies. Also, Cao, Gillespie, and Petzold [16] use the construct of a virtual fast system toarrive at an evolution equation for the slow species (similar to our evolution equation for theslow extent marginal, equation (4.14)), a choice that obviates the need for defining an evolutionequation for the conditional density P (z|y). In contrast to this approach, we believe that ourapproach has a much tighter connection to the original master equation due to the fact wederived the coupled system, equation (4.30), directly from the the original master equation,and the fact that we can obtain an approximate value of the joint density P (y,z; t) throughequation (4.25). Also, all approximations arise directly from order of magnitude and relaxationtime arguments.

4.1.5 The Langevin and Deterministic Approximations

We now consider the case in which both fast and slow extents relax at similar time scales.Revisiting the reaction example 4.27, we consider the case in which k1 k2, k3 and nAo

42

nBo, nCo in which the notation nAo refers to the initial number of A molecules. For this exam-ple, we partition ε1 as the fast extent of reaction z, and ε2 and ε3 as the slow extents of reactiony. Until a significant amount of A has been consumed, we would expect numerous firings of ε1interspersed with relatively few firings of ε2 and ε3. Clearly the system never equilibrates, butrather fast and slow reactions fire until the fast extent reaches a similar order of magnitude asone of the slow extents. Note also that, in contrast to the equilibrium approximation, we haveintroduced the number of molecules into the time-scale argument. For most cases, we expectthis time-scale argument to involve large numbers of reacting molecules, but such involve-ment is not always the case as demonstrated in the viral infection example presented later inthis chapter. Rather, we require that the magnitude of the fast reactions remain large relativeto the magnitude of the slow reactions through the expected time of the first slow reaction.

Returning to the master equation formalism, this process requires a different approxi-mation for the conditional density P (z|y). We proceed by demonstrating as outlined by Gar-diner [41] how this subset can be approximated using the Langevin approximation. Define thecharacteristic size of the system to be Ω, and use this size to recast the master equation (4.24)in terms of intensive variables (let z ← z/Ω). Performing a Kramers-Moyal expansion on thismaster equation results in a system size expansion in Ω. In the limit as z and Ω become large,the discrete master equation (4.26b) can be approximated by its first two differential momentswith the continuous Fokker-Planck equation

∂PA(z|y; t)∂t

= −l∑

i=1

∂

∂zi(Ai(y,z)PA(z|y; t))+

12

l∑i=1

l∑j=1

∂2

∂zi∂zj

(Bij(y,z)2PA(z|y; t)

)(4.31)

in which (noting that z consists of extents of reaction):

A(y,z) =l∑

i=1

Izi ci(y,z) =

[c1(y,z) c2(y,z) · · · cl(y,z)

]T(4.32)

[B(y,z)]2 =l∑

i=1

Izi (Iz

i )T ci(y,z) = diag (c1(y,z), c2(y,z), . . . , cl(y,z)) (4.33)

Here, diag(a, . . . , z) defines a matrix with elements a, . . . , z on the diagonal. Equation (4.31)has Ito solution of the form

dzi = Ai(y,z)dt+l∑

j=1

Bij(y,z)dW j ∀1 ≤ i ≤ l (4.34a)

= ci(y,z)dt+√ci(y,z)dW i ∀1 ≤ i ≤ l (4.34b)

in which W is a vector of Wiener processes. Equation (4.34) is the chemical Langevin equation,whose formulation was recently readdressed by Gillespie [49]. Note the difference betweenequations (4.31) and (4.34). The Fokker-Planck equation (4.31) specifies the distribution ofthe stochastic process, whereas the stochastic differential equation (4.34) specifies how thetrajectories of the state evolve. Also, bear in mind that whether or not a given Ω is large

43

enough to permit truncation of the system size expansion is relative. In this case, Ω is ofsufficient magnitude to make this approximation valid for only a subset of the reactions, notthe entire system.

Combining the evolution equations for the slow and fast reaction extents, i.e. equa-tions (4.26a) and (4.31) respectively, the problem of interest is the coupled set of master equa-tions

dP (y; t)dt

≈m−l∑k=1

(∫zbk(y − Iy

k,z′k)PA(z′k|y − Iy

k; t)dz′)P (y − Iy

k; t)

−(∫

zbk(y,z′)PA(z′|y; t)dz′

)P (y; t) (4.35a)

∂PA(z|y; t)∂t

= −l∑

i=1

∂

∂zi(Ai(y,z)PA(z|y; t)) +

12

l∑i=1

l∑j=1

∂2

∂zi∂zj

(Bij(y,z)2PA(z|y; t)

)(4.35b)

If we can solve these equations simultaneously, then we in fact have an approximate solutionto the original master equation (4.6) due to the definition of the conditional density given byequation (4.25). Note that the solution is approximate due to the fact that we have used theFokker-Planck approximation for the master equation of the fast reactions.

In the thermodynamic limit (z →∞, Ω→∞, z = z/Ω = finite), the intensive variablesfor the fast subset of reactions (z’s) evolve deterministically [76]. Accordingly, we proposefurther approximating the Langevin equation (4.34) as

dzi = ci(y,z)dt ∀1 ≤ i ≤ l (4.36)

In this case, the coupled master equations (4.35) reduce to

dP (y; t)dt

≈m−l∑k=1

bk(y − Iyk,z(t))P (y − Iy

k; t)− bk(y,z(t))P (y; t) (4.37a)

dzi =ci(y,z)dt ∀1 ≤ i ≤ l (4.37b)

in which z(t) is the solution to the differential equation (4.36). The benefit of this assumptionis that equation (4.36) can be solved rigorously using an ODE solver. Unfortunately for phys-ical systems, the thermodynamic limit is obviously unattainable. However, knowledge of themodeled system can lead to this simplification. If the magnitude of the fluctuations in thisterm is small compared to the sensitivity of ci(y,z) to the subset y, then equation (4.36) is avalid approximation. This approximation is also valid if one is primarily concerned with thefluctuations in the small-numbered species as opposed to the large-numbered species, assum-ing that the extents approximated by equation (4.36) predominantly affect the population sizeof large-numbered species.

44

4.2 Numerical Implementation of the Approximations

We now outline procedures for implementing the equilibrium, Langevin, and deterministicapproximations presented in the previous section. We propose using simulation to reconstructmoments of the underlying master equation. For the slow reactions, Gillespie [47] outlinesa general method for exact stochastic simulation that is applicable to the desired problem,equation (4.26a). This method examines the joint probability function, P (τ, µ), that governswhen the next reaction occurs, and which reaction occurs. We present a brief derivation of thisfunction.

We proceed by noting that the key probabilistic questions are: when will the next reac-tion occur, and which reaction will it be [45] ? For this end, we define

bµ(y,z; t)dt =

∑z bµ(y,z)PA(z|y; t)dt equilibrium approximation∫

z bµ(y,z′)PA(z′|y; t)dz′dt Langevin or deterministic approximation(4.38)

in which bµ(y,z; t)dt is the probability (first order in dt) that reaction µ occurs in the nexttime interval dt. We express the joint probability P (τ, µ)dτ as the product of the independentprobabilities

P (τ, µ)dτ = P0(τ)P (µ)dτ (4.39)

in which

• P0(τ) is the probability that no reaction occurs within [t, t+ τ), and

• P (µ)dτ is the probability that reaction µ takes place within [t+ τ, t+ τ + dτ).

To determine P0(τ), consider the change in this probability over the differential incre-ment in time dt, assuming that probabilities are independent over disjoint periods of time [68]:

P0(τ + dt) = P0(τ)

1−m−l∑j=1

bj(y,z; t+ τ)dt

(4.40a)

= P0(τ)(1− rytot(t)dt) (4.40b)

Here, rytot(t) is the sum of reaction rates for subset y at time t.Rearranging equation (4.40a) and taking the limit as dt→ 0 yields the differential equa-

tiondP0(τ)dt

= −rytot(t)P0(τ) (4.41)

which has solution

P0(τ) = exp(−∫ t+τ

try

tot(t′)dt′

)(4.42)

The joint probability function P (τ, µ) is therefore:

P (τ, µ) = bµ(y,z; t+ τ) exp(−∫ t+τ

try

tot(t′)dt′

)(4.43)

45

We now address our key questions by conditioning the joint probability functionP (τ, µ):

P (τ, µ) = P (µ|τ)P (τ) (4.44)

in which P (τ) is the probability that a reaction occurs in the differential instant after time t+τ ,and P (µ|τ) is the probability that this reaction will be µ. First note that by definition:

P (τ) =l∑

µ=1

P (τ, µ) (4.45)

Implicit in this equation is the assumption that a reaction occurs, and hence the probability ofnot having a reaction is zero. Then by rearranging equation (4.44) and incorporating (4.45), itcan be deduced that:

P (µ|τ) =P (τ, µ)∑m−l

µ=1 P (τ, µ)(4.46)

Equation (4.46) can be solved exactly by employing equation (4.43) to yield:

P (µ|τ) =bµ(y,z; t+ τ)∑m−l

j=1 bj(y,z; t+ τ)(4.47)

We then solve equation (4.45) by employing equation (4.43):

P (τ) =

m−l∑j=l

bj(y,z; t+ τ)

exp(−∫ t+τ

try

tot(t′)dt′

)(4.48a)

= rytot(t+ τ) exp

(−∫ t+τ

try

tot(t′)dt′

)(4.48b)

Using Monte Carlo simulation, we obtain realizations of the desired joint probabil-ity function P (τ, µ) by randomly selecting τ and µ from the probability densities defined byequations (4.48b) and (4.47). Such a method is the equivalent of the direct method for hybridsystems. Given two random numbers p1 and p2 uniformly distributed on (0, 1), τ and µ areconstrained accordingly: ∫ t+τ

try

tot(t′)dt′ + log(p1) = 0 (4.49a)

µ−1∑k=l+1

bk(y,z; t+ τ) < p2rytot(t+ τ) ≤

µ∑k=l+1

bk(y,z; t+ τ) (4.49b)

Simulating the different approximations require slightly different algorithms, which we ad-dress next.

46

4.2.1 Simulating the Equilibrium Approximation

We first address the equilibrium approximation. For this case,

bj(y,z; t) =∑

z

bj(y,z)PA(z|y; t) ∀1 ≤ j ≤ m− l (4.50)

Additionally, the quantities bj(y,z; t) are actually time invariant between slow reactions. Thus,the integral constraint (4.49a) reduces to the algebraic relation

τ = − log(p1)ry

tot(t)(4.51)

Algorithm 3 Exact solution of the partitioned stochastic system for the equilibrium approxi-mation.Off-line. Partition the set x of m extents of reaction into fast and slow extents. Determine the parti-tioned stoichiometric matrices (the (m − l × p)-matrix νy and the (l × p)-matrix νz) and the reactionpropensity laws (ay

k(y,z)’s). Also, choose a strategy for solving the distribution PA(z|y) given by equa-tion (4.30) for the fast reactions in the partitioned case.Initialize. Set the time, t, equal to zero.Set the number of species n to n0.

1. Solve for the distribution PA(z|y), denoting all possible combinations of z as (z(0), . . . ,z(t)).Record the initial value of z as z(i).

2. For subset y, calculate

(a) the reaction propensities, bj(y,z) =∑

z bj(y,z)PA(z|y) ∀j = 1, . . . ,m− l, and

(b) the total reaction propensity, rytot =

∑m−lk=1 bj(y,z).

3. Select three random numbers p1, p2, and p3 from the uniform distribution (0, 1).

4. Choose z(j) from the distribution PA(z|y) such that

j−1∑k=1

PA(z(k)|y) < p1 ≤j∑

k=1

PA(z(k)|y)

Set νz = z(j)− z(i).

5. Let τ = − log(p2)/rytot. Choose j such that

j−1∑k=1

bk(y,z) < p3rytot ≤

j∑k=1

bk(y,z)

6. Let n← n +(νy

j

)T + νz , where νyj is the jth row of νy .

Go to step 1.

Algorithm 3 presents one method of solving this system. Note that we could draw asample from the equilibrium distribution PA(z|y) at any time to determine a current value of

47

the state, which may be desirable for sampling the system at uniform time increments. Also,this algorithm is very similar to the slow-scale stochastic simulation algorithm proposed byCao, Gillespie, and Petzold [16], with the exception that our algorithm partitions extents asopposed to species.

Solution of the equilibrated density PA(z|y) deserves some further attention. If westack probabilities for all possible values of the fast extents into a vector P , we can recast thecontinuous-time master equation as a vector-matrix problem, i.e.

dP

dt= AP ≈ 0 (equilibrium assumption) (4.52)

in which A is the matrix of reaction propensities. The equilibrium distribution is then thenull space of the matrix A, which we can compute numerically. In general, we expect A

to be a sparse matrix. Consequently, we can efficiently solve the linear system (4.52) for P

using Krylov iterative methods [153] such as the biconjugate gradient stabilized method. Cao,Gillespie, and Petzold [16] outline some alternative, approximate methods for evaluating thisequilibrated density.

4.2.2 Simulating the Langevin and Deterministic Approximations: Exact Next Re-action Time

We now address methods for simulating the Langevin and deterministic approximations.These approximations have time-varying reaction propensities, so we must satisfy equation (4.49a)by integrating ry

tot and the fast subset of reactions z forward in time until the following condi-tion is met: ∫ t+τ

try

tot(t′)dt′ + log(p1) = 0 (4.53)

rytot(t) =

m−l∑j=l

bj(y,z; t) (4.54)

bj(y,z; t) =∫

zbj(y,z′)PA(z′|y; t)dz′ ∀1 ≤ j ≤ m− l (4.55)

For the Langevin approximation, we propose reconstructing the density PA(z|y; t) by simu-lating the stochastic differential equation (4.34) (also known as the Langevin equation). In thiscase, equation (4.55) becomes

bj(y,z; t) ≈ 1N

N∑k=1

bj(y,zk) ∀1 ≤ j ≤ m− l (4.56)

in which zk is the kth ofN simulations of equation (4.34). For the deterministic approximation,equation (4.37) indicates that we need only solve for the deterministic evolution of the fastextents. We propose using algorithm 4 to solve this partitioned reaction system, in which we

48

Algorithm 4 Exact solution of the partitioned stochastic system for the Langevin and deter-ministic approximations.Off-line. Determine the criteria for when and how the set x of m extents of reaction should be parti-tioned. Determine the stoichiometric matrices of the form given in equation (4.1) and reaction propen-sity laws for the unpartitioned (the (m×p)-matrix ν and ak(x)’s) and partitioned cases (the (m− l×p)-matrix νy , the (l×p)-matrix νz , and ay

k(y,z)’s). Also, determine the necessary Langevin or deterministicequations for the fast reactions in the partitioned case.Initialize. Set the time, t, equal to zero. Set the number of species n to n0.

1. If the partitioning criteria established off-line are met, go to step 5.

2. Calculate

(a) the reaction propensities, rk = ak(x), and

(b) the total reaction propensity, rtot =∑m

k=1 rk.

3. Select two random numbers p1, p2 from the uniform distribution (0, 1).Let τ = − log(p1)/rtot.Choose j such that

j−1∑k=1

rk < p2rtot ≤j∑

k=1

rk

4. Let t← t+ τ .Let n← n + νT

j , where νj is the jth row of ν.Go to step 1.

5. For subset y, calculate

(a) the reaction propensities, ryk = byk(y,z), and


∑m−lk=1 r

yk .

6. Select two random numbers p1, p2 from the uniform distribution (0, 1).

7. Determine νz = (νz)T [z(t+ τ)− z(t)] by integrating rytot(t) and the subset of fast reactions z

until the following condition is met:∫ t+τ

t

rytot(t

′)dt′ + log(p1) = 0 s.t. : rytot(t) =

m−l∑k=1

ayk(y,z; t)

8. Let t← t+ τ .Let n← n + νz .

9. Choose j such thatj−1∑k=1

ryk < p2r

ytot(t) ≤

j∑k=1

ryk

Current values of the ryk ’s and ry

tot should be available from step 7.

10. Let n← n +(νy

j

)T , where νyj is the jth row of νy .

Go to step 1.

49

choose only to use one simulation to evaluate equation (4.56) for the Langevin case. Usingmore than one simulation to evaluate equation (4.56) for the Langevin case is also possible.

Over the time interval τ , implementation of this algorithm actually enforces the morestringent requirement that

dP (y)dt

= 0 (4.57)

Hence equation (4.22) is exact, not approximate.

4.2.3 Simulating the Langevin and Deterministic Approximations: ApproximateNext Reaction Time

One major difficulty in this method is satisfying the constraint∫ t+τ

try

tot(t′)dt′ + log(p1) = 0 (4.58)

in step 7 of the algorithm 4 as opposed to the simple algebraic relation for τ used in the un-modified Gillespie algorithm (i.e. step 3 of algorithm 4). This constraint can prove to be com-putationally expensive.

If the reaction propensities for the fast subset of extents z change insignificantly overthe stochastic time step τ , the unmodified Gillespie algorithm can still provide an approximatesolution. When the reaction propensities change significantly over τ , steps can be taken toreduce the error of the Gillespie algorithm. One idea is to scale the stochastic time step τ byartificially introducing a probability of no reaction into the system:

• Let a0dt be the contrived probability, first order in dt, that no reaction occurs in the nexttime interval dt.

This probability does not affect the number of molecules of the modeled reaction system whileallowing adjustment of the stochastic time step by changing the magnitude of a0. Theoretically,as the magnitude of a0 becomes infinite, the total reaction rate becomes infinite. As the totalreaction rate approaches infinity, the error of the stochastic simulation subject to constraintsapproaches zero because the algorithm checks whether or not a reaction occurs at every time.Even though the method outlined by Gillespie [47] and Jansen [68] is “exact”, for this case thereis still error associated with 1) the number of simulations performed since it is a Monte Carlomethod, and 2) integration of the Langevin equations for the fast extents of reaction. Thus itis plausible that these errors may be greater than the error introduced by the approximation.Hence our approximation may often prove to be less computationally expensive than the exactsimulation while generating an acceptable amount of simulation error.

The approximation modifies steps 5-10 of the algorithm 4 with those given by algo-rithm 5.

50

Algorithm 5 Approximate solution of the partitioned stochastic system.5. For subset y, calculate

(a) the reaction propensities, ryk = byk(y,z), and


∑m−lk=0 r

yk .


7. Let τ = − log(p1)/rtot.Integrate subset z over the range [t, t+ τ) to determine νz = (νz)T [z(t+ τ)− z(t)].Let t← t+ τ .Let n← n + νz .

8. Recalculate the reaction propensities ryk’s and the total reaction propensity ry

tot(t). Choosej such that

j−1∑k=0

ryk < p2r

ytot(t) ≤

j∑k=0

ryk

9. Let n← n +(νy

j

)T, where νy

j is the jth row of νy.Go to step 1.

4.3 Practical Implementation

Partitioning of the state x into “fast” and “slow” extents should be intuitive. We recommendmaintaining at least two orders of magnitude difference between the values of the partitionedreaction propensities. It may also be helpful to generate results for a full stochastic simulation,and then identify which reactions are bottlenecks (i.e. ones occurring most frequently). Notethat there may exist several regimes that require different partitioning of the state. Also, careshould be exercised to maintain the validity of the order of magnitude partition between y

and z. It is obviously undesirable for “slow” reaction extents to become the same order ofmagnitude of the “fast” extents during the time increment τ . Finally, nothing precludes onefrom invoking the equilibrium approximation for one subset of fast reactions, and the deter-ministic or Langevin approximation for another subset of reactions. We did not carry out suchan analysis for notational simplicity.

4.4 Examples

We now consider three motivating examples that illustrate the accuracy of the approximations.For clarity, we first briefly review the nomenclature that indicates which approximations, ifany, are performed in a given simulation. We can either perform a purely stochastic simu-

51

Parameter Symbol Valuereaction propensity 4.59a a1(x) k1nEnS

reaction propensity 4.59b a2(x) k2nES

reaction propensity 4.59c a3(x) k3nES

reaction 4.59a rate constant k1 20.reaction 4.59b rate constant k2 200.reaction 4.59c rate constant k3 1.

initial number of E molecules nEo 20initial number of S molecules nSo 10

initial number of ES molecules nESo 0initial number of P molecules nPo 0

Table 4.1: Model parameters and reaction extents for the enzyme kinetics example

lation on the unpartitioned reaction system, or we can partition the system into “fast” and“slow” reactions. For this partitioned case, a stochastic-equilibrium simulation equilibrates thefast reactions, a stochastic-Langevin simulation treats the fast reactions as Langevin equations,and a stochastic-deterministic simulation treats the fast reactions deterministically. We can thensimulate this partitioned reaction system by exact simulation, in which the next reaction timeexactly accounts for the time dependence of the “fast” reactions upon the “slow” reactions; orby an approximate simulation, which neglects this time dependence but scales the next reac-tion time with a propensity of no reaction. For comparison to other approximate techniques,we simulate the simple crystallization example using implicit tau leaping. In contrast to thepartitioning techniques proposed here, tau leaping approximates the number of times everyreaction fires in a fixed time interval using a rate-dependent Poisson distribution. The detailsof this method are presented in Chapter 2.

4.4.1 Enzyme Kinetics

We consider the simple enzyme kinetics problem

E + S k1−→ ES ε1 (4.59a)

ES k2−→ E + S ε2 (4.59b)

ES k3−→ E + P ε3 (4.59c)

The model parameters and the reaction extents are given in Table 4.1. For this example, thefirst and second reactions equilibrate before the expected time of one third reaction. Hence wepartition the extents of reaction (εi’s) as follows:

• ε3 comprises the subset of slow reactions y, and

• ε1 and ε2 comprise the subset of fast reactions z.

52

P

ES

SE

Time

Num

ber

ofM

olec

ules

1086420

20

15

10

5

0

Figure 4.1: Comparison of the stochastic-equilibrium simulation (dashed lines) to exactstochastic simulation (solid lines) based on 50 simulations.

We calculate the averages of all species using fifty simulations sampled at a time inter-val of 0.1 units. We use both the stochastic-equilibrium and exact simulations to compute theseaverages. For the stochastic-equilibrium simulation, solving for the equilibrium distributionin equation (4.52) is easiest if one treats the fast reactions ε1 and ε2 as one extent. Figure 4.1presents the results of the comparison. The stochastic-equilibrium simulation provides anexcellent reconstruction of the mean behavior. The exact simulation requires roughly twenty-three times the amount of computational expense as the stochastic-equilibrium simulation.

We refer the interested reader to Cao, Gillespie, and Petzold [16] for additional exam-ples and discussion on the equilibrium approximation. While their derivation of the equi-librium approximation differs from ours, their simulation algorithm is very similar to ouralgorithm 3.

4.4.2 Simple Crystallization

Consider a simplified reaction system for the crystallization of species A:

2A k1−→ B ε1 (4.60a)

A + C k2−→ D ε2 (4.60b)

The model parameters and the reaction extents are given in Table 4.2. For this example, thefirst reaction occurs many more times than the second reaction. Hence we partition the extentsof reaction (εi’s) as follows 3:

• ε2 comprises the subset of slow reactions y, and3Reactions are partitioned on the basis of the magnitude of their extents, not their rate constants.

53

Parameter Symbol Valuereaction propensity 4.60a a1(x) 1

2k1nA(nA − 1)reaction propensity 4.60b a2(x) k2nAnC

reaction 4.60a rate constant k1 1× 10−7

reaction 4.60b rate constant k2 1× 10−7

initial number of A molecules nAo 1× 106

initial number of B molecules nBo 0initial number of C molecules nCo 10initial number of D molecules nDo 0

Table 4.2: Model parameters and reaction extents for the simple crystallization example

• ε1 comprises the subset of fast reactions z.

We first integrate the system using the implicit tau leap method [117]. We choose atime step of 0.2, and generate Poisson random numbers using code from Numerical Recipes inC [104]. Figure 4.2 demonstrates that this approximation adequately reconstructs the meanand standard deviation for all species.

We next perform an approximate stochastic-Langevin simulation. Here we approxi-mate the fast reaction subset using the Langevin approximation and attempt to reconstructthe first two moments of each species. The Langevin equations are integrated using theEuler-Murayama method [40] with a time increment of 0.01. We account for the time-varyingpropensity of the slow reaction by employing the approximate scheme, setting the propensityof no reaction (a0) to 10. Figure 4.3 compares these results to the exact stochastic results forten thousand simulations. The approximation accurately reconstructs the mean and standarddeviation for all species.

Next, we approximate the fast reaction subset deterministically and attempt to recon-struct the first two moments of each species based upon ten thousand simulations. For thiscase, we consider both the exact and approximate stochastic-deterministic simulations.

Figure 4.4 compares the results of exact stochastic simulation to the exact stochastic–deterministic solution. This approximation does an excellent job of reconstructing all of themeans as well as the standard deviations for species C and D. However, we are not able toreconstruct the standard deviations for species A and B. This phenomenon is expected becauseby approximating ε1 deterministically, we neglect all fluctuations caused by the first reaction.

Figure 4.5 compares the results of exact stochastic simulation to the approximatestochastic-deterministic solution given a small value for the propensity of no reaction, a0. Forthis value of a0, the approximation accurately reconstructs the means of species A and B, butfails to reconstruct the moments of species C and D as well as the standard deviations ofspecies A and B. This phenomenon indicates that the value of a0 is too small. By examiningthe cumulative squared error, however, Figure 4.6 demonstrates that increasing the value of a0

results in comparable error for the approximate and exact stochastic-deterministic simulations.Here, the least squares error is based on the deviation of the species C trajectories between the

54

A

B

(a)

Time

Num

ber

ofM

olec

ules

(×10

−5)

100806040200

10

8

6

4

2

0

B

A

(b)

Time

Num

ber

ofM

olec

ules

100806040200

600

500

400

300

200

100

0

−σ

+σC

(c)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

−σ

+σD

(d)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

Figure 4.2: Comparison of approximate tau-leap simulation (points) to exact stochastic simu-lation (lines) based on 10,000 simulations and time step of 0.2. (a) Comparison of the mean forspecies A and B. (b) Comparison of the standard deviations for species A and B. (c) Compari-son of the mean (C) and standard deviation (±σ) for species C. (d) compares the mean (D) andstandard deviation (±σ) for species D.

approximation techniques and the exact stochastic simulation.

Table 4.3 compares the order of magnitude of the limiting time step for the differentmethods in this example. The major improvement in the approximate methods is that thetime step is now limited by the “slow” reaction time as opposed to the “fast” reaction time.Note that the solution methods for the partitioned reaction system require more computa-tional expense per limiting time step than the exact stochastic solution method. However,we still observed an order of magnitude improvement in computational expense by employ-ing the approximate solution methods. Also, the results indicate that the tau leap method isthe fastest approximation. This result is a little misleading because we employed an implicitfirst-order method for tau leaping, whereas we integrated deterministic equations using stiffpredictor-corrector methods. For a comparison using the same order of method, we expect the

55

A

B

(a)

Time

Num

ber

ofM

olec

ules

(×10

−5)

100806040200

10

8

6

4

2

0

B

A

(b)

Time

Num

ber

ofM

olec

ules

100806040200

600

500

400

300

200

100

0

−σ

+σC

(c)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

−σ

+σD

(d)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

Figure 4.3: Comparison of approximate stochastic-Langevin simulation (points) to exactstochastic simulation (lines) based on 10,000 simulations, propensity of no reaction a0 = 10,and Langevin time step of 0.01. (a) Comparison of the mean for species A and B. (b) Com-parison of the standard deviations for species A and B. (c) Comparison of the mean (C) andstandard deviation (±σ) for species C. (d) compares the mean (D) and standard deviation (±σ)for species D.

stochastic-deterministic simulation to yield slightly faster results than tau leaping because theformer method does not draw any Poisson random variables.

56

A

B

(a)

Time

Num

ber

ofM

olec

ules

(×10

−5)

100806040200

10

8

6

4

2

0

B

A

B

A

(b)

Time

Num

ber

ofM

olec

ules

100806040200

103

102

10

1

−σ

+σC

(c)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

−σ

+σD

(d)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

Figure 4.4: Comparison of exact stochastic-deterministic simulation (points) to exact stochasticsimulation (lines) based on 10,000 simulations. (a) Comparison of the mean for species A andB. (b) Comparison of the standard deviations for species A and B. (c) Comparison of the mean(C) and standard deviation (±σ) for species C. (d) Comparison of the mean (D) and standarddeviation (±σ) for species D.

Solution Method System Type Limiting Time StepO(Time

Step)

RelativeCPUTime

Exact Stochastic unpartitioned fast reaction time O(10−5) 12.3Tau Leap unpartitioned slow reaction time O(0.25) 1.00

Stochastic-Langevin

partitionedslow reaction time

(Langevin integration)O(0.01) 1.31

Stochastic-Deterministic

partitionedslow reaction time (ODE

solver)O(1) 1.40

Table 4.3: Comparison of time steps for the simple crystallization example

57

A

B

(a)

Time

Num

ber

ofM

olec

ules

(×10

−5)

100806040200

10

8

6

4

2

0

B

A

B

A

(b)

Time

Num

ber

ofM

olec

ules

100806040200

103

102

10

1

−σ

+σ

C

(c)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

−σ

+σD

(d)

Time

Num

ber

ofM

olec

ules

1009080706050403020100

10

8

6

4

2

0

Figure 4.5: Comparison of approximate stochastic-deterministic simulation (points) to ex-act stochastic simulation (lines) based on 10,000 simulations and propensity of no reactiona0 = 0.01. (a) Comparison of the mean for species A and B. (b) Comparison of the standarddeviations for species A and B. (c) Comparison of the mean (C) and standard deviation (±σ)for species C. (d) Comparison of the mean (D) and standard deviation (±σ) for species D.

58

Exact

Approximate

(a)

Propensity of No Reaction, a0

Squa

red

Erro

r

1010.10.01

10

1

0.1

Exact

Approximate

(b)

Propensity of No Reaction, a0

Squa

red

Erro

r

1010.10.01

10

1

0.1

Figure 4.6: Squared error trends for the exact and approximate stochastic-deterministic simu-lations based on 10,000 simulations. The squared error is calculated from the deviation of themoments for species C between the approximation techniques and the exact stochastic simula-tion. (a) Plot of the error in the mean of species C. (b) Plot of the error in the standard deviationof species C.

59

Parameter Symbol Valuereaction propensity 4.61a a1(x) k1(template)reaction propensity 4.61b a2(x) k2(genome)reaction propensity 4.61c a3(x) k3(template)reaction propensity 4.61d a4(x) k4(template)reaction propensity 4.61e a5(x) k5(struct)reaction propensity 4.61f a6(x) k6(genome)(struct)

reaction 4.61a rate constant k1 1. day−1

reaction 4.61b rate constant k2 0.025 day−1

reaction 4.61c rate constant k3 1000. day−1

reaction 4.61d rate constant k4 0.25 day−1

reaction 4.61e rate constant k5 1.9985 day−1

reaction 4.61f rate constant k6 7.5×10−6(molecules day)−1

initial number of template molecules templateo 1initial number of genome molecules genomeo 0

initial number of struct molecules structo 0

Table 4.4: Model parameters and reaction extents for the intracellular viral infection example

4.4.3 Intracellular Viral Infection

We now consider a general model of an infection of a cell by a virus. A reduced system modelconsists of the following reaction mechanism [143]:

nucleotidestemplate−−−−−−−→ genome ε1 (4.61a)

nucleotides + genome −−−−−−−→ template ε2 (4.61b)

nucleotides + amino acidstemplate−−−−−−−→ struct ε3 (4.61c)

template −−−−−−−→ degraded ε4 (4.61d)

struct −−−−−−−→ secreted/degraded ε5 (4.61e)

genome + struct −−−−−−−→ secreted virus ε6 (4.61f)

where genome and template are the genomic and template viral nucleic acids respectively,and struct is the viral structural protein. Additional assumptions include:

1. nucleotides and amino acids are available at constant concentrations, and

2. template catalyzes reactions (4.61a) and (4.61c).

We are interested in the time evolution of the template, genome, and struct species. We assumethat the initial “infection” of a cell corresponds to the insertion of one template molecule into

60

struct

genome

template

(a)

Time (Days)

Num

ber

ofM

olec

ules

200150100500

100000

10000

1000

100

10

1

struct

genome

template

(b)

Time (Days)

Num

ber

ofM

olec

ules

543210

1000

100

10

1

Figure 4.7: Intracellular viral infections: (a) typical and (b) aborted.

the cell. The model parameters and reaction extents are presented in Table 4.4.

This model has two interesting features best illustrated by the two exact stochastic sim-ulations presented in Figure 4.7. First, the three components of the model exhibit fluctua-tions that vary by differing orders of magnitude. For the same time scale, the struct speciesfluctuates by hundreds to thousands of molecules, whereas the template and genome speciesfluctuate by tens of molecules. Second, the model solution exhibits a bimodal distribution. Inparticular, a cell may exhibit either a “typical” infection in which all species become populated,or an “aborted” infection in which all species are eliminated from the cell.

When the number of template and struct molecules are greater than zero and one hun-dred respectively, reactions 4.61c and 4.61e occur many more times than the remaining reac-tions. Hence when template > 0 and struct > 100, we partition the system as follows:

61

Solution Method System Type Relative CPU TimeExact Stochastic unpartitioned 51.5

Stochastic-Deterministic partitioned 1

Table 4.5: Simulation time comparison for the intracellular viral infection example

• ε1, ε2, ε4, and ε6 comprise the subset of slow reactions y, and

• ε3 and ε5 comprise the subset of fast reactions z.

Figure 4.7 indicates that the simulation should traverse between the partitioned and unparti-tioned reaction systems. Since our approximation makes fast reactions continuous events asopposed to discrete ones, we round all species when transitioning from the approximate to ex-act stochastic simulation to prevent non-integer values. This rounding only affects the structspecies, and therefore introduces negligible error into the system.

We choose to approximate the fast reaction subset deterministically, so we employ theapproximate stochastic-deterministic simulation with propensity of no reaction a0 = 0. Wecompare the approximate stochastic-deterministic simulation to the exact stochastic simula-tion by reconstructing the statistics for each species based upon one thousand simulations. Wealso compare the evolution of the mean for these two simulations to the solution of the purelydeterministic model.

Figures 4.8 through 4.10 compare the time evolution of the probability distribution fortemplate, the small numbered species. These figures indicate that the approximate stochastic-deterministic simulation accurately reconstructs the entire template probability distribution.Note that the purely deterministic model, however, is unable to accurately reconstruct eventhe evolution of the mean. This phenomenon occurs because the deterministic model cannotdescribe the bimodal nature of the probability density.

Figure 4.11 compares the evolution of the mean and standard deviation for the genomespecies. Again, the approximate simulation accurately reconstructs the time evolution of thesemoments.

Figure 4.12 compares the evolution of the mean and standard deviation for the struct,the large numbered species. Surprisingly, the approximate stochastic-deterministic simula-tion accurately reconstructs the time evolution of both of these statistics. Since we approxi-mated the fast reactions deterministically, we did not expect to accurately reconstruct momentshigher than the mean for the large numbered species. For this example, though, fluctuations inthe small numbered species, template, are amplified into the struct species via reaction 4.61c.Thus we are able to accurately reconstruct moments of order higher than zero.

Table 4.5 compares the computational expense between the exact stochastic and ap-proximate stochastic-deterministic solution methods. The approximate solution method re-sults in a fifty-fold reduction in computational expense over the exact solution method.

62

0

10

20

30 0

50

100

150

200

0

0.05

0.1

Time (days)Template Molecules

Pro

ba

bili

ty

(a)

0

10

20

30 0

50

100

150

200

0

0.05

0.1

Time (days)Template Molecules

Pro

ba

bili

ty

(b)

Figure 4.8: Evolution of the template probability distribution for the (a) exact stochastic and(b) approximate stochastic-deterministic simulations.

4.5 Critical Analysis of the Stochastic Approximations

The primary contribution of this work is the idea of partitioning a purely stochastic reactionsystem using extents of reaction into subsets of slow and fast reactions. Using order of magni-tude arguments, we can derive approximate Markov evolution equations for the slow extentmarginal and the fast extents conditioned on the slow extents. The evolution equation forthe fast extents conditioned on the slow extents is a closed-form expression, whereas the evo-

63

ApproximateStochastic

(a)

Time (Days)

Prob

abili

ty

200150100500

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

(b)

Template Molecules

Prob

abili

ty

454035302520151050

0.3

0.25

0.2

0.15

0.1

0.05

0

Figure 4.9: Comparisons of the (a) (template = 0,t) and (b) (template,t = 200 days) cross-sectionsof the template probability distribution for the exact stochastic (solid line) and approximatestochastic-deterministic (dashed line) simulations.

lution equation for the slow extent marginal depends on this conditional probability. Usingrelaxation time arguments, we can propose two approximations for the fast extents: an equilib-rium approximation when the fast extents relax faster than the slow extents, and a Langevin ordeterministic approximation when both fast and slow extents exhibit similar relaxation times.The equilibrium assumption is similar in nature to the slow-reaction simulation recently pro-posed in the literature by Cao, Gillespie, and Petzold [16]. In contrast to this approach, webelieve that our approach has a much tighter connection to the original master equation.

By equilibrating the fast reaction subset, we can substantially reduce the computationalrequirement by integrating the system over a much larger time step than the exact stochasticsimulation. This method requires solving for the equilibrium distribution of the fast reactions.

64

−σ

+σ templateDeterministic

template

Time (Days)

Tem

plat

eM

olec

ules

200150100500

25

20

15

10

5

0

-5

Figure 4.10: Comparison of the template mean and standard deviation (±σ) for exact stochastic(solid lines), approximate stochastic-deterministic (long dashed lines), and deterministic (shortdashed lines) simulations.

−σ

+σ genomeDeterministic

genome

Time (Days)

Gen

ome

Mol

ecul

es

200150100500

250

200

150

100

50

0

-50

Figure 4.11: Comparison of the genome mean and standard deviation (±σ) for exact stochastic(solid lines), approximate stochastic-deterministic (dashed lines), and deterministic (points)simulations.

If there are few fast extents or many of fast extents are independent of one another, then exactlysolving for this distribution is possible as illustrated by the enzyme kinetics example. If thereare a large number of coupled fast extents, then exact solution may not be computationally

65

−σ

+σ structDeterministic

struct

Time (Days)

Stru

ctM

olec

ules

(×10

−3)

200150100500

14

12

10

8

6

4

2

0

-2

Figure 4.12: Comparison of the structural protein (struct) mean and standard deviation (±σ)for exact stochastic (solid lines), approximate stochastic-deterministic (dashed lines), and de-terministic (points) simulations.

feasible. For example, consider the coupled, fast reactions

A + E −− B + E −− C + E −− D + E

A minimal representation of these reactions requires three (reversible) extents of reaction,which is difficult to solve given a reasonable number of molecules for each species.

By approximating the fast reaction subset using Langevin equations, we can reduce thecomputational requirement by integrating the system over a much larger time step than theexact stochastic simulation. However, we must now employ schemes for integrating stochas-tic differential equations. By approximating the fast reaction subset deterministically, we canbound the computational requirements for simulation of the system. For this case, we can em-ploy existing and robust ordinary differential equation solvers for integration of this reactionsubset. In contrast, the computational expense for exact stochastic simulation scales with thenumber of reaction events. For an example, reconsider simulation of the simple crystallizationsystem presented in section 4.4.2. Doubling the initial amount of A doubles the number oftimes the fast reaction must occur, and thus significantly increases the computational load ofan exact stochastic simulation. On the other hand, if the fast reaction is approximated deter-ministically, then doubling the initial amount of A does not require stochastic simulation ofany additional reaction events, and thus results in no change in the computational load.

The partitioning techniques presented here sacrifice some numerical accuracy for abound on the computational load. By equilibrating some fast reactions, one cannot expectto accurately reconstruct statistics for species affected by these fast reactions at very fine timescales. However, we are often interested in the macroscopic behavior of the system, and it

66

may not be possible to even observe a physical system at such fine time scales. Approximatingsome discrete, molecular reaction events as continuous events via the Langevin approximationloses the discrete nature of the entire system. However, as illustrated by the simple crystalliza-tion example, this approximation still accurately reconstructs at least the first two moments ofeach reacting species. Furthermore, approximating fast reactions deterministically eliminatesall fluctuations contributed to the system by these reactions. Depending upon the system andthe modeling objective, though, these sacrifices may be acceptable. In the simple crystalliza-tion example, the stochastic-deterministic simulations accurately reconstructed the means ofall species as well as the standard deviations for the small numbered species. If fluctuationsin the larger species are not of interest, then these results are acceptable. In the intracellularviral infection example, the approximate stochastic-deterministic simulation accurately recon-structed the evolution of the probability distribution for the small numbered species, as wellas the means and standard deviations for the large numbered species. Here, amplificationof fluctuations from the small to large numbered species (template to struct) led to accurateestimates of the statistics of large numbered species.

A secondary contribution of this work is an approximate simulation for master equa-tions subject to time-varying constraints. As demonstrated by the simple crystallization ex-ample, this approximate simulation approaches the accuracy of the exact simulation as themagnitude of the propensity of no reaction increases. This approximation is most useful forcases in which the total reaction rate, rtot, is not integrable analytically. For this case, we mustuse an ODE solver with a stopping criterion to determine the next reaction time. Since callingsuch an ODE solver requires some overhead computational expense, performing the approxi-mate simulation may be computationally favorable.

The work presented here reflects only a fraction of the approximations that shouldprove useful for simulating stochastic chemical kinetics. For example, one could simulatefast reactions using tau-leaping schemes instead of deterministic or Langevin approximations.Also, we did not address the quasi-steady state assumption (QSSA). In a deterministic setting,the QSSA equilibrates the rate of change for a given chemical species. In terms of our previousexample, reaction 4.27, such an assumption would set 0 = a1(x) − a2(x) − a3(x). For thediscrete master equation, however, it is unlikely that such a situation can arise due to theinteger nature of all chemical species. The most likely situation is for either ε1 > 0 and ε2 =ε3 = 0, or ε2, ε3 ε1. In this case, we would expect to almost never find a B molecule in anexact simulation. Although Rao and Arkin [113] recently addressed this issue, they assumeda Markovian form for their governing master equation rather than derive it directly from theoriginal master equation (4.2). A tighter connection between the original and approximatesystems should be possible.

We believe that the future of stochastic simulation lies in software packages that can

1. adaptively partition reactions into subsets, using appropriate approximations for eachsubset (i.e. exact, Poisson, Langevin, or deterministic approximations); and

2. adaptively adjust the integration time step to control the error induced at each step.

67

For reconstruction of only the mean and variance, this software should dramatically reducethe amount of computational expense required to generate approximate realizations from theunderlying master equation.

We envision that the primary benefit of the tools presented in this work is bridging thegap from the microscopic to the macroscopic. In particular, researchers are becoming increas-ingly interested in modeling nanomaterials, phenomena at interfaces, and site interactions oncatalysts. In each of these problems, macroscopic interactions in the bulk influence microscopicinteractions at interfaces. Although most of the action is at the interface, we cannot neglect thebulk or we lose the ability to model the effect of process design and control strategies. Thetechniques presented here provide one method of modeling these interactions.

Notation

A matrix of reaction propensitiesaj(n) jth reaction propensity (rate)bj(y,z) jth slow reaction rate averaged over values of the fast extentsbj(y,z) jth slow reaction ratecj(y,z) jth fast reaction rateI identity matrixkj rate constant for reaction kN number of Monte Carlo simulationsnj number of molecules for species jnjo initial number of molecules for species jn number of molecules for all reaction speciesn0 initial number of molecules for all reaction speciesn0,j jth initial number of molecules for all reaction speciesP probability vector for all possible values of the extents of reactionP probabilityPA approximate probability (reduced by order of magnitude arguments)p random number from the uniform distribution (0, 1)rtot sum of reaction ratesry

tot sum of reaction rates for the slow reaction partitiont timeW vector of Wiener processesx state of the system in terms of extentsy subset of slow reaction extentsz subset of fast reaction extentsz subset of fast reaction extents scaled by Ωε extent of reactionµ one possible reaction in the stochastic kinetics frameworkν stoichiometric matrix

68

σ standard deviationτ time of the next stochastic reactionΩ characteristic system size

69

Chapter 5

Sensitivities for Stochastic ModelsRecently, models of isothermal, well-mixed stochastic chemical kinetics and Monte Carlo tech-niques for simulating these models have garnered significant attention from researchers in awide variety of disciplines. This chapter considers a next logical step in applying these mod-els: performing systems level tasks such as parameter estimation and steady-state analysis.One useful quantity in performing these tasks is the sensitivity. Various methods for calcu-lating sensitivities of the underlying probability distribution and its moments are considered.For nontrivial models, the most computationally efficient method of evaluating the sensitivityconsists of coupling an approximate evolution equation for the sensitivity with Monte Carloreconstruction of the desired moments. Several parameter estimation and steady-state analy-sis examples demonstrate that, for systems level tasks, this approximation is well suited. Wealso show that highly-accurate sensitivities are not critical because optimization algorithmsgenerally converge without exact gradients.

This chapter is organized as follows. First we review the chemical kinetics master equa-tion and define the sensitivity of moments of this equation with respect to model parameters.Next we propose and compare several methods for calculating approximations of the sensitiv-ities with an eye on computational efficiency. Finally we illustrate how to use the sensitivitiesfor (1) calculating parameter estimates for several linear and nonlinear kinetic models and (2)performing steady-state analysis.

5.1 The Chemical Master Equation

The governing equation for the system of interest is again the chemical master equation. In thiscase, however, we consider the dependence of the master equation upon the set of parametersθ

dP (n, t;θ)dt

=m∑

k=1

ak(n− νk,θ)P (n− νk, t;θ)− ak(n,θ)P (n, t;θ) (5.1)

in which

• n is the state of the system in terms of number of molecules (a p-vector),

• θ is a vector containing the system parameters (an l-vector),

70

• P (n, t; θ) is the probability that the system is in state n at time t given parameters θ,

• ak(n, θ)dt is the probability to order dt that reaction k occurs in the time interval [t, t+dt),and


Here, we assume that the initial condition P (n, t0;θ) is known.One useful quantity in performing systems level tasks is the sensitivity. We consider

in the next section the calculation of the sensitivity for stochastic systems governed by thechemical master equation.

5.2 Sensitivities for Stochastic Systems

The sensitivity indicates how responsive the state is to perturbations of a given parameter. Forthe master equation (5.1), the state is the probability P (n, t;θ), and its sensitivity is

s(n, t;θ) =∂P (n, t;θ)

∂θ(5.2)

Here, s(n, t;θ) is an l-vector. We derive the evolution equation for this sensitivity by differen-tiating the master equation (5.1) with respect to the parameters θ

∂

∂θ

dP (n, t;θ)dt

=∂

∂θ

m∑k=1

ak(n− νk,θ)P (n− νk, t;θ)− ak(n,θ)P (n, t;θ) (5.3)

ds(n, t;θ)dt

=m∑

k=1

ak(n− νk,θ)s(n− νk, t;θ)− ak(n,θ)s(n, t;θ)+

∂ak(n− νk,θ)∂θ

P (n− νk, t;θ)− ∂ak(n,θ)∂θ

P (n, t;θ) (5.4)

We make two observations about equation (5.4):

1. it is linear in the sensitivity s(n, t;θ) and

2. solution of this equation requires simultaneous solution of the master equation (5.1), butnot vice versa.

For engineering purposes, we are interested in moments of the probability distribution,i.e.

g(n) =∑n

g(n)P (n, t;θ) (5.5)

in which g(n) and g(n) are q-vectors. For example, we might seek to implement control movesthat drive the mean system behavior towards a desired set point. Such tasks require knowledgeof how sensitive these moments are with respect to the parameters. The master equation (5.1)

71

indicates that the probability distribution evolves continuously with time; consequently, mo-ments of this distribution (assuming that they are well defined) evolve continuously as well.Therefore we can simply differentiate equation (5.5) with respect to the parameters to definethe sensitivity of these moments, s(g(n)), as follows:

∂

∂θTg(n) =

∂

∂θT

∑n

g(n)P (n, t;θ) (5.6)

s(g(n), t;θ) =∑n

g(n)s(n, t;θ)T (5.7)

Here, s(g(n), t;θ) is a q × l matrix. Equation (5.7) indicates that these sensitivities dependupon the sensitivity of the master equation, s(n, t;θ). Therefore, the exact solution of s(g(n))requires solving the following set of coupled equations:

dP (n, t;θ)dt

=m∑

k=1

ak(n− νk,θ)P (n− νk, t;θ)− ak(n,θ)P (n, t;θ) (5.8a)

ds(n, t;θ)dt

=m∑

k=1

ak(n− νk,θ)s(n− νk, t;θ)− ak(n,θ)s(n, t;θ)

+∂ak(n− νk,θ)

∂θP (n− νk, t;θ)− ∂ak(n,θ)

∂θP (n, t;θ) (5.8b)

s(g(n), t;θ) =∑n

g(n)s(n, t;θ)T (5.8c)

Exact solution of even just the master equation (5.1) is computationally intractable for all butthe simplest systems. Consequently, exact calculation of both the master equation and its sen-sitivity (i.e. equation (5.8)) is also intractable in general. However, Monte Carlo methods suchas those proposed by Gillespie [45, 46] and Gibson and Bruck [43] can reconstruct momentsof the master equation to some degree of precision (error associated with the finite number ofsimulations corrupts these reconstructed quantities). In the next section, we examine methodsfor reconstructing the sensitivities given only information about how moments of the masterequation evolve.

5.2.1 Approximate Methods for Generating Sensitivities

Approximate methods of generating sensitivities for this system include

1. deriving an approximate model for the sensitivity of a desired moment and

2. applying finite difference schemes.

The primary benefit of these alternatives is that they require only reconstruction of the desiredmoment, not necessarily via solution of the master equation (5.1). For systems level tasks

72

such as parameter estimation and steady-state analysis, we are particularly interested in thedynamic behavior of the mean n

n =∑n

nP (n, t;θ) (5.9)

and its sensitivitys = s(n, t;θ) =

∑n

ns(n, t;θ)T (5.10)

in which n is a p-vector and s is a p × l matrix. We consider deriving approximations for themean sensitivity s subsequently. We note that the sensitivity for any moment could be derivedand calculated similarly.

5.2.2 Deterministic Approximation for the Sensitivity

Combining equations (5.4) and (5.10) yields the following evolution equation for the meansensitivity s

ds

dt=

∂

∂θT

∑n

m∑k=1

n (ak(n− νk,θ)P (n− νk, t;θ)− ak(n,θ)P (n, t;θ)) (5.11)

=∂

∂θT

m∑k=1

(∑n

nak(n− νk,θ)P (n− νk, t;θ)−∑n

nak(n,θ)P (n, t;θ)

)(5.12)

=∂

∂θT

m∑k=1

(∑n

(n + νk)ak(n,θ)P (n, t;θ)−∑n

nak(n,θ)P (n, t;θ)

)(5.13)

=∂

∂θT

m∑k=1

∑n

νkak(n,θ)P (n, t;θ) (5.14)

Consider a Taylor series expansion of ak(n,θ) about the mean value n

ak(n,θ) = ak(n,θ) +∂ak(n,θ)∂nT

∣∣∣∣n=n

(n− n) +12(n− n)T ∂2ak(n)

∂n∂nT

∣∣∣∣n=n

(n− n) + · · · (5.15)

One approximation consists of incorporating only the first two terms of the expansion (5.15)into equation (5.14) to obtain

ds

dt≈ ∂

∂θT

m∑k=1

νk

∑n

(ak(n,θ) +

∂ak(n,θ)∂nT

∣∣∣∣n=n

(n− n))P (n,θ; t) (5.16)

=∂

∂θT

m∑k=1

νkak(n,θ) (5.17)

=∂

∂θTνa(n,θ) (5.18)

= ν

(∂a(n,θ)∂nT

∂n

∂θT+∂a(n,θ)∂θT

∂θ

∂θT

)(5.19)

73

ds

dt≈ ν

(∂a(n,θ)∂nT

s +∂a(n,θ)∂θT

)(5.20)

in whicha(n,θ) =

[a1(n,θ) · · · am(n,θ)

]T(5.21)

Equation (5.20), then, is the first-order approximation of the sensitivity evolution equationassuming that the mean n is known. Logically, then, we must specify how we plan on calculatingthe mean. Clearly we can also approximate the mean evolution equation using the first twoterms of the truncated Taylor series expansion (5.15) as follows:

dn

dt=

m∑k=1

∑n

νkak(n)P (n, t;θ) (5.22)

≈m∑

k=1

νk

∑n

(ak(n,θ) +

∂ak(n,θ)∂nT

∣∣∣∣n=n

(n− n))P (n,θ; t) (5.23)

=m∑

k=1

νkak(n,θ) (5.24)

dn

dt≈ νa(n,θ) (5.25)

Equation (5.25) is the usual deterministic approximation of the chemical master equation [154].In general, the mean behavior of the chemical master equation does not obey the determin-istic equation (5.25); see Arkin, Ross, and McAdams [3] and Srivastava, You, Summers, andYin [143] for recent biological examples of this phenomenon. Therefore, we do not advisecalculating both the mean and the sensitivity in this fashion.

We propose to estimate the mean by averaging the results of multiple Monte Carlosimulations, and to approximate the sensitivity of the mean using equation (5.20). Since boththe mean and the sensitivity are linear functions, exchanging the order of evaluation is valid.So the following strategies are equivalent:

1. Evaluate sk, nk for every simulation using equation (5.20), in which nk denotes the kthMonte Carlo simulation of n. Since the reaction rate vector a(n,θ) is constant betweenreaction events, equation (5.20) can be solved exactly via a matrix exponential [21]. Fi-nally, calculate s = E[sk], in which E[n] denotes the expectation of n.

2. Evaluate nk for every simulation, calculateE[nk], then calculate s usingE[nk] and equa-tion (5.20).

The first option is presumably the more computationally expensive option since exact solutionof equation (5.20) requires evaluation of a matrix exponential for every reaction step. Thesecond option, however, may experience difficulties because

• depending on the behavior of the mean n, explicit strategies for evaluating equation (5.20)(e.g. Runge-Kutta methods) may require small time steps to ensure stability; and

74

• random noise associated with the finite number of Monte Carlo simulations may induceinaccuracies for higher-order methods.

In spite of these problems, we advocate using the second option to calculate the approximatesensitivity if performing Monte Carlo simulations is computationally expensive.

We note that elementary chemical reactions are generally bimolecular. For this case,the Taylor series expansion consists exactly of the first three terms of equation (5.15), and weexpect that equation (5.20) adequately approximates the true sensitivity. For unimolecular orzero-order reactions, the Taylor series expansion is exact, so equation (5.20) is exact.

Finally, reducing the master equation to a series of moments truncates some of theinformation contained by the probability distribution of the initial state. For the remainderof this chapter, we assume that this probability distribution is a delta function at the initialmean value. Our method is not restricted to this particular choice of distribution, however.Rather, one may set this initial distribution arbitrarily via proper configuration of the MonteCarlo simulations used to reconstruct the desired moments as discussed in Chapter 4 (seeequation (4.3)).

5.2.3 Finite Difference Sensitivities

For finite differences, we assume that we have some evolution equation for the mean n thatdepends on the system parameters θ

nk+1 = F (nk;θ,Ω) (5.26)

Here, the notation nk denotes the value of the mean n at time tk. Also, Ω denotes the stringof random numbers used to propagate the state. Recall that the sensitivity s indicates howsensitive the mean is to perturbations of a given parameter, i.e.

sk =∂nk

∂θT(5.27)

We could then approximate the jth column of the desired sensitivity using, for example, acentral difference scheme:

sk+1,j =F (nk;θ + δcj ,Ω1)− F (nk;θ − δcj ,Ω2)

2δ+ i ·O(δ2) (5.28)

Here, δ is a small positive constant, cj is the jth unit vector, and i is a vector of ones. If we usethe mean of Monte Carlo simulations to determine the state propagation function F (nk;θ,Ω)and choose Ω1 6= Ω2, then we have essentially amplified the error associated with the finitenumber of simulations into evaluation of equation (5.28). On the other hand, evaluating themeans by using the same strings of random numbers, i.e. Ω1 = Ω2, eliminates this amplifica-tion. However, we now have the potential of choosing a sufficiently small, non-zero perturba-tion such that F (nk;θ+δcj ,Ω1) = F (nk;θ−δcj ,Ω2). If we choose the parameter perturbationto be too large, then the O(δ2) is not negligible in equation (5.28). Hence special care must be

75

taken in the selection of the perturbation δ. The subsequent chapter, Chapter 6, discussesthese subtleties in greater detail. Finally, the computational expense of this method may beprohibitive if evaluating the mean is computationally intensive because calculating the sensi-tivity requires, in this case, two mean evaluations per parameter. In contrast, calculating theadditional sensitivities using the approximate calculation of equation (5.20) does not requireany additional stochastic simulations.

Raimondeau, Aghalayam, Mhadeshwar, and Vlachos recently examined using finitedifferences to calculate sensitivities for kinetic Monte Carlo simulations [105]. However, theyuse only a single simulation to generate their sensitivity and require relatively large parameterperturbations to generate measurable changes in model responses (one of their examples usesa parameter perturbation of approximately 30%). These authors make no appeal to the masterequation nor to the fact that the mean should be a smoothly-varying function. We interprettheir approach as a mean sensitivity calculation using a poor reconstruction of the mean. Dueto the large choice of parameter perturbation, we infer that the authors did not use the samestrings of random numbers to evaluate equation (5.28), i.e. Ω1 6= Ω2.

Drews, Braatz, and Alkire also recently examined using finite differences to calculatesensitivities for kinetic Monte Carlo code simulating copper electrodeposition [25]. These au-thors consider the specific case of the mean sensitivity, and derive finite differences for caseswith significant finite simulation error. In these cases, the finite simulation error is greaterthan higher-order contributions of the finite difference expansion, so the authors derive first-order finite differences that minimize the variance of the finite simulation error. No appeal ismade to the master equation, and they implicitly assume that the mean should be a smoothly-varying function. Their computational requirements certainly motivate the approximationsmade in this chapter, however. Each simulation required on average 64 hours to complete,and the total computational requirement was 92,547 hours for 22 parameters. Additionally,the authors employed perturbations of +100% and −50%, so the accuracy of the finite differ-ence is questionable. Solving for the approximate sensitivity would require only one meanevaluation (roughly 1400 hours) plus the computational time required for the sensitivity cal-culation, a computational savings of at least an order of magnitude. These authors have alsochosen a rather large parameter perturbation, again leading us to infer that they did not usethe same strings of random numbers to evaluate equation (5.28), i.e. Ω1 6= Ω2.

5.2.4 Examples

We now illustrate these different methods of calculating the sensitivity with two simple exam-ples. For clarity, we first briefly review the nomenclature that indicates which approximations,if any, are performed in a given simulation. We can either reconstruct the mean exactly by solv-ing the master equation, or approximately via Monte Carlo simulation. Given a reconstructionof the mean, we can then calculate the sensitivity using the approximate equation (5.20), or byfinite differences, i.e. equation (5.28). Solving the exact sensitivity of the mean requires solu-tion of equation (5.8), namely the master equation, the desired moment, and their respective

76

Finite DifferenceApproximate

Exact

Time

s

1086420

200

150

100

50

0

Figure 5.1: Comparison of the exact, approximate, and central finite difference sensitivities fora second-order reaction.

sensitivities.

Second-Order Reaction Example

We consider the simple second-order reaction

A→ B a1 =12k1nA(nA − 1) (5.29a)

with initial condition nA,o = 20 and nB,o = 0, and k1 = 0.0333. For this example, we define

x =[nA nB

]T, θ = k1, s = ∂nB/∂k1

The reaction rate is nonlinear, implying that equation (5.20) is an approximation of the ac-tual sensitivity. We solve for the exact sensitivity. We also reconstruct the mean via MonteCarlo simulation, then calculate the sensitivity by both the approximate equation (5.20) andcentral finite differences. Each mean evaluation is calculated by averaging fifty Monte Carlosimulations. Additionally, we perturbed k1 by 10% to generate the finite difference sensitivity.Figure 5.1 compares the exact, approximate, and central finite difference sensitivities. For thisexample, the exact and approximate sensitivities are virtually identical. The central finite dif-ference sensitivity, on the other hand, yields a very noisy and poor reconstruction at roughlytwice the cost of the approximate sensitivity. Performing more Monte Carlo simulations pereach mean evaluation would improve this estimate at the expense of additional computationalburden.

77

nAo = 20

Approximate

Exact

Time

s

1086420

14

12

10

8

6

4

2

0

Figure 5.2: Comparison of the exact and approximate sensitivities for the high-order rate ex-ample.

High-Order Reaction Example

We consider the simple set of reactions

A −→ B a1 =k1nA

1 +KnA(5.30a)

B −→ A a2 = k2nB (5.30b)

with initial condition nA,o = 20 and nB,o = 0, and parameters k1 = 4.0, k2 = 0.1, andK = 20/nA,o. For this example, we define

x =[nA nB

]T, θ = K, s = ∂nA/∂K

The first reaction rate is nonlinear, implying that equation (5.20) is an approximation of theactual sensitivity. In fact, for this case the Taylor series expansion (5.15) has an infinite num-ber of terms. We solve for the exact sensitivity. We also reconstruct the mean exactly, thensolve for the sensitivity via the approximate equation (5.20). Figure 5.2 plots this comparison,and demonstrates a large discrepancy between the exact and approximate sensitivities. As theinitial number of A molecules increases, Figure 5.3 shows that the relative error between theexact and approximate sensitivities decreases. This trend is expected because, in the thermo-dynamic limit (i.e. x → ∞, Ω → ∞, z = x/Ω → constant), the chemical master equationreduces to a deterministic evolution equation for the concentrations z of the form given by thefirst-order approximation of the mean, equation (5.25) [76].

Next, we consider reconstructing the mean of the system via Monte Carlo simulation,and evaluate the sensitivity by both the approximate equation (5.20) and central finite differ-ences. For this example, we set the initial condition nA,o = 20. Each mean evaluation is calcu-lated by averaging fifty Monte Carlo simulations. Figure 5.4 compares the exact, approximate,

78

nAo = 400

nAo = 200

nAo = 20

Time

(s−s)/s

1086420

0.6

0.5

0.4

0.3

0.2

0.1

0

Figure 5.3: Relative error of the approximate sensitivity s with respect to the exact sensitivitys as the number of nA,o molecules increases for the high-order rate example.

Finite DifferenceApproximate

Exact

s

Time1086420

12

10

8

6

4

2

0

Figure 5.4: Comparison of the exact, approximate, and finite difference sensitivity for the high-order rate example.

and central finite difference sensitivities. The approximate sensitivity differs significantly fromthe exact sensitivity at later times but compares favorably with the approximate sensitivity ob-tained from using an exact reconstruction of the mean (i.e. Figure 5.2). Therefore the error inthe approximate sensitivity is due to the truncation of the Taylor series expansion and not theMonte Carlo simulations. We perturbed K by 10% to generate the finite difference sensitivity.This sensitivity better approximates the exact sensitivity, but this method amplifies the errorassociated with the finite number of simulations. Additionally, the computational expense isroughly twice that required for the approximate sensitivity. Finally, we note that this computa-

79

tional expense results from perturbing only a single parameter. If we had required sensitivitiesfor all parameters (k1, k2, and K), the computational expense would triple since the requirednumber of simulations scales linearly with the desired number of sensitivities. In contrast,determining additional sensitivities using the approximate calculation does not require anyadditional stochastic simulations.

5.3 Parameter Estimation With Approximate Sensitivities

The goal of parameter estimation is to determine the set of parameters that best reconciles themeasurements with model predictions. The classical approach is to assume that measurementsare corrupted by normally distributed noise. Accordingly, we calculate the optimal parametersvia the least squares optimization

minθ

Φ =12

∑k


s.t.: xk+1 = F (xk,θ) (5.31b)

ek = yk − h(xk) (5.31c)

in which ek’s denote the difference between the measurements yk and the model predictionsh(xk), and Π is the covariance matrix for the measurement noise. For the optimal set ofparameters, the gradient ∇θΦ is zero. We can numerically evaluate the gradient according to

∇θΦ =∂

∂θT

12

∑k

eTk Π−1ek (5.32)

= −∑

k

(∂h(xk)∂xT

k

∂xk

∂θT

)T

Π−1ek (5.33)

= −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1ek (5.34)

Equation (5.34) indicates that the gradient depends upon sk, the sensitivity of the state withrespect to the parameters.

In general, most experiments do not include many replicates due to cost and time con-straints. Therefore, the best experimental data we are likely to obtain is the average. In fittingthese data to stochastic models governed by the master equation, we accordingly choose themean n as the the state of interest. Monte Carlo simulation and evaluation of equation (5.20)provide estimates of the mean and the sensitivities. Since equation (5.20) is approximate (first-order with respect to the mean), evaluating the gradient using this sensitivity is also approxi-mate.

For the sake of illustration, we obtain optimal parameter estimates using an optimiza-tion scheme analogous to the Newton-Raphson method. In particular, we perform a Taylor

80

series expansion of the gradient around the current parameter estimate θk to generate the nextestimate θk+1

∇θΦ|θk+1≈ ∇θΦ|θk

+ ∇θθΦ|θk(θk+1 − θk) (5.35)

Since we desire the gradient at the next iterate to be zero,

θk+1 = θk −(∇θθΦ|θk

)−1∇θΦ|θk

(5.36)

Differentiating the gradient (i.e. equation (5.34)) yields the Hessian

∇θθΦ =∂

∂θT

(−∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1ek

)(5.37)

= −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1∂h(xk)∂xT

k

sk +(∂h(xk)∂xT

k

∂2xk

∂θk∂θTk

)T

Π−1ek (5.38)

Making the usual Gauss-Newton approximation for the Hessian (i.e. ek ≈ 0), we obtain

∇θθΦ ≈ −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1∂h(xk)∂xT

k

sk (5.39)

Finally, since we estimate both the mean and the sensitivities using Monte Carlo simulations,the finite number of simulations introduces some error into both of these estimates. Properlyspecifying a convergence criteria for this method must take this error into account.

Raimondeau, Aghalayam, Mhadeshwar, and Vlachos argue that using kinetic MonteCarlo simulation to perform parameter estimation is too computationally expensive [105].They claim that a model with two to three parameters requiring 0.5 hours per simulation needsroughly 105 function evaluations for direct optimization. We believe that the actual numberof function evaluations required for direct optimization is significantly lower if one uses theapproximate sensitivity coupled with the optimization scheme presented in this section. Inthe next example, we demonstrate that surprisingly few function evaluations lead to accurateparameter estimates.

5.3.1 High-Order Rate Example Revisited

We consider parameter estimation for the high-order rate example reactions (5.30). Our “ex-perimental data” consists of the time evolution of species A obtained from the average of fiftyMonte Carlo simulations. We assume that the values of k1 and k2 are known, and attemptto estimate K using the Newton-Raphson method described in the previous section. Sensi-tivities for this method are obtained using both the approximate and central finite differencesensitivities. For each method, mean evaluations are calculated by averaging fifty Monte Carlosimulations using the same strings of random numbers (note that a different string of randomnumbers is used to generate the experimental data). Hence calculation of the approximate

81

(a)

Actual (K = 1)

Iteration

K

20181614121086420

21.81.61.41.2

10.80.60.4

(b)

Time

Mea

sure

men

th(E

(x))

1086420

20181614121086420

Figure 5.5: Comparison of the (a) parameter estimates per Newton-Raphson iteration and (b)model fit at iteration 20 using the approximate (dashed line) and finite difference (solid line)sensitivities for the high-order rate example. Points represent the actual measurement data.

sensitivity requires one mean evaluation per iteration, while the finite difference sensitivityrequires three mean evaluations (K, K − δ, and K + δ). We perturbed K by 10% to calculatethe central finite difference sensitivity.

Figure 5.5 plots the results of this parameter estimation. Both sensitivities lead to cor-rect estimation of the parameter K in approximately the same number of Newton-Raphsoniterations. Clearly the error in the approximate sensitivity does not significantly hinder thesearch. Additionally, neither method converges to the true parameter value. This phenomenonresults due to the fact that different strings of random numbers are used to generate the exper-imental data and the data used to estimate the parameter K. Finally, the estimation using thecentral finite difference required roughly three times the computational expense of that usingthe approximate sensitivity.

82

5.4 Steady-State Analysis

Exact determination of steady states requires solving for the stationary state of the masterequation (5.1). The difficulty of this task is comparable to that of solving the dynamic response.The next logical question, then, is if we can determine steady states from simulation. Unfortu-nately, we can only reconstruct the entire probability distributions from an infinite number ofsimulations. Given a finite number of simulations, we can reconstruct only a limited numberof moments. Hence we can seek to find a steady state consistent with this desired number ofmoments.

An additional complication associated with simulation is that we only have informa-tion from integrating the model forward in time. At steady state, we know that

xk+1 = xk = steady state (5.40)

Thus, we propose two methods for determining steady states from simulation:

1. Run Monte Carlo simulation for a long time.

2. Guess the steady state.

Check: xk+1 = xk for short simulation?

If not, use a Newton-Raphson search algorithm to search for an improved estimate ofthe steady state:

(xk+1 − xk)|θj+1≈ (xk+1 − xk)|θj

+∂

∂θT(xk+1 − xk)

∣∣∣∣θj

(θj+1 − θj) (5.41)

0 = (xk+1 − xk)|θj+ (sk+1 − sk)|θj

(θj+1 − θj) (5.42)

θj+1 = θj − (sk+1 − sk)−1 (xk+1 − xk)|θj

(5.43)

Here, θj denotes the value of the initial state at iteration j. xk denotes the value of thestate x at simulation time k for a given iteration.

The second method, recently employed by Makeev et al. [85] in the same capacity, uses shortbursts of simulation to determine whether or not the system is at a steady state. Clearly thismethod may be significantly faster than the first method, which requires a lengthy simula-tion. Additionally, employing the second method permits use of the approximate sensitivitywhich can be calculated inexpensively from simulation. We consider an example of the secondmethod in the next example.

83

Parameter ValueTotal catalyst sites 2002

k1 1.6k2 0.04k3 1.0× 10−4

k4 1.03× 10−3

k5 0.36k6 1.6× 10−2

Table 5.1: Parameters for the lattice-gas example.

5.4.1 Lattice-Gas Example

We consider the following lattice-gas reaction model [85]:

A + ∗ k1−→ A∗ (5.44a)

A∗ k2−→ A + ∗ (5.44b)

A∗+ B∗ k3−→ C + 2 ∗ (5.44c)

B2 + 2∗ k4−→ 2B∗ (5.44d)

C + ∗ k5−→ C∗ (5.44e)

C∗ k6−→ C + ∗ (5.44f)

All reactions are elementary as written. Parameters for these reactions are given in Table 5.1.Figure 5.6 plots the results for a dynamic simulation of the lattice-gas model and the

convergence of the steady-state search algorithm. For this example, the model response cor-responds to a limit cycle; therefore, the eigenvalues of the state sensitivity matrix sk+1 =∂xk+1/∂xT

k should contain values with absolute value greater than unity to reflect the unsta-ble nature of this steady state. The search algorithm finds a steady state within the region ofthis limit cycle with eigenvalues (calculated using the approximate sensitivity)

λ (sk+1) =[−2.58 −1.96 1.38× 10−16

]THence the approximate sensitivity indicates that the steady state is indeed unstable.

5.5 Conclusions

We have examined various methods of calculating sensitivities for the moments of the chemi-cal master equation, and explicitly derived methods for calculating the mean sensitivity. Exactsolution of the mean sensitivity requires solving the chemical master equation and its sensitiv-ity, a task that is infeasible for all but trivial systems. For more complex systems, the mean and

84

C

B

A

Time

Surf

ace

Spec

ies

2000180016001400120010008006004002000

400003500030000250002000015000100005000

0

C

B

A

Iteration

Surf

ace

Spec

ies

2019181716151413121110987654321

30000

25000

20000

15000

10000

5000

0

Figure 5.6: Results for the lattice-gas model: (a) dynamic response of the model from an emptylattice initial condition and (b) convergence of the steady-state search algorithm.

its sensitivity must be reconstructed from Monte Carlo simulations. If carefully implemented,finite differences can generate accurate sensitivities. However, the computational expense ofthis method scales linearly with the number of parameters, and is particularly burdensomefor computationally intensive Monte Carlo simulations. In contrast, employing a first-orderapproximation of the sensitivity permits inexpensive calculation of the mean sensitivity froma reconstruction of the mean.

Knowledge of model sensitivities permits execution of systems level tasks such as pa-rameter estimation, optimal control, and steady-state analysis. In these operations, highly-accurate sensitivities are not critical because optimization algorithms generally converge, al-beit more slowly, without exact gradients. For use in an optimization context, the efficientevaluation of the approximate sensitivity proposed in this chapter seems well suited.

85

Notation

a(n, t) vector of all reaction rates (ak(n)’s)ak(n) kth reaction ratecj the jth unit vectorek deviation between the predicted and actual measurement vectors at time tki a vector of onesn vector of the number of molecules for all reaction speciesn vector of the mean number of molecules for all reaction speciesnk kth Monte Carlo reconstruction of the vector n

P probabilitys sensitivity of the state x with respect to the parameters θ

s sensitivity of the mean n with respect to the parameters θ

sk kth Monte Carlo simulation reconstruction of the sensitivity s

t timetk discrete sampling timex state of the systemyk measurement vector at time tkz state vector x scaled by the characteristic system size Ωδ finite difference perturbationλ eigenvalueθ parameter vector for a given modelν stoichiometric matrixΠ covariance matrix for the measurement noiseφ objective function valueΩ characteristic system size

87

Chapter 6

Sensitivity Analysis of DiscreteMarkov Chain ModelsIn the previous chapter, we considered two approximations to the sensitivity equation: (1)finite differences, which offer inherently biased estimates of the sensitivity for significant com-putational expense, and (2) a first-order approximation to the sensitivity that required trivialcomputational expense. The second of these methods is analogous to the stochastic fluid mod-els currently proposed in the field of perturbation analysis [19, 167]. For use in the context ofunconstrained optimization, we demonstrated that both these approximations to the sensitiv-ities permit efficient optimization.

In this chapter, we consider methods for exactly calculating sensitivities for discreteMarkov chain models from solely simulation information. In general, a discrete Markov chainmodel provides simple rules for propagating the discrete state n forward in time, i.e.

P (nk+1) = P (nk+1|nk)P (nk) (6.1)

in which P (·) denotes the probability of (·), and nk refers to the state at time tk. Usually the ac-cessible state space is too large to permit computation of the entire probability distribution, sowe are forced to sample the distribution via Monte Carlo methods. These methods take advan-tage of the fact that any statistic can be written in terms of a large sample limit of observations,i.e.

h(n) ,∫

h(n)P (n, t)dn = limN→∞

1N

N∑i=1

h(ni) ≈ 1N

N∑i=1

h(ni) for N sufficiently large (6.2)

in which ni is the ith Monte Carlo reconstruction of the state n. The desired statistic can thenbe reconstructed to sufficient accuracy given a large enough number of observations.

Ultimately, we are interested in calculating sensitivities of expectations, i.e.

s =∂

∂θE[h(n; θ)] (6.3a)

= lim∆→0

E [h(n; θ + ∆)]− E [h(n; θ)]∆

(6.3b)

= lim∆→0

limN→∞

∑Ni=1 h(ni; θ + ∆)− h(ni; θ)

∆N(6.3c)

88

in which θ is a parameter of interest 1 For purposes of simulation, we must truncate thenumber of simulations N at some (hopefully) large, finite value. Then perhaps the easiestapproximation to solving equation (6.3) is to fix ∆ at some nonzero value and use, for example,a forward finite difference scheme:

s =∑N

i=1 h(ni; θ + ∆,Ωi1)− h(ni; θ,Ωi

2)∆N

+O(∆) +O(N−0.5)

∆(6.4)

Here, Ωi1 and Ωi

2 refer to the string of random numbers used in the ith simulation. As notedby Fu and Hu [37] in their introduction, we can reduce the variance of this estimate by takingΩi

1 = Ωi2, that is, by using the same seed for multiple simulations. However, finite difference

methods are inherently biased estimators due to the fact that the O(∆) term does not go tozero as N → ∞. Additionally, these methods can suffer tremendously from finite simulationerror. If h(n) can only be reconstructed to several significant figures, we must choose a largevalue for the perturbation ∆, causing the O(N−0.5)/∆ term to dominate the expression (6.3).

Alternatively, we could seek to derive unbiased sensitivity estimates from the simu-lated sample paths alone. Accordingly, we would like to be able to justify the interchangeof expectation and differentiation in equation (6.3). This particular problem has been well-characterized in the field of perturbation analysis; see, for example, Ho and Cao [63] andCassandras and Lafortune [18]. When n is discrete, it is clear that for any finite N we canalways choose a ∆+ > 0 such that

∑Ni=1 h(ni; θ + ∆)− h(ni; θ)

∆N= 0 if 0 < ∆ < ∆+ (6.5)

To overcome this problem, we must devise a means to make the sample paths continuousso that the exchange of expectation and differentiation is valid. In this chapter, we considersmoothing by both conditional expectation (smoothed perturbation analysis) and integration.

1This analysis can easily be extended to multiple parameters. We choose to examine a single parameter fornotational simplicity.

89

6.1 Smoothed Perturbation Analysis

Smoothed perturbation analysis or SPA “smooths” discrete sample paths by using conditionalexpectation. Choosing a characterization z for each simulated sample path, we see that

h(n; θ) =∑n

h(n; θ)P (n; θ) (6.6)

=∑n

∑z

h(n; θ)P (n, z; θ) (6.7)

=∑n

∑z

h(n; θ)P (n, z; θ)P (z)P (z)

(6.8)

=∑n

∑z

h(n; θ)P (n|z; θ)P (z) (6.9)

=∑

z

P (z)∑n

h(n; θ)P (n|z; θ) (6.10)

=∑

z

P (z)∑n

h(n; θ)P (n|z; θ) (6.11)

=∑

z

P (z)E[h(n; θ)|z; θ] (6.12)

E[h(n; θ)|z; θ] varies continuously with θ, therefore we can evaluate the desired sensitivity bydifferentiating both sides of equation (6.12)

∂

∂θh(n; θ) =

∂

∂θ

∑z

P (z)E[h(n; θ)|z; θ] (6.13)

s =∑

z

P (z)∂

∂θE[h(n; θ)|z; θ] (6.14)

=∑

z

P (z) lim∆→0

E[h(n; θ)|z, θ + ∆]− E[h(n; θ)|z, θ]∆

(6.15)

Because each Monte Carlo sample path determines the characterization z, equation (6.15) cor-responds to first differentiating the smoothed sample paths, then averaging the results. Werefer the interested reader to Fu and Hu [37] for the proofs of the unbiasedness of this estima-tor. The remaining questions are how to choose the characterization z, and how to evaluatethe conditional expectation E[h(n; θ)|z; θ]. We examine these issues further by consideringtwo motivating examples.

90

6.1.1 Coin Flip Example

We consider the example of flipping a coin. Define Sn to be the sum of n-independent flips, inwhich

Sn =n∑

j=1

xj (6.16)

P (X = 0) = θ (6.17)

P (X = 1) = 1− θ (6.18)

0 ≤ θ ≤ 1 (6.19)

Here, xj is the jth realization of the random variable X . It is straightforward to show that

E[Sn] =n∑

j=1

E[xj ] (6.20)

We are interested in calculating the sensitivity of Sn with respect to the parameter θ. It is easyto show that

E[Sn] =n∑

j=1

1− θ = n(1− θ) (6.21)

∂E[Sn]∂θ

= −n (6.22)

For the sake of illustration, we compute the SPA estimate for this process.We choose the characterization zi to be the ith outcome of the n flips given the nominal

parameter value θ, e.g. zi =xi

1(θ) = 0, . . . , xin(θ) = 1

. Then equation (6.20) becomes

E[Sn] =N∑

i=1

P (zi)E[Sin|zi] (6.23)

=1N

N∑i=1

E[Sin|zi] (6.24)

=1N

N∑i=1

n∑j=1

E[xij |zi]

(6.25)

We turn our attention towards calculating the quantity E[xij |zi]. Suppose that the jth flip

yields xj(θ) = 0. Because each of the n flips are independent, each xj depends only on the jthelement of the characterization z. Therefore, we must calculate the conditional probabilitiesP (xj(θ + ∆) = 0|xj(θ) = 0) and P (xj(θ + ∆) = 1|xj(θ) = 0). To do so, we use the fact that therandom variable X(θ) can be written in terms of the uniform distribution U

P (X(θ) = 0) = P (U < θ) (6.26)

91

Assuming that the parameter perturbation ∆ > 0 2, we evaluate the conditional probabilities:

P (xj(θ + ∆) = 0|xj(θ) = 0) = P (U < θ + ∆|U < θ) (6.27)

=P (U < θ + ∆, U < θ)

P (U < θ)(6.28)

= 1 (6.29)

P (xj(θ + ∆) = 1|xj(θ) = 0) = P (U > θ + ∆|U < θ) (6.30)

= 0 (6.31)

Then the desired conditional expectation is

∂E[xj(θ)|z]∂θ

= lim∆→0

P (xj(θ + ∆) = 0|xj(θ) = 0)− P (xj(θ) = 0|xj(θ) = 0)∆

(0− 0) (6.32)

+P (xj(θ + ∆) = 1|xj(θ) = 0)− P (xj(θ) = 0|xj(θ) = 0)

∆(1− 0) (6.33)

=0 (6.34)

Alternatively, we consider the case in which the jth flip yields xj = 1. Then the condi-tional probabilities are

P (xj(θ + ∆) = 0|xj(θ) = 1) = P (U < θ + ∆|U > θ) (6.35)

=P (θ < U < θ + ∆)

P (U > θ)(6.36)

=∆

1− θ(6.37)

P (xj(θ + ∆) = 1|xj(θ) = 1) = P (U > θ + ∆|U > θ) (6.38)

=P (U > θ + ∆, U > θ)

P (U > θ)(6.39)

=1− (θ + ∆)

1− θ(6.40)

and the desired conditional expectation is

∂E[xj(θ)|z]∂θ

= lim∆→0

P (xj(θ + ∆) = 0|xj(θ) = 1)− P (xj(θ) = 1|xj(θ) = 1)∆

(1− 0) (6.41)

+P (xj(θ + ∆) = 1|xj(θ) = 1)− P (xj(θ) = 1|xj(θ) = 1)

∆(1− 1) (6.42)

= lim∆→0

∆∆(1− θ)

(6.43)

=1

1− θ(6.44)

2We can also calculate the conditional expectations assuming that ∆ < 0.

92

Parameter Symbol ValueFinite difference perturbation ∆ 0.1

Number of simulations N 50Coin flip probability θ 0.25

Table 6.1: Parameters for the coin flip example

Monte CarloExact

Number of coin flips n

E[Sn]

1086420

8

7

6

5

4

3

2

1

0

Figure 6.1: Mean E[Sn] as a function of the number of coin flips n

From this analysis, it is clear that the only trials that impact the sensitivity are those in whichxj(θ) = 1. Our estimator of the sensitivity is then

∂E[Sn]∂θ

=1N

N∑i=1

n∑j=1

11− θ

1xi

j(θ)− 1 (6.45)

1 δ =

1 if δ = 00 otherwise

(6.46)

We now examine the numerical results for this process via simulation. We use theparameters given in Table 6.1. Figure 6.1 plots the average E[Sn] as a function of the numberof flips n. Figure 6.2 plots the exact, SPA, and finite difference sensitivities s as a function ofthe number of flips n. This figure illustrates that the SPA estimate varies significantly less thanthe finite difference estimate. In fact, the SPA sensitivity appears to have roughly the sameamount of error as the simulated estimate of the mean.

93

Finite DifferenceSPA

Exact

Number of coin flips n

∂E[Sn]∂θ

1086420

0

-2

-4

-6

-8

-10

-12

Figure 6.2: Mean sensitivity ∂E[Sn]∂θ as a function of the number of coin flips n

6.1.2 State-Dependent Simulation Example

In the previous example, each of the n flips are independent, and the probability for choosingheads or tails depended solely on the parameter θ. We now consider calculating the sensitivityfor a Markov chain in which the transition probabilities are state dependent, e.g.

P (nk+1|nk; θ) =

nk + ν1 if U < H1(nk; θ)nk + ν2 if U > H1(nk; θ)

(6.47)

nk =[Ak Bk

]T(6.48)

ν =[ν1 ν2

]=

[−1 11 −1

](6.49)

r1(nk; θ) =k1Ak

1 + θAk(6.50)

r2(nk; θ) = k2Bk (6.51)

Hj(nk; θ) =rj(nk; θ)∑2

k=1 rk(nk; θ)(6.52)

We consider a total of n discrete decisions. One characterization for this system is zi =n0, v

i1(θ), . . . , v

in(θ); namely, the initial state n0 and the string of discrete decisions vj(θ)’s

for the ith simulation. We note that the simulation uses random numbers to generate the dis-crete decisions (vj(θ)’s). Identifying the discrete decision by the vj(θ)’s is more conducive forcalculating sensitivities than the string of random numbers.

We turn our attention towards calculating the quantity E[ni1|zi]. Suppose that the first

94

2 410

xk

Perturbed path

Nominal path

3

Figure 6.3: Comparison of nominal and perturbed path for SPA analysis

0 1

H(n0, θ + ∆)

H(n0, θ)

U

0 1

H(n0, θ + ∆)

H(n0, θ)0

U

Figure 6.4: SPA analysis of the discrete decision. Given a positive perturbation ∆ > 0,H(n0, θ + ∆) < H(n0, θ). Therefore if decision v2 is chosen given the nominal parameterθ, no perturbed parameter can change the choice to v1.

decision yields v1(θ) = ν1. We again ask the same question: what if we had chosen ν2 insteadof ν1? Figure 6.3 illustrates this question, in which the a new perturbed path deviates from thenominal path. Therefore, we must calculate the conditional probabilities P (v1(n0, θ + ∆) =ν1|vj(n0, θ) = ν1,n0) and P (v1(n0, θ + ∆) = ν2|vj(n0, θ) = ν1,n0). To do so, we again use thefact that the random variable v1(n0, θ) can be written in terms of the uniform distribution U

P (v1(n0, θ) = ν1) = P (U < H1(n0, θ)) (6.53)

95

Assuming that the parameter perturbation ∆ > 0 3, we evaluate the conditional probabilities:

P (v0(θ + ∆,n0) = ν1|v0(θ,n0) = ν1,n0) = P (U < H1(θ + ∆,n0)|U < H1(θ,n0)) (6.54)

=P (U < H1(θ + ∆,n0), U < H1(θ,n0))

P (U < H1(θ,n0))(6.55)

=H1(θ + ∆,n0)H1(θ,n0)

(6.56)

P (v0(θ + ∆,n0) = ν2|v0(θ,n0) = ν1,n0) = P (U > H1(θ + ∆,n0)|U < H1(θ,n0)) (6.57)

=P (H1(θ + ∆,n0) < U < H1(θ,n0))

P (U < H1(θ,n0))(6.58)

=H1(θ,n0)−H1(θ + ∆,n0)

H1(θ,n0)(6.59)

We note that if ν2 is chosen, no perturbation ∆ > 0 could change the reaction; see Figure 6.4for an illustration of why. Defining

nk = n0 +k−1∑j=0

vj(θ,nj) (6.60)

nk = n0 +k−1∑j=0

vj(θ + ∆, nj) (6.61)

Then the desired conditional expectation is

∂E[n1(θ)|z]∂θ

= lim∆→0

P (v0(θ + ∆,n0) = ν1|v0(θ,n0) = ν1)∆

(n1 − n1) (6.62)

+P (v0(θ + ∆,n0) = ν2|v0(θ,n0) = ν1)

∆(n1 − n1) (6.63)

= lim∆→0

H1(θ,n0)−H1(θ + ∆,n0)∆H1(θ,n0)

(n1 − n1) (6.64)

=−∂∂θ [H1(θ,n0)]H1(θ,n0)

(ν2 − ν1) (6.65)

In general, we are interested in calculating the sensitivity of E[nj(θ)|z], i.e.

∂E[nj(θ)|z]∂θ

= lim∆→0

1N

N∑i=1

∑vj−1

· · ·∑v0

P (nj(θ + ∆)|zi(θ)∆

(nj − nj) (6.66)

so we must consider the probability P (nj(θ + ∆)|zi(θ)). Using the properties of conditional

3We can also calculate the conditional expectations assuming that ∆ < 0.

96

3 3 421

Nominal pathxk

Perturbed paths

Figure 6.5: Illustration of the branching nature of the perturbed path for SPA analysis

densities and Markov chains, we have

P (nj(θ + ∆)|zi(θ)) =∑v1

· · ·∑vj−1

P (vj , . . . , v1; θ + ∆|zi(θ)) (6.67)

=∑v1

· · ·∑vj−1

P (vj ; θ + ∆|zi(θ), vj−1, . . . , v1; θ + ∆) · · ·P (v1; θ + ∆|zi(θ))

(6.68)

in which we use the notation P (·; θ) to denote that the quantity P (·) is a function of the param-eter θ. It is clear that this process branches at every discrete decision, as shown in Figure 6.5,and that we must follow each of these branches with nonzero weight throughout the durationof the simulation.

Figures 6.6 and 6.7 plot the mean and sensitivity comparison for this example. TheSPA estimate demonstrates superior reconstruction of the sensitivity in comparison to finitedifferences, albeit at a greater computational expense. Surprisingly, though, the SPA estimatefor each sample path did not require tracking perturbed paths for all possible state combina-tions due to the coalescing of many perturbed paths. However, we do not expect this featureto hold for more models that span larger dimensions, particularly those that include morediscrete decisions of the form

P (nk+1|nk; θ) =

nk + ν1 if U < H1(nk; θ)nk + ν2 if H1(nk; θ) < U ≤ H2(nk; θ)

...nk + νm if U > Hm−1(nk; θ)

(6.69)

Accordingly, we could consider using a particle filter to track the perturbed paths.

97

Monte CarloExact

Number of decisions k

E[xk]

1086420

20

15

10

5

0

Figure 6.6: Mean E[nk] as a function of the number of decisions k

Finite DifferenceSPA

Exact

∂E[xk]∂θ

Number of decisions k1086420

5

4

3

2

1

0

-1

-2

-3

-4

-5

Figure 6.7: Mean sensitivity ∂E[nk]∂θ as a function of the number of decisions k

6.2 Smoothing by Integration

In some cases, the SPA estimate is not easily calculable. Consequently, we are interested insimpler means of calculating sensitivities. As noted in the previous section, conditional expec-tation provides one means of “smoothing” discrete sample paths. In some sense, expectationmay be viewed as an integration over the state space. For some systems, it may be more

98

advantageous to integrate over a variable other than the state space.In this section, we consider the simple example of a state-dependent timing event with

no discrete decisions. In particular, we address calculation of sensitivities for stochastic chemi-cal kinetics given only one reaction. In this case, infinitesimal perturbation changes effect onlywhen the single reaction occurs. To begin this analysis, we examine the first possible reaction.We can then define the discrete state n as

n(t;θ) =

n0, t0 ≤ t < t0 + τ1

n + ν, t ≥ t0 + τ1(6.70)

τ1 = − log(p1)rtot(n0;θ)

(6.71)

in which t0 is the initial time, n0 is the initial state, t0+τ1 is the next reaction time, νT is the sto-ichiometric matrix, ant θ is a vector of parameters. We can write equation (6.70) alternativelyas

n(t;θ) = n0 + ν

∫ t−t0

0δ(t′ − τ1(n0;θ)

)dt′ (6.72)

The smoothing trick that we apply here is to define an integrated sensitivity sI in terms of thestate integrated with respect to time, i.e.

sI ,∂

∂θT

(∫ t

t0

n(t′;θ)dt′)

(6.73)

Integrating equation (6.72) with respect to time yields∫ t

t0

n(t∗;θ)dt∗ =∫ t

t0

(n0 + ν

∫ t∗−t0

0δ(t′ − τ1(n0;θ)

)dt′

)dt∗ (6.74)

We can differentiate equation (6.74) with respect to the parameters θ to yield

sI(t;θ) = sI0 + ν

τ1rtot(n0;θ)

∫ t−t0

0δ(t′ − τ1(θ)

)dt′ (6.75)

We can similarly show for an arbitrary number of µ reactions that∫ t

t0

n(t∗;θ)dt∗ =∫ t

t0

n0 + ν

∫ t∗−t0

0

µ∑j=1

δ(t′ − τj(n0 + (j − 1)ν;θ)

)dt′

dt∗ (6.76)

sI(t;θ) = sI0 + ν

∫ t−t0

0

µ∑j=1

δ(t′ − τj(n0 + (j − 1)ν;θ)

)dt′ (6.77)

Convergence of sI to the integral of the mean sensitivity follows from the law of large num-bers.

We consider a simple example to illustrate this technique. The single reaction is

2A k−→ B (6.78)

99

SimulationExact (a)

Time

x

1086420

20

15

10

5

0

SimulationExact

(b)

Time

sI

1086420

200

100

0

-100

-200

-300

-400

Figure 6.8: Comparison of the exact and simulated (a) mean and (b) mean integrated sensitiv-ity for the irreversible reaction 2A→ B.

with the reaction elementary as written. The initial condition is nA = 20 and nB = 0 molecules,and the parameter value is k = 1/15. We solve for the mean and its integrated sensitivity bothexactly (via solution of the master equation) and by Monte Carlo reconstruction. For the lattercase, we average fifty simulations to reconstruct the mean behavior and apply equation (6.77)to evaluate the integrated sensitivity. Figure 6.8 presents the results for this case, and demon-strates excellent agreement between the exact and reconstructed values for both the mean andthe integrated sensitivity.

In general, we require values for the sensitivity rather than the integrated sensitivity.There are numerous possibilities for deriving this quantity. For example, a polynomial canbe fitted through the integrated sensitivity. The derivative of this fitted polynomial wouldthen provide an estimate for the desired sensitivity. As seen in Figure 6.8 (b), however, thereconstructed integrated sensitivity can be noisy. Therefore, we recommend against low-orderdifferencing of the integrated sensitivity due to the fact that such differencing amplifies noise.

100

6.3 Sensitivity Calculation for Stochastic Chemical Kinetics

Thus far, we have considered calculation of sensitivities for first the discrete-time case (onlychoosing from a finite number of discrete events for the next reaction), and the time-dependentcase with no discrete event selection. The stochastic chemical kinetics problem, however, is acombination of both time-dependent and discrete events. We envision that this problem couldbe addressed using the tools presented in previous sections, namely smoothed perturbationanalysis (for the “which” reaction choice) and smoothing by integration (for the timing of thereaction). The problematic part for this particular problem, however, is really in implementingSPA. Because time is continuous, the selection of discrete events does not have the propertyof occurring at the same time for every simulation, as was the case in the discrete-time case.Hence there is no fortuitous coalescing of perturbed paths; in fact, one must nominally trackevery generated perturbed path to obtain the SPA estimate. Such a task seems unreasonablecomputationally for all but the simplest models. One potential means around this problem isto bound the computational expense by tracking only the paths that contribute the most to theSPA estimate. However, the problem of continuous time again appears because the perturbedpaths are potentially at different time points in the simulation, making comparison of thesepaths difficult.

6.4 Conclusions and Future Directions

This chapter explored methods for solving the sensitivity of moments of the master equationfrom simulation via smoothing. We first examined smoothing by conditional expectation,or smoothed perturbation analysis, to address the case of sensitivities for time-independent,discrete event systems. We then applied smoothing by time integration to account for theeffect of parameters on the timing of continuous events. Finally, we briefly examined how onemight apply these two methods to evaluate sensitivities for stochastic chemical kinetics. As ofthe writing of this thesis, we do not know of a satisfactory method for efficiently evaluatingunbiased estimates for the sensitivity of moments of the discrete master equation. Thus weare forced to conclude that the best options for evaluating these sensitivities are either theapproximation proposed previously in Chapter 5, or finite differences. We can speculate onpossible methods for doing so as presented next.

We speculate that directly solving the companion sensitivity equation to the masterequation may offer some hope for calculation of unbiased sensitivities. Considering all theindividual probabilities and their respective sensitivities, i.e.

P (n;θ) =

P (n0;θ)P (n1;θ)

...

(6.79)

S(n;θ) =∂P (n;θ)∂θT

(6.80)

101

We can write the evolution equations for the master equation and its sensitivity as linear sys-tems

dP (n;θ)dt

= A(θ)P (n;θ) (6.81)

dS(n;θ)dt

= A(θ)S(n;θ) + J(θ)P (n;θ) (6.82)

J(θ) =∂A(θ)∂θ

(6.83)

Integrating equation (6.82) with respect to time yields the convolution integral

S(n;θ) = eAtS(n;θ)0 +∫ t

0eA(t−t′)J(θ)P (n, t′;θ)dt′ (6.84)

The primary drawback to this method is that the sensitivity equation (6.84) has the same largedimensionality as the master equation, the same problem that forced us to use simulation tosolve the master equation. We can attempt to solve the sensitivity equation using simulation,but this method also suffers several drawbacks. First, the sensitivity is not a probability distri-bution, so we must recast the problem into a form conducive for solution by simulation. Evenafter doing so, solving for the sensitivity requires knowledge of the probability distribution,which is presumably reconstructed from simulation. Hence even if we could exactly solve forthe sensitivity, the result would only be as accurate as the reconstructed probability density.The primary appeal of this method, though, is that the simulations used to reconstruct theprobability density can also be used to evaluate the sensitivity, i.e. the convolution integral inequation (6.84). If we can efficiently store and retrieve this information, solving the sensitivityequation would require little or no additional simulation.

Notation

Hj jth transition probabilityn discrete state vectorN number of simulationsP probabilitypj jth random numberP vector of probabilitiesrj jth transition rateSn sum of n-independent coin flipsS matrix of probability sensitivity vectorssI time-integrated sensitivityt timeU uniform distributionvj jth discrete decisionX random variable

102

xj jth realization of the random variable Xz characterization of a simulated trajectory∆ finite difference perturbationν stoichiometric matrixτ next reaction timeθ parameterθ vector of parametersΩ random number string used for simulation

103

Chapter 7

Sensitivity Analysis of StochasticDifferential Equation ModelsThe purpose of this chapter is to develop and present methods for using stochastic differentialequation models for purposes other than pure simulation. As a simulation tool, these types ofmodels are becoming an increasingly popular method for introducing science and engineeringstudents and researchers to the molecular world in which random fluctuations are an impor-tant physical phenomena to be captured in the model. If we consider systems levels tasks, suchas parameter estimation, model-based feedback control, and process and product design, werequire a different set of tools than those required for pure simulation. Many systems leveltasks are conveniently posed as optimization problems, and brute force optimization of thesehighly “noisy” simulation models either fails outright or is so time consuming that the entireexercise becomes tedious and frustrating.

Simply attaching an optimization method to a stochastic simulation model is ineffi-cient if we do not consider the engineering task that might come later when the simulationis created. We propose adding a small piece of code to the stochastic simulation that exactlycomputes the sensitivity of the trajectory to all model parameters of interest. These parametersmay be kinetic parameters to be estimated from data or control decisions used to control thedynamic or steady-state behavior of the system.

Sensitivity analysis of stochastic differential equations (SDEs) is by no means a newconcept. To the best of our knowledge, Dacol and Rabitz [23] first proposed such an analy-sis. These authors suggested using a Green’s function approach to solve for the sensitivityof moments of the underlying probability distribution. In this chapter, we propose differenti-ating simulated sample paths directly to calculate the same sensitivities. We first review themaster equation of interest and define the sensitivity of moments of this equation with re-spect to model parameters. Next we propose and compare several methods for calculatingthese sensitivities with an eye on computational efficiency. Finally, we illustrate how to usethe sensitivities for calculating parameter estimates, computing steady states, and computingquantities for polymer models.

104

7.1 The Master Equation

We consider the following master (Fokker-Planck) equation:

∂P (x, t;θ)∂t

= −l∑

i=1

∂

∂xi(Ai(x;θ)P (x, t;θ)) +

12

l∑i=1

l∑j=1

∂2

∂xi∂xj

(Bij(x;θ)2P (x, t;θ)

)(7.1)

in which x is the state vector for the system, θ is the vector of parameters, t is time, P (x, t;θ)is the probability distribution function, Ai denotes the ith element of the vector A, and Bij

denotes the (i, j)th element of the matrix B. Many different boundary conditions are possiblefor this system (see, for example, Gardiner [41]); for this chapter, we use reflecting boundaryconditions of the form

Ai(x;θ)P (x, t;θ) +12

l∑i=1

l∑j=1

∂

∂xj


)= 0 (7.2)

unless specified otherwise. Defining the sensitivity S(x, t;θ) as

S(x, t;θ) ,∂P (x, t;θ)

∂θ(7.3)

we can differentiate equation (7.1) with respect to θ to obtain the sensitivity evolution equation

∂

∂θ

∂P (x, t;θ)∂t

=∂

∂θ

− l∑i=1

∂

∂xi(Ai(x;θ)P (x, t;θ)) +

12

l∑i=1

l∑j=1

∂2

∂xi∂xj


)(7.4)

∂S(x, t;θ)∂t

=−l∑

i=1

∂

∂xi

(∂Ai(x;θ)

∂θP (x, t;θ) + Ai(x;θ)S(x, t;θ)

)

+12

l∑i=1

l∑j=1

∂2

∂xi∂xj

(∂Bij(x;θ)

∂θ2Bij(x;θ)P (x, t;θ) + Bij(x;θ)2S(x, t;θ)

)(7.5)

Clearly solution of equation (7.5) requires the solution of equation (7.1), but not vice-versa.In general, we are interested in moments of the probability distribution, i.e.

g(x) =∫

xg(ω)P (ω, t;θ)dω (7.6)

in which g(x) and g(x) are vectors. For example, we might seek to implement control movesthat drive the mean system behavior towards a desired set point. Such tasks require knowledgeof how sensitive these moments are with respect to the parameters. The master equation (7.1)indicates that the probability distribution evolves continuously with time; consequently, mo-ments of this distribution (assuming that they are well defined) evolve continuously as well.

105

Therefore we can simply differentiate equation (7.6) with respect to the parameters to definethe sensitivity of these moments, s(g(x)), as follows:

∂

∂θTg(x) =

∂

∂θT

∫x

g(ω)P (ω, t;θ)dω (7.7)

s(g(x), t; θ) =∫

xg(x)S(x, t;θ)T (7.8)

Here, s(g(x), t;θ) is a matrix. Equation (7.8) indicates that these sensitivities depend upon thesensitivity of the master equation, S(n, t; θ). Therefore, the exact solution of s(g(x)) requiressimultaneous solution of equations (7.1), (7.5), and (7.8).

As opposed to exactly solving for the desired moments of the master equation, we canreconstruct these moments via simulation. The master equation (7.1) has Ito solution of theform

dxi = Ai(x;θ)dt+l∑

j=1

Bij(x;θ)dW j (7.9)

in which W is a vector of Wiener processes. We can simulate trajectories of equation (7.9) byusing, for example, an Euler scheme [40], then tabulate this trajectory information to recon-struct the desired moments (7.6) by applying the law of large numbers

g(x) =∫

xg(ω)P (ω, t;θ)dω = lim

N→∞

1N

N∑i=1

g(xi) ≈ 1N

N∑i=1

g(xi) for finite N (7.10)

in which N is the number of simulated trajectories and xi is the value of the state for the ithsimulation. Logically, then, we could also attempt to reconstruct the sensitivities from thesimulated sample paths alone. This analysis requires some care; in particular, we must justifyinterchanging the operators of expectation and differentiation, i.e.

lim∆→0

E[g(xi;θ + ∆,Ω1)]− E[g(xi;θ,Ω2)]∆

= E

[lim∆→0

g(xi;θ + ∆,Ω1)− g(xi;θ,Ω2)∆

](7.11)

in which E(·) denotes the expectation operator and Ω1 and Ω2 refer to the random numbersused to generate the desired expectations. Because the individual sample paths are continu-ous, the interchange is justifiable and we can merely differentiate equation (7.9) with respectto θ

∂

∂θdxi =

∂

∂θ

Ai(x;θ)dt+l∑

j=1

Bij(x;θ)dW j

(7.12)

dsi =∂Ai(x;θ)

∂xsi +

∂Ai(x;θ)∂θ

dt+l∑

j=1

(∂Bij(x;θ)

∂xsi +

∂Bij(x;θ)∂θ

)dW j (7.13)

106

in which si is defined as

si ,∂xi

∂θ(7.14)

Consequently, we can evaluate the desired moments and sensitivities of these moments by si-multaneously evaluating equations (7.9) and (7.13). Additionally, we have the choice of usingeither the same strings of random numbers for evaluation of equation (7.11), i.e. Ω1 = Ω2, ordifferent strings of random numbers. The former case corresponds to differentiating individ-ual sample paths and consequently using the same values for the state and Brownian incrementsto evaluate the desired moments and their sensitivities. This subtle distinction actually resultsin dramatic differences in the evaluated sensitivities as pointed out by Fu and Hu [37]. Weillustrate this point in the examples.

7.2 Sensitivity Examples

We now consider two motivating examples comparing parametric and finite difference sensi-tivities. The first example is a single, reversible reaction that demonstrates the accuracy of theparametric and finite difference sensitivities. The second example consists of the Oregonatorreactions and illustrates the superiority of the parametric sensitivity over finite differences.

7.2.1 Simple Reversible Reaction

We consider the reversible reaction

2Ak1−−

k−1

B ε = 0.5k1cA(cA − 1)− k2cB (7.15)

in which ε denotes the extent of reaction. Parameter values for this example are given in Ta-ble 7.1. We solve the master equation (7.1) and its sensitivity (7.5) for the extent of reaction ε byusing finite differences to discretizing the ε dimension (∆ε = 2), then using DASKR (a variantof the package DASPK [15]) to integrate the resulting system of differential-algebraic equa-tions. We also use simulation to evaluate the mean of the stochastic differential equation (7.9)and its sensitivity (7.13). Here, we use a first-order Euler integration with a time increment of∆t = 10−2. We reconstruct the mean with ten simulations.

Figure 7.1 compares the mean results for the master equation, parametric sensitivity,and finite difference sensitivity. For this figure, we have chosen a central finite differencescheme with a perturbation of 10−8 of the parameter value. Figure 7.1 (a) demonstrates thatten simulations yield a reasonable approximation to the mean. Figures 7.1 (b) and (c) illustratethat the parametric and finite difference mean sensitivities yield indistinguishable results (tothe scale of the graph), and that these results are similar to those of the master equation.

Figure 7.2 compares the mean sensitivity of parameter k2 for the master equation andfinite difference sensitivity. Rather than use the same random numbers for evaluation of thesensitivity, we evaluate the perturbed expectations using different strings of random numbers(i.e. Ω1 6= Ω2 in equation (7.11)). Figure 7.2 presents these results for a parameter perturbation

107

of 10%. The finite difference result is substantially noisier than when using the same stringsof random numbers for each perturbation (e.g. Figure 7.1 (b)). The noise directly results fromthe error due to the finite number of simulations used to reconstruct the mean. In fact, thefinite simulation error completely swamps the sensitivity calculation when using a parameterperturbation of 1% or less. This result underscores the importance of differentiating the sampletrajectories to obtain sensitivity information.

Parameter Valuek1 1/450k2 1/30

P (cA = 150, cB = 25, t = 0) 1

Table 7.1: Parameter values for the simple reversible reaction.

7.2.2 Oregonator

We now consider the Oregonator system of reactions [32]

W + B k1−→ A (7.16)

A + B k2−→ X (7.17)

Y + A k3−→ 2A + C (7.18)

2A k4−→ Z (7.19)

C k5−→ B (7.20)

Reactions are elementary as written. Parameters for this system are given in Table 7.2. Itis assumed that concentrations of species W and Y remain constant throughout the reaction.Additionally, we track only species A, B, and C since species X and Z are products.

We use simulation to evaluate a single trajectory of the stochastic differential equa-tion (7.9) and its sensitivity (7.13). Here, we use a first-order Euler integration with a timeincrement of ∆t = 10−3. The initial condition is P (cA = 500, cB = 1000, cC = 2000, t0) = 1.

Figure 7.3 presents the results for this example. Figure 7.3 (a) demonstrates that forthe given set of parameters, these reactions yield a stable, oscillatory response. AlthoughFigure 7.3 (b) shows good visual agreement between the parametric and finite difference sen-sitivities, plot (c) clearly shows that the difference between these two sensitivities is actuallyincreasing with time even though the finite difference perturbation is small (10−8 of the pa-rameter k1).

108

SimulationExact

(a)

B

A

Time

Num

ber

ofm

olec

ules

1086420

160140120100

80604020

FDParametric

Exact

(b)B

A

Time

s k1×

10−

3

1086420

10

5

0

-5

-10

-15

-20

FDParametric

Exact

(c)

B

A

Time

s k2

1086420

500400300200100

0-100-200-300

Figure 7.1: Results for the simple reversible reaction: (a) comparison of the exact and recon-structed mean (by simulation); (b) comparison of the exact, parametric, and finite difference(FD) sensitivities for parameter k1; and (c) comparison of the exact, parametric, and finite dif-ference (FD) sensitivities for parameter k2. Here, the finite difference perturbation is 10−8 ofeach parameter.

109

FDExact

B

A

Time

s k1×

10−

3

1086420

151050

-5-10-15-20-25

Figure 7.2: Results for the simple reversible reaction: comparison of the exact and finite differ-ence (FD) sensitivities for parameter k1 using different random numbers for each finite differ-ence expectation. Here, the finite difference perturbation is 10−1 of the parameter k1.

Parameter Valuek1cW 2k2 0.1k3cY 104k4 0.016k5 26

Table 7.2: Parameter values for the Oregonator system of reactions.

7.3 Applications of Parametric Sensitivities

We now turn our attention to applications of parametric sensitivities. We first consider estimat-ing parameters for the simple, reversible reaction of section 7.2.1. We then perform steady-stateanalysis for the Oregonator reactions of section 7.2.2. Finally, we use parametric sensitivitiesto evaluate the viscosity of a simple dumbbell model.

7.3.1 Parameter Estimation

The goal of parameter estimation is to determine the set of parameters that best reconciles themeasurements with model predictions. The classical approach is to assume that measurementsare corrupted by normally distributed noise. Accordingly, we calculate the optimal parameters

110

cCcBcA

Time

Con

cent

rati

on×

10−

3

543210

109876543210

cCcBcA

Time

Sens

itiv

ity×

10−

4

543210

20151050

-5-10-15-20

cCcBcA

Time

∆Se

nsit

ivit

y×

103

543210

43210

-1-2-3-4

Figure 7.3: Results for one trajectory of the Oregonator cyclical reactions: (a) simulated trajec-tory, (b) parametric and finite difference (FD) sensitivity for parameter k1, and (c) differencebetween the parametric and finite difference sensitivities. Here, the finite difference perturba-tion is 10−8 of the parameter.

111

via the least squares optimization

minθ

Φ =12

∑k


s.t.: xk+1 = F (xk;θ) (7.21b)

ek = yk − h(xk) (7.21c)

in which Φ is the objective function value, ek’s denote the difference between the measure-ments yk and the model predictions h(xk), and Π is the covariance matrix for the measure-ment noise. For the optimal set of parameters, the gradient ∇θΦ is zero. We can numericallyevaluate the gradient according to

∇θΦ =∂

∂θT

12

∑k

eTk Π−1ek (7.22)

= −∑

k

(∂h(xk)∂xT

k

∂xk

∂θT

)T

Π−1ek (7.23)

= −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1ek (7.24)

Equation (7.24) indicates that the gradient depends upon sk, the sensitivity of the state withrespect to the parameters.

In general, most experiments do not include many replicates due to cost and time con-straints. Therefore, the best experimental data we are likely to obtain is the average. In fittingthese data to stochastic models governed by the master equation, we accordingly choose themean x as the the state of interest. Monte Carlo simulation and parametric sensitivities pro-vide estimates of the mean and its sensitivity.

For the sake of illustration, we obtain optimal parameter estimates using an uncon-strained, line-search optimization with BFGS Hessian update; for further details on this method,we refer the interested reader to Nocedal and Wright [97]. Here, we provide the optimizer withboth the objective function and the gradient given in equations (7.21) and (7.24), respectively.Although the Monte Carlo reconstruction of the mean is nominally “stochastic”, by reusing thesame string of random numbers for every optimization iteration the objective function given inequation (7.21) becomes, in a sense, deterministic. Additionally, the objective function is con-tinuous with respect to the parameters. Some care must be taken to ensure that the string ofrandom numbers used by the optimization gives a representative reconstruction of the mean(recall that the finite number of simulations introduces some error in the reconstructed mean).Practically, this condition can be checked by optimizing the model with several different ran-dom number strings.

We reconsider the simple reversible reaction of section 7.2.1. We assume that we canmeasure the average amount of cB with a sampling time of ∆t = 0.2. “Experimental” dataare generated using the parameters given in section 7.2.1 with the exception that one hundred

112

(a)

Time

Mea

sure

men

tnB

1086420

7570656055504540353025

(b)k2

k1

Iteration

Para

met

er

1614121086420

-1

-1.5

-2

-2.5

-3

-3.5

Figure 7.4: Results for parameter estimation of the simple reversible reaction example: (a)comparison of the “experimental” (points) and predicted (line) measurements and (b) con-vergence of the optimized parameters (dashed lines) to the true values (solid lines) for theproposed scheme.

simulations are used to generate the mean behavior. For the parameter estimation, we attemptto estimate both log base ten values of k1 and k2 using a different seed for the random num-ber generator than that used to generate the experimental data. We estimate log10 values toprevent both numerical conditioning problems and negative estimates of the rate constants.Figure 7.4 presents the results of this estimation. The experimental and predicted measure-ments agree well, and the parameter estimates quickly converge to close to the true values.The offset between the estimated and true parameters is expected due to the finite simulationerror since different seeds are used to generate the experimental and predicted measurements.

To determine the accuracy of the optimization, we analyze both the gradient and theHessian of the objective function. Differentiating the gradient (i.e. equation (7.24)) yields the

113

Hessian ∇θθΦ

∇θθΦ =∂

∂θT

(−∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1ek

)(7.25)

= −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1∂h(xk)∂xT

k

sk +(∂h(xk)∂xT

k

∂2xk

∂θk∂θTk

)T

Π−1ek (7.26)

Making the usual Gauss-Newton approximation for the Hessian (i.e. ek ≈ 0), we obtain

∇θθΦ ≈ −∑

k

(∂h(xk)∂xT

k

sk

)T

Π−1∂h(xk)∂xT

k

sk (7.27)

For this optimization, the values of the gradient and the approximate Hessian are

∇θΦ =[1.03× 10−5 −4.83× 10−6

](7.28)

∇θθΦ =

[1.36× 105 −3.37× 104

−3.37× 104 1.09× 104

](7.29)

Examining the eigenvalue/eigenvector (λ/ν) decomposition of the Hessian yields

λ1 = 2.40× 103, ν1 =[−0.245 −0.969

](7.30)

λ2 = 1.44× 105, ν2 =[−0.969 0.245

](7.31)

Because the gradient is reasonably small and the Hessian is positive definite (eigenvalues arepositive), we conclude that the optimizer has indeed converged to a local minimum.

7.3.2 Calculating Steady States

Exact determination of steady states requires solving for the stationary state of the masterequation (7.1). The difficulty of this task is comparable to that of solving the dynamic response.In this section, we use the result from section 5.4 which allows us to determine stationarypoints for moments of the underlying probability distribution given short bursts of simulation.The difference in the analysis presented here is that (1) the considered master equation is ofthe Fokker-Planck type, i.e. equation (7.1), and (2) sensitivities of the simulated momentscan be determined exactly. We now apply this method to the Oregonator system of reactionspreviously presented in section 7.2.2.

For the steady-state calculation, we calculate the evolution of the mean using a shortburst of simulation (∆ss = 10−2). We use an Euler integration with time increment ∆t = 10−3

to evaluate ∆ss. One hundred simulations are used to reconstruct the mean. Figure 7.5 presentsthe convergence of the steady-state calculation per completed Newton iteration. The majority

114

C

BA

Iteration

x×

10−

3

10987654321

4.54

3.53

2.52

1.51

0.50

Figure 7.5: Results for steady-state analysis of the Oregonator reaction example: estimatedstate per Newton iteration.

of the convergence occurs within the first five iterations. The calculated mean and sensitivityare

x =[491.7 1008.6 1979.9

]T(7.32)

s =

1.031 −0.344 −0.036−0.654 0.748 0.1790.838 −0.146 0.780

(7.33)

Analyzing the eigenvalues of the mean sensitivity yields

λ =[0.341 1.11 + 0.065i 1.11− 0.065i

](7.34)

which indicates by linear stability analysis (see Chen [21] for further details) that the steadystate is unstable, as expected.

7.3.3 Simple Dumbbell Model of a Polymer in Solution

We now consider calculation of the zero-shear viscosity for a simple dumbbell model of apolymer molecule in solution. For this model, two dumbbells are connected by a Hookeanspring. We track the coordinates of each dumbbell, in which

dx1 =(H

ζ(x2 − x1) + (∇v)T x1

)dt+

√2DdW 1 (7.35a)

dx2 =(−Hζ

(x2 − x1) + (∇v)T x2

)dt+

√2DdW 2 (7.35b)

∇v =

0 0 0γ 0 00 0 0

(7.35c)

115

in which x1 and x2 are the Cartesian coordinates of each dumbbell, H is the spring constant,ζ is the friction coefficient, v is the velocity field, D is the diffusivity of each bead, and W is avector of Wiener processes. The stress τ is defined as

τ =< HqqT > −nkTδ (7.36)

in which< · > denotes the expectation operator and q = x1−x2. For this system, the viscosityη is

η =∂τ 12

∂γ

∣∣∣∣γ=0

(7.37)

Defining γ as the parameter of interest, the viscosity η clearly becomes a function of the sensi-tivities

s1 =∂x1

∂γ(7.38)

s2 =∂x2

∂γ(7.39)

Parameter Symbol ValueFriction coefficient ζ 1.

Diffusivity D 10−4

Spring constant H 1.

Table 7.3: Parameters for the simple dumbbell model.

Quantity Symbol ValueAnalytical viscosity η 2.5× 10−5

Estimated viscosity ηe 2.43× 10−5 ± 3.90× 10−6

Table 7.4: Results for the simple dumbbell model. Standard deviation calculated by group-ing the simulation results into groups of ten, then determining the standard deviation of theresulting ten averages.

We simulate equation (7.35) using an Euler discretization with time increment of ∆t =10−2. The expectation < HqqT > is calculated by averaging the time courses of one hundredsimulations with a time period of 10. Parameters for the model are given in Table 7.3. For thissimple example, the viscosity can be calculated exactly as

η =Dζ2

4H(7.40)

Table 7.4 presents the results of this simulation. The viscosity calculated using parametricsensitivities compares favorably to the exact value.

116

7.4 Conclusions

We have proposed differentiating simulated sample paths to obtain parametric sensitivities formodels consisting of stochastic differential equations. The sensitivity equations are evaluatedsimultaneously with the model equations to yield accurate, first-order information about thesimulated trajectories. Two simple examples demonstrated the accuracy of this technique incomparison to both finite differences and the solution of the underlying master equation andits sensitivity. These results underscore the importance of differencing each simulated trajec-tory rather than trajectories generated using different strings of random numbers. However,we observed little difference between the accuracy of parametric and finite difference sensi-tivities. Additionally, we have demonstrated how these sensitivities can be used to performsystems-level tasks for this class of models. The examples included using nonlinear optimiza-tion to estimate parameters, performing steady-state analysis, and evaluating derivatives forpolymer models efficiently. We expect these tools to prove useful in a wide range of applica-tions, from more complex polymer models to financial models.

Notation

cj concentration of species jD diffusivityek difference vector between the measurements yk and the model predictions h(xk) at time

tk

g(x) moment of the probability distributionH spring constanth(xk) model-predicted measurement vector at time tkkj rate constant for the jth reactionN number of simulated trajectoriesP (x, t;θ) probability distribution functionq distance vector for the dumbbell modelS(x, t;θ) sensitivity of the probability distribution functions(g(x)) sensitivity of a moment of the probability distributions sensitivity of x for a simulated trajectoryt timeW vector of Wiener processesx state vectorxi value of the state x for the ith simulationyk measurement vector at time tkε extent of reactionη viscosityηe estimated viscosityλ eigenvalueν eigenvector

117

Φ objective function valueΠ covariance matrix for the measurement noiseζ friction coefficientτ shear matrixθ vector of model parametersΩ random number string used for simulation

119

Chapter 8

Stochastic Simulation of ParticulateSystems 1

The stochastic chemical kinetics approach provides one method of formulating the stochasticcrystallization population balance equation (PBE). In this formulation, crystal nucleation andgrowth are modeled as sequential additions of solubilized ions or molecules (units) to eitherother units or an assembly of any number of units. Monte Carlo methods provide one meansof solving this problem. In this chapter, we assess the limitations of such methods by both(1) simulating models for isothermal and nonisothermal size-independent nucleation, growthand agglomeration; and (2) performing parameter estimation using these models. We also de-rive the macroscopic (deterministic) PBE from the stochastic formulation, and compare the nu-merical solutions of the stochastic and deterministic PBEs. The results demonstrate that evenas we approach the thermodynamic limit, in which the deterministic model becomes valid,stochastic simulation provides a general, flexible solution technique for examining many pos-sible mechanisms. Thus the stochastic simulation permits the user to focus more on modelingissues as opposed to solution techniques.

8.1 Introduction

Both deterministic and stochastic frameworks have been used to describe the time evolution ofa population of particles. The classical deterministic framework consists of coupled popula-tion, mass, and energy balances which describe crystal nucleation, growth, agglomeration,and breakage as smooth, continuous processes. Randolph and Larson [110], Hulburt andKatz [65], and Ramkrishna and Borwanker [108, 109] have extensively studied the analysisand treatment of the deterministic population balance equation (PBE) to these crystal forma-tion mechanisms. Hulburt and Katz [65] made a seminal contribution in which they develop apopulation balance that includes an arbitrary number of characteristic variables. They use themethod of moments to solve the PBE for a variety of applications such as modeling systemswith one or two length dimensions, and modeling agglomerating systems. Ramkrishna [107]

1Portions of this chapter to appear in Haseltine, Patience, and Rawlings [56].

120

provides an excellent summary of techniques used to solve the deterministic balances for mod-els in a single distributed dimension. Ma, Braatz, and Tafti [84] apply high resolution methodsto solve the deterministic balances with two characteristic length scales.

If the population is large, single microscopic events such as incorporation of growthunits into a crystal lattice and biparticle collisions are not significant. Microscopic events tendto occur on short time scales relative to those required to make a significant change in themacroscopic particle size density (PSD). If fluctuations about the average PSD are large, thenthe deterministic PBE is no longer valid. Large fluctuations about the average density occurwhen the population modeled is small. Examples of small populations in particulate sys-tems in which fluctuations are significant include such varied applications as aggregation ofplatelets and neutrophils in flow fields, growth and aggregation of proteins, and aggregationof cell mixtures [79]. The deterministic PBE is also not valid in modeling precipitation reac-tions in micelles in which the micelles act as micro-scale reactors containing a small populationof fine particles [86, 8].

In contrast to the deterministic framework, the stochastic framework models crystalnucleation, growth and agglomeration as random, discrete processes. Ramkrishna and Bor-wanker [108, 109] introduce the stochastic framework to modeling particulate processes. Theauthors show that the deterministic PBE is one of an infinite sequence of equations, calledproduct densities, that describe the mean behavior and fluctuations about the mean behav-ior of the PSD. The deterministic PBE is, in fact, the expectation density of the infinite se-quence of equations satisfied by the product density equations. As the population decreases,higher order product density equations are required to describe the time behavior and fluctu-ations about the expected behavior of the population. We refer the interested reader to Ramkr-ishna [107] for the details of this analysis.

One approach to solving the stochastic model for any population of crystals is theMonte Carlo simulation method. Kendall [72] first applies the concept of exponentially dis-tributed time intervals between birth and death events in a single-species population. Shah,Ramkrishna, and Borwanker [136] use the same approach and simulate breakage and agglom-eration in a dispersed-phase system. The rates of agglomeration and breakage are propor-tional to the number of particles in the system and the size-dependent mechanism of breakageand agglomeration. Laurenzi and Diamond [79] apply the same technique as Shah, Ramkr-ishna, and Borwanker to model aggregation kinetics of platelets and neutrophils in flow fields.Gooch and Hounslow [52] apply a Monte Carlo technique similar to Shah, Ramkrishna, andBorwanker to model breakage and agglomeration. Gooch and Hounslow calculate the eventtime interval from the numerical solution to the zeroth moment equation with ∆N = 1 forbreakage, and ∆N = −1 for agglomeration. Manjunath et al. [86] and Bandyopadhyaya etal. [8] use the stochastic approach to model precipitation in small micellar systems. The modelspecifies the minimum number of solubilized ions and molecules to form a stable nucleus.Once a particle nucleates, growth is rapid and depletes the micelle of growth units. Brow-nian collisions govern the interaction between micelles. Solubilized ions and molecules aretransferred during collisions.

121

In the stochastic approach developed here, nucleation and growth in a large-scale batchcrystallizer are considered as a sequence of bimolecular chemical reactions. In particular, sol-ubilized ions or molecules (units) sequentially add to other units or to an assembly of anynumber of units. Both Gillespie [46] and Shah, Ramkrishna, and Borwanker [136] proposeequivalent methods for simulating exact trajectories of this random process. The expected be-havior of the system can then be evaluated by averaging over many trajectory simulations.The burden of model solution rests mainly with the computing hardware, and these MonteCarlo simulations can be time intensive depending on the number of particles and the size ofthe molecular unit. Currently, desktop computers can simulate systems with reasonably largeparticle populations and small molecular units in a matter of seconds or minutes.

In this chapter we first review the stochastic formulation of chemical kinetics and sum-marize the exact simulation method used to solve this system. We then extend the scope of theformulation to describe nonisothermal systems. Since this extension leads to a constraint thathinders the computation, we suggest an approximation that overcomes this obstacle. We thenoutline assumptions for formulation of the crystallization model. We illustrate the dependenceof the stochastic solution on key stochastic parameters, such as cluster size and simulation vol-ume. We also provide an analysis showing the connection between the stochastic formulationand the deterministic PBE. Next, we solve the stochastic formulation for models incorporat-ing isothermal and nonisothermal, size-independent nucleation, growth and agglomerationand contrast the solution to that from the deterministic framework. We then address how toestimate parameters using stochastic models, and provide an example. Finally, we assess thelimitations of the Monte Carlo simulation technique.

8.2 Stochastic Chemical Kinetics Overview

In this section, we first review the stochastic formulation of chemical kinetics and one com-putational method for solving this problem. We then relax key assumptions of this problemformulation in order to address other interesting physical systems, and discuss one approxi-mate computational solution method.

8.2.1 Stochastic Formulation of Isothermal Chemical Kinetics

The stochastic formulation of chemical kinetics has its physical basis in the kinetic theory ofgases [48]. The modeled system consists of well-mixed, gas-phase chemical species main-tained at thermal equilibrium. The key model assumptions include 1) a hard-sphere molecularmodel and 2) non-reactive collisions occur much more frequently than reactive collisions. Itis then possible to derive a deterministic time-evolution equation not for the state, but ratherfor the probability of being in a given state at a specific time. This evolution equation is thechemical master equation

dP (x, t)dt

=m∑

k=1

ak(x− νk)P (x− νk, t)− ak(x)P (x, t) (8.1)

122

in which

• x is the state of the system in terms of number of molecules (a p-vector),

• P (x, t) is the probability that the system is in state x at time t,

• ak(x)dt is the probability to order dt that reaction k occurs in the time interval [t, t+ dt),and


Here, we assume that the initial condition P (x, t0) is known.The solution of equation (8.1) is computationally intractable for all but the simplest sys-

tems. Rather, Monte Carlo methods are employed to reconstruct the probability distributionand its moments (usually the mean and variance). Monte Carlo methods take advantage ofthe strong law of large numbers, which permits reconstruction of functions of the probabilitydistribution g(x) by drawing exact samples from this distribution, i.e.

g(x) ,∫g(x)P (x, t)dx = lim

N→∞

1N

N∑i=1

g(xi) ≈ 1N

N∑i=1

g(xi) for N sufficiently large (8.2)

in which g(x) is the average value of g(x), N is the number of samples, and xi is the ith MonteCarlo reconstruction of x.

One efficient method for generating exact trajectories from the master equation is Gille-spie’s direct method [45, 46]. As noted previously, this particular simulation method is equiv-alent to the “interval of quiescence” technique proposed by Shah et al. [136]. This method waspreviously summarized in algorithm 1.

8.2.2 Extension of the Problem Scope

The previous problem formulation is quite restrictive from a modeling perspective. Firstly,many systems of interest are not solely gas phase. This restriction can be overcome by ju-dicious modeling assumptions to ensure that neither thermodynamics nor conservation lawsare violated. Secondly, the reaction propensities (ak’s) often change between reaction events.For example, subjecting the system to a deterministic energy balance introduces time-varyingreaction propensities into the system. In such cases the problem of interest is actually thefollowing master equation subject to constraints:

dP (x; t)dt

=m∑

k=1

ak(x− νk,y)P (x− νk; t)− ak(x,y)P (x; t) (8.3a)

dy(t)dt

= b(P (x),y; t) (8.3b)

To solve equation (8.3) exactly, we must revise algorithm 1 to account for the time dependenceof the propensity functions, ak(x,y) [47]. Since rtot and rk are functions of time, they must be

123

recalculated after determination of τ in order to choose which reaction occurs next. The majordifficulty in this method is that in step 2 of the algorithm 1, we must now satisfy the constraint∫ t+τ

trtot(t′)dt′ + log(p1) = 0 (8.4)

as opposed to a simple algebraic relation. This constraint often proves to be computationallyexpensive.

If the reaction propensities do not change significantly over the stochastic time stepτ , the unmodified algorithm 1 can still provide an approximate solution. When the reactionpropensities change significantly over τ , steps can be taken to reduce the error of algorithm 1.One idea is to scale the stochastic time step τ by artificially introducing a probability of noreaction into the system [57]:

• Let a0dt be the contrived probability, first order in dt, that no reaction occurs in the nexttime interval dt.

This probability does not affect the number of molecules of the modeled reactive system whileallowing adjustment of the stochastic time step by changing the magnitude of a0. Theoreti-cally, as the magnitude of a0 becomes infinite, the total reaction rate becomes infinite. As thetotal reaction rate approaches infinity, the error of the stochastic simulation subject to ODEconstraints approaches zero because the algorithm checks whether or not a reaction occurs atevery instant of time. Practically, the algorithm should first check the “no reaction propensity”at each iteration to prevent needless calculation of the entire range of actual reactions. Finally,we note that even though the method outlined by Gillespie is “exact” [47], there is still errorassociated with the finite number of simulations performed since it is a Monte Carlo method.Thus it is plausible that the inherent sampling error may be greater than the error introducedby our approximation. Hence our approximation may often prove to be less computation-ally expensive than the simulation by Gillespie [47] while generating an acceptable amount ofsimulation error. We summarize our approximation in algorithm 6.

8.2.3 Interpretation of the Simulation Output

Stochastic simulations of population balances involve two inherent and completely differentdistributions. First, each particle size Nj has its own probability distribution P (Nj , t) dictat-ing the likelihood that the particle size contains a prescribed number of particles. Second, thepopulation balance encompasses the entire distribution of these Nj ’s. For the simulation re-sults in this chapter, we perform multiple simulations given a specific initial condition. Foreach particle size at a given time, we then average over all simulations to obtain the expectednumbers of particles for the given size, i.e.

Nj(t) =N∑

i=1

N ij(t) (8.5)

124

Algorithm 6 Approximate Method (time-dependent reaction propensities).Initialize.Set the time, t, equal to zero.Set x and y to x0 and y0, respectively.

1. Calculate:

(a) the reaction propensities, rk = ak(x,y), and

(b) the total reaction propensity, rtot =∑m

k=0 rk.


3. Let τ = − log(p1)/rtot.Integrate dy/dt = b(x,y; t) over the range [t, t+ τ) to determine y(t+ τ).Let t← t+ τ .

4. Recalculate the reaction propensities rk’s and the total reaction propensity rtot. Choose jsuch that

j−1∑k=0

rk < p2rtot ≤j∑

k=0

rk

5. Let x← x + νj .Update y if necessary.Go to 1.

Here, Nj(t) is clearly a scalar value. Finally, we tabulate all of these Nj(t)’s, the expected num-ber of particles, to yield a mean population balance. This procedure is illustrated in Figure 8.1.

8.3 Crystallization Model Assumptions

Certain key assumptions ensure the validity of the stochastic problem formulation. Theseassumptions are:

1. The system of interest is a well-mixed, constant volume, batch crystallizer. The well-mixed assumption implies that the crystallizer temperature is homogeneous; that is,if any event creates a temperature change, the thermal energy is instantaneously dis-tributed throughout the crystallizer.

2. Particles have discrete sizes and size changes occur in discrete increments. On an atomiclevel, this assumption is physically true since crystals are composed of a discrete number

125

Distribution of Stochastic Averages

Distributions Scalars

StochasticStochasticRealizations Averages

Sample and

Average

Tabulate

Nj

Nn

...Nj

Nn

...

N1 N1......

Figure 8.1: Method for calculating the population balance from stochastic simulation. Eachparticle size Nj has its own inherent probability distribution. Monte Carlo methods providesamples from these distributions, and the samples are averaged to yield the mean value. Tab-ulating the mean values yields the mean of the stochastic population balance.

of molecules.

3. The degree of supersaturation acts as the thermodynamic driving force for crystalliza-tion. This assumption is necessary to account for the system thermodynamics. Other-wise we would need to employ molecular dynamics simulations using an appropriatemodel for the potential energy function to more accurately describe the time evolutionof the population balance. The downside of that choice is that our problem of interest,the macroscopic behavior of the crystallizer, becomes computationally intractable.

The additional assumptions we use to simplify the solution of the population balance andreduce computational load are:

1. Physical properties for the heat capacity, liquid and crystal densities, and the heat ofcrystallization remain constant.

2. Nucleation, growth, and agglomeration rate constants are independent of temperature.

3. Crystal growth occurs in integer steps of a monomer unit.

4. The number of saturated monomers is an empirical function of temperature.

126

8.4 Stochastic Simulation of Batch Crystallization

To illustrate the solution of the population balance via stochastic simulation, we examine threeexamples:

1. isothermal nucleation and growth;

2. nonisothermal nucleation and growth; and

3. isothermal nucleation, growth, and agglomeration.

The mechanisms for each of these examples are size-independent. Also, we define the follow-ing nomenclature:

• Mtot, Msat, and M are the total number of monomers, number of saturated monomers,and number of supersaturated monomers, respectively, on a per volume basis. Hence:

M = M tot −M sat (8.6)

• ∆ is the characteristic volume of one monomer unit.

• Nn is the number of particles with size ln = ∆(n+ 1).

• V is the system volume.

• V 0mon is the initial volume of monomer. For these examples, V 0

mon = 800V .

• n0mon is the initial number of monomer particles, and is determined by the relation:

n0mon =

V 0mon∆

(8.7)

• n0seed is the initial number of seed particles. For these examples, n0

seed = 10V .

8.4.1 Isothermal Nucleation and Growth

Consider the isothermal reaction system with second-order nucleation and growth and a uni-formly incremented volume scale ∆ = li − li−1:

2M kn−→ N1 (8.8a)

Nn + Mkg−→ Nn+1 (8.8b)

The model parameters are given in Table 8.1. We have chosen to model the crystallizationmechanism using a volume scale in order to conserve mass (recall the constant crystal densityassumption). In accord with this choice, the initial number of monomers are computed basedon the assigned value of ∆. Finally, we quadratically distribute the seeds over the particlevolume interval l ∈ [2, 2.5].

127

Parameter Symbol Valuenucleation rate constant kn 3.125× 10−9

growth rate constant kg 2.5× 10−4

number of saturated monomers Msat 0

Table 8.1: Nucleation and growth parameters for an isothermal batch crystallizer

Figure 8.2: Mean of the stochastic solution for an isothermal crystallization with nucleationand growth, 1 simulation, characteristic particle size ∆ = 0.01, system volume V = 1

Since kg is a constant, size-independent growth exhibits the same kinetics as the second-order reaction:

A + Mkg−→ B (8.9)

Here the number of species A molecules is equivalent to the zeroth moment of the particledistribution N . We can reduce computational expense by using reaction (8.9) to calculate thetotal reaction propensity (rtot) in the algorithm 1, then only calculating reaction propensitiesas needed to determine the next reaction.

Simulation Results

The stochastic simulation contains two parameters, the simulation volume V and the charac-teristic particle size ∆, that do not exist in deterministic population balances. In deterministic

128

Figure 8.3: Mean of the stochastic solution for an isothermal crystallization with nucleationand growth, average of 100 simulations, characteristic particle size ∆ = 0.01, system volumeV = 1

Characteristic Particle Size

Ave

rage

Tim

efo

rO

neSi

mul

atio

n(s

ec)

10.10.010.0011e-041e-05

100

10

1

0.1

Figure 8.4: Average stochastic simulation time based on 10 simulations and V = 1

129

0

10

20

30

40

50

01

23

45

67

890

2

4

6

TimeVolume

Cry

sta

ls

Figure 8.5: Mean of the stochastic solution for an isothermal crystallization with nucleationand growth, average of 100 simulations, characteristic particle size ∆ = 0.1, system volumeV = 1

population balances, the simulation volume is specified by the volume of the modeled crys-tallizer. In general, stochastic techniques cannot simulate the system volume due to excessivecomputational expense. To overcome this difficulty, we invoke the well-mixed assumption,choose a volume that accurately represents the system, and average the results of multiplesimulations given this volume.2 Care must be taken to ensure that the results are generatedfrom a sufficient number of simulations. For an example, consider the case in which ∆ = 0.01and V = 1. For one simulation, Figure 8.2 shows that each particle size is sparsely populated,making discrete transitions between states clearly observable. Averaging over one hundredsimulations, Figure 8.3 demonstrates that the particle sizes are more densely populated, thuscredibly reproducing the average system behavior.

Varying the characteristic particle size ∆ varies the initial number of monomer units.As ∆ decreases, the initial number of monomer units increases. Since the computational ex-pense scales with the number of reactant molecules, this expense increases. Figure 8.4 illus-trates this point by examining the average computational expense for ten simulations as afunction of ∆. In addition, the dispersion among particle sizes associated with the stochas-tic simulation becomes less pronounced as ∆ decreases. The effects of manipulating ∆ areillustrated in Figures 8.3 and 8.5.

2Rate constants of order greater than one are volume dependent in the stochastic simulation because reactionsare molecular events.

130

Derivation of the Macroscopic Population Balance as the Limit of the Master Equation

The results of the stochastic simulations lead to the belief that, under appropriate conditions,the deterministic population balance arises from the master equation system representation.We now prove this assertion.

The discrete master equation is of the form given in equation (8.1). Define the char-acteristic size of the system to be Ω, and use this size to recast the master equation (8.1) interms of intensive variables (let z ← x/Ω). Performing a Kramers-Moyal expansion on thismaster equation results in a system size expansion in Ω. In the limit as x and Ω become large,the discrete master equation can be approximated by its first two differential moments. Thisapproximation is the continuous Fokker-Planck equation [41]:

∂P (z; t)∂t

= −l∑

i=1

∂

∂zi(Ai(z)P (z; t)) +

12

l∑i=1

l∑j=1

∂2

∂zi∂zj

(Bij(z)2P (z; t)

)(8.10a)

A(z) =m∑

k=1

νkak(z) (8.10b)

B(z)2 =m∑

k=1

νkνTk ak(z) (8.10c)

Equation (8.10) has Ito solution of the form:

dzi = Ai(z)dt+l∑

j=1

Bij(z)dW j (8.11)

in which W is a vector of Wiener processes. The Fokker-Planck equation (8.10) specifies thedistribution of the stochastic process, whereas the stochastic differential equation (8.11) speci-fies how the trajectories of the state evolve.

By taking the thermodynamic limit (x→∞, Ω→∞, z = x/Ω = finite), equation (8.11)approaches the deterministic limit [76]:

dzi

dt= Ai(z) (8.12)

The deterministic limit implies that the probability P (z; t) collapses to a delta function. Nowconsider the two densities N(li, t) and f(l, t), representing the discrete and continuous pop-ulation balances, respectively. These densities are functions of the characteristic particle sizel and the time t. N(li, t) has units of number of crystals per volume, and f(l, t) has units ofnumber of crystals per volume per characteristic particle size. Define the system volume, V ,as the extensive characteristic size of the system, Ω. For the kinetic mechanism (8.8), equation

131

(8.12) defines the the discrete population balance accordingly:

dM tot

dt= −knM

2 −∞∑i=1

kgMN(li, t) (8.13a)

dN(l1, t)dt

=12knM

2 − kgMN(l1, t) (8.13b)

dN(li, t)dt

= kgM [N(li−1, t)−N(li, t)] , i = 2, . . . ,∞ (8.13c)

For small ∆ and a ≥ 1, it is apparent that the following equality should hold:

N(la, t) =∫ la+∆

2

la−∆2

f(l, t)dl (8.14a)

la = (a+ 1)∆ (8.14b)

Differentiating equation (8.14a) with respect to time yields:

dN(la, t)dt

=d

dt

∫ la+∆2

la−∆2

f(l, t)dl =∫ la+∆

2

la−∆2

∂f(l, t)∂t

dl (8.15)

For a > 1, apply the definition given by (8.13c) into equation (8.15):

kgM [N(la −∆, t)−N(la, t)] =∫ la+∆

2

la−∆2

∂f(l, t)∂t

dl (8.16)

Rewriting the left hand side in terms of an integral over the particle size l and regroupingyields: ∫ la+∆

2

la−∆2

∂f(l, t)∂t

+ kgM [f(l, t)− f(l −∆, t)]dl = 0 (8.17)

Since the bounds on the integral of equation (8.17) are arbitrary, i.e., they hold for any a suchthat a > 1, one solution is to set the integrand to zero:

∂f(l, t)∂t

+ kgM [f(l, t)− f(l −∆, t)] = 0 (8.18)

Mccoy [92] suggests considering a Taylor series expansion to determine the difference f(l, t)−f(l −∆, t):

f(l −∆, t) = f(l, t) +∂f(l, t)∂l

[(l −∆)− l] +12!∂2f(l, t)∂l2

[(l −∆)− l]2 + . . . (8.19a)

= f(l, t)−∆∂f(l, t)∂l

+∆2

2∂2f(l, t)∂l2

+ . . . (8.19b)

Hence the desired difference is:

f(l, t)− f(l −∆, t) = ∆∂f(l, t)∂l

− ∆2

2∂2f(l, t)∂l2

+ . . . (8.20)

132

For sufficiently small ∆, the first partial derivative of equation (8.20) adequately approximatesthis difference:

∂f(l, t)∂t

= −kgM [f(l, t)− f(l −∆, t)] (8.21a)

≈ −k′gM∂f(l, t)∂l

(8.21b)

where k′g = ∆kg. Equation (8.21b) is the corresponding macroscopic population balance equa-tion for well-mixed systems with only nucleation and growth, and is defined over the range0 ≤ l <∞. The boundary condition for equation (8.21b) at l =∞ is:

f(∞, t) = 0 (8.22)

The other boundary condition, f(0, t), can be determined by examining the zeroth moment(µ0) of equation (8.21b) and noting that only nucleation influences the number of particles:∫ ∞

0

∂f(l, t)∂t

dl =∫ ∞

0−k′gM

∂f(l, t)∂l

dl (8.23a)

dµ0

dt= −k′gM(f(∞, t)− f(0, t)) (8.23b)

12knM

2 = k′gMf(0, t) (8.23c)

f(0, t) =knM

2k′g(8.23d)

Finally, conservation of monomer dictates:

dM tot

dt= −knM

2 −∞∑i=1

kgMN(li, t) (8.24a)

≈ −knM2 − kgM

∫ ∞

0f(l, t)dl (8.24b)

In summary, in the thermodynamic limit and as ∆ becomes small, the stochastic formulationyields the following deterministic formulation:

∂f(l, t)∂t

= −k′gM∂f(l, t)∂l

(8.25a)

dM tot

dt= −knM

2 − kgM

∫ ∞

0f(l, t)dl (8.25b)

f(0, t) =knM

2k′g(8.25c)

Using these results, we solve the deterministic population balance for ∆ = 0.01 usingorthogonal collocation on finite elements [121, 127]. Figure 8.6 presents the resulting popula-tion balance discretized to ∆. Note that in comparison to the mean of the stochastic solution,

133

Figure 8.6: Deterministic solution by orthogonal collocation for isothermal crystallization withnucleation and growth, results discretized to a characteristic particle size ∆ = 0.01, systemvolume V = 1

i.e. Figure 8.3, the deterministic solution displays no dispersion in either the seed or nucle-ated particle distributions. This result indicates that the simulated characteristic particle size,∆ = 0.01, is large enough to merit including higher order terms of the f(l, t) − f(l − ∆, t)expansion. The next correction is the “diffusivity” term commonly used to model growth ratedispersion. The corresponding formulation for this model is:

∂f(l, t)∂t

= −k′gM(∂f(l, t)∂l

− ∆2∂2f(l, t)∂l2

)(8.26a)

dM tot

dt= −knM

2 − kgM

∫ ∞

0f(l, t)dl (8.26b)

f(0, t) =knM

2k′g+

∆2∂f(l, t)∂l

∣∣∣∣l=0

(8.26c)

0 =∂f(l, t)∂l

∣∣∣∣l=∞

(8.26d)

Figure 8.7 presents this population balance discretized to ∆. Comparison of this result toFigure 8.3, the mean of the stochastic solution, demonstrates excellent agreement between thetwo distributions. In contrast to prior modeling efforts (e.g. [111]), however, the “diffusivity”term is a function of the growth rate, not a constant. Hence when the growth rate is zero,growth rate dispersion ceases.

134

Figure 8.7: Deterministic solution by orthogonal collocation for isothermal crystallization withnucleation and growth, inclusion of the diffusivity term, results discretized to a characteristicparticle size ∆ = 0.01, system volume V = 1

The key differences between the stochastic and deterministic population balances aresomewhat subtle and deserve further attention. First, the stochastic population balance hasdiscrete particle sizes containing an integer number of particles. The deterministic populationbalance, on the other hand, has continuous particle sizes, and integration over a range of par-ticle sizes yields a real number of particles contained within this range. Second, the numberof particles contained in each size class of the stochastic population balance is governed by anindividual probability distribution; hence different simulations may yield different numbersof particles in a particular size class at the same time even if the initial condition is identical. Onlyin the large number (thermodynamic) limit do these probability distributions collapse to deltafunctions (single values) for the concentration of particles in a given size class. In the determin-istic population balance, simulating a given initial condition multiple times yields the samenumber of particles over a given size range at the same simulation time.

We note that Ramkrishna [106] provides a similar, but different perspective than ourson the connection between the stochastic and deterministic population balances. In his work,Ramkrishna considers continuous particle size classes, and demonstrates that the deterministicpopulation balance can be obtained by averaging the governing master equation. Our deriva-tion considers discrete particle sizes and derives the deterministic population balance as thelarge number (thermodynamic) limit of the governing master equation. We shy away fromaveraging because of literature examples demonstrating that this equivalence does not always

135

hold in the small molecule limit [143].

8.4.2 Nonisothermal Nucleation and Growth

In this example, we are interested in modeling a nonisothermal crystallizer whose temperatureis regulated by a cooling jacket. We consider the reaction system:

2M kn−→ N1 ∆Hnrxn (8.27a)

Nn + Mkg−→ Nn+1 ∆Hg

rxn (8.27b)

For the deterministic case, the energy balance should satisfy the following equation:

dT

dt=

UA

ρCpV(Tj − T )− ∆Hn

rxnρCp

(12knM

2

)− ∆Hg

rxn

ρCp

(kgM

∫ ∞

0f(l, t)dl

)(8.28)

Stochastically, we differentiate between enthalpy changes due to interaction with the cool-ing jacket and enthalpy changes due to nucleation and growth reactions. We treat enthalpychanges due to reactions stochastically in that they instantaneously release a specified heat ofreaction upon completion. On the other hand, we treat enthalpy changes due to interactionwith the cooling jacket continuously, giving rise to a deterministic enthalpy loss expression.This treatment of the energy balance with stochastic and deterministic contributions is dis-cussed further by Vlachos [156]. Hence our simulation plan is as follows:

1. Upon completion of a reaction event, update the temperature due to the enthalpy ofreaction.

2. Between reaction events, update the temperature using the following equation:

dT

dt=

UA

ρCpV(Tj − T ) (8.29)

Since the monomer saturation, Msat, is a function of temperature, the monomer supersatura-tion, M , is also a function of temperature and we must apply an algorithm that accounts fortime-dependent reaction propensities. We quadratically distribute the seeds over the crystalvolume interval [2, 2.5]. The cooling temperature profile for the jacket (Tj) follows an exponen-tially decreasing trajectory. The solubility relationship for the number of monomer is given by:

log10M sat = 2.25 log10 T +0.04T

+ 1.3 (8.30)

The model parameters are given in Table 8.2.The results for the mean of the exact stochastic simulation are presented in Figures

8.8 through 8.10. Figure 8.11 presents the result for the mean of the approximate stochasticsimulation with propensity of no reaction a0 = 10. The discretized solution of the deterministicpopulation balance including the diffusivity term is presented in Figure 8.12. These figures

136



characteristic particle volume ∆ 0.01initial crystallizer temperature To 39.896

initial cooling jacket temperature Tj,o 39.896crystallizer heat transfer coefficient × area UA 5

solution density × heat capacity ρCp 1simulation system volume V 1

nucleation and growth heats of reaction ∆Hnrxn = ∆Hg

rxn −0.01

Table 8.2: Nonisothermal nucleation and growth parameters for a batch crystallizer

Num

ber

ofSu

pers

atur

ated

Mol

ecul

esTime

Num

ber

ofM

onom

erM

olec

ules

2500

2000

1500

1000

500

050403020100

80000

70000

60000

50000

40000

30000

20000

Figure 8.8: Total and supersaturated monomer profiles for nonisothermal crystallization

demonstrate agreement between the mean of the exact stochastic solution, the mean of theapproximate stochastic solution, and the deterministic solution.

Figures 8.13 and 8.14 compare of the zeroth and first moments of the approximatestochastic simulation to the exact stochastic simulation. Here, we define the jth moment ofthe stochastic simulation µj as

µj =∑

x

xjNx (8.31)

in which Nn is the average number of particles in the nth size class. Varying the value ofthe propensity of no reaction, a0, controls the stochastic time in the approximate stochasticsolution. For this simulation, the value of a0 = 0.1 is clearly too small to account for the

137

Jacket

Crystallizer

Time

Tem

pera

ture

50403020100

40

35

30

25

20

15

Figure 8.9: Crystallizer and cooling jacket temperature profiles

Figure 8.10: Mean of the exact stochastic solution for nonisothermal crystallization with nu-cleation and growth, average of 500 simulations, characteristic particle size ∆ = 0.01, systemvolume V = 1

138

Figure 8.11: Mean of the approximate stochastic solution for nonisothermal crystallizationwith nucleation and growth, average of 500 simulations, characteristic particle size ∆ = 0.01,system volume V = 1, propensity of no reaction a0 = 10

time-varying reaction propensities as evidenced by the poor initial reconstruction of the mo-ments. However, as the value of a0 increases, the resulting population balances tend towardsthe exact stochastic solution. Although accuracy increases as a0 increases, computational ex-pense increases as well. Hence the value of a0 must be carefully selected to balance the two.Also, our implementation of the exact stochastic simulation employed an ODE solver with astopping criteria to account for the time-varying reaction propensities, whereas the approxi-mate solution did not require an ODE solver. As a result, the exact solution was two orders ofmagnitude slower than the approximate solution.

8.4.3 Isothermal Nucleation, Growth, and Agglomeration

We examine the same reactions as in mechanism (8.8), but now consider particle agglomera-tion as well:

2M kn−→ N1 (8.32a)

Nl + Mkg−→ Nn+1 (8.32b)

Np + Nqka−→ Np+q (8.32c)

The model parameters are given in Table 8.3. For size-independent agglomeration, ka is a

139

Figure 8.12: Deterministic solution by orthogonal collocation for nonisothermal crystallizationwith nucleation and growth, inclusion of the diffusivity term, results discretized to a charac-teristic particle size ∆ = 0.01, system volume V = 1

a0 = 100a0 = 10a0 = 0.1

Time

Perc

entE

rror

50403020100

35

30

25

20

15

10

5

0

Figure 8.13: Zeroth moment comparisons, mean of the stochastic solution for nonisothermalcrystallization with nucleation and growth, average of 500 simulations, characteristic particlesize ∆ = 0.01, system volume V = 1

140

a0 = 100a0 = 10a0 = 0.1

Time

Perc

entE

rror

50403020100

35

30

25

20

15

10

5

0

Figure 8.14: First moment comparisons, mean of the stochastic solution for nonisothermalcrystallization with nucleation and growth, average of 500 simulations, characteristic particlesize ∆ = 0.01, system volume V = 1



agglomeration rate constant ka 2.5× 10−4

simulation system volume V 1characteristic particle volume ∆ 0.01

number of saturated monomers Msat 0

Table 8.3: Nucleation, growth, and agglomeration parameters for an isothermal, batch crystal-lizer

constant. To make the simulation efficient, we note that this type of agglomeration exhibitsthe same kinetics as the second-order reaction:

2A ka−→ C (8.33)

Again, the number of species A molecules is equivalent to the zeroth moment of the particledistribution N , so we can use reaction (8.33) to calculate the propensity of all agglomerationevents occurring. In steps 1 and 4 of algorithm 1, we use this value in calculation of the totalreaction rate. Next, we first determine which type of reaction occurs (nucleation, growth, oragglomeration), then which specific event occurs, again calculating reaction propensities onlyas needed.

141

Figure 8.15: Mean of the stochastic solution for an isothermal crystallization with nucleation,growth, and agglomeration; average of 500 simulations; characteristic particle size ∆ = 0.01;system volume V = 1

The results of this simulation are presented in Figure 8.15. In contrast to Figure 8.3, theequivalent reaction system without agglomeration, we see that agglomeration increases theobserved particle dispersion phenomenon.

8.5 Parameter Estimation With Stochastic Models

The goal of parameter estimation is to determine the set of parameters that best reconcilesthe experimental measurements with model predictions. The classical approach is to assumethat measurements are corrupted by normally distributed noise. Accordingly, we calculate theoptimal parameters via the least squares optimization

minθ

Φ =12

∑k

eTk Rek (8.34a)

s.t.: nk+1 = F (nk, θ) (8.34b)

ek = yk − h(nk) (8.34c)

in which ek’s denote the difference between the measurements yk’s and the model predic-tions h(nk)’s 3. In general, most experiments do not include many replicates due to cost

3We assume that the measurement residuals ek’s are normally distributed with zero mean and R−1 covariance.

142

and time constraints. Therefore, the best experimental data we are likely to obtain is in theform of moments of the master equation, i.e. equation (8.2). Clearly the master equation (8.1)demonstrates that these moments are twice continuously differentiable, so standard nonlinearoptimization algorithms apply to fitting these moments to data.

In fitting data to stochastic models governed by the master equation, we choose themean x as the the state of interest. Monte Carlo simulation provides an estimate of this mean,albeit to some degree of error due to the finite simulation error. In the following subsections,we present a trust-region optimization method, discuss the calculation of finite difference sen-sitivities, and provide an example of estimating parameters for the nucleation, growth, andagglomeration mechanism of section 8.4.3.

8.5.1 Trust-Region Optimization

We perform optimization (8.34) using a trust-region method employing a Gauss-Newton ap-proximation of the Hessian. This method has provable convergence to stationary points (i.e.∇θΦ→ 0) [97]. Algorithm 7 presents the basic steps of this method. Evaluation of the objectivefunction is relatively expensive since it requires integrating the stochastic model. Therefore,we choose to accept all parameter changes that reduce the value of the objective function andsolve the trust-region subproblem exactly using a quadratic programming solver. Also, wescale the optimized parameters using a log10 transformation.

The trust-region subproblem requires knowledge of both the gradient and the Hessian.We can numerically evaluate both of these quantities

∇θΦ =∂

∂θT

12

∑k

eTk Rek (8.37)

= −∑

k

(∂h(nk)∂nT

k

∂nk

∂θT

)T

Rek (8.38)

= −∑

k

(∂h(nk)∂nT

k

Sk

)T

Rek (8.39)

∇θθΦ ≈ −∑

k

(∂h(nk)∂nT

k

Sk

)T

R∂h(nk)∂nT

k

Sk (8.40)

which indicates dependence upon Sk, the sensitivity of the state with respect to the parame-ters.

8.5.2 Finite Difference Sensitivities

We assume that the unknown evolution equation for the mean x depends on the system pa-rameters θ

xk+1 = F (xk,θ) (8.41)

143

Algorithm 7 Trust Region Optimization.

Given k = 0, ∆ > 0, ∆0 ∈ (0,∆), and η ∈ [0, 0.25).while (not converged)

1. Solve the subproblem

pk = arg minp∈Rn

mk(p) = Φ|θk+ ∇θΦ|Tθk

p+12pT ∇θθΦ|θk

p (8.35a)

s.t.: ||p||∞ ≤ ∆k (8.35b)

2. Evaluate

ρk =Φ(θk)− Φ(θk + pk)mk(0)−mk(pk)

(8.36)

3. if ρk < 0.25

∆k+1 = 0.25||pk||∞

else

if ρk > 0.75 and ||pk||∞ = ∆k

∆k+1 = min(2∆k,∆)

else

∆k+1 = ∆k

end if

end ifif ρk > η

θk+1 = θk + pk

else

θk+1 = θk

end if

4. k ← k + 1

end while

144

Parameter Symbol Valuesimulations per measurement evaluation nsim 1

finite difference perturbation δ 0.01θj

transmittance constant kt1

3000

measurement inverse covariance R diag([10−8, 1])

Table 8.4: Parameters for the parameter estimation example. Here, θj is the jth element of thevector θ.

Here, the notation xk denotes the value of the mean x at time tk. The sensitivity s indicateshow sensitive the mean is to perturbations of a given parameter, i.e.

sk =∂xk

∂θT(8.42)

We can then approximate the jth component of the desired sensitivity using, for example, acentral difference scheme:

sk+1,j =F (xk,θ + δej)− F (xk, θ − δej)

2δ+ i ·O(δ2) (8.43)

in which δ is a small positive constant, ej is the jth unit vector, and i is a vector of ones.

Finite difference methods have several potential problems when used in conjunctionwith Monte Carlo reconstructed quantities as discussed in Chapters 6 and 5. To reduce thefinite simulation error, we re-seed the random number generator before each sample used togenerate the mean xk. In doing so, we must take special care in the selection of the pertur-bation δ to ensure that its effect on the mean is sufficiently large; otherwise, the positive andnegative perturbations are approximately equal (i.e. F (xk,θ + δej) ≈ F (xk, θ− δej)) resultingin a poor reconstruction of the sensitivity. Finally, the computational expense of this methodcan be prohibitive if evaluating the mean is computationally intensive because calculating thesensitivity requires, in this case, two mean evaluations per parameter.

Drews, Braatz, and Alkire [25] recently examined using finite differences to calculatesensitivities for kinetic Monte Carlo code simulating copper electrodeposition. These authorsconsider the specific case of the mean sensitivity, and derive finite differences for cases withsignificant finite simulation error. In these cases, the finite simulation error is greater thanhigher-order contributions of the finite difference expansion, so the authors derive first-orderfinite differences that minimize the variance of the finite simulation error. We circumvent theneed for such expressions by appealing to the law of large numbers; that is, we reduce thevariance of the finite simulation error by merely increasing the number of simulations used toevaluate the mean when necessary.

145

Transformed Parameter Symbol Actual Value Estimated Valuenucleation rate constant log10 kn −8.51 −8.63± 0.03

growth rate constant log10 kg −3.60 −3.56± 0.02agglomeration rate constant log10 ka −3.60 −3.73± 0.05

Table 8.5: Estimated parameters

Tran

smit

tanc

e

Time

Num

ber

ofSu

pers

atur

ated

Mol

ecul

es 1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.250403020100

80000

70000

60000

50000

40000

30000

20000

10000

Figure 8.16: Comparison of final model prediction and measurements for the parameter esti-mation example.

8.5.3 Parameter Estimation for Isothermal Nucleation, Growth, and Agglomera-tion

We reconsider the isothermal nucleation, growth, and agglomeration example given in sec-tion 8.4.3. Traditional measurements for batch crystallizers yield moments of the PBE, so weassume that we can measure both the supersaturated monomer and transmittance, i.e.

y =

[M

exp (−ktµ2)

](8.44)

µ2 =∑

x

x2Nx (8.45)

in which µ2 is the second moment of the particle distribution and M is the average amountof supersaturated monomer. Parameters for the optimization routine are given in Table 8.4.Using the kinetic mechanism (8.32), we generate results from one simulation for the “experi-mental” measurements, then attempt to fit the parameters using subsequent simulations.

Table 8.5 compares the actual and estimated parameter values. We also report 95%

146

log10 ka

log10 kn

log10 kg

Iteration

-2.8

-3

-3.2

-3.4

-3.6

-3.8

-4

-4.2

-4.410987654321

-3

-4

-5

-6

-7

-8

-9

-10

Figure 8.17: Convergence of parameter estimates as a function of the optimization iteration.

confidence intervals for the estimated parameter values calculated by ignoring the effect of thefinite simulation error. The results indicate excellent agreement between the actual and fittedparameters. The slight discrepancies in the fit most likely result from the finite simulationerror (we simulated the “experimental” and predicted measurements using different seeds forthe random number generator). Figure 8.16, which plots both the “experimental” and modelpredicted measurements, also demonstrates excellent agreement between the model and theexperiment.

Figure 8.17 plots the convergence of the parameter estimates as a function of the opti-mization iteration. This result indicates that the convergence to the optimal parameter valuesoccurs relatively quickly (roughly five iterations). Each iteration requires seven mean evalua-tions (six for the finite difference calculations and one for the predicted step).

Raimondeau, Aghalayam, Mhadeshwar, and Vlachos [105] argue that using kineticMonte Carlo simulation to perform parameter estimation is too computationally expensive.They claim that a model with two to three parameters needs roughly 105 function (mean)evaluations for direct optimization. For this example, in contrast, the required number ofmean evaluations is less than 102. In general, we expect that the actual number of functionevaluations required for direct optimization is significantly lower than their estimate whenusing an appropriate optimization scheme.

8.6 Critical Analysis of Stochastic Simulation as a Modeling Tool

Thus far, we have demonstrated the efficacy of stochastic simulation as a macroscopic mod-eling tool. Now we address the benefits and shortcomings of this technique. The primary

147

shortcoming of stochastic simulation is the computational expense. Since the computationalexpense of stochastic simulation scales with the number of reactant molecules, this expenseincreases as the modeled volume increases or the characteristic particle size decreases. Also,the computational expense is significantly greater than that required to solve the equivalentdeterministic system. However, as computing power continues to increase, this discrepancywill become less of a hindrance in solving the stochastic PBE.

Perhaps the greatest advantages of stochastic simulation are its flexibility and ease ofimplementation. The simple algorithms presented in this chapter are applicable to any reac-tion mechanism. For example, adding agglomeration to the preexisting isothermal nucleationand growth code required addition of the n(n−1)/2 possible agglomeration reactions betweenn possible particle sizes. We expect that adding more complicated mechanisms or trackingmore than one crystal characteristic are straightforward extensions of this algorithm. Imple-menting size-dependent growth, for example, requires only making the reaction propensitiesfunctions of length (i.e., rk = ak(x, y, lk)). To track two characteristic lengths, we need onlyexplicitly account for each particle and define mechanisms for growth of each characteristiclength.

The most difficult part of augmenting the reaction mechanism is deciding how to storeand update the active particle sizes. To illustrate these points, we invite the interested reader todownload and examine codes that simulate isothermal nucleation and growth, and isothermalnucleation, growth, and agglomeration from our web site at

http://www.che.wisc.edu/˜haseltin/stochsims.tar .

Addition of agglomeration requires approximately sixty additional lines of code to the nu-cleation and growth code. The majority of this code updates the data structure employed toaccount for existing crystal sizes. In contrast, attempting to examine nucleation, growth, andagglomeration using orthogonal collocation most likely requires major revision of the solutiontechnique, such as adaptive mesh algorithms. Stochastic simulation inherently accounts foreach crystal in the simulation. Hence we see stochastic simulation as a general solution tech-nique that allows the user to focus on key modeling issues as opposed to population balancesolution methods.

We also demonstrated one method of performing parameter estimation with stochasticmodels. By applying appropriate nonlinear optimization routines, we can obtain optimal pa-rameter values with surprisingly few evaluations of the stochastic model. The primary draw-back to the presented method is the calculation of sensitivities via finite differences. Finitedifference methods quickly become expensive to evaluate as both the number of parametersand the computational burden of evaluating the stochastic model increase. Finally, refined op-timization of Monte Carlo simulations requires quantifying the effects of the finite simulationerror on both the model constraint (an error-in-variables formulation is more appropriate) andthe termination criteria.

148

8.7 Conclusions

Stochastic simulation provides one alternative to solving the deterministic crystallization pop-ulation balance. For systems with small numbers of monomer and seed, the stochastic crys-tallization model is more realistic than the deterministic model because it inherently accountsfor the system fluctuations. In the limit as the numbers of monomer and seed become large,the deterministic model becomes valid. Even for this case, stochastic simulation provides ageneral, flexible solution technique for examining many possible reaction mechanisms. Ad-ditionally, optimization of the stochastic model for purposes such as parameter estimation isfeasible and requires relatively few evaluations of the model. Simulation results presented inthis chapter illustrate these claims. Thus stochastic simulation should permit the user to focusmore on modeling issues as opposed to solution techniques.

Notation

A crystallizer areaa0dt contrived probability, first order in dt, that no reaction occurs in the next time interval dtak(x)dt probability to order dt that reaction k occurs in the time interval [t, t+ dt)Cp heat capacitye error vectorej jth unit vectorf(l, t)dl concentration of particlesg(x) average value of the quantity g(x)h(xk) model prediction of the measurement vector at time tki vector of oneska agglomeration rate constantkg growth rate constantkn nucleation rate constantl characteristic particle sizeM average amount of supersaturated monomerM number of supersaturated monomersM tot total number of monomersM sat number of saturated monomersN number of Monte Carlo samplesNj jth particle sizeNn number of particles with size ln = ∆(n+ 1)n0

mon initial number of monomer particlesn0

seed initial number of seed particlesP (x, t) probability that the system is in state x at time tpk kth uniformly-distributed random numberR inverse covariance matrix of the measurement noiserk kth reaction propensity

149

rtot total reaction propensityS sensitivity matrix of the states sensitivity of the mean x

T temperatureTj,o initial cooling jacket temperatureTo initial crystallizer temperaturet timeU crystallizer heat transfer coefficientV system volumeV 0

mon initial volume of monomerW the Wiener processx state vector in terms of number of moleculesx average state vectorxi ith Monte Carlo reconstruction of x

y vector of state-dependent variablesyk measurement vector at time tkz state vector in terms of concentration (intensive variable)∆ characteristic volume of one monomer unit∆k trust-region optimization parameter at step k∆ trust-region optimization parameter∆Hg

rxn growth heat of reaction∆Hn

rxn nucleation heat of reactionδ small positive constantη trust-region optimization parameterµj jth moment of the particle size distributionν stoichiometric matrixΦ objective function valueρp solution densityρk trust-region optimization parameter at step kτ next reaction timeθ vector of model parametersΩ characteristic system size

151

Chapter 9

Population Balance Models for CellularSystems 1

To date, most models of viral infections have focused exclusively on modeling either the intra-cellular level or the extracellular level. To more realistically model these infections, we proposeincorporating both levels of information into the description. One way of performing this taskin a deterministic setting is to derive cell population balances from the equation of continuity.In this chapter, we first outline the basics of deriving and solving these population balancemodels for viral infections. Next, we construct a population balance model for a generic viralinfection. We examine the behavior of this model given in vitro and in vivo conditions, andcompare the results to other model candidates. Finally, we present conclusions and considerthe future role of cell population balances in modeling virus dynamics.

9.1 Population Balance Modeling

The general population balance equation for cell populations arises from the seminal contribu-tion of Fredrickson, Ramkrishna, and Tsuchiya [36]. In recent years, this modeling frameworkhas returned to the literature as researchers strive to adequately reconcile model predictionswith the dynamics demonstrated by experimental data [80, 10, 33]. Also, new measurementssuch as flow cytometry offer the promise of actually differentiating between cells of a givenpopulation [1, 67], again implying the need to model distinctions between cells in a given pop-ulation. Here, we present a brief derivation for models encompassing a population of infectedcells as well as intracellular and extracellular components of interest.

In a deterministic setting, we can model the infected cell population by deriving a cellpopulation balance from the equation of continuity. Here we define the concentration of in-fected cells as a function of time (t) and the internal (y) and external (x) characteristics of the

1Portions of this chapter to appear in Haseltine, Rawlings, and Yin [60].

152

system:

η(t, z)dz = concentration of infected cells (9.1)

z =

[xy

]=

[external characteristicsinternal characteristics

](9.2)

We can then write a conservation equation for these cells by considering an arbitrary controlvolume V (t) spanning a space in x and y, assuming that V (t) contains a statistically signif-icant number of cells. Following the same arguments presented in section 2.1 results in themicroscopic equation of continuity, equation (2.8). This equation is the most general form ofour proposed model. We reiterate that the only assumption made thus far is that we considera statistically significant number of cells.

We now must specify segregations for the infected cell population. First, we assumethat the cells are well-mixed; this assumption allows us to eliminate the spatial dimensionsfrom equation (2.8):

∂η(t,y)∂t

+∇ · (η(t,y)vy) = Rη (9.3)

Next, we propose differentiating among the stage of infection for infected cells by using theinfected cell age. The cell age acts as a “clock” that starts upon initial infection of an uninfectedcell and ends upon the death of this cell. Although such a parameter cannot be explicitly mea-sured, it can nonetheless be identified experimentally through its effect upon other observablequantities such as the expression of viral products. Because the age changes with time in theusual way, the age velocity term is unity,

y = τ = infected cell age (9.4)

vy = 1 (9.5)

Additionally, modeling the intracellular biochemical network necessitates augmenting the cellpopulation balance with mass balances for viral components (labeled component i). Since theintracellular components are also segregated by the cell age, derivation of these mass balancesfollows that for the infected cell population (i.e. from equation (2.3) to (9.3)), yielding

∂ij∂t

+∂ij∂τ

= Rj + Ej j = 1, . . . , n (9.6)

in which

• Rj is the intracellular production rate of component j. Processes such as transcriptionand translation of the viral genome are examples of events contributing to Rj .

• Ej accounts for the effect of extracellular events on the intracellular production rate ofcomponent j. An example of such an event includes superinfection of an infected cell,which inserts additional viral genome and proteins into the cell.

153

We model extracellular components (labeled component e) as well-mixed and unsegregated(i.e. having no τ -dependence). The production rates for extracellular components may also bea function of both extracellular (E) and intracellular (R) events. In this case, however, infectedcells produce and secrete extracellular components at an age-dependent rate. The conservationequation for the extracellular component, then, includes an integration of the intracellular rateover the infected cell population:

∂ek∂t

= Ek +∫ τd

0η(t, τ)Rkdτ k = 1, . . . ,m (9.7)

Here, τd specifies the age of the oldest infected cell. Examples of processes contributing toEk and Rk include regeneration of uninfected cells and secretion of virus from infected cells,respectively. The comprehensive model for this system is

∂η(t, τ)∂t

+∂η(t, τ)∂τ

= Rη (9.8a)

∂ij∂t

+∂ij∂τ

= Rj + Ej j = 1, . . . , n (9.8b)

∂ek∂t

= Ek +∫ τd

0η(t, τ)Rkdτ k = 1, . . . ,m (9.8c)

9.2 Application of the Model to Viral Infections

We now consider application of this model to a generic viral infection. We first outline the basicintracellular and extracellular events occurring in such an infection, discuss further modelrefinements, and present the numerical technique used to solve the final model.

9.2.1 Intracellular Model

At the intracellular level, we incorporate events from a simple structured model of virusgrowth [143]:

nucleotides + gen k1−→V1

tem ε1 = k1i V1i gen (9.9a)

amino acids k2−→V2, tem

str ε2 = k2i V2i tem (9.9b)

nucleotides k3−→tem

gen ε3 = k3i tem (9.9c)

str k4−→ degraded ε4 = k4i str (9.9d)

gen + str k5−→ secreted virus ε5 = k5i geni str (9.9e)

Here, gen and tem are the genomic and template viral nucleic acids respectively, str is theviral structural protein, V1 and V2 are viral enzymes that catalyze their respective reactions,

154

and the reaction rates are given by the ε expressions. These events account for the insertionof the viral genome into the host nucleus, production of a viral template used to replicatethe viral genome and mass-produce viral structural protein, and the assembly and secretionof viral progeny. We assume that host nucleotides and amino acids are available at constantconcentrations. Therefore, the only intracellular components that we must track are the tem,gen, str, V1, and V2 components.

9.2.2 Extracellular Events

At the extracellular level, we adopt a standard model [98]:

virus + uninfected cell k6−→ infected cell ε6 = k6e vire unc (9.10a)

virus k7−→ degraded ε7 = k7e vir (9.10b)

infected cell k8−→ death ε8 = k8e inf (9.10c)

uninfected cell k9−→ death ε9 = k9e unc (9.10d)

precursors k10−→ uninfected cell ε10 = k10 (9.10e)

These events address the intuitive notions of cell growth, death, and infection by free virus.From this point forward, we use the abbreviations unc, inf, and vir for uninfected host cells,infected host cells, and virus.

9.2.3 Final Model Refinements

Further model assumptions include:

• Reaction rates of intracellular and extracellular events follow simple, mass-action ki-netics. All reactions are elementary as written except for enzyme-catalyzed reactions,in which case the expressions result from performing model reduction on Michaelis-Menten kinetics.

• Infected cells are created at age zero due to interaction between uninfected cells and freevirus, and infected cells die at an exponential rate until age τd

Rη = k6euncevirδ(τ)− η(t, τ) (k8 + δ(τ − τd)) (9.11)

Here, δ is the Dirac delta function. Also, an initial infection corresponds to insertion of 1gen/cell, 80 V1/cell, and 40 V2/cell into an uninfected cell.

• No superinfection of infected cells occurs.

• Concentrations of intracellular enzymes remain constant throughout the life cycle of aninfected cell.

155

Therefore, our final model is

∂η(t, τ)∂t

+∂η(t, τ)∂τ

= k6euncevirδ(τ)− (k8 + δ(τ − τd)) η(t, τ) (9.12a)

∂item(t, τ)∂t

+∂item(t, τ)

∂τ= Rtem (9.12b)

∂igen(t, τ)∂t

+∂igen(t, τ)

∂τ= Rgen + δ(τ) (9.12c)

∂istr(t, τ)∂t

+∂istr(t, τ)

∂τ= Rstr (9.12d)

∂iV1(t, τ)∂t

+∂iV1(t, τ)

∂τ= 80δ(τ) (9.12e)

∂iV2(t, τ)∂t

+∂iV2(t, τ)

∂τ= 40δ(τ) (9.12f)

deunc

dt= k10 − k9eunc − k6euncevir (9.12g)

devir

dt= −k7evir − k6euncevir +

∫ τd

0η(t, τ)Rvir(τ)dτ (9.12h)

9.2.4 Model Solution

To solve the model, we use orthogonal collocation on finite elements of Lagrange polynomi-als [155, 121, 127]. This method approximates functions of multiple coordinates, e.g. η(t, τ),by a linear combination of Lagrange interpolation polynomials:

η(t, τ) ≈n∑

j=1

Lj(τ)η(t, τj) (9.13)

in which Lj is a Lagrange interpolation polynomial of degree n, and η(t, τj) is the functionevaluated at the point τj . Accordingly, we can approximate the age derivative at each colloca-tion point as

∂η(t, τ)∂τ

∣∣∣∣τ=τj

≈n∑

j=1

∂Lj(τ)∂τj

η(t, τj) (9.14)

≈n∑

j=1

Aijη(t, τj) (9.15)

in which the matrix A is the derivative weight matrix. Also, we can approximately evaluateintegrals by using quadrature ∫ τd

0η(t, τ)dτ ≈

n∑j=1

qjη(t, τj) (9.16)

156

where qj is the jth quadrature weight.This method is known as the global orthogonal collocation method when only one

collocation element is applied to the entire domain of interest. Alternatively, one could splitthe domain into multiple subdomains, then apply a collocation element to each subdomain; inthis case, the method is called orthogonal collocation on finite elements. Collocation on finiteelements permits concentration of elements in regions where sharp gradients exist, a case thatnormally causes difficulties in global orthogonal collocation. At the junction of finite elements,one imposes continuity of the population η(t, τ), i.e.

η(t, τ)|τ− = η(t, τ)|τ+ (9.17)

in which the boundary between elements occurs at τ = τ , and τ− and τ+ represent the bound-aries of the adjoining finite elements [127]. Note that the number of boundary conditions at thejunction of elements is equal to the number of partial derivatives due to segregations (i.e. τ ),and that the order of each boundary condition is one less than the order of its partial deriva-tive. Unless otherwise specified, we use only one collocation element in our discretization.

The collocation method is very sensitive to large changes of the order of magnitude forthe approximating function. Since equation (9.12a) indicates that η(t, τ) changes exponentially,we use a logarithmic transformation to scale η(t, τ). Applying this method to equation (9.12)in effect discretizes the integro-partial differential equation into a system of differential alge-braic equations (DAE’s). We then use the software package DASPK [15] to integrate the DAEsystem.

Orthogonal collocation on finite elements presents merely one manner of solving equa-tion (9.12). We refer the interested reader to Mantzaris, Daoutidis, and Srienc [87, 88, 89] foran excellent overview of other numerical methods used to solve similar equations.

9.3 Application to In Vitro and In Vivo Conditions

To better understand the cell population balance, we apply model (9.12) to both in vitro and invivo conditions. We also compare the results of the model to other commonly used models.

9.3.1 In Vitro Experiment

Here we construct an in silico example to simulate a laboratory experiment. Our apparatus isa well-mixed, batch reactor containing uninfected cells in which nutrients are provided to sus-tain cells without growth. We assume that assays are available that measure the concentrationof uninfected cells, infected cells, virus, genome, template, structural protein, and V1 and V2

viral enzymes. With the goal of determining the intracellular kinetics, we consider performingthe following experiment: infect a population of cells and measure components for a sampleof cells. Although this technique has the disadvantage of introducing the population dynam-ics into the measurements, sampling a statistically significant number of cells has two primaryadvantages:

157

Parameter Value Unitsτd 100 daysk1 3.13× 10−4 cell/(#-day)k2 25.0 cell/(#-day)k3 0.7 day−1

k4 2.0 day−1

k5 7.5× 10−6 cell/(#-day)k6 5.0× 10−9 host/(#-day)k7 8.0× 10−2 day−1

k8 5.0× 10−2 day−1

k9 1.0× 10−2 day−1

k10 0 #/(host-day)

Table 9.1: Model parameters for in vitro simulation

• stochastic effects and cell to cell variations should average out, and

• we can adjust the sample size so that each component can be detected by its assay andconsistency with the key assumption of the continuity equation (statistically significantnumber of cells) is maintained.

We simulate the population balance model (9.12) with parameters given in Table 9.1 forthe following initial conditions:

1. extracellular virus >> uninfected cells (all uninfected cells are infected initially), and

2. extracellular virus > uninfected cells (only a fraction of uninfected cells are infected ini-tially).

Experimental observations indicate that infected cells die [83]. Perhaps the simplestway to account for cell death is to combine the intracellular model (9.9) with a simple popula-tion balance, i.e.

deunc

dt= −k5eunc − k2euncevir (9.18a)

deinf

dt= −k4einf + k2euncevir (9.18b)

devir

dt= −k3evir − k2eunccvir +Rvireinf (9.18c)

Equation (9.18) is a structured, unsegregated model. Next, we perform parameter estimationand model reduction 2 to obtain an optimal fit of the structured, unsegregated model (9.18) tothe data generated by the population balance (structured, segregated) model (9.12). For thesake of brevity, we do not report any of the fitted rate constants (k’s). Examining this optimalfit provides insight into the limitations of structured, unsegregated models.

2Rawlings and Ekerdt [120] provide the details of this method.

158

Case 1: All Uninfected Cells Infected Initially

Figure 9.1 presents the results for this case. These results indicate that the structured, unsegre-gated model provides an excellent fit to the data. Since all uninfected cells are infected withina relatively short period of time (roughly ten days), the approximation that all cells behave thesame is valid; hence the good fit to the data.

We contrast these results to those obtained from only simulating the intracellular events(i.e. Figure 9.2). Over the same time, the purely intracellular model predicts that all intra-cellular components increase monotonically throughout the experiment. We therefore inferthat the phenomenon of cell death causes the maxima observed in the measured intracellularcomponents. This observation reiterates the fact that experiments of this type introduce thepopulation dynamics into the measurements.

Case 2: A Fraction of Uninfected Cells Infected Initially

Figure 9.3 presents the results for this case. Examination of these results indicate that roughlytwo rounds of infection initiation occur (marked by peaks in the infected cell population): thefirst round within the first ten days of the experiment, corresponding to the initial infection;and the second round at roughly 75 to 100 days, corresponding to infection of uninfected cellsfrom virus produced by the first round of infected cells. Since the structured, unsegregatedmodel assumes that all cells behave on average the same, it cannot adequately describe thephenomenon of multiple rounds of infection. As a result, this model provides a sub-par fit tothe data. Also, we note that multiple rounds of infection have been observed experimentally incontinuous flow reactors [66, 151, 134, 74] as opposed to the conditions simulated here whichare batch experiments.

9.3.2 In Vivo Initial Infection

We now consider the in vivo behavior of the cell population balance for an initial infection of avirus-free host. Here, the initial condition is the steady state of the system with no virus. Forthe sake of illustration, we account for the host immune response very simply: comparison ofTables 9.1 and 9.2 shows that, in contrast to the in vitro system, the in vivo system:

• clears extracellular virus more rapidly (faster decay due to a larger value of k7), and

• uninfected host cells are produced at a nonzero rate (k10 is now nonzero).

Figure 9.4 demonstrates the host response for all extracellular components. The systemexhibits three stages of infection: first, a period of relative dormancy for roughly two infectioncycles (200 days ≈ 2τd); next, a cycle of rapid infection leading first to a peak in the infectedcell then virus population; and finally, an approach to an infected steady state. In the firststage, both the extracellular virus and infected cell populations are actually increasing steadily.However, in contrast to the rapid rate of infection observed during the second stage, the firststage appears to be dormant on the scale of Figure 9.4.

159

Time (Days)

unin

fect

edce

lls(×

10−

5#/

host

)

100806040200

1009080706050403020100

Time (Days)

infe

cted

cells

(×10

−5

#/ho

st)

100806040200

8

7

6

5

4

3

2

1

0

Time (Days)

viru

s(×

10−

7#/

host

)

100806040200

109876543210

Time (Days)

tem

(×10

−5

#/ho

st)

100806040200

35

30

25

20

15

10

5

0

Time (Days)

gen

(×10

−6

#/ho

st)

100806040200

9876543210

Time (Days)

stru

ct(×

10−

8#/

host

)

100806040200

16

14

12

10

8

6

4

2

0

Figure 9.1: Fit of a structured, unsegregated model to experimental results. Initial condition issuch that all uninfected cells are quickly infected by virus. Points present the “experimental”data obtained by solving the population balance model (structured, segregated model). Linespresent the optimal fit of the structured, unsegregated model to the “experimental” data.

160

secreted virus

str

gen

tem

Time (Days)

Con

cent

rati

on

100806040200

106

105

104

103

102

10

1

10−1

Figure 9.2: Time evolution of intracellular components and secreted virus for the intracellularmodel

Parameter Value Unitsτd 100 daysk1 3.13× 10−4 cell/(#-day)k2 25.0 cell/(#-day)k3 0.7 day−1

k4 2.0 day−1

k5 7.5× 10−6 cell/(#-day)k6 5.0× 10−9 host/(#-day)k7 1.0 day−1

k8 5.0× 10−2 day−1

k9 1.0× 10−2 day−1

k10 1.0× 106 #/(host-day)

Table 9.2: Model parameters for in vivo simulation

For the in vivo case, structured, unsegregated models do not offer an adequate repre-sentation of the system. Firstly, the “average cell” approximation ignores the cyclic nature ofan infection because we must assume that the average cell reaches a steady state when intu-itively we know that cells are regenerating and dying. Secondly, our intracellular model (seeFigure 9.2) does not reach a steady state over the life time of an infected cell (i.e. 100 days),so making the in vivo model reach a steady state requires unphysical changes to either theintracellular or extracellular description.

161

Time (Days)

unin

fect

edce

lls(×

10−

5#/

host

)

200150100500

109876543210

Time (Days)

infe

cted

cells

(×10

−4

#/ho

st)

200150100500

25

20

15

10

5

0

Time (Days)

viru

s(×

10−

6#/

host

)

200150100500

109876543210

Time (Days)

tem

(×10

−5

#/ho

st)

200150100500

14

12

10

8

6

4

2

0

Time (Days)

gen

(×10

−5

#/ho

st)

200150100500

35

30

25

20

15

10

5

0

Time (Days)

stru

ct(×

10−

8#/

host

)

200150100500

7

6

5

4

3

2

1

0

Figure 9.3: Fit of a structured, unsegregated model to experimental results. Initial condition issuch that not all uninfected cells are initially infected by virus. Points present the “experimen-tal” data obtained by solving the population balance model (structured, segregated model).Lines present the optimal fit of the structured, unsegregated model to the “experimental” data.

162

virusuninfected cells

infected cells

Time (Days)

Extr

acel

lula

rC

ompo

nent

s(×

10−

7#/

host

)

6005004003002001000

20

18

16

14

12

10

8

6

4

2

0

Figure 9.4: Dynamic in vivo response of the cell population balance to initial infection


infected cells

Time (Days)

Extr

acel

lula

rC

ompo

nent

s(×

10−

7#/

host

)

6005004003002001000

25

20

15

10

5

0

Figure 9.5: Extracellular model fit to dynamic in vivo response of an initial infection

Alternatively, we could incorporate only the extracellular events (9.10) in a mathemat-ical description as so:

deunc

dt= k1 − k5eunc − k2euncevir (9.19a)

deinf

dt= −k4einf + k2euncevir (9.19b)

devir

dt= −k3evir − k2eunccvir + k6einf (9.19c)

163

Model (9.19) Model (9.12)Parameter Fit Value 95% Confidence Interval Parameter Value Units

k1 7.96× 105 ±2.17× 105 k10 1.0× 106 #/(host-day)k2 4.28× 10−9 ±1.04× 10−9 k6 5.0× 10−9 host/(#-day)k3 1.56× 10−2 ±2.16× 10−3 k7 1.0 day−1

k4 3.86× 10−2 ±1.07× 10−2 k8 5.0× 10−2 day−1

k5 1.04× 10−2 ±2.10× 10−3 k9 1.0× 10−2 day−1

k6 0.104 ±8.68× 10−3 NA day−1

unc(t=0) 2.14× 108 ±5.68× 107 unc(t=0) 108 #/hostvir(t=0) 37.2 ±9.25 vir(t=0) 1000 #/host

Table 9.3: Comparison of actual and fitted parameter values for in vivo simulation of an initialinfection

This model differs only from that of Wodarz and Nowak [164] in that we assume infectionof an uninfected cell by a virus consumes the virus. Again, we attempt to optimally fit thismodel (9.19) to the cell population balance results 3. Figure 9.5 shows that this model cannotexhibit the same behavior as the cell population balance; most noticeably, the purely extracel-lular model cannot capture the dynamics of the initial dormant phase nor the burst of virusthat follows the peak in the infected cell population. Table 9.3 illustrates that the fitted and ac-tual parameters do not match to 95% confidence, but all fitted parameters are roughly the sameorder of magnitude with the exception of the virus decay parameter (k7 and k3) and the ini-tial virus concentration. This discrepancy occurs because the purely extracellular model (9.19)lumps all intracellular virus production events together. This result indicates that unstruc-tured, lumped parameter models can supply unreliable estimates for parameters that governindividual events.

9.3.3 In Vivo Drug Therapy

Now we consider in vivo response to drug therapy. In particular, we examine the extracellulareffect that viral enzyme inhibitors I1 and I2 produce by affecting the intracellular enzymes V1

and V2, respectively. Thus, the extracellular events associated with the drug therapy are

I1k13−→ degraded / secreted ε13 = k13e I1 (9.20a)

I1 + unc k14−→ I1(adsorbed) + unc ε14 = k14e I1e unc (9.20b)

I2k15−→ degraded / secreted ε15 = k15e I2 (9.20c)

I2 + unc k16−→ I2(adsorbed) + unc ε16 = k16e I2e unc (9.20d)

3Optimal fit corresponds to a least squares fit for the residual log10(yk + ci) − log10(sk + ci), where log10 isthe base ten logarithm, yk is the measurement vector, sk is the model predicted measurement vector, i is a vectorof ones, and c is a small constant. Also, the initial uninfected cell and virus concentrations were used as modelparameters.

164

In equations (9.20b) and (9.20d), we use the notation “(adsorbed)” to designate that the ex-tracellular drugs have been adsorbed into a cell. Intracellularly, these drugs then interact asso:

V1 + I1K1−− V − I1 (9.21a)

V2 + I2K2−− V − I2 (9.21b)

I1k11−→ secreted ε11 = k11i I1 (9.21c)

I2k12−→ secreted ε12 = k12i I2 (9.21d)

For this situation, we assume that:

1. equilibrium holds for the intracellular reactions (9.21a) and (9.21b);

2. all other reactions in (9.21) and (9.20) are elementary as written;

3. the inhibitors interact only with uninfected cells; and

4. the extracellular drug intake can be modeled as an overdamped second-order, linearfunction [99] of the form

uIj (t) = uIj

[1− exp

(− ζtτu

)(cosh

βt

τu+ζ

βsinh

βt

τu

)](9.22a)

β =∣∣ζ2 − 1

∣∣0.5 (9.22b)

assuming that a change in the drug intake occurs at time t = 0.

Parameters for this model are given in Tables 9.2 and 9.4. The initial condition for this modelcorresponds to the steady state of the previous section (see Figure 9.4).

Figure 9.6 presents the dynamic response for in vivo drug therapy. This response demon-strates the characteristic “pharmacokinetic lag” observed experimentally in viral treatments [13,101]; however, this lag is directly attributable to modeled events, namely the drug intake dy-namics, the assumption that the drugs interact only with uninfected cells, and the intracellulardynamics of drug interaction with virus enzymes. In contrast, purely extracellular modelsmust lump each of these individual events into (generally) a single parameter to describe thislag, as examined by Perelson et al. [101].

Another attractive feature of the cell population balance over the purely extracellularmodel is the ability to examine the effects that perturbations to the intracellular model haveupon the extracellular components. As an example, we consider the effect that changes in theefficacy of the viral inhibitors have upon the extracellular uninfected cell and virus concentra-tions. Such a change in efficacy may result, for example, by a mutation in the viral enzymescausing decreased efficiency in the viral enzyme-inhibitor interaction. Also, we assume thatintracellular drug concentrations cannot exceed values of 45 and 60 #/cell for iI1 and iI2 , re-spectively, due to adverse side-effects of the inhibitors. Plots (a) and (b) of Figure 9.7 present

165

Parameter Value UnitsK1 1.0 cell/#K2 1.0 cell/#k11 100. day−1

k12 100. day−1

k13 10. day−1

k14 1.0× 10−3 host/(#-day)k15 9.0 day−1

k16 8.0× 10−4 host/(#-day)uI1 4.0× 107 #/(host-day)uI2 4.0× 107 #/(host-day)ζ 1.1 unitlessτu 10. day

Table 9.4: Additional model parameters for in vivo drug therapy

the results for the nominal case. If the goal of the drug therapy is to maximize the uninfectedcell concentration while minimizing the virus concentration, then the optimal treatment strat-egy is to maximize intake of both drugs. Plots (c) and (d) of Figure 9.7 present the results fora mutated virus corresponding to an 80% and 90% decrease in the binding constants K1 andK2, respectively. After the mutation, the optimal treatment strategy is actually to maximize I1intake and stop treatment with I2.

166

I2

I1

Time (Days)

Extr

acel

lula

rIn

hibi

tor(×

10−

6#/

host

)

5004003002001000

4.54.03.53.02.52.01.51.00.50.0

uninfected

infected

Time (Days)

Cel

ls(×

10−

7#/

host

)

5004003002001000

2.01.81.61.41.21.00.80.60.40.2

Time (Days)

Extr

acel

lula

rV

irus

(×10

−7

#/ho

st)

5004003002001000

9.0

8.0

7.0

6.0

5.0

4.0

3.0

2.0

1.0

Figure 9.6: Dynamic in vivo response to initial treatment with inhibitor drugs I1 and I2.

167

(a)

incr

easi

ngI 1

Intr

acel

lula

rI 2

(#/c

ell)

UninfectedCells(×10−6#/host)

6050

4030

2010

0

80 70 60 50 40 30 20 10 0

(b)

incr

easi

ngI 1

Intr

acel

lula

rI 2

(#/c

ell)

ExtracellularVirus(#/host)

6050

4030

2010

0

109

108

107

106

105

(c)

incr

easi

ngI 1

Intr

acel

lula

rI 2

(#/c

ell)

UninfectedCells(×10−6#/host)

6050

4030

2010

0

16 14 12 10 8 6 4 2 0

(d)

incr

easi

ngI 1

Intr

acel

lula

rI 2

(#/c

ell)

ExtracellularVirus(#/host)

6050

4030

2010

0

108

107

Figu

re9.

7:Ef

fect

ofdr

ugth

erap

yon

invi

vost

eady

stat

es.

Am

ount

of(a

)un

infe

cted

cells

and

(b)

extr

acel

lula

rvi

rus

give

nno

min

aldr

ugef

ficac

y.A

mou

ntof

(c)

unin

fect

edce

llsan

d(d

)ex

trac

ellu

lar

viru

sgi

ven

redu

ced

drug

effic

acy

due

tovi

rus

mut

atio

n.

168

9.4 Future Outlook and Impact

The cell population balance offers an intuitive, flexible environment for modeling the com-bined intracellular and extracellular events associated with viral infections. Because this modelhas segregations, it can account for observed phenomena such as multiple rounds of infectionand pharmacokinetic delays associated with drug treatments of infections. Because this modelhas structure, it can examine the effects that each intracellular component has upon the dy-namics of the extracellular components. Neither structured, unsegregated models nor purelyextracellular models can account for both of these phenomena.

Validation of cell population balance models requires experimental measurements ofboth extracellular populations and intracellular viral components. Traditional assays alreadyoffer a means for measuring extracellular populations; for example, clinicians routinely mea-sure both host CD4+ T-cells and virus titers in HIV-infected patients. Methods such as poly-merase chain reaction (PCR), western blotting, and plaque assays offer quantitative intracellu-lar measurements of the viral genome, proteins, and infectious viral progeny, respectively. Cellpopulation balance models provide one method of adequately assimilating the data containedin these measurements.

For in vitro experiments, we suspect that modifications to existing protocols may yieldnew information about the structure of the population balance model. For example, moststudies of replication for animal viruses rely on one-step growth curves in which all cells ina culture are infected simultaneously [162]. While such experiments have supplied informa-tion on the intracellular dynamics of a single infection cycle, they offer no insight into howvirus-mediated activities, such as activation of cellular antiviral responses and cell-cell com-munication, may influence the subsequent dynamics of viral propagation. New in vitro meth-ods currently being developed [26, 28] allow viruses to infect cells sequentially rather thansimultaneously, opening new opportunities to probe virus-host interactions at multiple levels.

A good quantitative model of how viral infections propagate will lead to better under-standing of how to best control this propagation. For example, steady-state analysis for invitro drug therapy revealed that the optimal treatment strategy for one particular virus mu-tation requires stopping treatment with one drug. This counterintuitive result highlights apotential pitfall of current strategies that aim to thwart the emergence of drug-resistant virusmutants by employing multiple anti-viral drugs. Another intriguing possibility would be toperform sensitivity analysis for both intracellular components and rate constants to determinewhich ones have the greatest impact upon extracellular components such as the virus concen-tration. This analysis could then focus drug development towards those candidates havingmaximum therapeutic benefit. One could also consider tailoring therapies by characterizingboth the virus and immune system for a given individual, rather than relying on general drugregimens obtained from the best “average” response for a given study.

169

Notation

A derivative weight matrix for orthogonal collocationc small constantEj extracellular production rateej extracellular viral componenti a vector of onesij intracellular viral componentKj equilibrium constant for the segregated, structured modelkj reaction rate constant for the segregated, structured modelkj reaction rate constant for the unsegregated, structured modelkj reaction rate constant for the purely extracellular modelLj(τ) Lagrange interpolation polynomial of degree n for orthogonal collocationlog10 base ten logarithmqj jth quadrature weight for orthogonal collocationRj jth intracellular production rateRη production rate for the infected cell population ηsk measurement vector predicted by the modelt timeuj second-order input for extracellular component juj input for extracellular component jV (t) arbitrary, time-varying control volume spanning a space in zvy vector specifying the y-component velocity of cells flowing through the volume Vx external characteristicsy internal characteristicsyk experimental measurement vectorz internal and external characteristicsβ parameter for the second-order input functionδ Dirac delta functionεj jth reaction rateη(t, z)dz concentration of infected cellsη(t, τj) infected cell concentration evaluated at the point τjτ infected cell ageτd age of the oldest infected cell permitted by the modelτu natural period of the second-order input functionζ damping coefficient of the second-order input function

171

Chapter 10

Modeling Virus Dynamics: FocalInfectionsWe consider using dynamic models to obtain a better quantitative and integrative understand-ing of both viral infections and cellular antiviral mechanisms. We expect this approach toprovide key insights into mechanisms of viral pathogenesis and host immune responses, aswell as facilitate development of effective anti-viral strategies. Our focus, however, is not toincorporate all the wealth of information already known about either of these topics; rather,we seek to identify the critical biological and experimental phenomena that give rise to the ex-perimental observations. We consider the focal infection system described by Duca et al. [26],which permits quantification of multiple rounds of viral infection. This experimental systemprovides a unique platform for studying multiple rounds of the virus replication cycle as wellas the innate ability of host cells to combat the invading virus.

We consider the example virus/host system of vesicular stomatitis virus (VSV) propa-gating on either baby hamster kidney (BHK-21) cells or murine astrocytoma (DBT) cells. VSVis a member of the Rhabdoviridae family consisting of enveloped RNA viruses [129]. Its com-pact genome is only approximately 12 kb in length, and encodes genetic information for fiveproteins. Because VSV is highly infective and grows to high titer in cell culture, it is viewedas a model system for studying viral replication [64, 7]. Also, VSV infection can elicit aninterferon-mediated antiviral response from host cells [129]. Thus the studied experimentalsystem provides a platform for further probing the quantitative dynamics of this antiviral re-sponse. A great wealth of information is known about the interferon antiviral response (see,for example, [133, 54]). We seek to elucidate what level of complexity is requisite to explainthe experimental data.

Yin and McCaskill [165] first proposed a reaction-diffusion model to capture the dy-namics of plaque formation due to viral infection. The authors derived model solutions forthis formulation in several limiting cases. You and Yin [166] later refined this model and useda finite difference method to numerically solve the time progression of the resulting model.Fort [34] and Fort and Mendez [35] revised the model of You and Yin [166] to account for thedelay associated with intracellular events required to replicate virus, and derived expressionsfor the velocity of the propagating front. These works, however, focused on explaining the

172

Step 1: Monolayers fixed at selected times

Step 2: Removal of agar and washes

Step 3: Antibody labeling for viral glycoprotein

focal infection

uninfected cells

infection spread

Uninfected cell

Dead cell Infected cell

Antibody VirusKey:

Step 4: Detection by antibody immunofluorescence

Measurement Imaging

Figure 10.1: Overview of the experimental system. Initially, host cells are grown in a confluentmonolayer on a plate. The cells are then covered by a layer of agar. To initiate the infection, apipette (one mm radius) is used to carefully remove a small portion of the agar in the center ofthe plate. An initial inoculum of virus is then placed in the resulting hole in the agar, initiatingthe infection. The agar overlay serves to restrict virus propagation to nearby cells. To monitorthe infection spread, monolayers are fixed at various times post-infection. The agar overlayis removed and the cells are rinsed several times, the last time with a labeled antibody thatbinds specifically to the viral glycoprotein coating the exterior of the virus capsid. Images ofthe monolayers are then acquired using an inverted epifluorescent microscope.

velocity of the infection front, a quantity derived from experimentally-obtained images of theinfection spread. Our goal in this chapter is to explain the infection dynamics contained withinthe entire images.

In this chapter, we first briefly review the experimental system of interest. Next, weoutline the steps taken to analyze the experimental measurements (images of the infectionspread) and propose a measurement model. We then successively formulate, fit, and refinemodels using the analyzed images, first for VSV infection of BHK-21 cells, then for DBT cells.Finally, we analyze the results of the parameter fitting and present conclusions.

10.1 Experimental System

Here we briefly review the experimental system of interest; for detailed information on theexperimental procedure, we refer the interested reader to Duca et al. [26]. This system permitsdynamic, spatial quantification of virus protein via antibody immunofluorescence. Figure 10.1presents a general schematic of this experimental system along with a digital image acquiredduring such an infection. Initially, host cells are grown in a confluent monolayer on a plate.The cells are then covered by a layer of agar. To initiate the infection, a pipette (one mmradius) is used to carefully remove a small portion of the agar in the center of the plate. An

173

Parameter Symbol ValueCell volume Vc 3.4× 10−9 ml

Initial number of uninfected cells nunc,0 106 cellsNumber of viruses in the initial inoculum nvir,0 8.0× 104 viruses

Radius of the plate rplate 1.75 cm

Table 10.1: Parameters used to describe the experimental conditions.

initial inoculum of virus is then placed in the resulting hole in the agar, initiating the infection.The agar overlay serves to restrict virus propagation to nearby cells. To monitor the infectionspread, monolayers are fixed at various times post-infection. The agar overlay is removed andthe cells are rinsed several times, the last time with a labeled antibody that binds specificallyto the viral glycoprotein coating the exterior of the virus capsid. Images of the monolayers arethen acquired using an inverted epiflourescent microscope.

10.1.1 Modeling the Experiment

Table 10.1 presents parameters used to model the experimental conditions. We assume thatcells are spherical objects, with the height of the cell monolayer equal to the resulting cell diam-eter. Concentrations for all species are calculated assuming that the volume of the monolayeris cylindrical. The dimensions of this cylinder are given by the height of the cell monolayerand the radius of the plate. We model the concentration of the initial virus inoculum using thepiecewise linear continuous function

cvir(t = 0, r) =

cvir,0, r < 0.075 cm(

1− 20cm(r − 0.075)

)cvir,0 0.075 cm ≤ r ≤ 0.125 cm

0, r > 1.25 cm(10.1)

10.1.2 Modeling the Measurement

We assume that the measurement process (steps one through four in Figure 10.1) is an equi-librium process in which virus associates indiscriminately with cells in the monolayer. Addi-tionally, dead cells undergo a change in morphology which decreases their ability to remainbound to the plate during removal of the agar overlay. We account for this effect by estimatingkwash, the fraction of dead cells that adhere to the plate after the removal of the agar overlayand the subsequent washes. Accordingly, the amount of virus bound to host cells is given bythe expression

Km =cvir-host

cvir (cunc + cinfc + kwashcdc)(10.2)

in which Km is the equilibrium constant, and cvir, cunc, cinfc, cdc, and cvir-host refer to the con-centrations of virus, uninfected cells, infected cells, dead cells, and virus-host complexes, re-spectively.

174

Intensity

0

255

1

254

...

Virus-Host Complex

vmaxvmin vmin

Original Image Averaged ImageIntensity

0

255

1

254

...

Virus-Host Complex

ibgd

km

vmax

Figure 10.2: Measurement model. The original images quantize the virus-host concentration, acontinuous variable, onto the integer-valued intensity. Each pixel in the averaged images is themean of 400 pixels from the original image, and we approximate the step-wise discontinuousintensity (incremented by 1/400) as a piece-wise, continuous function.

10.1.3 Analyzing and Modeling the Images

We have reduced the amount of information in each image by partitioning the images intoblocks of 20 pixels by 20 pixels, then averaging the pixels contained in each block. This aver-aging technique has the primary benefit of drastically reducing the total number of pixels thatmust be analyzed (in the case of the largest image, from roughly two million to five thousandpixels) while retaining the prominent features of the infection spread.

We assume that the intensity of each pixel in the image is due to the background flu-orescence of cells and linear variation in the concentration of virus-host complexes, whichfluoresce due to the labeled antibody. In the original images, the intensity information quan-tizes this essentially continuous variable into a step-wise, discontinuous signal (integer valuedfrom 0 to the saturating value of 255). For the averaged images, the intensity information isstep-wise, discontinuous with increments of 1/400. We approximate this signal using a piece-wise continuous function. The comparison between the measurement model for the originaland averaged images is illustrated in Figure 10.2. The measurement model is then:

ym =

ibgd, cvir-host ≤ vmin

kmcvir-host + ibgd, vmin < cvir-host < vmax

255, cvir-host ≥ vmax

(10.3)

in which ym is the intensity measurement, km is the conversion constant from concentration tointensity, ibgd is the background fluorescence (in intensity), and vmin and vmax are the minimumand maximum detectable virus-host concentrations.

175

Time(hours)

Data Original Model + Initial Inoculation

18

30

48

72

90

Figure 10.3: Comparison of representative experimental images to model fits. The full set ofexperimental images are available in the appendix. “Original Model” refers to the derivedreaction-diffusion model. “+ Initial Inoculation” incorporates the variation in the concentra-tion of uninfected cells within the radius of the initial inoculation. The white scale bar in theupper left-hand corner of the experimental images is one millimeter.

10.2 Propagation of VSV on BHK-21 Cells

We first consider propagation of VSV on baby hamster kidney (BHK-21) cells. The first columnof images in Figure 10.3 presents representative images for the time course of the experiment;the full set of experimental images are available in the appendix. For this virus/host system,the images demonstrate two prominent features: (1) the infection propagates unimpeded out-ward radially and (2) the band of intensity amplifies from the first to the third measurement.We now consider models to quantitatively capture both of these features.

176

10.2.1 Development of a Reaction-Diffusion Model

We extend the reaction-diffusion model first proposed by Yin and McCaskill [165] and laterrefined by You and Yin [166] to model this infection. We consider only extracellular species,namely virus, uninfected cells, infected cells, and dead cells. In this context, only the virus isallowed to diffuse, and we model the following reactions:

virus + uninfected cell k1−→ infected cell (10.4a)

infected cell k2−→ Y virus (10.4b)

in which Y is the yield of virus per infected cell. We assume that the infection propagationis radially symmetric. The concentrations of all species are then segregated by both time andradial distance, giving rise to the following governing equations for the model:

∂cvir

∂t=

1r

∂

∂r

(Deff

virr∂cvir

∂r

)+Rvir (10.5a)

∂cunc

∂t=Runc (10.5b)

∂cinfc

∂t=Rinfc (10.5c)

Deffvir = 2Dvir

1− φ2 + φ

(10.5d)

φ = Ve(cunc + cinfc) (10.5e)

cj(t = 0, r) known,dcvir

dr

∣∣∣∣r=0,rmax

= 0 (10.5f)

in which the reaction terms (e.g. Rvir) are dictated by the stoichiometry of reaction (10.4)assuming that the reactions are elementary as written. Also, diffusivity of the virus is hin-dered due to the presence of uninfected and infected cells on the plate. An effective dif-fusivity accounts for this effect. We solve equation (10.5) by discretizing the spatial dimen-sion using central differences with an increment of 0.025 cm, then solving the resulting set ofdifferential-algebraic equations using the package DASKR, a variant of the predictor-correctorsolver DASPK [15], with the banded solver option. We determine optimal parameter estimatesby solving the following least squares optimization

minθ

Φ = minθ

∑k

eTk Rek

s.t.: ek = yk − h(xk;θ)

xk =[cvir cunc cinfc cdc

]TEquation (10.5)

which minimizes the sum of squared residuals between the vectorized images yk and themodel-predicted images h(xk;θ) in a pixel by pixel comparison by manipulating the modelparameters θ. Here we use a log10 transformation of the parameters for the optimization.

177

+ Initial InoculationOriginal Model

Radius (cm)

c unc×

10−

7(#

/ml)

1.81.61.41.210.80.60.40.20

4

3.5

3

2.5

2

1.5

1

0.5

0

Figure 10.4: Comparison of the initial uninfected cell concentration for the original and revised(accounting for the initial inoculation effect) models.

The second column of images in Figure 10.3 presents the results for the optimal fit. Incomparison to the experimental data, the results demonstrate similar radial propagation ofthe infection front, but do not capture the amplification of intensity observed through the firstthree samples. To refine the model, we propose that the resulting amplification results froman initial condition effect. In particular, we allow the initial concentration of uninfected cellsto vary within the radius of the initial inoculum, and introduce the parameter c′unc,0 in which

cunc(t = 0, r) =

c′unc,0, r < 0.075 cm(

1 + 20cm

(1− c′unc,0

cunc,0

)(r − 0.075)

)cunc,0, 0.075cm ≤ r ≤ 0.125 cm

cunc,0, r > 1.25 cm

(10.6)

Performing the parameter estimation with this additional degree of freedom yields the alteredinitial concentration profile for uninfected cells in Figure 10.4 as well as the optimal fit pre-sented in the third column of images in Figure 10.3. Clearly this fit captures both the outwardradial propagation of the infection as well as the amplification of the intensity in the first threeimages of the time series data.

10.2.2 Analysis of the Model Fit

Table 10.2 presents the parameter estimates for both the original and refined models. Bothmodels predict roughly the same estimates for all parameters. Also, adding the parameterc′unc,0 reduces the objective function Φ by about five percent.

Ware et al. [160] use laser light-scattering spectroscopy to estimate the diffusivity of theVSV virion to be 2.326 × 10−8 cm2/sec. Converting this value to cm2/hr and taking the log10

178

Model 1 Model 2Parameter Units log10 Value log10 Value

k1 hr−1 −11.0 −10.8k2 cm3/hr 0.145 0.555Dvir cm2/hr −3.87 −3.94Y 2.66 2.51ibgd 1.50 1.50kmKm cm−3 −15.9 −15.7kwash −1.34 −1.39cunc,0 cm−3 NA 6.25

Φ 2.11× 106 2.00× 106

Table 10.2: Parameter estimates for the VSV/BHK-21 focal infection models. Parameters areestimated for the log10 transformation of the parameters. NA denotes that the parameter is notapplicable for the given model.

Eigenvectorlog10 Parameter v1 v2 v3 v4 v5 v6 v7

k1 0.769 0.046 −0.069 0.147 −0.195 −0.233 −0.537k2 −0.162 0.976 0.042 0.030 −0.077 −0.037 −0.102Dvir −0.241 −0.080 −0.103 −0.812 −0.285 −0.088 −0.420Y −0.468 −0.144 0.142 0.384 0.331 −0.051 −0.693ibgd 0.046 −0.003 −0.030 0.098 −0.303 0.926 −0.195kmKm 0.285 0.131 −0.142 −0.358 0.821 0.272 −0.076kwash 0.147 −0.007 0.970 −0.182 0.021 0.052 0.006

Eigenvalue −1.05e7 −2.23e5 6.12e5 4.88e6 3.58e7 3.21e8 1.07e9

Table 10.3: Hessian analysis for the parameter estimates of the original VSV/BHK-21 focalinfection model. Parameters are estimated for the log10 transformation of the parameters.Negative eigenvalues are likely due to error in the finite difference approximation used tocalculate the Hessian.

yields a value of −4.08. This value is very close to the estimated values of −3.87 and −3.94(see Table 10.2).

Table 10.3 analyzes the Hessian of the objective function for the parameter estimates ofthe original model. This analysis indicates that two linear combinations of parameters cannotbe estimated due to negative eigenvalues (which most likely result from errors in the finitedifference approximation of the Hessian). The first of these two linear combinations of param-eters, i.e. v1, is primarily constituted by the first reaction rate constant k1 and the virus yieldY . The second rate constant k2 accounts for virtually all of the second of these linear combi-nations. Table 10.4 analyzes the Hessian of the objective function for the parameter estimatesof the revised model. This analysis indicates that two linear combinations of parameters can-not be estimated due to negative eigenvalues. These two linear combinations of parameters

179

Eigenvectorlog10 Parameter v1 v2 v3 v4 v5 v6 v7 v8

k1 0.782 −0.019 −0.001 0.055 0.147 −0.167 −0.207 −0.541k2 −0.068 −0.995 0.024 −0.036 0.011 −0.039 −0.015 −0.046Dvir −0.212 0.035 −0.008 0.071 −0.828 −0.312 −0.070 −0.401Y −0.504 0.065 0.005 −0.105 0.363 0.293 −0.050 −0.715ibgd −0.004 −0.025 −0.999 0.002 0.010 −0.000 0.000 −0.001kmKm 0.048 0.002 0.001 0.029 0.101 −0.286 0.936 −0.170kwash 0.263 −0.059 −0.003 0.109 −0.367 0.840 0.267 −0.068c′unc,0 0.115 0.024 −0.004 −0.983 −0.128 0.023 0.047 0.006

Eigenvalue −1.64e7 −2.92e4 7.66e3 5.31e5 7.59e6 3.60e7 3.24e8 1.24e9

Table 10.4: Hessian analysis for the parameter estimates of the revised VSV/BHK-21 focalinfection model. Parameters are estimated for the log10 transformation of the parameters.Negative eigenvalues are likely due to error in the finite difference approximation used tocalculate the Hessian.

correspond roughly to those of the original model.

The modeling process gives insight into the key biological and experimental phenom-ena giving rise to the observed experimental measurements. First, manipulation of the initialconcentration of uninfected cells within the radius of the initial inoculum accounts for the am-plification of the intensity in the first three images of the time-series data. This effect has twopossible causes: either cells are damaged or removed when a hole is removed from agar at theinitiation of the experiment, or uninfected cells but not infected cells continue to grow dur-ing the first portion of the experiment. Second, the infection spread is well characterized byconsidering only extracellular species in the model development. We could have incorporatedintracellular infection events (transcription, translation, replication, and assembly of virus)into the model description, but the additional parameters necessary for this model would notbe justifiable for the given experimental data.

10.3 Propagation of VSV on DBT Cells

We now consider propagation of VSV on murine astrocytoma (DBT) cells. The first columnof images in Figure 10.5 presents a representative time course for the experiment; the full setof experimental images are available in the appendix. For this virus/host system, the imagesdemonstrate three prominent features: (1) the infection propagates unimpeded outward radi-ally for the first three images, (2) the intensity of the measurement amplifies from the first tothe third measurement, and (3) the infection spread is halted after the third image and the in-tensity of the measurement diminishes. This particular cell line is known to have an antiviralstrategy, namely the interferon signaling pathway. We now consider models to quantitativelycapture all of these features.

180

Time(hours)

DataReaction-Diffusion

Model

SegregatedModel, Fit 1

SegregatedModel, Fit 2

7

27

48

72

96

Figure 10.5: Comparison of representative experimental images to model fits for VSV prop-agation on DBT cells. The white scale bar in the upper left-hand corner of the experimentalimages is one millimeter.

10.3.1 Refinement of the Reaction-Diffusion Model

We refine the reaction-diffusion model proposed in the previous section to model this infec-tion. In addition to the extracellular species considered previously (virus, uninfected cells,infected cells, and dead cells), we also model interferon (without any distinction between thetypes α, β, and γ) and inoculated cells. Both virus and interferon are permitted to diffuse. We

181

account for the following reactions:

virus + uninfected cell k1−→ infected cell (10.7a)

infected cell k2−→ Y virus + dead cell (10.7b)

infected cell k3−→ infected cell + interferon (10.7c)

uninfected cell + interferon k4−→ inoculated cell (10.7d)

inoculated cell k5−→ inoculated cell + interferon (10.7e)

inoculated cell + virus k1−→ inoculated cell (10.7f)

infected cell + virus k1−→ infected cell (10.7g)

This reaction mechanism makes the following assumptions:

1. interferon binds to uninfected cells to form inoculated cells that are resistant to viralinfection,

2. super-infection of infected cells does not alter the yield of virus per infected cell, and

3. virus binds indiscriminately to uninfected, infected, and inoculated cells.

We again assume that the infection propagation is radially symmetric. The concentrations ofall species are then segregated by both time and radial distance, giving rise to the followinggoverning equations for the model:

∂cvir

∂t=

1r

∂

∂r

(Deff

virr∂cvir

∂r

)+Rvir (10.8a)

∂cifn

∂t=

1r

∂

∂r

(Deff

ifnr∂cifn

∂r

)+Rifn (10.8b)

∂cunc

∂t= Runc,

∂cinfc

∂t= Rinfc (10.8c)

∂cinoc

∂t= Rinoc,

∂cdc

∂t= Rdc (10.8d)

Deffvir = 2Dvir

1− φ2 + φ

, Deffifn = 2Difn

1− φ2 + φ

(10.8e)

φ = Ve(cunc + cinfc + cinoc) (10.8f)

dcvir

dr


= 0,dcifn

dr


= 0 (10.8g)

ci(t = 0, r) known (10.8h)

in which the reaction terms (e.g. Rvir) are dictated by the stoichiometry of reaction (10.7)assuming that the reactions are elementary as written. Additionally, the initial images of theinfection indicate a ring-like pattern in the intensity. We account for this phenomenon by esti-mating two parameters, c′unc,1 and c′unc,2, that determine the shape of the initial radial profile

182

for the uninfected cell concentration, i.e.

cunc(t = 0, r) =

c′unc,2, r < 0.025 cm

c′unc,2 −20(c′unc,2−c′unc,1)

cm (r − 0.025), 0.025 cm ≤ r < 0.075 cmc′unc,1, 0.075 ≤ r < 0.1 cm

cunc,0 −20(cunc,0−c′unc,1)

cm (r − 0.025) 0.1 cm,≤ r < 0.15 cmcunc,0, r > 0.15 cm

(10.9)

We estimate the optimal parameters using the same spatial discretization and nonlinear opti-mization as in the previous section.

The second column of images in Figure 10.5 present the optimal fits for this model. Incomparison to the experimentally obtained images, this model is able to capture quantitativelythe radial propagation of the infection front. However, the fit only qualitatively captures theincrease and decrease in the intensity of the experimental data. To better quantitatively cap-ture the temporal changes in this intensity, we propose incorporating the life cycle of infectedcells. We therefore segregate the infected cell population by the age of infection τ , and modelthe intracellular production rates of virus and interferon using first-order plus time delay ex-pressions, i.e.

rvir(τ) = Kvir [1− exp (−kvir(τ − dvir))] (10.10)

rifn(τ) = Kifn [1− exp (−kifn(τ − difn))] (10.11)

We also assume that infected cells cannot live longer than age τd, at which point these cellsdie. This model requires fitting of four more parameters than the reaction-diffusion model(seven additional parameters are required for the first-order plus time delay description, butthis description obviates the need for the virus yield Y and the rate constants k2 and k3). Theconsidered reactions now become:

virus + uninfected cell k1−→ infected cell (10.12a)infected cell −→ virus (age dependent) (10.12b)infected cell −→ infected cell + interferon (age dependent) (10.12c)




infected cell + virus k1−→ infected cell (all ages) (10.12g)

183

The model equations are then the following set of coupled integro-partial differential equa-tions

∂cvir

∂t=

1r

∂

∂r

(Deff

virr∂cvir

∂r

)+∫ τd

0cinfc(τ)rvir(τ)dτ +Rvir (10.13a)

∂cifn

∂t=

1r

∂

∂r

(Deff

ifnr∂cifn

∂r

)+∫ τd

0cinfc(τ)rifn(τ)dτ +Rifn (10.13b)

∂cunc

∂t= Runc,

∂cinfc

∂t+∂cinfc

∂τ= Rinfc (10.13c)

∂cinoc

∂t= Rinoc,

∂cdc

∂t= Rdc (10.13d)

Deffvir = 2Dvir

1− φ2 + φ

, Deffifn = 2Difn

1− φ2 + φ

(10.13e)

φ = Ve(cunc +∫ τd

0cinfcdτ + cinoc) (10.13f)

dcvir

dr


= 0,dcifn

dr


= 0 (10.13g)

dcinfc

dτ

∣∣∣∣τ=0

= k1cvircunc,dcinfc

dτ

∣∣∣∣τ=τd

= 0 (10.13h)

ci(t = 0, r) known (10.13i)

We discretize the age dimension using orthogonal collocation on Lagrange polynomials [155]with seventeen points, and use the same spatial discretization scheme as in the reaction-diffusion model.

The third and fourth columns of images in Figure 10.5 present the optimal fits for thismodel. In comparison to the experimentally obtained images, this model is able to capturequantitatively both the radial propagation of the infection front and the changes in the inten-sity of the experimental data. The optimization also yields two sets of parameters with similarfits and similar values of the objective function, but different values for the parameters. Mostovertly different are the estimates for the intracellular production rates of virus and interferon,which suggest two different mechanisms for up-regulation of the interferon pathway. Theseproduction rates are presented in Figure 10.6. In the first fit, the estimated maximum age ofinfected cells is roughly 26 hours, and the production of interferon lags significantly after theproduction of interferon. For the second fit, the estimated maximum age of infected cells isonly roughly 17 hours, and the production of interferon closely precedes the virus production.Additionally, the production rates in the second fit are approximately an order of magnitudelower than the production rates in the first fit.

10.3.2 Discussion

The models provide estimates for key parameters in the viral infection and host response. Inthis case, the three model fits only predict similar parameter values for the background fluo-

184

InterferonVirus

(a)

Infection Age (hours)

Prod

ucti

onR

ate

(#/h

our)

302520151050

200

180

160

140

120

100

80

60

40

20

0

InterferonVirus

(b)

Infection Age (hours)

Prod

ucti

onR

ate

(#/h

our)

181614121086420

35

30

25

20

15

10

5

0

Figure 10.6: Comparison of intracellular production rates of virus and interferon for the seg-regated model of VSV propagation on DBT cells.

rescence ibgd and the viral diffusivity Dvir. The remaining parameters are generally differentby at least an order of magnitude.

Ware et al. [160] estimate the diffusivity of the VSV virion to be 2.326 × 10−8 cm2/sec.Converting this value to cm2/hr and taking the log10 yields a value of −4.08. The estimatedvalues of this diffusivity, Dvir in Table 10.5, are all within an order of magnitude of this value.

Porterfield et al. [102] and Nichol and Deutsch [96] estimate the diffusivity of γ-interferonto be 7.4× 10−7 and 4.1× 10−7 cm2/sec, respectively. Converting these values to cm2/hr and

185

Reaction-Diffusion Segregated Fit 1 Segregated Fit 2Parameter Units log10 Value log10 Value log10 Value

k1 hr−1 −12.103 −9.816 −10.159k2 cm3/hr 4.941 NA NAk3 cm3/hr −8.258 NA NAk4 cm3/hr −8.181 −8.334 −11.890k5 cm3/hr 0.637 0.752 3.717Dvir cm2/hr −3.737 −3.406 −3.445Difn cm2/hr −2.938 −2.981 −0.990Y 3.841 NA NAibgd 1.577 1.573 1.571

km/Km −17.130 −16.571 −15.868kwash −6.131 −0.785 −0.942c′unc,1 cm−3 7.466 6.529 6.443c′unc,2 cm−3 7.602 7.426 7.428kvir hr−1 NA −0.197 0.727kifn hr−1 NA −0.387 −0.838Kvir hr−1 NA 2.021 1.479Kifn hr−1 NA 2.304 1.434dvir hr−1 NA 0.834 0.619difn hr−1 NA 1.283 0.583τd hr NA 1.416 1.228Φ 7.63× 105 6.35× 105 6.30× 105

Table 10.5: Parameter estimates for the VSV/DBT focal infection models. Parameters are es-timated for the log10 transformation of the parameters. NA denotes that the parameter is notapplicable for the given model.

taking the log10 yields values of−2.83 to−2.57, respectively. These values have the same orderof magnitude as the fits for the reaction-diffusion model and the first segregated fit. The sec-ond segregated fit predicts the diffusivity of interferon to be roughly two orders of magnitudegreater than either of the previously reported values.

The infection spread is not well characterized by considering only extracellular speciesin the model development. Incorporation of simple first-order plus time delay expressionsfor the production rates of virus and interferon leads to significantly improved quantitativeprediction of the given experimental data (roughly a 17% decrease in the objective functionΦ via the addition of four parameters). Additionally, the model fits suggest two differentpossible mechanisms for production of both virus and interferon. For VSV infection of Krebs-2 carcinoma cells [158] and mouse L cells [161], experimental studies place the first detectableamount of interferon between four and eight hours, respectively. These results suggest thatthe second segregated fit is more realistic than the first segregated fit.

186

log10 EigenvectorParameter v1 v2 v3 v4 v5 v6

k1 −0.499 −0.229 −0.561 0.045 0.153

k2 0.289 −0.591 −0.142 −0.165 0.162

k3 −0.026 0.033 −0.016 −0.946 0.067

k4 0.303 −0.223 −0.302 0.070 0.144

k5 0.053 −0.019 −0.190 −0.016 −0.058

Dvir 0.584 −0.160 −0.281 0.096 −0.199

Difn −0.189 −0.677 0.561 0.003 −0.092

Y 0.423 0.180 0.301 −0.072 0.005

ibgd −0.054 −0.033 0.018 0.005

km/Km −0.053 −0.154 0.026 0.044 −0.118

kwash 1

c′unc,1 −0.086 −0.061 0.009 0.069 −0.656

c′unc,2 0.038 0.008 0.224 0.219 0.653

Eigenvalue −2.83e7 −6.81e6 −4.80e6 −2.66e6 −3.37e5 1.02

log10 EigenvectorParameter v7 v8 v9 v10 v11 v12 v13

k1 0.086 0.067 −0.097 0.005 −0.142 −0.372 0.419

k2 −0.468 −0.449 0.154 −0.191 0.097 −0.017 0.013

k3 0.302 −0.033 −0.027 0.061 0.009 0.008 −0.010

k4 0.047 0.119 −0.243 0.767 −0.082 0.171 −0.199

k5 0.032 0.002 0.019 −0.297 −0.862 0.242 −0.254

Dvir 0.498 0.243 −0.032 −0.363 0.192 −0.083 0.141

Difn 0.145 0.316 −0.207 −0.015 −0.114 −0.054 −0.027

Y −0.191 0.018 −0.036 0.196 −0.392 −0.393 0.553

ibgd −0.013 −0.015 −0.126 −0.028 0.035 0.774 0.614

km/Km 0.215 0.052 0.899 0.274 −0.077 0.067 0.110

kwash

c′unc,1 0.244 −0.645 −0.187 0.182 −0.052 −0.066 0.040

c′unc,2 0.515 −0.445 −0.020 −0.037 −0.072 −0.016 0.009

Eigenvalue 1.89e6 6.81e6 2.09e7 3.61e7 1.22e8 4.15e8 1.27e9

Table 10.6: Hessian analysis for the parameter estimates of the reaction-diffusion VSV/DBTfocal infection model. Parameters are estimated for the log10 transformation of the parameters.Negative eigenvalues are likely due to error in the finite difference approximation used tocalculate the Hessian. Unreported values denote that the contribution of the parameter to theeigenvector is less than 5× 10−4.

Table 10.6 presents the Hessian analysis for the reaction-diffusion model. This anal-ysis indicates that roughly five linear combinations of parameters cannot be estimated fromthe experimental data. However, Figure 10.5 demonstrates that this model is not capable ofcapturing the infection dynamics, particularly the magnitude of the intensity.

Tables 10.7 and 10.8 present the Hessian analysis of the objective function Φ for the seg-regated model fits. Roughly five linear combinations of parameters yield negative eigenvaluesfor both fits, indicating that these parameter combinations cannot be estimated from the exper-imental data. This analysis indicates that the experimental measurements are not informative

187

log10 EigenvectorParameter v1 v2 v3 v4 v5 v6 v7 v8 v9

k1 0.635 −0.038 0.317 −0.239 −0.187 −0.097 −0.158

k4 −0.707

k5 −0.081 0.090 −0.322 0.047 −0.397 0.017 −0.643

Dvir −0.707

Difn −0.042 −0.124 0.402 −0.076 0.576 0.062 0.019

ibgd 0.014 0.006 0.003 −0.002 −0.001 −0.001 0.003

km/Km 0.376 0.497 −0.055 0.005 0.006 −0.021 0.174

kwash −0.025 −0.141 0.076 −0.042 −0.010 −0.002 −0.152

func,1 −0.044 −0.072 −0.070 0.172 0.060 −0.970 0.046

func,2 −0.105 −0.048 −0.047 0.033 0.027 0.075 −0.026

kvir 0.421 −0.468 −0.532 0.442 0.227 0.161 0.124

kifn −0.059 −0.079 0.419 0.571 −0.540 0.083 0.362

Kvir −0.216 −0.588 0.197 −0.111 −0.099 −0.009 −0.209

Kifn −0.059 −0.233 −0.297 −0.595 −0.318 −0.029 0.541

dvir 0.448 −0.270 0.174 −0.113 −0.120 −0.061 −0.143

difn 0.707

τd −0.707

Eigenvalue −2.27e14 −7.08e13 −1.67e5 −6.18e4 −2.61e3 4.32e3 1.20e4 2.07e4 7.33e4

log10 EigenvectorParameter v10 v11 v12 v13 v14 v15 v16 v17

k1 0.018 −0.116 0.066 −0.511 −0.281 0.086

k4 −0.707

k5 −0.436 −0.333 −0.030 0.039 0.044 −0.021

Dvir 0.707

Difn −0.528 −0.436 0.002 0.047 0.074 −0.031

ibgd −0.003 −0.004 −0.013 −0.025 0.369 0.928

km/Km −0.076 −0.136 0.611 −0.389 0.162

kwash 0.056 0.058 0.948 0.205 0.012 0.015

func,1 −0.026 −0.096 0.014 −0.005 0.002

func,2 0.639 −0.753 −0.002 0.012 −0.010 0.005

kvir −0.105 −0.044 0.049 −0.058 −0.099 0.038

kifn −0.169 −0.156 0.048 0.008 0.034 −0.012

Kvir 0.052 0.113 −0.227 0.328 −0.527 0.221

Kifn −0.227 −0.211 0.067 0.011 0.047 −0.016

dvir 0.130 0.042 −0.185 0.454 0.577 −0.226

difn 0.707

τd 0.707

Eigenvalue 3.19e5 4.93e5 6.32e5 1.44e7 9.39e7 8.87e8 7.08e13 2.27e14

Table 10.7: Hessian analysis for the parameter estimates of the first segregated VSV/DBT focalinfection model. Parameters are estimated for the log10 transformation of the parameters.Negative eigenvalues are likely due to error in the finite difference approximation used tocalculate the Hessian. Unreported values denote that the contribution of the parameter to theeigenvector is less than 5× 10−4.

188

log10 EigenvectorParameter v1 v2 v3 v4 v5 v6 v7 v8 v9

k1 0.074 0.412 0.363 −0.016 0.001 0.050 −0.015 −0.023 0.019

k4 0.030 −0.404 0.420 −0.495 0.048 0.063 0.022 0.002 0.005

k5 −0.039 0.147 −0.163 0.537 −0.192 0.285 −0.023 −0.022 0.145

Dvir 0.086 −0.097 0.204 −0.003 −0.010 0.046 0.016 0.005 −0.014

Difn 0.409 −0.022 −0.048 0.000 −0.001 −0.003 −0.002 0.033 0.001

ibgd −0.003 0.016 −0.009 −0.006 0.000 −0.001 0.001

km/Km −0.050 0.553 −0.224 −0.440 0.027 0.006 0.078 0.007 −0.056

kwash 0.011 0.081 0.140 0.323 −0.033 0.018 −0.076 −0.035 −0.059

func,1 0.005 −0.036 0.018 0.121 0.013 −0.094 0.982 0.031 −0.060

func,2 0.001 −0.028 0.010 0.100 −0.027 0.029 −0.069 −0.003 −0.980

kvir 0.411 −0.058 −0.216 −0.014 0.466 0.278 0.038 −0.670 −0.008

kifn 0.689 −0.014 −0.242 −0.024 −0.003 −0.121 −0.036 0.550 −0.005

Kvir 0.157 −0.320 0.230 0.265 −0.033 0.029 −0.028 0.017 0.070

Kifn −0.213 0.066 0.090 0.199 0.859 −0.110 −0.053 0.348 0.009

dvir 0.294 0.432 0.581 0.063 −0.012 0.127 0.036 −0.025 −0.000

difn −0.105 −0.070 −0.056 −0.125 0.025 0.881 0.086 0.347 −0.019

τd 0.024 −0.130 0.204 0.107 −0.002 0.014 −0.025 −0.010 0.017

Eigenvalue −5.04e5 −2.39e5 −1.38e5 −5.37e4 −1.35e2 9.32e3 1.21e4 2.66e5 1.08e5

Eigenvectorlog10 Parameter v10 v11 v12 v13 v14 v15 v16 v17

k1 −0.268 −0.379 0.306 0.098 −0.289 0.494 0.177 0.095

k4 −0.149 −0.067 −0.146 −0.037 −0.475 −0.186 −0.280 −0.134

k5 0.143 −0.293 −0.269 −0.060 −0.487 −0.157 −0.247 −0.116

Dvir 0.103 −0.145 −0.733 −0.246 0.262 0.474 0.060 0.101

Difn 0.014 0.013 −0.255 0.873 0.003 −0.000 0.000

ibgd 0.001 0.003 0.052 0.017 0.008 −0.033 −0.472 0.878

km/Km 0.093 0.220 −0.330 −0.075 −0.317 −0.190 0.313 0.168

kwash −0.754 0.483 −0.211 −0.051 −0.022 −0.075 −0.002 0.011

func,1 −0.056 −0.010 0.011 0.002 −0.028 0.001 −0.001 0.000

func,2 0.084 −0.069 0.010 0.004 −0.090 −0.008 −0.016 −0.005

kvir −0.059 −0.031 0.059 −0.160 −0.012 0.012 0.010 0.006

kifn −0.125 −0.089 0.082 −0.330 −0.062 −0.020 −0.033 −0.015

Kvir 0.372 0.436 0.124 −0.045 −0.373 0.129 0.424 0.239

Kifn 0.065 −0.020 −0.088 0.068 −0.080 −0.026 −0.042 −0.019

dvir 0.312 0.179 0.056 −0.085 0.267 −0.287 −0.228 −0.139

difn −0.064 0.080 0.095 0.062 0.153 0.051 0.082 0.037

τd −0.149 −0.465 −0.076 −0.017 0.166 −0.569 0.511 0.263

Eigenvalue 1.85e5 4.74e5 2.09e6 2.74e6 8.68e6 5.20e7 1.37e8 9.60e8

Table 10.8: Hessian analysis for the parameter estimates of the second segregated VSV/DBTfocal infection model. Parameters are estimated for the log10 transformation of the parameters.Negative eigenvalues are likely due to error in the finite difference approximation used tocalculate the Hessian. Unreported values denote that the contribution of the parameter to theeigenvector is less than 5× 10−4.

189

Time(hours)

DataSegregated Model, Fit

1Segregated Model, Fit

2

24

48

96

144

Figure 10.7: Comparison of representative experimental images to model predictions for VSVpropagation on DBT cells in the presence of interferon inhibitors. The white scale bar in thelower left-hand corner of the experimental images is one millimeter.

enough to distinguish between these different mechanisms.

10.3.3 Model Prediction: Infection Propagation in the Presence of Interferon In-hibitors

To validate the model, we compare model predictions of the infection propagation in the pres-ence of interferon inhibitors to experimentally-obtained images. We assume that the dosing ofinterferon inhibitor is sufficiently large to completely inhibit production of interferon. Accord-ingly, we set the constants k5 and K2 corresponding to interferon production from inoculatedcells and the production rate of interferon in infected cells to zero.

Figure 10.7 compares the results for the experimental data with the segregated modelpredictions. In both cases, the models over-predict the radial propagation of the infection frontfor the latter two time points. Additionally, the first segregated model predicts even fartherpropagation of the infection front than the second segregated model. The most likely expla-nation for the deviations between the data and predictions is that the dosing of the interferoninhibitor is not large enough to completely eliminate the host antiviral response.

190

10.4 Conclusions

We have used quantitative models to investigate the dynamics of multiple rounds of viralinfection and host antiviral response for the focal infection system. For the VSV/BHK virus-host system, extracellular models capture the salient features contained in the measurements,namely unimpeded radial propagation of the infection front as well as amplification of thesignal in the initial data points. The model suggests that an initial condition effect for theuninfected cell concentration is necessary to capture the latter feature of the data. This effectmay result from the experimental technique used to initiate the viral infection.

For the VSV/DBT virus-host system, the data initially behaves similarly to the VSV/BHKsystem (outward radial propagation of the infection front and amplification of the signal), butthen the infection front stagnates and the signal strength diminishes. This stagnation occursdue to the host antiviral mechanism of interferon signaling. The proposed extracellular modelis not capable of capable of quantitatively capturing the measurement dynamics. Refining themodel by introducing an age segregation significantly improves the data fit. Here, we usesimple first-order plus time delay dynamics to model both the production rates of interferonand virus. Consequently, the model fit suggests a rough estimate for intracellular productionrates of these species. However, the data are not informative enough to uniquely determineall of the parameters in the model as evidenced by both the Hessian analysis and the fact thattwo sets of parameters fit the data equally well.

We also compared segregated model predictions with no interferon production to ex-periments of the VSV/DBT system dosed with interferon inhibitors. The model predictionsoverestimated the radial propagation of the infection front. This over-prediction likely resultsfrom incomplete inhibition of interferon production.

This work serves as a first step in providing a quantitative understanding of multiplerounds of both viral infection and host antiviral response. Also, comparing model predictionsto experimental measurements requires modeling of both the underlying biology of the systemand the experimental procedure. Additional experimental measurements such as microarraydata or using reporter genes to detect interferon up-regulation should provide further con-straints to the developed model and necessitate future model modification. We expect futureiterations of additional experiments, measurements, and modeling to elucidate an even bettercomprehensive understanding of both viral infections and cell-cell signaling.

Notation

cj concentration of species jc′unc initial concentration of uninfected cells in the radius of the initial inoculum for the VSV/BHK-

21 fitc′unc,1 initial concentration of uninfected cells in the first radial region of the initial inoculum for

the VSV/DBT fitc′unc,2 initial concentration of uninfected cells in the second radial region of the initial inoculum

191

for the VSV/DBT fitDifn interferon diffusivityDeff

ifn effective interferon diffusivityDvir virus diffusivityDeff

vir effective virus diffusivitydj time delay for reaction je error vectorh(xk;θ) model prediction vector of the measurementibgd background fluorescenceKj rate constant for reaction jKm equilibrium constant for the measurementkj rate constant for reaction jkm conversion constant from virus-host concentration to intensitykwash fraction of dead cells removed during the measurement processnunc,0 initial number of uninfected cellsnvir,0 number of viruses in the initial inoculumR weighting matrix for parameter estimationRj production rate of species jrj intracellular production rate of species jr radial dimensionrplate radius of the platet timeVc cell volumevmin minimum detectable virus-host concentrationvmax maximum detectable virus-host concentrationx state vectorY virus yield per infected celly measurement vectorym intensity measurementΦ objective function value for parameter estimationφ correction to the diffusivity for hindered diffusionτ age of infectionτd maximum age of infectionθ vector of model parameters

Subscripts

dc dead cellifn interferoninfc infected cellinoc inoculated cellunc uninfected cell

192

vir virusvir-host virus-host complex

193

10.5 Appendix

18 hours 30 hours 48 hours 72 hours 90 hours

Figure 10.8: Experimental (averaged) images obtained from the dynamic propagation of VSVon BHK-21 cells. The white scale bar in the lower left-hand corner of the experimental imagesis one millimeter.

194

7 hours 27 hours 48 hours 72 hours 96 hours

Figure 10.9: Experimental (averaged) images obtained from the dynamic propagation of VSVon DBT cells. The white scale bar in the lower left-hand corner of the experimental images isone millimeter.

195

Chapter 11

Multi-level Dynamics of ViralInfectionsOne of the simplest, yet most intriguing biological organisms is the virus. The virus containsenough genetic information to replicate itself given the machinery of a living host. So powerfulis this strategy that viral infections are at once a threat to and a hope for human survival. Ac-cording to the Joint United Nations Programme on HIV/AIDS (UNAIDS) in 2002, 42 millionpeople were living with human immunodeficiency virus (HIV), 5 million people were newlyinfected with HIV, and 3.1 million people died due to acquired immune deficiency syndrome(AIDS) related illnesses. At the same time, viruses show promise in anti-tumor therapies asoncolytic agents [9] and as delivery vehicles for gene therapy [95]. The common thread be-tween these two examples is that controlling the propagation of virus spread is essential, anddoing so first requires understanding of how viruses propagate. Mathematical models offerone means of quantitatively understanding how viruses propagate, and how to best controlthis propagation. In particular, models can serve as a beneficial tool in proposing, identifying,and distinguishing between key biological and experimental phenomena contained in data.

Most mathematical models for viral infections have focused exclusively on events in ei-ther the intracellular or extracellular level. At the intracellular level, kinetic models have beenapplied to examine the dynamics of how viruses harness host cells to replicate more virus [73,27, 29, 3], and how drugs targeting specific virus components affect this replication [122, 30].These models, however, consider only one infection cycle, whereas infections commonly con-sist of numerous infection cycles. At the extracellular level, researchers have considered howdrug therapies affect the dynamics of populations of viruses [164, 62, 98, 13, 100]. These mod-els, though, neglect the fact that these drugs target specific intracellular viral components. Tomore realistically model these infections, we recently proposed incorporating both levels ofinformation into the description in a deterministic setting via cell population balances [60].

In this chapter, we consider a limiting case of this general model in which informationflows unidirectionally from the intracellular level to the extracellular level. In this case, it ispossible to decouple the intracellular and extracellular levels such that one can first solve theequations governing the intracellular description of the model, then use these results to solvethe extracellular description of the model. We first briefly review the general cell population

196

balance modeling approach for viral infections. We then introduce the idea of decoupling theintracellular and extracellular descriptions. Two motivating examples illustrate the efficacy ofthis technique. Finally, we discuss the results and present conclusions.

11.1 Modeling Framework

We consider population balance models containing an arbitrary number of internal segrega-tions. One can readily extend these models to include external (i.e., spatial) segregations as isconsidered in the second example presented in this chapter. The resulting segregated modelis then

∂η(t,y)∂t

+∇ · (η(t,y)vy) = Rη (11.1a)

∂cij(t,y)∂t

+∇ ·(cij(t,y)vy

)= Rj + Ej j = 1, . . . , n (11.1b)

∂ck∂t

= Ek +∫yη(t,y)Rk(t,y)dy k = 1, . . . ,m (11.1c)

in which η(t,y)dy, cij(t,y)dy, and ck are the concentrations of infected cells, intracellular com-ponents, and extracellular components respectively; y is a vector of all the internal segrega-tions; vy is the velocity vector for each of the y components; andRj andEj are the intracellularand extracellular reaction rates for species j, respectively.

We focus our attention on the intracellular reaction set, i.e., equation (11.1b). If we canremove the time dependence for this set of equations, then the production rate term Rk(t,y)also becomes time independent. In this case, the intracellular reactions

∇ ·(cij(t,y)vy

)= Rj + Ej j = 1, . . . , n (11.2)

effectively decouple from the extracellular reactions and population balance

∂η(t,y)∂t

+∇ · (η(t,y)vy) = Rη (11.3a)

∂ck∂t

= Ek +∫yη(t,y)Rk(y)dy k = 1, . . . ,m (11.3b)

Consequently, we may first solve the intracellular equations (11.2) to determine the nowtime-independent production rate term Rk(y), then use this term to solve the remaining equa-tions (11.3). The primary benefit of this decomposition is the potential for significant reduc-tions in both computational expense and the complexity of the resulting systems of equations.We illustrate these claims in the examples.

What then are the biological assumptions that we must make to validate this decompo-sition? The decomposition clearly requires that the time-dependent extracellular descriptionhave little or no interaction with the intracellular events. The most restrictive assumption,

197

then, is that each host cell is infected by identically the same virus (i.e., identical initial condi-tions for each infected cell), and that the infected cell may affect the extracellular environmentbut not vice versa. A less restrictive assumption would permit variation in the initial con-dition, but at the expense of requiring more substantial simulation of the intracellular equa-tions (11.2). Accounting for more extensive interaction from the extracellular to intracellulardescriptions, such as super-infection of infected cells, requires solving the full model (11.1).

11.2 Examples

In this section, we consider two examples that illustrate the efficiency of the proposed de-composition. First, we re-examine the model previously presented by Haseltine, Rawlings,and Yin [60]. Then we develop a multi-level model describing the focal infection of murineastrocytoma (DBT) cells by vesicular stomatitis virus (VSV) [77].

11.2.1 Initial Infection for a Generic Viral Infection

We reconsider the initial infection example of Haseltine, Rawlings, and Yin [60]. This modelconsiders intracellular species of genomic (gen) and template (tem) viral nucleic acids respec-tively, viral structural protein (str), and viral enzymes V1 and V2. Intracellular reactions in-clude:

nucleotides + genki1−→

V1

tem a1 = k i1 ci

V1ci

gen (11.4a)

amino acidski2−→

V2, temstr a2 = k i

2 ciV2

citem (11.4b)

nucleotideski3−→

temgen a3 = k i

3 citem (11.4c)

strki4−→ degraded a4 = k i

4 cistr (11.4d)

gen + strki5−→ secreted virus a5 = k i

5 cigenc

istr (11.4e)

Reaction rates are given by the aj expressions. These events account for the insertion of theviral genome into the host nucleus, production of a viral template used to replicate the viralgenome and mass-produce viral structural protein, and the assembly and secretion of viralprogeny. We assume that host nucleotides and amino acids are available at constant concen-trations.

Extracellularly, the model tracks uninfected host cells (unc), infected host cells (infc),

198

Parameter Value Unitsτd 100 dayski

1 3.13× 10−4 cell/(#-day)ki

2 25.0 cell/(#-day)ki

3 0.7 day−1

ki4 2.0 day−1

ki5 7.5× 10−6 cell/(#-day)k6 5.0× 10−9 host/(#-day)k7 1.0 day−1

k8 5.0× 10−2 day−1

k9 1.0× 10−2 day−1

k10 1.0× 106 #/(host-day)cunc(t = 0) 108 #/hostcvir(t = 0) 1000 #/hostci

gen(τ = 0) 1 #/cellci

V1(τ = 0) 80 #/cell

ciV2

(τ = 0) 40 #/cell

Table 11.1: Model parameters for the initial infection simulation.

and virus (vir) for the reactions

virus + uninfected cell k6−→ infected cell a6 = k6 cvircunc (11.5a)

virus k7−→ degraded a7 = k7 cvir (11.5b)

infected cell k8−→ death a8 = k8 cinfc (11.5c)

uninfected cell k9−→ death a9 = k9 cunc (11.5d)

precursors k10−→ uninfected cell a10 = k10 (11.5e)

These events address the intuitive notions of cell growth, death, and infection by free virus. Inthis description, infected cells are segregated by the age of infection τ , and it is assumed that

199

all infected cells die by the maximum age τd. The model equations for this system are then

∂η(t, τ)∂t

+∂η(t, τ)∂τ


∂item(t, τ)∂t

+∂item(t, τ)

∂τ= Rtem (11.6b)

∂igen(t, τ)∂t

+∂igen(t, τ)

∂τ= Rgen + δ(τ) (11.6c)

∂istr(t, τ)∂t

+∂istr(t, τ)

∂τ= Rstr (11.6d)

∂iV1(t, τ)∂t

+∂iV1(t, τ)

∂τ= 80δ(τ) (11.6e)

∂iV2(t, τ)∂t

+∂iV2(t, τ)

∂τ= 40δ(τ) (11.6f)

deunc

dt= k10 − k9eunc − k6euncevir (11.6g)

devir


∫ τd

0η(t, τ)Rvir(τ)dτ (11.6h)

Table 11.1 presents the initial conditions and rate constants used for this simulation. Theseparameters are the same as those used in Chapter 10.

For this example, the intracellular events decouple from the population balance (11.6a)in equation (11.6) due to the fact that the governing equations for the intracellular species donot depend on time. Consequently, we can first solve for the intracellular age distribution, i.e.

∂item(τ)∂τ

= Rtem (11.7a)

∂igen(τ)∂τ

= Rgen + δ(τ) (11.7b)

∂istr(τ)∂τ

= Rstr (11.7c)

∂iV1(τ)∂τ

= 80δ(τ) (11.7d)

∂iV2(τ)∂τ

= 40δ(τ) (11.7e)

and subsequently for the population-level dynamics

∂η(t, τ)∂t

+∂η(t, τ)∂τ


deunc

dt= k10 − k9eunc − k6euncevir (11.8b)

devir


∫ τd

0η(t, τ)Rvir(τ)dτ (11.8c)

Rvir(τ) = ki5c

igen(τ)ci

str(τ) (11.8d)

200

To solve the coupled sets of integro-partial differential equations, we use orthogonalcollocation of Lagrange polynomials on finite elements [155]. When t < τd, we use a singleelement with the transformation

τ =τ

t

For this case, the population balance becomes, by application of the chain rule,

∂η(t, τ)∂t

+∂η(t, τ)∂τ

=∂η(t, τ)∂t

+∂η(t, τ)∂τ

∂τ

∂t+∂η(t, τ)∂τ

∂τ

∂τ(11.9)

=∂η(t, τ)∂t

+1− τt

∂η(t, τ)∂τ

(11.10)

Similar expressions can be derived for each of the segregated species. When t > τd, we usetwo elements with the transformations

τ1 =τ − t+ τdτmin − t+ τd

τ2 =τ

t− τmin

Transformation of the population balance and segregated intracellular species follows simi-larly as before by application of the chain rule. Also, continuity between elements is enforced,i.e.

τ1|τmin= τ2|τmin

We use twenty-five collocation points for each finite element. The discretized PDEyields a system of differential-algebraic equations, which we solve using the package DASKR,a variant of the predictor-corrector solver DASPK [15]. For the decoupled system, we firstsolve for the intracellular reactions, i.e., equations (11.7), over the age range [0, τd] using onehundred evenly-incremented time points. Virus production rates required by the cell popu-lation balance, i.e., equation (11.8), are calculated from this intracellular information by linearinterpolating between the time points.

Figure 11.1 plots the full and decoupled model solutions for all extracellular species.The decoupled solution provides results indistinguishable from the full solution, but at roughlyone third the computation expense (15.6 CPU seconds versus 45.2 CPU seconds on a 1.6 GHzIntel Centrino processor). The majority of the decrease in computational expense is directlyattributable to the reduced size of the state vector; the decoupled solution requires only dis-cretization of the cell population balance, whereas the full solution also requires discretizationof all intracellular species. Because the predictor-corrector method employed by DASKR re-quires the solution of a Newton iteration, an operation that scales cubically with the size of thestate, we expect more dramatic decreases in computational expense for the decoupled solutionas the number of intracellular species increases. Additionally, formulating the problem in thismanner significantly reduces the potential for discretization problems due to stiffness in theintracellular components.

201


infected cells

(a)

Time (Days)

Extr

acel

lula

rC

ompo

nent

s(×

10−

7#/

host

)

6005004003002001000

20

18

16

14

12

10

8

6

4

2

0

Infected CellsUninfected Cells

Virus

(b)

Time (Days)

Perc

entE

rror

6005004003002001000

0.006

0.004

0.002

0

-0.002

-0.004

-0.006

-0.008

Figure 11.1: (a) Comparison of the full (lines) and decoupled (points) model solutions for theinitial infection example. (b) Percent error for the decoupled model solution, assuming the fullsolution is exact.

11.2.2 VSV/DBT Focal Infection

In this section, we incorporate intracellular events corresponding to viral infection and subse-quent host-cell response for VSV infection of DBT cells. VSV is a member of the Rhabdoviridaefamily consisting of enveloped RNA viruses [129]. Its compact genome is only approximately12 kb in length, and encodes genetic information for five proteins: nucleoprotein (N), phos-phoprotein (P), matrix (M), glycoprotein (G), and large protein (L) [129]. Recently, researchers

202

VSV

! !! !! !! !! !

" "" "" "" "

####

baa

b ba

IFN

a

bb

a

b a

Interferon signaling occurs morerapidly than virus propagation.

IFN-α/β genes IFN-α/β genesRNP

dsRNA

L

mRNALmRNAM

M L

Infected Cell Uninfected Cell

RNP

Figure 11.2: Schematic of modeled events for the infection of DBT cells by VSV. Infection of ahost cell begins with insertion of the viral ribonucleoprotein (RNP) and polymerase (L). Thepolymerase reversibly binds to the RNP to form a double-stranded RNA complex (dsRNA),which serves as the template for viral transcription and replication. The model assumes thatthe matrix (M) and L proteins limit virus growth, and explicitly accounts for transcription andtranslation of these proteins. The M and L proteins combine with the RNP to form progenyviruses, which are secreted from infected cells. Additionally, the viral dsRNA induces up-regulation of the host interferon (IFN) genes, leading to production of IFN which is also se-creted from infected cells. The model allows secreted VSV and IFN to compete for uninfectedcells: if VSV binds first, then the infection cycle starts again; if IFN binds first, up-regulation ofhost IFN genes leads to an inoculated state in which viral RNP is immediately degraded uponentry into the cell.

have begun investigating the potential of using VSV as an oncolytic agent for anti-tumor ther-apies (see [44], for example). However, the host antiviral strategy of interferon signaling cansubstantially limit the propagation of the infection. To maximize the therapeutic benefit ofthis agent, then, better understanding of multiple rounds of viral propagation and host an-tiviral response are needed. The focal infection system proposed by Duca et al. [26] providesone in vitro platform for investigating these dynamics. We have already considered several

203

mRNAIFN

baa

b baIFN

IRF

dsRNA

IRFP

Intracellular

Extracellular

PRD

Figure 11.3: Detailed schematic of modeled events for the up-regulation of interferon (IFN)genes. Presence of viral double-stranded RNA (dsRNA) leads to the phosphorylation ofthe interferon-response factor (IRF to IRFP). IRFP reversibly binds to the protein regulatorydomain (PRD) of the interferon gene to up-regulate synthesis of interferon messenger RNA(mRNAIFN). Translation of mRNAIFN produces interferon, which is secreted from the cell.

simple dynamic models for this system given only measurements on an extracellular level inChapter 10. Here we consider the potential for incorporating intracellular measurements bydeveloping a model that contains intracellular structure and captures the spatial spread of theinfection.

For model development, we make the following assumptions for this system:

1. Virus replication is limited by large (L) and matrix (M) protein production.

2. Interferon genes are up-regulated by detection of the viral double-stranded ribonucleicacid (dsRNA) species.

3. Interferon up-regulation in inoculated cells occurs over a significantly faster time scalethan events associated with viral infection (transcription, translation, and replication).

4. Antiviral mechanisms in inoculated cells destroy viral ribonucleoproteins (RNPs) imme-diately upon entry into the cell.

These assumptions greatly simplify the biological complexity of both viral replication and hostantiviral response, as is discussed in greater detail in the discussion of intracellular eventsleading to viral replication and up-regulation of the interferon signaling pathway next. Suchdetail could readily be incorporated into future developments of this model.

Intracellular Viral Replication

For the intracellular model of viral replication, we start with the previous model developmentfor this infection of Lim, Lang, and Yin (in preparation), using many of the same reactionexpressions and rate constants. The primary differences between this previous work and themodel derived here are that we track only the large and matrix proteins as opposed to all five

204

VSV proteins, and we use a more detailed model for assembly of the virus. Consequently, weincorporate the following steps leading to virus replication:

1. Virus binds to a host cell and inserts a single viral ribonucleoprotein (RNP) and fiftypolymerases (L protein).

2. The polymerase (L) reversibly binds to the ribonucleoprotein (RNP) to form the RLspecies, which serves as the template for both transcription and replication.

3. For virus replication, we neglect formation of positive-strand RNA and assume thatpackaging of negative-strand RNA by nucleoprotein occurs instantaneously.

4. Transcription proceeds processively from the 5’ to the 3’ end of the ribonucleoprotein.Since the gene order for the studied VSV is N-P-M-G-L, transcription yields the messen-ger ribonucleoprotein for the matrix (mRNAM) first. At this point, the polymerase mayeither dissociate from the template (this polymerase-template complex is denoted RL1),or continue to transcribe the mRNA for the large protein (mRNAL).

5. The rate of translation depends solely on the concentrations of the mRNA species (con-centrations of amino acids and host ribosomes are not rate limiting).

6. Assembly of the virus capsid begins with construction of the matrix core. We model thisconstruction as a polymerization-like process initiated by the fusion of two M proteinsand sequential addition of M proteins. This sequential addition is approximated as a ten-step process. Given the completed matrix core Mfull, assembly continues with packagingof first the ribonucleoprotein (to form the MR complex) and then 50 L proteins to form avirus that is secreted from the cell.

Reaction (11.11) accounts for all of these events. All reactions are elementary as written unlessspecified otherwise by reaction rate aj . Values for all intracellular rate constants (ki’s) andnonzero initial conditions are given in Table 11.2.

205

RNPki1−→ degraded (11.11a)

RNP + Lki2−−

ki−2

RL (11.11b)

RLki3−→ RL1 + mRNAM (11.11c)

RL1ki4−→ RNP + L (11.11d)

RL1ki5−→ RNP + L + mRNAL (11.11e)

RLki6−→ 2RNP + L (11.11f)

mRNAMki7−→ mRNAM + M (11.11g)

mRNALki8−→ mRNAL + L (11.11h)

mRNAMki9−→ degraded (11.11i)

mRNALki10−→ degraded (11.11j)

Mki11−→ degraded (11.11k)

Lki12−→ degraded (11.11l)

2Mki13−→ M2 (11.11m)

Mj + 182.2Mki14−→ Mj+1 a14 = k i

14 cMjcM (11.11n)

Mfull + Rki15−→ MR (11.11o)

50L + MRki16−→ secreted virus a16 = k i

16 cLcMR (11.11p)

The infected cell in Figure 11.2 presents a brief overview of these modeled intracellularevents.

Intracellular Host Antiviral Response

Figure 11.3 presents the modeled events for the host antiviral response, namely up-regulationof the interferon pathway, in greater detail. These events reflect a substantially simplified pic-ture of interferon signaling recently reviewed in [133, 54]). Initiation of this response occurswhen the cell recognizes viral double-stranded RNA species, leading to phosphorylation ofan interferon-response factor (IRF). Here we assume that the cell initially has twenty copiesof IRF. The phosphorylated species, IRFP, binds to the positive regulatory domain (PRD) ofthe interferon gene to form the induced complex IP, which expresses interferon. Interferon

206

Parameter ValueL0 50 #/(infected cell)

RNP0 1 #/(infected cell)ki

1 10−0.265 hr−1

ki2 101.663 cell/hr

ki−2 100.369 hr−1

ki3 100.529 hr−1

ki4 101.428 hr−1

ki5 100.319 hr−1

ki6 100.177 hr−1

ki7 101.796 hr−1

ki8 101.031 hr−1

ki9 10−0.065 hr−1

ki10 10−0.265 hr−1

ki11 10−0.227 hr−1

ki12 10−0.902 hr−1

ki13 10−2.683 cell/hrki

14 10−0.823 cell/hrki

15 10−1.525 cell/hrki

16 100.813 cell/hr

Table 11.2: Initial conditions and rate constants for the intracellular reactions of the VSV infec-tion of DBT cells.

secreted from the cell can diffuse radially and bind to uninfected cells to form inoculated cellsthat are resistant to viral infection (i.e., inserted viral ribonucleoprotein is degraded immedi-ately). Figure 11.2 illustrates this inoculation.

This model significantly reduces the complexity of the interferon response. For exam-ple, IFN-β-induced protein kinase R and IFN-α-induced 2’-5’ oligoadenylate synthetase re-spectively inactivate protein synthesis by inactivation of the eukaryotic protein synthesis ini-tiation factor eIF-2 and destroy mRNA through activation of the cellular endonuclease RNaseL [162, 135]. Our simpler model does not distinguish between α, β, and γ interferons, nor doesit account explicitly for eIF-2 or RNase L. Consequently, the model treats cellular antiviral re-sponse as all or none: either cells are susceptible to viral infection, or they are resistant due tointerferon inoculation.

207

Parameter ValueIRF0 20 #/cellki

17 101.597 cell/hrki

18 101.294 cell/hrki−18 103.451 hr−1

ki19 102.329 hr−1

ki20 10−0.294 hr−1

ki21 10−0.636 hr−1

ki22 10−1.303 hr−1

Table 11.3: Initial conditions and rate constants for the reactions describing the intracellularhost antiviral response of the VSV infection of DBT cells.

The intracellular reactions considered in this example are

RL + IRFki17−→ RL + IRFP (11.12a)

RL1 + IRFki17−→ RL1 + IRFP (11.12b)

IRFP + PRDki18−−

ki−18

IP (11.12c)

IPki19−→ IRFP + PRD + mRNAIFN (11.12d)

mRNAIFNki20−→ mRNAIFN + IFN (11.12e)

mRNAIFNki21−→ degraded (11.12f)

IFNki22−→ secreted (11.12g)

All reactions are elementary as written. Values for the required rate constants (ki’s) are givenin Table 11.3.

11.2.3 Model Solution

For the given assumptions, the intracellular events decouple from the extracellular events.Accordingly, the model equations for the intracellular events are of the form

∂cijdτ

= Rj (11.13)

in which cij is the intracellular concentration of the jth species, τ is the infected cell age, andRj

is the production rate of the jth species. We solve this ODE system using the package DASKRwith an time step of 0.1 hours.

208

Parameter Symbol ValueCell volume Vc 3.4× 10−9 ml

Initial number of uninfectedcells

nunc,0 106 cells

Number of viruses in theinitial inoculum

nvir,0 8.0× 104 viruses

Radius of the plate rplate 1.75 cmInfection rate constant k1 10−10.159 cm3/hr

Inoculation rate constant k2 10−11.890 cm3/hrInterferon production rate

constantk3 103.717 cm3/hr

Virus diffusivity Dvir 10−3.445 cm2/hrInterferon diffusivity Difn 10−0.990 cm2/hr

Background fluorescence ibgd 101.571

Measurement rate constant km/Km 10−15.868

Fraction of dead cellsremoved during the

measurement processkwash 10−0.942

Initial concentration ofuninfected cells

c′unc,1 106.443 cm−3

Initial concentration ofuninfected cells

c′unc,2 107.428 cm−3

Maximum age of infection τd 101.228 hr

Table 11.4: Extracellular model parameters for the infection of DBT cells by VSV.

Extracellular reactions considered by this example are

virus + uninfected cell k1−→ infected cell (11.14a)infected cell −→ virus (age dependent) (11.14b)infected cell −→ infected cell + interferon (age dependent) (11.14c)




infected cell + virus k1−→ infected cell (all ages) (11.14g)

The model equations are then the following set of coupled integro-partial differential equa-

209

tions

∂cvir

∂t=

1r

∂

∂r

(Deff

virr∂cvir

∂r

)+∫ τd

0cinfc(τ)Ri

vir(τ)dτ +Rvir (11.15a)

∂cifn

∂t=

1r

∂

∂r

(Deff

ifnr∂cifn

∂r

)+∫ τd

0cinfc(τ)Ri

ifn(τ)dτ +Rifn (11.15b)

∂cunc

∂t= Runc,

∂cinfc

∂t+∂cinfc

∂τ= Rinfc (11.15c)

∂cinoc

∂t= Rinoc,

∂cdc

∂t= Rdc (11.15d)

Deffvir = 2Dvir

1− φ2 + φ

, Deffifn = 2Difn

1− φ2 + φ

(11.15e)

φ = Ve(cunc +∫ τd

0cinfcdτ + cinoc) (11.15f)

dcvir

dr


= 0,dcifn

dr


= 0 (11.15g)

dcinfc

dτ

∣∣∣∣τ=0

= k1cvircunc,dcinfc

dτ

∣∣∣∣τ=τd

= 0 (11.15h)

ci(t = 0, r) known (11.15i)

The production rates of virus and interferon are approximated using linear interpolation be-tween time points for the intracellular model results. We discretize the spatial dimension usingcentral differences with an increment of 0.025 cm. We assume that cells are spherical objects,with the height of the cell monolayer equal to the resulting cell diameter. Concentrations forall species are calculated assuming that the volume of the monolayer is cylindrical. The di-mensions of this cylinder are given by the height of the cell monolayer and the radius of theplate. We model the concentration of the initial virus inoculum using the piecewise linearcontinuous function

cvir(t = 0, r) =

cvir,0, r < 0.075 cm(

1− 20cm(r − 0.075)

)cvir,0 0.075 cm ≤ r ≤ 0.125 cm

0, r > 1.25 cm(11.16)

The initial radial profile for the uninfected cell concentration is given by the expression

cunc(t = 0, r) =

c′unc,2, r < 0.025 cm

c′unc,2 −20(c′unc,2−c′unc,1)

cm (r − 0.025), 0.025 cm ≤ r < 0.075 cmc′unc,1, 0.075 ≤ r < 0.1 cm

cunc,0 −20(cunc,0−c′unc,1)

cm (r − 0.025), 0.1 cm ≤ r < 0.15 cmcunc,0, r > 0.15 cm

(11.17)

We first use the model to predict infection spread for the focal infection system [26].

210

Model predictions for the experimental measurements are calculated via the relations

Km =cvir-host

cvir (cunc + cinfc + kwashcdc)(11.18)

ym =

ibgd, cvir-host ≤ vmin

kmcvir-host + ibgd, vmin < cvir-host < vmax

255, cvir-host ≥ vmax

(11.19)

in which Km is the equilibrium constant; cvir, cunc, cinfc, cdc, and cvir-host refer to the concen-trations of virus, uninfected cells, infected cells, dead cells, and virus-host complexes, respec-tively; ym is the intensity measurement; km is the conversion constant from concentration tointensity; ibgd is the background fluorescence (in intensity); and vmin and vmax are the mini-mum and maximum detectable virus-host concentrations. Parameters for the measurementand the extracellular model are given in Table 11.4.

Figure 11.4 compares the experimental data, simple segregated model predictions (re-sults taken directly from Chapter 10), and the predictions for the model developed in this chap-ter. This figure demonstrates excellent agreement between the two model predictions and theexperimental data. Clearly the experimental data is not informative enough to merit all of theintracellular structure developed in this section. This model requires eighteen more parame-ters than the simple segregated model, which approximates intracellular production rates ofboth virus and interferon using first-order plus time delay models. Figure 11.5 demonstratesthat the total production of both virus and interferon on a per infected cell basis is similar forboth the simple and intracellularly-structured models. However, the simpler model includesno intracellular structure and hence cannot predict concentrations for specific intracellularcomponents. For example, we might consider harvesting the entire monolayer of cells andassaying for intracellular species, as was recently performed by Munir and Kapur [94], whoused microarrays to analyze host-pathogen interactions for an avian pneumovirus infection.Figure 11.6 presents the results for an mRNA assay of this type, assuming that inoculated cellsmaintain an average of fifty copies of interferon mRNA. The results intuitively follow the na-ture of the infection. Initially, VSV hijacks the host DBT cells to produce the viral componentsas evidenced by the sharp rises of matrix and large protein mRNA. As the host cell recognizesviral double-stranded RNA, it up-regulates the interferon response, leading to a burst in theinterferon mRNA. Finally, all remaining living cells become inoculated to infection, leading toa steady state with a constant value of interferon mRNA and no viral mRNA species.

For this example, we expect that solving the decoupled system should yield dramaticreductions in the computational expense when compared to solving the full model. Since theintracellular level contains twenty-four distinct species, solving the full model would requirediscretization of each of these species at every node of the spatial discretization. Tracking thisinformation is clearly unnecessary because the concentrations of intracellular species are notspatially dependent. Additionally, the intracellular description for this example is stiff due tothe large rate constants for the reversible reactions (11.11b) and (11.12c), which poses potentialproblems for discretization of the age dimension.

211

Time Data Simple Intracellular

7 hours

27 hours

48 hours

72 hours

96 hours

Figure 11.4: Comparison of experimental data, simple segregated model fit (simple), and thedeveloped model (intracellular). The two models fit the data equally well. White scale bar inthe upper left-hand corner of the experimental images is one millimeter.

11.3 Conclusions

We have considered a decomposition for solving viral infection models with restricted flow ofinformation from the extracellular to intracellular level. For these cases, the intracellular leveldecomposes from the extracellular level, leading to a more computationally tractable problemthan the original formulation. Two examples illustrated the efficacy of this decomposition. Theassumptions required for this decomposition restrict the ability of the model to compensatefor instantaneous extracellular effects such as super-infection of infected cells and transferof material from the extracellular to intracellular levels. However, we suspect that in manycases these effects may either be insignificant or indistinguishable from models making these

212

IFN

VSV

Time (hours)Tota

lAm

ount

Secr

eted

Per

Infe

cted

Cel

l(nu

mbe

r)

181614121086420

400

350

300

250

200

150

100

50

0

Figure 11.5: Comparison of total production of virus (VSV) and interferon (IFN) per cell forthe simple segregated model (lines; first-order plus time delay expressions for the productionrate) and intracellularly-structured, segregated model (points).

IFNML

Time (hours)

mR

NA

1009080706050403020100

8

7

6

5

4

3

2

1

0

-1

Figure 11.6: Dynamic measurement of mRNA species (interferon and viral matrix (M) andlarge (L) proteins) for the focal infection system. Here, the entire monolayer is harvested andanalyzed for mRNA.

213

assumptions given experimental data.We also extended a model for vesicular stomatitis virus infection of murine astrocytoma

cells to include intracellular structure for both virus replication and interferon signaling. Thismodel predicts similar spatio-temporal profiles to a segregated, intracellularly-unstructuredmodel for the focal infection system, indicating that the available extracellular data is not in-formative enough to justify even the simplified intracellular model presented in this paper.In addition, the proposed model predicts population-averaged concentrations of intracellularspecies, and thus serves as a basis for assimilating more extensive intracellular measurementsin further studies of viral infection and cellular antiviral mechanisms.

Notation

aj jth reaction ratecj concentration of extracellular species jcij concentration of intracellular species jc′unc initial concentration of uninfected cells in the radius of the initial inoculum for the VSV/BHK-

21 fitc′unc,1 initial concentration of uninfected cells in the first radial region of the initial inoculum for

the VSV/DBT fitc′unc,2 fraction two of initial concentration of uninfected cells in the second radial region of the

initial inoculum for the VSV/DBT fitdj time delay for reaction jDifn interferon diffusivityDvir virus diffusivityEj jth extracellular production rateibgd background fluorescenceKj rate constant for reaction jKm equilibrium constant for the measurementkj rate constant for reaction jki

j rate constant for intracellular reaction jkm conversion constant from virus-host concentration to intensityRj jth intracellular production rateRη production rate for the infected cell population ηr radial dimensiont timevy velocity vector for the internal characteristics yY virus yield per infected celly internal characteristicsym intensity measurementδ Dirac delta functionη(t,y)dy concentration of infected cellsτ age of infection

214

τd maximum age of infection

Subscripts

dc dead cellifn interferoninfc infected cellinoc inoculated cellunc uninfected cellvir virusvir-host virus-host complex

215

Chapter 12

Moving-Horizon State Estimation 1

It is well established that the Kalman filter is the optimal state estimator for unconstrained,linear systems subject to normally distributed state and measurement noise. Many physicalsystems, however, exhibit nonlinear dynamics and have states subject to hard constraints, suchas nonnegative concentrations or pressures. Hence Kalman filtering is no longer directly ap-plicable. As a result, many different types of nonlinear state estimators have been proposed;Soroush [141] provides a review of many of these methods. We focus our attention on tech-niques that formulate state estimation in a probabilistic setting, that is, both the model andthe measurement are potentially subject to random disturbances. Such techniques include theextended Kalman filter, moving-horizon estimation, Bayesian estimation, and Gaussian sumapproximations. In this probabilistic setting, state estimators attempt to reconstruct the a pos-teriori distribution P (xT |y0, . . . ,yT ), which is the probability that the state of the system is xT

given measurements y0, . . . ,yT . The question arises, then, as to which point estimate shouldbe used for the state estimate. Two obvious choices for the point estimate are the mean andthe mode of the a posteriori distribution. For non-symmetric distributions, Figure 12.1 (a)demonstrates that these estimates are generally different. Additionally, if this distribution ismultimodal as is Figure 12.1 (b), then the mean may place the state estimate in a region of lowprobability. Clearly the mode is a more desirable estimate in such cases.

For nonlinear systems, the a posteriori distribution is generally non-symmetric andpotentially multimodal. In this chapter, we outline conditions that lead to the formation ofmultiple modes in the a posteriori distribution for systems tending to a steady state, and con-struct examples that generate multiple modes. To the best of our knowledge, only Alspachand Sorenson (and references contained within) [2], Gordon et al. [53], and Chaves and Son-tag [20] have proposed examples in which multiple modes arise in the a posteriori distribu-tion, but these contributions do not examine conditions leading to their formation. Gaussiansum approximations [2] offer one method for addressing the formation of multiple modes inthe a posteriori distribution for unconstrained systems. Current Bayesian estimation meth-ods [53, 12, 22, 142] offer another means for addressing multiple modes, but these methodspropose estimation of the mean rather than the mode. In this chapter, we examine the estima-

1Portions of this chapter appear in Haseltine and Rawlings [58] and are to appear in Haseltine and Rawl-ings [59].

216

(a)

meanmode

xT

p(x

T|y

0,...,y

T)

1086420

0.16

0.12

0.08

0.04

0

(b)modemode

mean

xT

p(x

T|y

0,...,y

T)

3210-1-2-3

0.8

0.6

0.4

0.2

0

Figure 12.1: Comparison of potential point estimates (mean and mode) for (a) unimodal and(b) bimodal a posteriori distributions.

tion properties of both the extended Kalman filter and moving-horizon estimation throughsimulation. The extended Kalman filter assumes that the a posteriori distribution is nor-mally distributed (unimodal), hence the mean and the mode of the distribution are equiva-lent. Moving-horizon estimation seeks to reconstruct the mode of the a posteriori distributionvia constrained optimization, but current implementations employ local optimizations thatoffer no means of distinguishing between multiple modes of this distribution. The simulationexamples thus provide a means of benchmarking these current industrially implementabletechnologies.

In this chapter, we first formulate the estimation problem of interest. Next, we brieflyreview pertinent extended Kalman filtering, Monte Carlo filter, and moving-horizon estima-tion literature. Then we present several motivating chemical engineering examples in whichthe accurate incorporation of both state constraints and the nonlinear model are paramountfor obtaining accurate estimates.

217

12.1 Formulation of the Estimation Problem

In chemical engineering systems, most processes consist of continuous processes with discretemeasurements. Therefore for this work, we choose the discrete stochastic system model

xk+1 = F (xk,uk) + G(xk,uk)wk (12.1a)

yk = h(xk) + vk (12.1b)

in which

• xk is the state of the system at time tk,

• uk is the system input at time tk (assumes a zero order hold over the interval [tk, tk+1)),

• wk is aN (0,Qk) noise (N (m,P ) denotes a normal distribution with mean m and covari-ance P ),

• F (xk,uk) is the solution to a first principles, differential equation model,

• G(xk,uk) is a full column rank matrix (this condition is required for uniqueness of the aposteriori distribution defined in section 12.5),

• yk is the system measurement at time tk,

• hk is a (possibly) nonlinear function of xk at time tk, and

• vk is a N (0,Rk) noise.

We believe that by appropriately choosing both a first principles model and a noise structure,we can identify both the model parameters (or a reduced set of these parameters) and the stateand measurement noise covariance structures. Such identification could proceed as follows:

1. Assuming a noise structure, identify the model parameters.

2. Assuming a model, model parameters, and a noise structure, identify the covariancestructures.

Here, we propose performing replicate experiments and measurements to estimate momentsof the desired quantity (in general, the mean of the state or covariance structure), then fit-ting the model parameters by comparing the estimated moments to those reconstructed fromMonte Carlo simulation of equation (12.1). This identification procedure is an area of currentresearch beyond the scope of this chapter, but we maintain that such a procedure yields arough, potentially biased, yet useful stochastic model from the system measurements.

As discussed in the introduction, state estimators given multimodal a posteriori distri-butions should solve the problem

x+T = arg max

xT

P (xT |y0, . . . ,yT ) (12.2)

218

Here, we assume that the input sequence u0, . . . ,uT is known exactly. Equation (12.2) is re-ferred to as the maximum a posteriori estimate. In the special case that the system is notconstrained and in equation (12.1)

1. F (xk,uk) is linear with respect to xk,

2. h(xk) is linear with respect to xk, and

3. G(xk,uk) is a constant matrix,

the maximum a posteriori estimator is the Kalman filter, whose well-known recursive formis conducive for online implementation. For the more general formulation given by equa-tion (12.1), online solution of the exact maximum a posteriori estimate is impractical, andapproximations are used to obtain state estimates in real time.

12.2 Nonlinear Observability

The determination of observability for nonlinear systems such as equation (12.1) is substan-tially more difficult than for linear systems. For linear systems, either one state is the optimalestimate, or infinitely many states are optimal estimates, in which case the system is unob-servable. Nonlinear systems have the additional complication that finitely many states may belocally optimal estimates. Definitions of nonlinear observability should account for such a con-dition. Concepts such as output-to-state stability [159] offer promise for a rigorous mathemat-ical definition of nonlinear observability, but currently no easily implemented tests for suchdetermination exist. In lay terms, such a definition for deterministic models should roughlycorrespond to “for the given model and measurements, if the measurement data are close, theinitial conditions generating the measurements are close.”

One approximate method of checking nonlinear observability is to examine the time-varying Gramian [21]. This test actually establishes the observability criterion for linear, time-varying systems. By approximating nonlinear systems as linear time-varying systems, we canobtain a rough estimate of the degree of observability for the system by checking the condi-tion number of the time-varying Gramian. In general, ill-conditioned Gramians indicate poorobservability because different initial conditions can reconstruct the data arbitrarily closely[93].

12.3 Extended Kalman Filtering

The extended Kalman filter is one approximation for calculating equation (12.2). The EKF lin-earizes nonlinear systems, then applies the Kalman filter (the optimal, unconstrained, linearstate estimator) to obtain the state estimates. The tacit approximation here is that the processstatistics are multivariate normal distributions. We summarize the algorithm for implement-ing the EKF presented by Stengel [144], employing the following notation:

219

• E[α] denotes the expectation of α,

• Ak denotes the value of the function A at time tk,

• xk|l refers to the value of x at time tk given measurements up to time tl,

• x denotes the estimate of x, and

• x0 denotes the a priori estimate of x0, that is, the estimate of x0 with knowledge of nomeasurements.

The assumed prior knowledge is identical to that of the Kalman filter:

x0 given (12.3a)

P 0 = E[(x− x0)(x− x0)T ] (12.3b)

Rk = E[vkvTk ] (12.3c)

Qk = E[wkwTk ] (12.3d)

The inputs uk are also assumed to be known.The approximation uses the following linearized portions of equation (12.1)

Ak =∂F (x,u)∂xT

∣∣∣∣x=xk,u=uk

(12.4)

Ck =∂h(x)∂xT

∣∣∣∣x=xk

(12.5)

to implement the following algorithm:

1. At each measurement time, compute the filter gain L and update the state estimate andcovariance matrix:

Lk = P k|k−1CTk [CkP k|k−1C

Tk + Rk]−1 (12.6)

xk|k = xk|k−1 + Lk(yk − h(xk|k−1)) (12.7)

P k|k = P k|k−1 −LkCkP k|k−1 (12.8)

2. Propagate the state estimate and covariance matrix to the next measurement time via theequations:

xk+1|k = F (xk,uk) (12.9)

P k+1|k = AkP k|kATk + GkQkG

Tk (12.10)

3. Let k ← k + 1. Return to step 1.

220

Until recently, few properties regarding the stability and convergence of the EKF havebeen proven. Recent publications present bounded estimation error and exponential conver-gence arguments for the continuous and discrete EKF forms given detectability, small initialestimation error, small noise terms, and perfect correspondence between the plant and themodel [123, 124, 125]. However, depending on the system, the bounds on initial estimationerror and noise terms may be unreasonably small. Also, initial estimation error may result inbounded estimate error but not exponential convergence, as illustrated by Chaves and Sontag[20].

12.4 Monte Carlo Filters

The basic idea of Monte Carlo filters is to use simulations of the stochastic process to recon-struct the state estimates. In general, we can reconstruct functions of the underlying probabil-ity distribution by sampling from this distribution, then averaging the resulting properties. Inthe limit as the number of samples approaches infinity, we obtain the equivalence∫

h(x)P (x)dx = limNs→∞

1Ns

Ns∑j=1

h(xj) (12.11)

in which xj is the jth realization of x. Monte Carlo filters approximate the left-hand side ofequation (12.11) by evaluating the right-hand side of the same equation with a finite numberof samples. For example, most Monte Carlo filters propose estimation of the mean

E[x] =∫

xx′P (x′)dx′ ≈ 1

Ns

Ns∑j=1

xj

The primary benefits of using this type of filter are

• their relative simplicity (most demanding requirement is the integration of model (12.1)),and

• the ability to use any combination of model and random noise in a straightforward man-ner.

Examples of Monte Carlo techniques include

• Rejection sampling [12]. Here, one draws samples from an initial distribution, propa-gates this sample to the next measurement time via equation (12.1a), then either acceptsor rejects the integrated state based upon the statistics of the measurement noise in equa-tion (12.1b). This process is repeated until one generates the desired number of acceptedsamples.

• Particle methods [22]. For this method, one randomly distributes a set number of initial“particles”, i.e. states. Each of these states is propagated to the next measurement time

221

via equation (12.1a), then a weight qj is assigned to each state according to the distri-bution of the measurement noise vk in equation (12.1b). For this case, functions of theunderlying probability distribution are evaluated according to∫

xh(x′)P (x′)dx′ ≈

∑Nsj=1 qjh(xj)∑Ns

j=1 qj(12.12)

Spall [142] presents a nice overview of other Monte Carlo methods.Most Monte Carlo filters propose estimation of the mean as opposed to the mode. For

cases such as the bimodal distribution given in Figure 12.1(b), however, one may prefer themode point estimate. In this case, a Monte Carlo filter must first estimate the entire probabilitydensity, then maximize this estimated density to calculate the mode. We consider the task ofdensity estimation next.

The density estimation is well known in the field of statistics. The information pre-sented here merely summarizes some of the information presented by Silverman [140]. Thebasic idea is that one can use samples of an underlying distribution to approximately recon-struct this distribution. Kernel methods are perhaps the most popular manner of performingthis reconstruction. This technique proceed roughly as follows:

1. Draw samples from the underlying distribution.

2. Apply a “kernel” density at each sample.

3. Sum the kernel densities to approximate the underlying distribution.

Mathematically, the approximate distribution is then

f(x) =1Nsh

Ns∑j=1

K

(x−Xj

h

)(12.13)

in which

• f(x) is the reconstructed distribution, a function of x;

• Ns is the number of samples;

• h is the window width, also called the smoothing parameter or bandwidth;

• K is the kernel; and

• Xj is the jth sample of x.

Each kernel obeys the properties of a probability density function, and are usually symmetric.Silverman [140] gives greater detail in selection of the kernel K and the window width h.

As an illustrative example, we employ the kernel method to estimate the density ofsamples drawn from a normal distribution. Figure 12.2 demonstrates the three steps of this

222

0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

f(x)

0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

f(x)

0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

f(x)

Figure 12.2: Example of using the kernel method to estimate the density of samples drawnfrom a normal distribution.

0

0.05

0.1

0.15

0.2

-4 -3 -2 -1 0 1 2 3 4

f(x)

Figure 12.3: Example of using a histogram to estimate the density of samples drawn from anormal distribution.

procedure and presents the resulting reconstructed density. This density approximates wellthe underlying normal distribution given a relatively small sample size of ten. In contrast,more naıve methods of estimating the density such as using histograms generate substantiallyworse estimates as seen by Figure 12.3.

The primary drawback of density estimation is its “curse of dimensionality”. Table 12.1presents the number of samples required to approximate, to a given relative error, an under-lying standard multivariate normal distribution for the origin (a single point). The number of

223

Dimensionality Required Sample Size1 42 193 674 2235 7686 27907 107008 437009 18700010 842000

Table 12.1: Sample size required to ensure that the relative mean square error at zero (a singlepoint) is less than 0.1. The underlying distribution is a standard multivariate normal density.

samples increases exponentially with the dimensionality. Since the computational expense ofMonte Carlo methods scales with the number of samples, we expect density estimation to beapplicable for systems with smaller dimensions.

Another drawback results if one is interested in obtaining the maximum mode of thereconstructed density. If the estimated density has multiple modes, then one requires a globaloptimizer to find the desired mode. However, we expect that local optimization may prove ef-fective in calculating a global optimum since the samples of the underlying distribution shouldprovide excellent initial guesses.

12.5 Moving-Horizon Estimation

One alternative to solving the maximum a posteriori estimate is to maximize a joint probabilityfor a trajectory of state values, i.e.,

x∗0, . . . ,x∗T = arg maxx0,...,xT

P (x0, . . . ,xT |y0, . . . ,yT ) (12.14)

Equation (12.14) is the full information estimate. The computational burden of calculating thisestimate increases as more measurements come online. To bound this burden, one can fix theestimation horizon:

x∗T−N+1, . . . ,x∗T

= arg max

xT−N+1,...,xT

P (xT−N+1, . . . ,xT |y0, . . . ,yT ) (12.15)

Moving-horizon estimation, or MHE, corresponds probabilistically to equation (12.15), andis equivalent numerically to a constrained, nonlinear optimization problem [128, 114]. Wenote that the restrictive assumptions of normally distributed noises and the model given byequation (12.1) are not required by MHE. If the matrix G in equation (12.1) is not a function of

224

the state xk, then these assumptions merely lead to a convenient least-squares optimization asdemonstrated by Jazwinski [71].

From a theoretical perspective, Tyler and Morari examine the feasibility of constrainedMHE for linear, state-space models [152]. Rao et al. show that constrained MHE is an asymp-totically stable observer in a nonlinear deterministic modeling framework [112, 116]. Theseworks also provide a nice overview of current MHE research. Furthermore, recent advancesin numerical computation have allowed real-time implementation of MHE strategies for thelocal optimization of the MHE problem [147, 148]. How to incorporate the effect of past dataoutside the current estimation horizon (also known as the arrival cost), though, remains anopen issue of MHE.

Rao, Rawlings and Lee [115] explore estimating this cost for constrained linear systemswith the corresponding cost for an unconstrained linear system. More specifically, the follow-ing two schemes are examined:

1. a “filtering” scheme that penalizes deviations of the initial estimate in the horizon froman a priori estimate, and

2. a “smoothing” scheme that penalizes deviations of the trajectory of states in the estima-tion horizon from an a priori estimate.

For unconstrained, linear systems, the MHE optimization collapses to the Kalman filter forboth of these schemes. Rao [112] further considers several optimal and suboptimal approachesfor estimating the arrival cost via a series of optimizations. These approaches stem from theproperty that, in a deterministic setting (no state or measurement noise), MHE is an asymptoti-cally stable observer as long as the arrival cost is underbounded. One simple way of estimatingthe arrival cost, therefore, is to implement a uniform prior. Computationally, a uniform priorcorresponds to not penalizing deviations of the initial state from the a priori estimate.

For nonlinear systems, Tenny and Rawlings [148] estimate the arrival cost by approxi-mating the constrained, nonlinear system as an unconstrained, linear time-varying system andapplying the corresponding filtering and smoothing schemes. They conclude that the smooth-ing scheme is superior to the filtering scheme because the filtering scheme induces oscillationsin the state estimates due to unnecessary propagation of initial error. Here, the tacit assump-tion is that the probability distribution around the optimal estimate is a multivariate normal.The problem with this assumption is that nonlinear systems may exhibit multiple peaks (i.e.local optima) in this probability distribution. Haseltine and Rawlings [58] demonstrate thatapproximating the arrival cost with the smoothing scheme in the presence of multiple localoptima may skew all future estimates. They conjecture that if global optimization is imple-mentable in real time, approximating the arrival cost with a uniform prior and making theestimation horizon reasonably long is preferable to an approximate multivariate normal ar-rival cost because of the latter’s biasing effect on the state estimates.

We now seek to demonstrate by simulation examples that MHE is a useful and prac-tical tool for state estimation of chemical process systems. We examine the performance of

225

MHE with local optimization and an arrival cost approximated with a “smoothing” update.For further details regarding this MHE scheme, we refer the interested reader to Tenny andRawlings [148] and note that this code is freely available as part of the NMPC toolbox (http:

//www.che.wisc.edu/˜tenny/nmpc/ ). Currently, this particular MHE configuration rep-resents a computationally feasible implementation for an industrial setting.

12.6 Example 1

Consider the gas-phase, reversible reaction

2A k−→ B k = 0.16 (12.16)

with stoichiometric matrixν =

[−2 1

](12.17)

and reaction rater = kP 2

A (12.18)

We define the state and measurement to be

x =

[PA

PB

], yk =

[1 1

]xk (12.19)

where Pj denotes the partial pressure of species j. We assume that the ideal gas law holds(high temperature, low pressure), and that the reaction occurs in a well-mixed, isothermalbatch reactor. From first principles, the model for this system is

x = f(x) = νT r, x0 =[3 1

]T(12.20)

For state estimation, consider the following parameters:

∆t = tk+1 − tk = 0.1, Π0 = diag(62, 62), Gk = diag(1, 1),

Qk = diag(0.0012, 0.0012), Rk = 0.12, x0 =[0.1 4.5

]T(12.21)

Note that the initial guess, x0, is poor. The actual plant experiencesN (0,Qk) noise in the stateand N (0,Rk) noise in the measurements. We now examine the estimation performance ofboth the EKF and MHE for this system.

12.6.1 Comparison of Results

Figure 12.4 demonstrates that the EKF converges to incorrect estimates of the state (the par-tial pressures). In addition, the EKF estimates that the partial pressures are negative, which isphysically unrealizable. To explain why this phenomenon occurs, we examine the probabilitydensity P (xk|y0, . . . ,yk). Recall that the goal of the maximum likelihood estimator is to deter-mine the state xk that maximizes this probability density. Since we know the statistics of thesystem, we can calculate this density by successively

226

B

B

A

A

(a)

Time

Part

ialP

ress

ure

1086420

8

6

4

2

0

-2

-4

(b)

Time

Pres

sure

1086420

5

4.5

4

3.5

3

2.5

2

Figure 12.4: Extended Kalman filter results: (a) evolution of the actual (solid line) and EKF up-dated (dashed line) concentrations; (b) evolution of the actual (solid line), measured (points),and EKF updated (dashed line) pressure estimates.

1. using the discretized version of the nonlinear model

xk+1 = F (xk,wk) =

xk,1

2k∆txk,1 + 1

xk,2 +k∆tx2

k,1

2k∆txk,1 + 1

+ wk (12.22)

to propagate the probability density from P (xk|y0, . . . ,yk) to P (xk+1|y0, . . . ,yk) via

P (xk+1,wk|y0, . . . ,yk) = P (xk|y0, . . . ,yk)P (wk)

∣∣∣∣∣∣∂F (xk,wk)

∂xTk

∂F (xk,wk)

∂wTk

∂wk

∂xTk

∂wk

∂wTk

∣∣∣∣∣∣−1

(12.23)

and then

227

−8 −6 −4 −2 0 2

2.6

2.65

2.7

2.75

2.8

2.85

t1

t 2

A > 0, B > 0

Figure 12.5: Contours of P (x1|y0,y1)

2. using measurements to update P (xk|y0, . . . ,yk−1) to P (xk|y0, . . . ,yk)

P (xk|y0, . . . ,yk) =P (xk|y0, . . . ,yk−1)pvk

(yk − Cxk)∫∞−∞ P (xk|y0, . . . ,yk−1)pvk

(yk − Cxk)dxk(12.24)

Therefore, the expression for the probability density we are interested in is

P (xk|y0, . . . ,yk) =

∫∞−∞ . . .

∫∞−∞ Ωkdw0 . . . dwk−1∫∞

−∞ . . .∫∞−∞

∫∞−∞ Ωkdw0 . . . dwk−1dxk

(12.25)

in which

Ωk =

k−1∏j=0

(2k∆txj,1 + 1

)2 exp

−12

(x0 − x0)T Π−10 (x0 − x0) +

k∑j=0

vTj R−1vj +

k−1∑j=0

wTj Q−1wj

(12.26)

We can numerically evaluate equation (12.25) using the integration package Bayespack [42].Figure 12.5 presents a contour plot of the results for P (x1|y0,y1) with transformed axes

t =√

2

[1 −11 1

]−1

x

This plot clearly illustrates the formation of two peaks in the probability density. However,only one of these peaks corresponds to a region where both the partial pressures for species A

228

B

A

(a)

Time

Part

ialP

ress

ure

50403020100

1614121086420

(b)

Time

Pres

sure

50403020100

14

12

10

8

6

4

2

Figure 12.6: Clipped extended Kalman filter results: (a) evolution of the actual (solid line) andclipped EKF updated (dashed line) concentrations; (b) the evolution of the actual (solid line),measured (points), and clipped EKF updated (dashed line) pressure estimates.

and B are positive. The real problem is that the process prohibits negative partial pressures,whereas unconstrained estimators permit updating of the state to regions where partial pres-sures may be negative. Since the EKF falls into the unconstrained estimator category with alocal optimization (at best), the estimation behavior in Figure 12.4 is best explained as a poorinitial guess leading to an errant region of attraction.

One method of preventing negative estimates for the partial pressure is to “clip” theEKF estimates. In this strategy, partial pressures rendered negative by the filter update arezeroed. As seen in Figure 12.6, this procedure results in an improved estimate in that the EKFeventually converges to the true state, but estimation during the initial dynamic response ispoor. Also, only the estimates are “clipped”, not the covariance matrix. Thus the accuracy ofthe approximate covariance matrix is now questionable.

229

B

A

(a)

Time

Part

ialP

ress

ure

1086420

4.54

3.53

2.52

1.51

0.50

(b)

Time

Pres

sure

1086420

4.24

3.83.63.43.2

32.82.62.4

Figure 12.7: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initialcovariance update, and horizon length of one time unit (N = 11 measurements): (a) evolutionof the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolution of theactual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Alternatively, we can optimally constrain the partial pressures by applying MHE. Fig-ure 12.7 presents the MHE results for a horizon length of one time unit (N = 11 measure-ments). These results indicate significant improvement over those of either the EKF or theclipped EKF.

To explore further the differences between the full information and maximum likeli-hood estimates, we examine contour plots of the projection

maxx0,...,xk−1

P (x0, . . . ,xk|y0, . . . ,yk) (12.27)

noting again the equivalence between this probability and the full information cost functionΦk given by equation (12.14). Figure 12.8 confirms of our previous assertion that the full in-

230

−8 −6 −4 −2 0 2

2.6

2.65

2.7

2.75

2.8

2.85

t1

t2

A > 0, B > 0

Figure 12.8: Contours of maxx0

P (x1,x0|y0,y1).

−8 −6 −4 −2 0 2

2.6

2.7

2.8

t1

t 2

A > 0, B > 0

Figure 12.9: A posteriori density P (x1|y0,y1) calculated using a Monte Carlo filter with densityestimation.

formation and maximum likelihood estimates are not equivalent for nonlinear systems. Infact, the global optima are even different. However, the full information formulation retainsthe dominant characteristic of the maximum likelihood estimate, namely the formation of twolocal optima.

Finally, we consider using the rejection sampling technique outlined by Bølviken etal. [12] for the Monte Carlo filter, and reconstruct the a posteriori density P (x1|y0,y1) usingone hundred accepted samples. Figure 12.9 presents these results. The actual distribution,Figure 12.5, is bimodal with the maximum mode placed in the region where PA > 0 andPB > 0. The Monte Carlo reconstruction is unimodal, and the single mode does not overlap in

231

−8 −6 −4 −2 0 2

2.45

2.5

2.55

2.6

2.65

2.7

t1

t 2

A > 0, B > 0

Figure 12.10: Contours of P (x4|y0, . . . ,y4).

the same transformed coordinate t1 space as the actual maximum. These results indicate thatMonte Carlo methods do not provide very accurate estimation of the mode in the presence ofmultiple modes for models with small state noise. The primary sources of error are the finitenumber of samples associated with the Monte Carlo approximation, and the error induced bythe density estimation approximation.

12.6.2 Evaluation of Arrival Cost Strategies

The next logical question is: does MHE retain the same properties as the maximum likelihoodestimate? The short answer is: it depends on what approximation one chooses for the arrivalcost.

Figures 12.10 through 12.12 compare contours of the maximum likelihood estimate,unconstrained MHE with a smoothing update, and unconstrained MHE with a uniform prior,respectively, given five measurements. Figure 12.11 shows that the smoothing update biasesthe contours of the state estimate so much that the estimator no longer predicts multiple op-tima. This biasing occurs because the update has “smoothed” the estimate around only one ofthe optima in the estimator. Using MHE with a uniform prior, on the other hand, retains theproperty of multiple optima in the estimator as seen in Figure 12.12.

Increasing the number of measurements in the estimation horizon can overcome thebiasing of the smoothing update. Figure 12.13 shows the eventual reemergence of multipleoptima in the estimator upon increasing the estimation horizon from four (i.e. Figure 12.11) to

232

−8 −6 −4 −2 0 2

2.45

2.5

2.55

2.6

2.65

2.7

t1

t 2

A > 0, B > 0

Figure 12.11: Contours of maxx1,...,x3


using the smoothing update.

ten. However, the optima are still heavily biased by the smoothing update.

We speculate that any approximation of the arrival cost using the assumption that theprocess is a time-varying linear system may lead to substantial biasing of the estimator. Ashort estimation horizon further compounds such biasing because the information containedin the data can no longer overcome the prior information (i.e. the arrival cost). This situationis analogous to cases in Bayesian inference when the prior dominates and distorts the infor-mation contained in the data [14]. We expect the EKF to demonstrate similar biasing since itis essentially a suboptimal MHE with a short estimation horizon and an arrival cost approxi-mated by a filtering update. For such approximations to work well, one must have a systemthat does not exhibit multiple local optima in the probability distribution.

The optimization strategy further obfuscates the issue of whether or not to approx-imate the arrival cost via linearization (e.g. the smoothing and filtering updates). Ideally,one would implement a global optimizer so that MHE could then distinguish between localoptima. With global optimization, approximating the arrival cost with a uniform prior andmaking the estimation horizon reasonably long is preferable to approximating the arrival costas a multivariate normal because of the observed biasing effect. Currently, though, only localoptimization strategies can provide the computational performance required to perform theMHE calculation in real time. For this case, it may be preferable to use a linear approximationof the arrival cost and then judiciously apply constraints to prevent multiple optima in the

233

−8 −6 −4 −2 0 2

2.45

2.5

2.55

2.6

2.65

2.7

t1

t 2

A > 0, B > 0



as a uniform prior.

estimator. The examples considered next examine the estimator performance of this type ofMHE.

12.7 EKF Failure

In this section, we outline the conditions that generate EKF failure in two classes of chemicalreactors. We then present several examples that demonstrate failure of the EKF as an estimator.

If there is no plant-model mismatch, measurement noise, or state noise, one definitionof estimator failure is

limk→∞

∣∣xk|k − xk

∣∣ > ε (12.28)

for some ε > 0 (|x| is a norm of x). That is, the estimator is unable to reconstruct the truestate no matter how many measurements it processes. For stable systems, i.e. those systemstending to a steady state, we expect that

xk|k = xk−1|k−1 (12.29)

in the same limit as equation (12.28). We now examine the discrete EKF given such conditions.

235

Combining expressions (12.30) and (12.32) yields:

0 = F (xk−1|k−1,uk−1)− xk|k−1 (12.33a)

0 = Ak−1P k−1|k−1ATk−1 + Gk−1Qk−1G

Tk−1 − P k|k−1 (12.33b)

0 = xk|k−1 + Lk(yk − h(xk|k−1))− xk−1|k−1 (12.33c)

0 = P k|k−1 −LkCkP k|k−1 − P k−1|k−1 (12.33d)


Tk + Rk]−1 (12.33e)

If both equations (12.28) and (12.33) hold, then the EKF has failed as an estimator.One solution to equation (12.33) results when multiple steady states satisfy the steady-

state measurement. This phenomenon corresponds to the case that

xk|k = xk|k−1 = xk−1|k−1 (12.34)

yk = h(xk|k−1) (12.35)

xk|k 6= xk (12.36)

We would expect the EKF to fail when

1. the system model and measurement are such that multiple states satisfy the steady-statemeasurement, and

2. the estimator is given a poor initial guess of the state.

Condition 1 does not imply that the system is unobservable; rather, this condition states thatthe state cannot be uniquely determined from solely the steady-state measurement. For such acase to be observable, the process dynamics must make the system observable. Condition 2implies that the poor initial guess skews the estimates (xk|k’s) toward a region of attraction notcorresponding to the actual state (xk’s).

12.7.1 Chemical Reaction Systems

For well-mixed systems consisting of reaction networks, the nonlinearity of the system must bepresent at steady state so that multiple steady states can satisfy the steady-state measurement.Consequently, we must analyze the structure of the stoichiometric matrix in combination withthe number (and type) of measurements to determine whether or not multiple steady statescan satisfy the steady-state measurement. Define:

• ν, the stoichiometric matrix of size r × s, in which r is the number of reactions and s isthe number of species;

• ρ, the rank of ν (ρ = r if there are no linearly dependent reactions);

• η, the nullity of ν;

• n, the number of measurements; and

236

• nm, the number of measurements that can be written as a linear combination of states(e.g. y = x1 + x2 and (x1 + x2)y = x1).

For batch reactors, conservation laws yield a model of the form

d

dt(xVR) = νT r(x)VR (12.37)

in which

• x is an s-vector containing the concentration of each species in the reactor,

• VR is the volume of the reactor, and

• r(x) is an r-vector containing the reaction rates.

For this system ρ specifies the number of independent equations at equilibrium. In general,we require that

1. all reactions are reversible

2. the following inequalities hold:

number of “linear”equationsnm + η

<

number of estimatedspecies

s

≤number of independent

equationsn+ ρ

Note that the batch reactor preserves the nonlinearity of the reaction rates in the steady-statecalculation. Also, the combination of batch steady-state equations and measurements may ormay not be an over-specified problem.

For continuously stirred tank reactors (CSTRs), conservation laws yield a model of theform

d

dt(xVR) = Qfcf −Qox + νT r(x)VR (12.38)

where

• x is an s-vector containing the concentration of each species in the reactor,

• VR is the volume of the reactor,

• Qf is the volumetric flow rate into the reactor,

• cf is an s-vector containing the inlet concentrations of each species,

• Qo is the effluent volumetric flow rate, and

• r(x) is an r-vector containing the reaction rates.

237

Here η specifies the number of linear algebraic relationships among the s species at equilib-rium because the null space represents linear combinations of the material balances that elim-inate nonlinear reaction rates. We require

number of “linear” equationsnm + η

<number of estimated species

s(12.39)

If equation (12.39) is an equality instead of an inequality, then determination of the steadystate is generally a well-defined, linear problem with a unique solution. Note that the lefthand side of equation (12.39) is actually an upper bound since we could potentially choosea measurement contained within the span of the null space (a linear combination of the nullvectors). However, such measurements would be invariant and hence would give no dynamicinformation. Also, equation (12.39) does not imply that multiple steady states can satisfy thesteady-state measurement; rather, having multiple steady states that can satisfy the steady-state measurement implies that equation (12.39) holds. EKF failure for CSTRs modeled byequation (12.38) must be confirmed by verifying that equation (12.33) holds. This requirementdiffers from the batch case because in general, the CSTR design equation (12.38) yields a suf-ficient number of equations to calculate all possible steady states, whereas the batch designequation (12.37) does not.

We now examine several examples that illustrate these points.

12.7.2 Example 2

Consider the gas-phase, reversible reactions

Ak1−−k2

B + C (12.40a)

2Bk3−−k4

C (12.40b)

k =[0.5 0.05 0.2 0.01

]T(12.40c)

with stoichiometric matrix

ν =

[−1 1 10 −2 1

](12.41)

and reaction rates

r =

[k1cA − k2cBcCk3c

2B − k4cC

](12.42)

We define the state and measurements to be

x =[cA cB cC

]T(12.43a)

y =[RT RT RT

]x (12.43b)

238

ComponentPredicted EKFSteady State

Actual SteadyState

A −0.0274 0.01241B −0.2393 0.1837C 1.1450 0.6753

Table 12.2: EKF steady-state behavior, no measurement or state noise

where cj denotes the concentration of species j, R is the ideal gas constant, and T is the reactortemperature 2. We assume that the ideal gas law holds (high temperature, low pressure). Weconsider state estimation for both a batch reactor and a CSTR.

Batch Reactor

From first principles, the model for a well-mixed, constant volume, isothermal batch reactor is

x = f(x) = νT r(x) (12.44)

x0 =[0.5 0.05 0

]T(12.45)

We consider state estimation with the following parameters:

∆t = tk+1 − tk = 0.25 (12.46a)

Π0 = diag(0.52, 0.52, 0.52

)(12.46b)

Gk = diag (1, 1, 1) (12.46c)

Qk = diag(0.0012, 0.0012, 0.0012

)(12.46d)

Rk = 0.252 (12.46e)

x0 =[0 0 4

]T(12.46f)

Note that the initial guess, x0, is poor. The actual plant experiencesN (0,Qk) noise in the stateand N (0,Rk) noise in the measurements. We now examine the estimation performance ofboth the EKF and MHE for this system.

Figure 12.14 demonstrates that the EKF cannot reconstruct the evolution of the statefor this system. In fact, the EKF appears to converge to incorrect steady-state estimates ofthe state. Table 12.2 presents the results of solving the equations in (12.33) for this system.Note that the concentrations of components A and B are negative, indicating that the EKFhas converged to an unphysical state estimate. To prevent negative concentrations, we nextimplement an ad hoc clipping strategy in which negative filtered values of the state are setto zero (i.e. if xk|k < 0, set xk|k = 0). Figure 12.15 plots these clipped EKF results. Here, theclipped EKF drives the predicted pressure three orders of magnitude larger than the measured

2For the simulations, RT = 32.84.

239

C

C

B

B

AA

(a)

Time

Con

cent

rati

on

302520151050

1.6

1.2

0.8

0.4

0

-0.4

-0.8

(b)

Time

Pres

sure

302520151050

35

30

25

20

15

10

Figure 12.14: Extended Kalman filter results: (a) evolution of the actual (solid line) andEKF updated (dashed line) concentrations; (b) evolution of the actual (solid line), measured(points), and EKF updated (dashed line) pressure estimates.

pressure before eventually converging to the actual states. Figure 12.16 presents the resultsof applying MHE. For these results, we have constrained the state to prevent estimation ofnegative concentrations. The figures demonstrate that MHE swiftly converges to the correctstate estimates.

A little algebraic analysis reveals that multiple steady states satisfy the steady-statemeasurement for this system. At steady state, the model and measurement equations yieldone linear equation (assuming no noise in the steady-state measurement yss)

cA + cB + cC =yss

RT(12.47)

240

C

B

A

(a)

Time

Con

cent

rati

on

140120100806040200

100010010

10.1

0.010.0011e-041e-05

(b)

Time

Pres

sure

140120100806040200

100000

10000

1000

100

10

Figure 12.15: Clipped extended Kalman filter results: (a) evolution of the actual (solid line)and clipped EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line),measured (points), and clipped EKF updated (dashed line) pressure estimates.

and two nonlinear equations

k1cA = k−1cBcC (12.48)

k2c2B = k−2cC (12.49)

Solving for the steady-state solution using equations (12.47)-(12.49):

cC =k2

k−2c2B = K2c

2B (12.50)

cA =k−1k2

k1k−2c3B =

K2

K1c3B (12.51)

0 =K2

K1c3B +K2c

2B + cB −

yss

RT(12.52)

241

C

B

A (a)

Time

Con

cent

rati

on

302520151050

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

(b)

Time

Pres

sure

302520151050

323028262422201816

Figure 12.16: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initialcovariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolutionof the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolution of theactual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Descartes’ rule of signs states that for polynomials with real coefficients, the number of pos-itive, real roots is either the number of sign changes between consecutive coefficients or twoless than this number. Since equilibrium constants and the steady-state measurement are posi-tive, equation (12.52) has at most one positive root. Thus there is only one physically realizablesteady state. MHE is a natural estimation tool for this system since its incorporation of con-straints can thus prevent the estimator from converging to unphysical steady states.

242

ComponentPredicted EKFSteady State

Actual SteadyState

A −0.0122 0.0224B −0.1364 0.2006C 1.1746 0.6411

Table 12.3: EKF steady-state behavior, no measurement or state noise

CSTR

From first principles, the model for a well-mixed, isothermal CSTR reactor is

x =Qf

VRcf −

Qo

VRx + νT r(x) (12.53)

cf =[0.5 0.05 0

]T(12.54)

x0 =[0.5 0.05 0

]T(12.55)

Qf = Qo = 1 (12.56)

VR = 100 (12.57)

We consider state estimation with the following measurement and parameters:

yk =[RT RT RT

]xk (12.58a)

∆t = tk+1 − tk = 0.25 (12.58b)

Π0 = diag(42, 42, 42

)(12.58c)

Gk = diag (1, 1, 1) (12.58d)

Qk = diag(0.0012, 0.0012, 0.0012

)(12.58e)

Rk = 0.252 (12.58f)

x0 =[0 0 3.5

]T(12.58g)

Again, the initial guess, x0, is poor. The actual plant experiences N (0,Qk) noise in the stateand N (0,Rk) noise in the measurements. We now examine the estimation performance ofboth the EKF and MHE for this system.

Figure 12.17 demonstrates that, similarly to the batch case, the EKF appears to convergeto an incorrect steady-state estimate. This observation is confirmed by determining the EKFsteady state assuming no state or measurement noise. Calculating the EKF steady state viaequations (12.33) and assuming no state or measurement noise yields the results in Table 12.3.Some steady-state analysis of the system sheds light on the cause of this phenomenon. As-suming no noise in the steady-state measurement, the system has one linear steady-state mea-surement yss

cA + cB + cC =yss

RT(12.59)

243

CC

BB

AA

(a)

Time

Con

cent

rati

on

302520151050

32.5

21.5

10.5

0-0.5

-1-1.5

(b)

Time

Pres

sure

302520151050

3028262422201816


and one linear combination resulting from ρ, the null space of the stoichiometric matrix

ρ =[3 1 2

](12.60)

3cA + cB + 2cC = 3cAf + cBf + 2cCf (12.61)

Therefore the steady-state calculation is a nonlinear problem, and this system satisfies bothconditions required for EKF failure.

Figure 12.18 presents the EKF estimation results for implementation of a clipping strat-egy. Although clipping eliminates estimation error, this strategy causes a lengthy period ofoverestimation of the pressure, in some cases by two orders of magnitude.

Figure 12.19 presents the results of applying MHE. For these results, we have con-strained the state to prevent estimation of negative concentrations. These figures demonstrate

244

C

B

A

(a)

Time

Con

cent

rati

on

100806040200

1000

100

10

1

0.1

0.01

0.001

1e-04

(b)

Time

Pres

sure

100806040200

10000

1000

100

10

Figure 12.18: Clipped extended Kalman filter results: (a) evolution of the actual (solid line)and clipped EKF updated (dashed line) concentrations; (b) evolution of the actual (solid line),measured (points), and clipped EKF updated (dashed line) pressure estimates.

that MHE swiftly converges to the correct state estimates.

12.7.3 Example 3

Reconsider the batch model given in section 12.7.2, but with the following updated parameters

k =[0.5 0.4 0.2 0.1

]T(12.62a)

Rk = 0.12 (12.62b)

and new measurementyk =

[−1 1 1

]xk (12.63)

245

C

B

A (a)

Time

Con

cent

rati

on

302520151050

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

(b)

Time

Pres

sure

302520151050

323028262422201816

Figure 12.19: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initialcovariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolutionof the actual (solid line) and MHE updated (dashed line) concentrations. (b) evolution of theactual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Note that the measurement has no physical meaning. Solving for the steady-state solution interms of cB yields

0 = −K2

K1c3B +K2c

2B + cB − yss (12.64)

Again using Descartes’ rule of signs and taking into account the specified parameters, equa-tion (12.64) has two positive roots and one negative root. In contrast to the previous example,there are multiple physically realizable steady states. We now examine the effect of poor initialconditions upon the estimation behavior of the EKF and MHE.

Table 12.4 presents the a priori initial conditions for state estimation. Comparison ofFigures 12.20 and 12.21 demonstrates that given a poor estimate of the initial state, the EKF

246

Figures x0

12.20, 12.21[3 0.1 3

]T12.22-12.27

[4 0 4

]TTable 12.4: A priori initial conditions for state estimation

C

B

A

Time

Con

cent

rati

on

302520151050

12

10

8

6

4

2

0

Time

Mea

sure

men

t

302520151050

1.41.2

10.80.60.40.2

0-0.2-0.4-0.6


cannot reconstruct the evolution of the state while MHE can. Figures 12.22 and 12.23 show thatgiven an even poorer estimate of the initial state, both the EKF and MHE fail to reconstruct theevolution of the state. To improve the quality of the estimates, we constrain the concentrationsin the estimators so that

0 ≤ cj ≤ 4.5, j = A, B, C (12.65)

247

C

B

A

Time

Con

cent

rati

on

302520151050

43.5

32.5

21.5

10.5

0

Time

Mea

sure

men

t

302520151050

1.21

0.80.60.40.2

0-0.2-0.4-0.6

Figure 12.21: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initialcovariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolutionof the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolution of theactual (solid line), measured (points), and MHE updated (dashed line) pressure estimates.

Figures 12.24 and 12.25 demonstrate that with this extra knowledge, MHE converges to thetrue state estimates while the clipped EKF estimates are trapped on the constraint. Finally, werelax the concentration constraints to

0 ≤ cj ≤ 5.5, j = A, B, C (12.66)

Not surprisingly, the clipped EKF estimates remain trapped on the constraint, as shown inFigure 12.26. The quality of the MHE estimates is a function of the estimation horizon, as seenin Figure 12.27. If the estimation horizon is too short, the MHE estimates are pinned againstthe state constraint; increasing the horizon remedies this problem. For short horizons, wesuspect that the data in the estimation horizon cannot overcome the biasing of the arrival cost

248

C

B

A

Time

Con

cent

rati

on

302520151050

12

10

8

6

4

2

0

Time

Mea

sure

men

t

302520151050

1.21

0.80.60.40.2

0-0.2-0.4-0.6


approximation (with the smoothing scheme), hence resulting in state estimates pinned againstthe constraint. Changing arrival cost approximations (e.g. switching from the smoothingscheme to a uniform prior) when constraints are active may constitute one way of addressingthis problem without having to increase the estimation horizon.

Table 12.5 summarizes the estimation results examined in this section.

12.7.4 Computational Expense

Table 12.6 summarizes the average computational expense per time step for each of the ex-amples presented in this chapter. All computations were performed in GNU Octave (http:

//www.octave.org/ ) on a 2.0-GHz processor. MHE computations were performed using

249

C

B

A

Time

Con

cent

rati

on

302520151050

12

10

8

6

4

2

0

Time

Mea

sure

men

t

302520151050

1.21

0.80.60.40.2

0-0.2-0.4-0.6

Figure 12.23: Moving-horizon estimation results, states constrained to x ≥ 0, smoothing initialcovariance update, and horizon length of 2.5 time units (N = 11 measurements): (a) evolutionof the actual (solid line) and MHE updated (dashed line) concentrations. (b) evolution of theactual (solid line), measured (points), and MHE updated (dashed line) estimates.

the NMPC toolbox (http://www.che.wisc.edu/˜tenny/nmpc/ ). Not surprisingly, MHErequires substantially more computational time than the EKF. This increase results because

1. MHE employs optimization while the EKF uses a one-step linearization, and

2. MHE calculates sensitivities over a trajectory of states whereas the discrete EKF calcu-lates only a single sensitivity.

250

B

A,C

Time

Con

cent

rati

on

302520151050

5

4

3

2

1

0

Time

Mea

sure

men

t

302520151050

1.21

0.80.60.40.2

0-0.2-0.4-0.6

Figure 12.24: Clipped extended Kalman filter results, states clipped to 0 ≤ x ≤ 4.5: (a) evo-lution of the actual (solid line) and clipped EKF updated (dashed line) concentrations; (b)evolution of the actual (solid line), measured (points), and clipped EKF updated (dashed line)estimates.

12.8 Conclusions

Virtually all chemical engineering systems contain nonlinear dynamics and/or state constraints.The need to incorporate this information into state estimation is illustrated by the examplespresented in this chapter. These examples demonstrate that even with perfect concordance be-tween the model and the physical plant, it is possible for the nominal EKF to fail to convergeto the true state when

1. the system model and measurement are such that multiple states satisfy the steady-statemeasurement, and

2. the estimator is given a poor initial guess of the state.

251

C

B

A

Time

Con

cent

rati

on

302520151050

54.5

43.5

32.5

21.5

10.5

0

(b)

Time

Mea

sure

men

t

302520151050

2

1.5

1

0.5

0

-0.5

-1

Figure 12.25: Moving-horizon estimation results, states constrained to 0 ≤ x ≤ 4.5, smoothinginitial covariance update, and horizon length of 2.5 time units (N = 11 measurements): (a)evolution of the actual (solid line) and MHE updated (dashed line) concentrations; (b) evolu-tion of the actual (solid line), measured (points), and MHE updated (dashed line) estimates.

Given the same estimator tuning, model, and measurements as the EKF, MHE provides im-proved state estimation and greater robustness to poor guesses of the initial state. These bene-fits arise because MHE incorporates physical state constraints into an optimization, accuratelyuses the nonlinear model, and optimizes over a trajectory of states and measurements. Withlocal optimization, our results indicate that multivariate normal approximations to the arrivalcost combined with judicious use of constraints can prevent multiple optima in the estimatorand generate acceptable estimator performance.

The issue of global versus local optimization and the selection of an arrival cost alsohave substantial impact on the behavior of MHE. If one could implement a global optimiza-tion strategy in real time, approximating the arrival cost with a uniform prior and making

252

B

A,C

Time

Con

cent

rati

on

302520151050

6

5

4

3

2

1

0

Time

Mea

sure

men

t

302520151050

1.21

0.80.60.40.2

0-0.2-0.4-0.6

Figure 12.26: Clipped extended Kalman filter results, states clipped to 0 ≤ x ≤ 5.5: (a) evo-lution of the actual (solid line) and clipped EKF updated (dashed line) concentrations; (b)evolution of the actual (solid line), measured (points), and clipped EKF updated (dashed line)estimates.

the estimation horizon reasonably long is preferable to an approximate multivariate normalarrival cost because of the latter’s biasing effect on the state estimates. With local optimiza-tion, our results indicate that multivariate normal approximations to the arrival cost combinedwith judicious use of constraints can prevent multiple optima in the estimator and generateacceptable estimator performance.

One potential pitfall of employing local optimization is the inability to identify multi-ple modes in the a posteriori distribution. Example 2 of this chapter illustrates this pitfall per-fectly: attraction of MHE estimates to a mode in the infeasible region leads to state estimatestrapped on constraints even though another mode lies within the feasible region. To overcomethis difficulty, we propose that Monte Carlo particle filters may prove useful in estimating the

253

N=2.5 N=5

N=10

Actual

Time

C

302520151050

6

5

4

3

2

1

0

N=2.5,5

N=10

Actual

(b)

Time

Mea

sure

men

t

302520151050

2.5

2

1.5

1

0.5

0

-0.5

Figure 12.27: Moving-horizon estimation results, states constrained to 0 ≤ x ≤ 5.5, andsmoothing initial covariance update: (a) effect of horizon length on the evolution of the ac-tual (solid line) and MHE updated (dashed line) C concentration; (b) evolution of the actual(solid line), measured (points), and MHE updated (dashed line) estimates. Values of N on theplots correspond to the horizon length in time units.

a posteriori distribution. These filters present one method of identifying the appearance anddisappearance of small numbers of local optima in the a posteriori distribution, but do notprovide a reasonable framework for accurately reconstructing the mode of this distribution.Using particle filters to estimate the arrival cost in MHE presents one manner of better approx-imating the mode of the a posteriori distribution. Additionally, identifying local optima in thearrival cost distribution should yield better initial guesses for the local MHE optimization.

It is reasonable to expect that more complicated models with less restrictive assump-tions than the ones proposed here may yield multiple optima corresponding to both physicallyrealizable and unrealizable states. Since MHE permits incorporation of constraints into its op-

254

Estimator x0 Constraints Horizon Length Estimates Converge?

EKF[3 0.1 3

]Tx ≥ 0 NA No

MHE[3 0.1 3

]Tx ≥ 0 2.5 time units (N = 11) Yes

EKF[4 0 4

]Tx ≥ 0 NA No

MHE[4 0 4

]Tx ≥ 0 2.5 time units (N = 11) No

EKF[4 0 4

]T0 ≤ x ≤ 4.5 NA No

MHE[4 0 4

]T0 ≤ x ≤ 4.5 2.5 time units (N = 11) Yes

EKF[4 0 4

]T0 ≤ x ≤ 5.5 NA No

MHE[4 0 4

]T0 ≤ x ≤ 5.5 2.5 time units (N = 11) No

MHE[4 0 4

]T0 ≤ x ≤ 5.5 5 time units (N = 21) No

MHE[4 0 4

]T0 ≤ x ≤ 5.5 10 time units (N = 41) Yes

Table 12.5: Effects of a priori initial conditions, constraints, and horizon length on state estima-tion. N denotes the number of measurements in the estimation horizon.

Example EstimatorHorizonLength

Average CPU Timeper Time Step (sec)

12.7.2 EKF N = 1 0.00312.7.2 MHE N = 11 0.73712.7.2 EKF N = 1 0.00512.7.2 MHE N = 11 0.67612.7.3 EKF N = 1 0.00612.7.3 MHE N = 11 1.75612.7.3 MHE N = 21 4.71212.7.3 MHE N = 41 6.899

Table 12.6: Comparison of MHE and EKF computational expense. N denotes the number ofmeasurements in the estimation horizon.

timization, it is the natural choice for preventing estimation of physically unrealizable states.Since MHE employs a trajectory of measurements as opposed to measurements at only a sin-gle time, it is better suited than the EKF for distinguishing among the remaining physicallyrealizable states.

257

From Equation To Equation Manipulation of Boxed Quantity

(12.76) (12.77) P (a|b) =P (a, b)P (b)

(12.77) (12.78) P (a, b, c) = P (a, b|c)P (c)

(12.78) (12.79) P (a, b|c) = P (a|b, c)P (b|c)

(12.79) (12.80) 1 by the Markov property

(12.80) (12.81)P (a, b)P (b)

= P (a|b)

The filtering formulation of MHE is thus

p(xT−N+1, . . . ,xT |y0, . . . ,yT ) =

P (xT−N+1|y0, . . . ,yT−N )P (yT−N+1, . . . ,yT |y0, . . . ,yT−N )

(T−1∏

k=T−N+1

P (xk+1|xk)

)(T∏

k=T−N+1

P (yk|xk)

)(12.83)

12.9.3 Equivalence of the Full Information and Least Squares Formulations

Consider the model given by equation (12.1). This model assumes that each wk and vk isnormally distributed, and that the matrix G(xk,uk) has full column rank. We would like tocalculate the maximum likelihood estimate:

arg maxx0,...,xT

P (x0, . . . ,xT |y0, . . . ,yT ) (12.84)

= arg minx0,...,xT

− logP (x0, . . . ,xT |y0, . . . ,yT ) (12.85)

= arg minx0,...,xT

− log

P (x0)

(T−1∏

k=T−N+1

P (xk+1|xk)

)(T∏

k=T−N+1

P (yk|xk)

)(12.86)

= arg minx0,...,xT

− logP (x0)−T−1∑

k=T−N+1

logP (xk+1|xk)−T∑

k=T−N+1

logP (yk|xk)

(12.87)

We can calculate the conditional probabilities in equation (12.87) by first rewriting the jointdistributions as functions of independent random variables

(xk+1,xk) = f(wk,xk) (12.88)

(yk,xk) = f(vk,xk) (12.89)

These density calculations are presented in the following subsections.

258

Calculation of P (xk+1|xk)

Given the model (12.1), derive the density P (xk+1|xk) under the assumption that G(xk,uk)has full column rank. We derive this density by writing the joint density P (xk+1,xk) as afunction of the joint density P (wk,xk). For this conversion to hold, we must show that

1. (wk,xk) can be uniquely written in terms of (xk+1,xk). We trivially have

xk = xk (12.90)

Also, we have

xk+1 = F (xk,uk) + G(xk,uk)wk (12.91)

G(xk,uk)wk = xk+1 − F (xk,uk) (12.92)

G(xk,uk)T G(xk,uk)wk = G(xk,uk)T (xk+1 − F (xk,uk)) (12.93)

wk =(G(xk,uk)T G(xk,uk)

)−1G(xk,uk)T (xk+1 − F (xk,uk)) (12.94)

Since G(xk,uk) has full column rank, equation (12.94) has a unique solution.

2. We must show that the following matrix has full rank:

H1 =

∂xk+1

∂wTk

∂xk+1

∂xTk

∂xk

∂wTk

∂xk

∂xTk

(12.95)

=

G(xk,uk)∂F (xk,uk)

∂xTk

+∂G(xk,uk)

∂xTk

wk

0 I

(12.96)

Clearly H1 has full rank since G(xk,uk) has full column rank.

Since these conditions hold, the inverse function theorem tells us that

P (xk+1,xk) =P (wk,xk)|H1(xk,wk)|−1 (12.97)

=P (wk)P (xk)|H1(xk,wk)|−1 (independence of wk and xk) (12.98)

=P(wk =

(G(xk,uk)T G(xk,uk)

)−1G(xk,uk)T (xk+1 − F (xk,uk))

)× P (xk)|H1(xk,wk)|−1 (12.99)

Now solve for the desired conditional density:

P (xk+1|xk) =P (xk+1,xk)

P (xk)(12.100)

= P(wk =

(G(xk,uk)T G(xk,uk)

)−1G(xk,uk)T (xk+1 − F (xk,uk))

)|H1(xk,wk)|−1

(12.101)

259

Calculation of P (yk|xk)

Given the model (12.1), derive the density P (yk|xk). We derive this density by writing thejoint density P (yk,xk) as a function of the joint density P (vk,xk). For this conversion to hold,we must show that

1. (vk,xk) can be uniquely written in terms of (yk,xk). We clearly have

vk = yk − h(xk) (12.102)

xk = xk (12.103)

2. We must show that the following matrix has full rank:

H2 =

∂yk

∂vTk

∂yk

∂xTk

∂xk

∂vTk

∂xk

∂xTk

(12.104)

=

I∂h(xk)∂xT

k

0 I

(12.105)

Clearly H2 has full rank.

Since these conditions hold, the inverse function theorem tells us that

P (yk,xk) = P (vk,xk)|H2(xk)|−1 (12.106)

= P (vk)P (xk) (independence of vk and xk) (12.107)

= P (vk = yk − h(xk))P (xk) (12.108)

Now solve for the desired conditional density:

P (yk|xk) =P (yk,xk)P (xk)

(12.109)

= P (vk = yk − h(xk)) (12.110)

Derivation of the Least Squares Problem

We have assumed that both P (wk) and P (vk) are normally distributed. N (m,P )-distributedmultivariate normals have probability functions of the form

P (x) =1

(2π)n/2|P |1/2exp

[−1

2(x−m)T P−1(x−m)

](12.111)

260

where n is the number of elements of the variable x. Therefore

arg maxxk,xk+1

= arg minxk,xk+1

− logP (xk+1|xk) (12.112)

= arg minxk,xk+1

− log(P (wk)|H1(xk,wk)|−1

)(12.113)

= arg minxk,xk+1

12wT

k Q−1k wk + log (|H1(xk,wk)|) (12.114)

s.t.: xk+1 = F (xk,uk) + G(xk,uk)wk (12.115)

and

arg maxxk

P (yk|xk) = arg minxk

− logP (yk|xk) (12.116)

= arg minxk

− logP (vk) (12.117)

= arg minxk

12vT

k R−1k vk (12.118)

s.t.: yk = h(xk) + vk (12.119)

Plugging these values into equation (12.87) yields the minimization

ΦT = minx0,...,xT

Γ(x0) +T−1∑k=0

wTk Q−1

k wk + |H1(xk,wk)|+T∑

k=0

vTk R−1

k vk (12.120a)

s.t.: Γ(x0) = (x0 − x0)TΠ−1(x0 − x0) (12.120b)

xk+1 = F (xk,uk) + G(xk,uk)wk (12.120c)

yk = h(xk) + vk (12.120d)

If G(xk,uk) is not a function of xk, then the determinant |H1(xk,wk)| is constant and opti-mization (12.120) becomes a pure least-squares problem.

12.9.4 Evolution of a Nonlinear Probability Density

We are interested in determining formulas for the evolution of the a posteriori probabilitydensity for the system

xk+1 =

xk,1

2k∆txk,1 + 1

xk,2 +k∆tx2

k,1

2k∆txk,1 + 1

+ wk (12.121a)

yk =[1 1

]xk + vk (12.121b)

We view future states (xk’s) as functions of the random variables with known statistics (x0,wk’s, and vk’s). First update the a priori estimate, x0, with the first measurement, y0, by

261

1. writing the joint probability density P (x0,y0) as a function of P (x0, v0)

P (x0,y0) = P (x0,v0)

∣∣∣∣∣I 0C I

∣∣∣∣∣−1

(12.122a)

= P (x0)P (v0) (x0 and v0 are independent) (12.122b)

2. calculating the conditional probability density P (x0|y0)

P (x0|y0) =P (x0,y0)P (y0)

(12.123a)

=P (x0,y0)∫∞

−∞ P (x0,y0)dx0(12.123b)

=P (x0)Pv0(y0 − Cx0)∫∞

−∞ P (x0)Pv0(y0 − Cx0)dx0(12.123c)

=exp

[−1

2(x0 − x0)TΠ−10 (x0 − x0)− 1

2(y0 −Cx0)T R−1k (y0 −Cx0)

]∫∞−∞ exp

[−1

2(x0 − x0)TΠ−10 (x0 − x0)− 1

2(y0 −Cx0)T R−1k (y0 −Cx0)

]dx0

(12.123d)

Now propagate P (x0|y0) to the next measurement time to obtain P (x1|y0):

P (x1,w0|y0) = P (x0,w0|y0)

∣∣∣∣∣∂x1

∂xT0

∂x1

∂wT0

∂w0

∂xT0

∂w0

∂wT0

∣∣∣∣∣−1

(12.124a)

= P (x0,w0|y0)(2k∆x0,1 + 1

)2 (12.124b)

= P (x0|y0)P (w0)(2k∆x0,1 + 1

)2 (12.124c)

P (x1|y0) =∫ ∞

−∞P (x1,w0|y0)dw0 (12.125)

=∫ ∞

−∞P (x0|y0)P (w0)

(2k∆x0,1 + 1

)2dw0 (12.126)

P (x1|y0,y1) =P (x1|y0)Pv1(y1 − Cx1)∫∞

−∞ P (x1|y0)Pv1(y1 − Cx1)dx1(12.127a)

=

∫∞−∞ Ω1dw0∫∞

−∞∫∞−∞ Ω1dw0dx1

(12.127b)

in which

Ω1 =∫ ∞

−∞

(2k∆x0,1 + 1

)2 exp

−12

(x0 − x0)T Π−10 (x0 − x0) + wT

0 Q−1k w0 +

1∑j=0

vTj R−1

k vj

dw0

(12.128)For future times, it is straightforward to derive equations (12.25) and (12.26).

262

Notation

Ak state sensitivity at time tkCk linearization of the measurement function h(xk)cf inlet concentration vectorcj concentration of species jF (xk,uk) solution to a first principles, differential equation modelf(x) reconstructed distribution using density estimationG(xk,uk) full column rank matrixh window width for density estimationh(xk, tk) model prediction of the measurement at time tkK kernel function for density estimationk reaction rate vectorkj jth reaction rateL filter gain matrixN MHE horizon lengthNs number of Monte Carlo samplesn number of measurementsnm number of measurements that can be written as a linear combination of statesN (m,P ) normal distribution with mean m and covariance P

P probabilityPj partial pressure of species jQ covariance of the state noise wk

Qf volumetric flow rate into the reactorQo effluent volumetric flow rateqj weight of the jth Monte Carlo sampleR covariance of the measurement noise vk

R ideal gas constantr(x) reaction rate vectorT reactor temperaturet timeuk input at time tkVR reactor volumevk N (0,Rk) measurement noise at time tkwk N (0,Qk) state noise at time tkx statexk state at time tkxj|k estimated state at time tj given measurementsx0 a priori estimate of x0

xk estimated state at time tj to time tkyk measurement at time tkyss steady-state measurement

263

∆t time incrementε arbitrary constantη nullity of the stoichiometric matrix ν

ν stoichiometric matrixΠ covariance matrix for the a priori state estimate x0

Φk objective function value at time tkρ rank of the stoichiometric matrix ν

ρ null space of the stoichiometric matrix ν

265

Chapter 13

Closed Loop Performance UsingMoving-Horizon EstimationWe now turn our attention to the effect of local optima in the estimator under closed-loopcontrol. Figure 13.1 details the interaction between the process, sensor, estimator, target calcu-lation, and regulator. For this section, we use the nonlinear model-predictive control (NMPC)regulator and target calculation contained in the NMPC toolbox [146, 149, 150], and considerthe effect of employing either the extended Kalman filter (EKF) or moving-horizon estimation(MHE) as the estimator.

13.1 Regulator

The goal of the regulator is to drive the state to its set point. Nonlinear model-predictivecontrol solves the on-line optimization

minuk,xk

Nc−1∑k=0

L(xk,uk)︸︷︷︸Stage cost

+xTNc

PxNc︸︷︷︸Tail cost

(13.1)

xk+1 = F (xk,uk) (13.2)

Hxk ≤ h, Duk ≤ d, Mu∆uk ≤mu (13.3)

in which

• uk is the input vector at time tk;

• x is the state vector at time tk;

• Nc is the control horizon length;

• P is the penalty for the terminal state in the control horizon;

• H and h are the state constraint matrix and vector, respectively;

• D and d are the input constraint matrix and vector, respectively; and

266

Regulator

TargetCalculation

ysetuset

xk

xt

ut

UnmeasuredDisturbances

Measurements

ProcessProcess

Estimator

xk

Sensoruk

vk

yk

yk = h(xk) + vk

Sensor Noise

xk+1 = F (xk, uk, wk)

State Estimate

pk

Figure 13.1: General diagram of closed-loop control for the model-predictive control frame-work. The goal of control is to drive the process measurements (yk’s) to a desired measurementand input set point (yset and uset, respectively). Model-predictive control requires models forboth the process and sensor. The estimator uses these models along with process measurementyk and input uk information to estimate the state xk and disturbances dk. The target calculationdetermines the state and input targets (xt and ut) given the state estimates. The regulator usesthese targets and the state estimate to calculate the next input to the process.

• Mu and mu are the change in input constraint matrix and vector, respectively.

The subscript k denotes that the measurement time is tk. Here, we use the expectation ofthe stochastic model (12.1a) as the desired control model. Also, one generally assumes thatthe control horizon is sufficiently long so that the linear tail cost adequately approximates theinfinite horizon solution.

13.2 Disturbance Models for Nonlinear Models

For offset free control, we must account for discrepancies between the plant and the model.The general strategy for doing so requires augmentation of the state with a disturbance model.In general, one models the disturbance as a random walk that influences either manipulated

267

inputs or the observed outputs as so:

xk+1 = F (xk,uk+Xudk) + Gwk (13.4)

yk = h(xk)+Xydk + vk (13.5)

dk+1 = dk + ξk (13.6)

wk ∼ N (0,Q) (13.7)

vk ∼ N (0,R) (13.8)

ξk ∼ N (0,Qd) (13.9)

(13.10)

in which

• dk is the integrated disturbance vector at time tk,

• wk is the state noise vector at time tk,

• G is the full-column rank state noise matrix,

• Xu is the input disturbance matrix,

• Xy is the output disturbance matrix,

• wk is aN (0,Q) noise at time tk (N (0,Q) denotes a normal distribution with mean 0 andcovariance Q),

• F (xk,uk) is the solution to a first principles, differential equation model,

• yk is the system measurement at time tk,

• h is a (possibly) nonlinear function of x,

• vk is a N (0,R) noise at time tk, and

• ξk is a N (0,Qd) noise at time tk.

Such a model implies that dk is stochastic in nature. For output disturbance models, dk shouldremain roughly constant over the estimation horizon; otherwise, the tacit assumption is thatthe system is not modeled sufficiently well, and so increasing the estimation horizon has notangible benefit because the model predictions are not reliable.

268

A→ B

T

CAf

Tf

Tc

Figure 13.2: Exothermic CSTR diagram.

Steady State cA (mol/l) Output T (K) Disturbance d (K)1 0.851 326.2 23.82 0.583 344.4 5.63 0.177 371.8 −21.8

Table 13.1: Model Steady States for a Plant with Tc = 300 K, T = 350 K

13.2.1 Plant-model Mismatch: Exothermic CSTR Example

We consider the exothermic CSTR shown in Figure 13.2. This example was motivated by asimilar example in Tenny [146]. The state, input, and measurement are

x =

[cAT

](13.11)

u = Tc (13.12)

230 K ≤ uk ≤ 427 K (13.13)

|∆uk| ≤ 15 K (13.14)

yk =[0 1

]xk + dk (13.15)

in which cA is the concentration of species A, T is the reactor temperature, and Tc is the coolanttemperature. We induce a small mismatch in activation energy between the plant and themodel. Here, the output disturbance model generates multiple steady states in the estimatorfor a given range of the input Tc as seen in Figure 13.3. Table 13.2.1 presents the exact values ofthese optima. The question of interest, then, is whether or not these optima affect the overallcontrol performance of the system.

269

set point

−dk

+dk

Model

Plant

Input Tc (K)

Out

putT

(K)

306304302300298296

380

370

360

350

340

330

320

310

Figure 13.3: Steady states for the Exothermic CSTR example.

Time (hr)Fe

edC

once

ntra

tion

(mol

/l)

Feed

Tem

pera

ture

(K) 1.6

1.51.41.31.21.110.90.80.7

543210

360

350

340

330

320

310

300

Figure 13.4: Exothermic CSTR feed disturbance.

We consider a disturbance in the feed given by Figure 13.4. We examine the closed-loopperformance given the following estimators:

1. EKF (Qd = 0.25)

2. MHE with N = 2, smoothing update (Qd = 0.25)

3. MHE with N = 10, no initial penalty (uMHE; Qd = 10−8)

4. MHE withN = 10, constant initial penalty (cMHE; Qd = 0.25, P T−N |T = diag(1, 102, 102))

We use nonlinear MPC with a prediction horizon of Nc = 60 and sampling interval ∆t = 0.05

270

hours. The controller penalty matrices are

Q = diag(0, 4), R = 2 (13.16)

Figure 13.5 presents the results of this comparison. The EKF causes plant ignition,whereas each MHE is able to successfully reject the disturbance without igniting the plant.MHE with a longer estimation horizon provides better disturbance rejection than MHE witha shorter horizon. Also, the output and input behavior of MHE with N = 2 and the EKFappear roughly identical through the first simulated hour, but the estimated states cA andT are slightly different, thus explaining the disparate closed-loop performances. Finally, thetwo MHE’s with N = 10 provide very similar input and output trajectories even though theestimated states are substantially different. In fact, the two apparently have different steady-state attractors (compare the estimates in Figure 13.5 with the steady states of Table 13.2.1).

Note that we did not present results for MHE with N = 10 and the smoothing update.For this particular case, the smoothing update leads to very large penalties on the a prioriestimate of the initial state in the horizon, and subsequently yields very poor estimation.

Of course, the current benchmark in disturbance rejection is linear MPC. Therefore, welinearize the model around unstable steady state 2, employ an output disturbance model withthe same tuning as above, and use a Kalman filter for state estimation. Figure 13.6 comparesthe best nonlinear results, MHE with N = 10, to the linear MPC result. Nonlinear MPCappears to provide little if any improvement over linear MPC. Additionally, the computationalexpense of nonlinear MPC is at least two orders of magnitude greater than that of linear MPC.

13.2.2 Maximum Yield Example

Consider the CSTR in Figure 13.7 in which we would like to maximize the yield of the inter-mediate B. This example was motivated by a similar example in Tenny [146]. The state, inputs,disturbance model, measurement, and state-evolution equations are given by

x =[cA cB

]T(13.17)

u =[Tc cAf

]T(13.18)

uk =

[Tc

cAf

]+

[10

]dk (13.19)

yk =[0 1

]xk (13.20)

dcAdt

=F

V(cA,f − cA)− k1,0 exp

(E1

RT

)cA (13.21)

dcBdt

= k1 ∗ cA − k2,0 exp(E2

RT

)cB −

F

VcB (13.22)

(13.23)

271

MHE N = 10MHE N = 2

EKF

Time (hr)

Out

putT

(K)

54.543.532.521.510.50

360

355

350

345

340

335

330

cMHE N = 10uMHE N = 10

MHE N = 2EKF

Time (hr)

Inpu

tTC

(K)

543210

340

330

320

310

300

290

280

uMHE N = 10

cMHE N = 10

MHE N = 2

EKF

Time (hr)

Con

cent

rati

onc A

(mol

/l)

54.543.532.521.510.50

10.90.80.70.60.50.40.30.20.1

0

uMHE N = 10cMHE N = 10

MHE N = 2EKF

Time (hr)

Tem

pera

tureT

(K)

54.543.532.521.510.50

380

370

360

350

340

330

Figure 13.5: Exothermic CSTR results: rejection of a feed disturbance using an output distur-bance model.

272

LMPC

cMHE, N=10

set point

Time (hr)

Out

putT

(K)

54.543.532.521.510.50

356354352350348346344342340338336

LMPCcMHE N = 10

Time (hr)

Inpu

tTC

(K)

543210

330325320315310305300295290

Figure 13.6: Exothermic CSTR: Comparison of best nonlinear results to linear MPC results.

Due to plant-model mismatch, we opt to operate at a set point less than the true maximumfor a given cAf value. Given an input disturbance on the coolant temperature Tc, multipleoptima again arise in the estimator, as demonstrated by Figure 13.8. For this example, weconsider the temporary disturbance in the measurement given by Figure 13.9. We examinethe closed loop performance given both the EKF and MHE. For the estimator, we choose thetuning parameters

Π0 = diag(0.12, 0.12, 1), Qd = 1, R = 10−8 (13.24)

with no state noise. The MHE implementation uses a short estimation horizon of N = 5 andthe smoothing update. For the controller, we use a prediction horizon of Nc = 60 with a timeincrement of ∆t = 0.05 hours. The controller penalty matrices are

Q = diag(0, 400), R = diag(0.5, 50) (13.25)

Figure 13.10 presents the results of this example. MHE successfully rejects the distur-bance and returns the output to set point, whereas the EKF estimates cause a target calcula-tion failure that results in a shutdown of the process. The estimated state cA and disturbance

273

A→ B → C

T

Tset

Tf

cAf

cAf ,set

cb

Figure 13.7: Maximum yield CSTR

Parameter ValueF 100.V 100.k1,0 7.2× 1010

k2,0 5.2× 1010

E1/R 8750E2/R 9700

Table 13.2: Maximum yield CSTR parameters.

demonstrate that the EKF experiences considerable difficulties resolving the sudden changesin the measurement caused by the output. Ultimately these difficulties lead to the failure ofthe target calculation.

274

modelsplant

set point

Input Tc (K)

Out

putC

b(m

ol/l

)

500450400350300

0.70.60.50.40.30.20.1

0

Figure 13.8: Maximum yield CSTR steady states

Time (hr)

Out

putD

istu

rban

ced

k(m

ol/l

)

43.532.521.510.50

0.2

0.15

0.1

0.05

0

-0.05

-0.1

Figure 13.9: Maximum yield CSTR: temporary output disturbance

275

target calculation fails

set point

MHE

EKF

Time (hr)

Out

putc

B(m

ol/l

)

43.532.521.510.50

0.70.60.50.40.30.20.1

0


set point

MHE

EKF

Time (hr)

c A(m

ol/l

)

43.532.521.510.50

21.81.61.41.2

10.80.60.40.2

0


MHE

EKF

Time (hr)

Dis

turb

ance

(l/h

r)

43.532.521.510.50

200

150

100

50

0

-50

-100

Figure 13.10: Maximum yield CSTR results: (a) measurement cB , (b) estimated state cA, and(c) estimated disturbance. EKF and MHE denote extended Kalman filter and moving-horizonestimator, respectively.

276

13.3 Conclusions

In this chapter, we demonstrated how integrated disturbances used in conjunction with non-linear models could induce multiple optima in the estimator. The two examples clearly illus-trated that the quality of feedback depends on the quality of the state estimate, since MHE ex-hibited superior performance to the EKF when used in conjunction with nonlinear MPC. Ad-ditionally, we observed that increasing the MHE horizon length led to improved closed-loopperformance at the expense of increased computational burden. Finally, for the exothermicCSTR example, we obtained no significant improvement in nonlinear over linear control fordisturbance rejection. This result provides a preliminary indication that the expected improve-ment in disturbance rejection given a nonlinear model requires a better disturbance model.

Notation

D input constraint matrixd input constraint vectorH state constraint matrixh state constraint vectorM change in input constraint matrixm change in input constraint vectorN control horizon lengthP penalty matrix for the terminal state in the control horizonu input vectorx state vector

277

Chapter 14

ConclusionsThis thesis has addressed improving and applying stochastic and deterministic methods formodeling chemically reacting systems. The three primary focuses of this thesis were simulat-ing and using stochastic simulations, deriving and applying deterministic population balancemodels, and applying and improving moving-horizon state estimation. In this chapter, webriefly recap the primary contributions made to each of these topics and outline future av-enues of research.

Chapters 4 through 8 considered stochastic models with an emphasis on chemical ki-netics: how to efficiently simulate such models, and how to maximize the utility of thesemodels. For these models, exact simulation methods can be computationally expensive dueto the fact that the computational expense scales linearly with the number of reaction events.Additionally, little is currently known about how to efficiently extract information from thesemodels (the so-called systems-level tasks). Chapter 4 considered approximations for moreefficient simulation of stochastic chemical kinetics models governed by the discrete masterequation. By first partitioning reactions into sets of fast and slow reactions and then mak-ing either an equilibrium, Langevin, or deterministic approximation for the fast reactions, wewere able to derive coupled master equations that approximated the complete master equa-tion. These derivations led to simulation strategies that can significantly decrease the com-putational expense of evaluating these models while still accurately reconstructing momentsof the exact probability distribution. Chapter 5 considered biased approximations for meansensitivities of the discrete master equation given only simulation data. Here, we proposeda suitable approximation (first-order error with respect to the mean) that required insignif-icant computational effort in comparison to finite difference methods. These approximatesensitivities enabled efficient execution of systems-level tasks due to the fact that optimizationalgorithms generally converge without exact gradients. In Chapter 6, we investigated meth-ods for computing unbiased mean sensitivities. We first explained why calculating unbiasedmean sensitivities for simulations governed by discrete master equations is difficult: namely,the interchange of the the differentiation and expectation operators required to compute thesensitivity is not valid for a finite number of Monte Carlo reconstructions. To overcome thisproblem, we applied smoothed perturbation analysis to evaluate sensitivities for discrete-time,state-dependent Markov chain models. To account for the effect of parameters on the timing

278

of continuous events, we introduced the novel technique of smoothing by time integration.These two methods (smoothed perturbation analysis and smoothing by integration) can becombined to estimate sensitivities for the problem of interest, simulations of stochastic chem-ical kinetics governed by the discrete master equation. However, problems arise in imple-menting the smoothed perturbation analysis, making this method more expensive to evalu-ate than the biased sensitivity estimates proposed in the previous chapter. In Chapter 7, weproposed a novel method for calculating exact sensitivities for simulations of stochastic differ-ential equations. For this case, the simulated sample paths are continuous, so we calculatedsensitivities by simply differentiating the sample paths with respect to the parameters. We alsodemonstrated how these sensitivities could be used to efficiently perform systems-level tasks,including steady-state analysis and parameter estimation. However, the results demonstratedlittle improvement over finite difference methods for fixed-time step integration schemes. Fi-nally, Chapter 8 applied many of the techniques for simulating and using discrete simulationsto model batch crystallization systems. Here the primary contribution consisted of demon-strating that stochastic simulation provides a flexible solution technique for examining manypossible reaction mechanisms. A second contribution was showing that optimization of thestochastic model is feasible and requires relatively few evaluations of the model.

We see many areas for future work in the area of stochastic simulation, including:

• robust software packages that can

1. adaptively partition reactions into fast and slow subsets, applying appropriate ap-proximations for each subset, and

2. adaptively control the error at each step;

• rigorous error analysis for all of the approximations in comparison to the solution of theoriginal master equation; and

• efficient methods for evaluating unbiased estimates of mean sensitivities for the discretemaster equation governing stochastic chemical kinetics.

Chapters 9 through 11 examined population balance models for virus infections. Cur-rently, most modelers focus either solely on the extracellular or intracellular levels, even thoughmany in vitro and in vivo experiments involve interactions between the two levels. Populationbalance models can incorporate both levels of information, but solving these models numeri-cally can prove computationally expensive. In Chapter 9, we first derived a population balancemodel that incorporated intracellular and extracellular levels of information. These modelspermit differentiating between cells in the population. We then compared this model to othersimpler models, such as extracellular models and models that assume all cells in the popula-tion are identical. The results demonstrated that the cell population balance models can moreintuitively account for experimentally-observed phenomena than these simpler models, suchas multiple rounds of infection and pharmacokinetic delays associated with drug treatments

279

of infections. Chapter 10 considered modeling experimental data from the focal infection sys-tem for the vesicular stomatitis virus. Here, our emphasis was understanding the dynamicsof multiple rounds of virus infection and antiviral host response. For host cells without an an-tiviral response, namely baby hamster kidney cells, extracellular models adequately describedthe dynamics contained in the experimental measurements. In this case, the model suggestedthat an initial condition effect possibly resulting from the experimental technique led to salientfeatures of the experimental data. For host cells with an interferon antiviral response, in thiscase murine astrocytoma cells, an age-segregated population balance model best described theexperimental measurements. Here, the model suggested intracellular production rates of bothvirus and interferon. However, combinations of parameters fit the data equally well. Con-sequently, additional measurements are required to uniquely determine all parameters in themodel. Chapter 11 revisited the formulation of the population balance model and proposed adecomposition for solving these models when flow of information is restricted from the extra-cellular to intracellular level. As demonstrated by the examples, this decomposition permitsefficient and accurate solution of the population balance model. Additionally, the model re-sults can be used to predict population-level measurements of intracellular species.

Our work in modeling the focal infection system serves as a first step in providing aquantitative understanding of multiple rounds of both viral infection and host antiviral re-sponse. Additional experimental measurements such as microarray data or using reportergenes to detect interferon up-regulation should provide further constraints to the developedmodel and necessitate future model modification. We expect future iterations of additionalexperiments, measurements, and modeling to elucidate an even better comprehensive under-standing of both viral infections and cell-cell signaling. At the same time, we also expectthe experiments to eventually out-pace our current capability to solve adequate models. Forexample, the decomposition technique for solving population balance models presented inChapter 11 cannot account for a graded antiviral response resulting from differing exposuresof host cells to interferon; rather, cells are either completely resistant or completely susceptibleto viral infection. Accounting for such a graded response requires more efficient methods forsolving coupled integro-partial differential equations than the ones presented in this thesis.

Chapters 12 and 13 considered the state estimation problem of determining the max-imum a posteriori state given dynamic process measurements and nonlinear system mod-els. The current industrial standard for this estimation problem, the extended Kalman filter,is computationally efficient but treats the a posteriori distribution as approximately normal.This approximation is not appropriate for multimodal a posteriori distributions, but it is notclear if such distributions arise in chemically reacting systems. In Chapter 12, we examineddifferent probabilistic observers that approximately solve this problem, namely the extendedKalman filter, Monte Carlo observers, and moving-horizon estimation. We outlined conditionsin which multiple modes can appear in the a posteriori distribution, and demonstrated thatthe judicious use of constraints and nonlinear optimization as employed by moving-horizonestimation can lead to significantly better state estimates than the extended Kalman filter. Ad-ditionally, we proposed that Monte Carlo methods could be used to provide improved esti-

280

mates of the arrival cost in the moving-horizon formulation. In Chapter 13, we consideredthe performance of both moving-horizon estimation and the extended Kalman filter in closed-loop feedback control. Here, using moving-horizon estimation provided superior closed-loopperformance than using the extended Kalman filter for cases in which the estimation problemexhibited multiple optima. However, we saw little difference in the performance of nonlin-ear control versus linear control for the case of disturbance rejection. We attribute this phe-nomenon to the lack of a properly-tuned disturbance model.

The primary areas for future work in moving-horizon estimation include

• exploring the benefit of using Monte Carlo observers to approximate the arrival costfunction;

• distinguishing between local optima arising in the estimation problem, ideally throughapplication of global optimization;

• reducing the computational expense of the optimization required to solve the moving-horizon estimation problem; and

• improving the estimation performance by accurately identifying covariance matricesfrom experimental data.

In conclusion, this thesis has addressed both stochastic and deterministic models forchemically reacting systems, as well as how to best extract information from these modelsfor purposes other than pure simulation. We believe that the simulation and systems-leveltasks developed here should prove useful in modeling and understanding dynamic physicalsystems. Additionally, we hope that the modeling of multiple rounds of viral infections andhost antiviral response will provide an integrated, quantitative understanding of how theseinfections propagate, and how to best control this propagation.

281

Bibliography[1] N. R. Abu-Absi, A. Zamamiri, J. Kacmar, S. J. Balogh, and F. Srienc. Automated flow

cytometry for acquisition of time-dependent population data. Cytometry, 51A(2):87–96,February 2003.

[2] D. L. Alspach and H. W. Sorenson. Nonlinear Bayesian estimation using Gaussian sumapproximations. IEEE Transactions on Automatic Control, AC-17(4):439–448, 1972.

[3] A. Arkin, J. Ross, and H. McAdams. Stochastic kinetic analysis of developmental path-way bifurcation in phage lambda-infected Escherichia coli cells. Genetics, 149(4):1633–1648, August 1998.

[4] A. Armaou and I. G. Kevrekidis. Optimal switching policies using coarse timesteppers.In Proceedings of the IEEE Conference on Decision and Control, Maui, Hawaii, December2003.

[5] A. Armaou, C. I. Siettos, and I. G. Kevrekidis. Time-steppers and control of microscopicdistributed processes. Accepted for publication in Int. J. Robust Nonlinear Control, 2003.

[6] J. E. Bailey and D. F. Ollis. Biochemical Engineering Fundamentals. McGraw-Hill, NewYork, 1986.

[7] L. A. Ball, C. R. Pringle, B. Flanagan, V. P. Perepelitsa, and G. W. Wertz. Phenotypicconsequences of rearranging the P, M, and G genes of vesicular stomatitis virus. J. Virol.,73(6):4705–4712, June 1999.

[8] R. Bandyopadhyaya, R. Kumar, K. S. Gandhi, and D. Ramkrishna. Modeling of precipi-tation in reverse micellar systems. Langmuir, 13:3610–3620, 1997.

[9] J. Bell, B. Lichty, and D. Stojdl. Getting oncolytic virus therapies off the ground. Cancercell, 4(1):7–11, July 2003.

[10] W. E. Bentley, B. Kebede, T. Franey, and M.-Y. Wang. Segregated characterization ofrecombinant epoxide hydrolase synthesis via the baculovirus/insect cell expression sys-tem. Chemical Engineering Science, 49(24A):4133–4141, December 1994.

[11] R. B. Bird, W. E. Stewart, and E. N. Lightfoot. Transport Phenomena. John Wiley & Sons,New York, 1960.

282

[12] E. Bølviken, P. J. Acklam, N. Christopherson, and J.-M. Størdal. Monte Carlo filters fornon-linear state estimation. Automatica, 37(2):177–183, February 2001.

[13] S. Bonhoeffer, R. M. May, G. M. Shaw, and M. A. Nowak. Virus dynamics and drugtherapy. Proc. Natl. Acad. Sci. USA, 94(13):6971–6976, June 1997.

[14] G. E. P. Box and G. C. Tiao. Bayesian Inference in Statistical Analysis. Addison–WesleyPublishing Company, Reading, Massachusetts, 1st edition, 1973.

[15] P. N. Brown, A. C. Hindmarsh, and L. R. Petzold. Using Krylov methods in the solu-tion of large-scale differential-algebraic systems. SIAM Journal on Scientific Computing,15(6):1467–1488, November 1994.

[16] Y. Cao, D. T. Gillespie, and L. R. Petzold. The slow-scale stochastic simulation algorithm.Journal of Chemical Physics, 122(1):014116, January 2005.

[17] M. Caracotsios and W. E. Stewart. Sensitivity analysis of initial value problems withmixed ODEs and algebraic equations. Computers & Chemical Engineering, 9(4):359–365,1985.

[18] C. G. Cassandras and S. Lafortune. Introduction to Discrete Event Systems. The Kluwer In-ternational Series in Engineering and Computer Science. Kluwer Academic Publishers,Boston, MA, 1999.

[19] C. G. Cassandras, Y. Wardi, B. Melamed, G. Sun, and C. G. Panayiotou. Perturbationanalysis for on-line control and optimization of stochastic fluid models. IEEE Transac-tions on Automatic Control, AC-47(8):1234–1248, 2002.

[20] M. Chaves and E. Sontag. State-estimators for chemical reaction networks of Feinberg-Horn-Jackson zero deficiency type. European Journal of Control, 8(4):343–359, 2002.

[21] C.-T. Chen. Linear System Theory and Design. Oxford University Press, 3rd edition, 1999.

[22] W. S. Chen, S. Ungarala, B. Bakshi, and P. Goel. Bayesian rectification of nonlinear dy-namic processes by the weighted bootstrap. In AIChE Annual Meeting, Reno, Nevada,2001.

[23] D. K. Dacol and H. Rabitz. Sensitivity analysis of stochastic kinetic models. J. Math.Phys., 25(9):2716–2727, September 1984.

[24] W. M. Deen. Analysis of Transport Phenomena. Topics in chemical engineering. OxfordUniversity Press, Inc., New York, 1998.

[25] T. O. Drews, R. D. Braatz, and R. C. Alkire. Parameter sensitivity analysis of MonteCarlo simulations of copper electrodeposition with multiple additives. Journal of TheElectrochemical Society, 150(11):C807–C812, November 2003.

283

[26] K. A. Duca, V. Lam, I. Keren, E. E. Endler, G. J. Letchworth, I. S. Novella, and J. Yin.Quantifying viral propagation in vitro: Toward a method for characterization of complexphenotypes. Biotechnology Progress, 17(6):1156–1165, November–December 2001.

[27] M. Eigen, C. K. Biebricher, M. Gebinoga, and W. C. Gardiner. The hypercycle. Cou-pling of RNA and protein biosynthesis in the infection cycle of an RNA bacteriophage.Biochemistry, 30(46):11005–11018, November 1991.

[28] E. E. Endler, K. A. Duca, P. F. Nealey, G. M. Whitesides, and J. Yin. Propagation ofviruses on micropatterned host cells. Biotechnology and Bioengineering, 17(6):1156–1165,November–December 2003.

[29] D. Endy, D. Kong, and J. Yin. Intracellular kinetics of a growing virus: A geneticallystructured simulation for bacteriophage T7. Biotechnology and Bioengineering, 55(2):375–389, July 1997.

[30] D. Endy and J. Yin. Toward antiviral strategies that resist viral escape. AntimicrobialAgents and Chemotherapy, 44(4):1097–1099, April 2000.

[31] A. M. Fendrick, A. S. Monto, B. Nightengale, and M. Sarnes. The economic burden ofnon-influenza-related viral repiratory tract infection in the United States. Archives ofInternal Medicine, 163(4):487–494, February 2003.

[32] R. J. Field and R. M. Noyes. Oscillations in chemical systems. IV. Limit cycle behavior ina model of a real chemical reaction. Journal of Chemical Physics, 60(5):1877–1884, March1974.

[33] A. P. Fordyce and J. B. Rawlings. A segregated fermentation model for growth anddifferentiation of Bacillus licheniformis. AIChE Journal, 42(11):3241–3252, November 1996.

[34] J. Fort. A comment on amplification and spread of viruses in a growing plaque. Journalof Theoretical Biology, 214(3):515–518, February 2002.

[35] J. Fort and V. Mendez. Time-delayed spread of viruses in growing plaques. Phys. Rev.Lett., 89(17):178101–1–178101–4, October 2002.

[36] A. G. Fredrickson, D. Ramkrishna, and H. M. Tsuchiya. Statistics and dynamics of pro-caryotic cell populations. Mathematical Biosciences, 1:327–374, 1967.

[37] M. Fu and J.-Q. Hu. Conditional Monte Carlo: Gradient Estimation and Optimization Appli-cations. The Kluwer International Series in Engineering and Computer Science. KluwerAcademic Publishers, Boston, MA, 1997.

[38] M. A. Gallivan. Modeling and Control of Epitaxial Thin Film Growth. PhD thesis, CaliforniaInstitute of Technology, 2003.

284

[39] M. A. Gallivan and R. M. Murray. Model reduction and system identification for masterequation control systems. In Proceedings of the American Control Conference, pages 3561–3566, Denver, Colorado, June 2003.

[40] T. C. Gard. Introduction to Stochastic Differential Equations. Marcel Dekker, Inc., 1988.

[41] C. W. Gardiner. Handbook of Stochastic Methods for Physics, Chemistry, and the NaturalSciences. Springer-Verlag, Berlin, Germany, 2nd edition, 1990.

[42] A. Genz and R. E. Kass. A collection of numerical integration software for Bayesian anal-ysis. Available from http://www.sci.wsu.edu/math/faculty/genz/homepage, 1998.

[43] M. A. Gibson and J. Bruck. Efficient exact stochastic simulation of chemical systems withmany species and many channels. Journal of Physical Chemistry A, 104:1876–1889, 2000.

[44] M. A. Giedlin, D. N. Cook, and J. Thomas W. Dubensky. Vesicular stomatitis virus: Anexciting new therapeutic oncolytic virus candidate for cancer or just another chapterfrom Field’s Virology? Cancer Cell, 4(4):241–243, October 2003.

[45] D. T. Gillespie. A general method for numerically simulating the stochastic time evolu-tion of coupled chemical reactions. Journal of Computational Physics, 22:403–434, 1976.

[46] D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal ofPhysical Chemistry, 81:2340–2361, 1977.

[47] D. T. Gillespie. Markov Processes: An Introduction for Physical Scientists. Academic Press,Inc., 1992.

[48] D. T. Gillespie. A rigorous derivation of the chemical master equation. Physica A,188:404–425, 1992.

[49] D. T. Gillespie. The chemical Langevin equation. Journal of Chemical Physics, 113(1):297–306, 2000.

[50] D. T. Gillespie. Approximate accelerated stochastic simulation of chemically reactingsystems. Journal of Chemical Physics, 115(4):1716–1733, July 2001.

[51] D. T. Gillespie and L. R. Petzold. Improved leap-size selection for accelerated stochasticsimulation. Journal of Chemical Physics, 119(16):8229–8234, October 2003.

[52] J. R. Gooch and M. J. Hounslow. Monte Carlo simulation of size-enlargement mecha-nisms in crystallization. AIChE Journal, 42(7):1864–1874, 1996.

[53] N. Gordon, D. Salmond, and A. Smith. Novel approach to nonlinear/non-GaussianBayesian state estimation. IEE Proceedings F-Radar and Signal Processing, 140(2):107–113,April 1993.

285

[54] N. Grandvaux, B. R. tenOever, M. J. Servant, and J. Hiscott. The interferon antiviralresponse: from viral invasion to evasion. Current Opinion in Infectious Diseases, 15(3):259–267, June 2002.

[55] R. Gudi, S. Shah, and M. Gray. Multirate state and parameter estimation in an antibi-otic fermentation with delayed measurements. Biotechnology and Bioengineering, 44:1271–1278, 1994.

[56] E. L. Haseltine, D. B. Patience, and J. B. Rawlings. On the stochastic simulation of par-ticulate systems. Accepted for publication in ChE. Sci., 2004.

[57] E. L. Haseltine and J. B. Rawlings. Approximate simulation of coupled fast and slowreactions for stochastic chemical kinetics. Journal of Chemical Physics, 117(15):6903–7390,October 2002.

[58] E. L. Haseltine and J. B. Rawlings. A critical evaluation of extended Kalman filteringand moving horizon estimation. Technical Report 2002–03, TWMCC, Department ofChemical Engineering, University of Wisconsin-Madison, August 2002.

[59] E. L. Haseltine and J. B. Rawlings. A critical evaluation of extended Kalman filteringand moving horizon estimation. Accepted for publication in Industrial & EngineeringChemistry Research, 2004.

[60] E. L. Haseltine, J. B. Rawlings, and J. Yin. Dynamics of viral infections: Incorporatingboth the intracellular and extracellular levels. Computers & Chemical Engineering, 2005.In press.

[61] J. He, H. Zhang, J. Chen, and Y. Yang. Monte Carlo simulation of kinetics and chainlength distributions in living free-radical polymerization. Macromolecules, 30(25):8010–8018, December 15 1997.

[62] A. V. M. Herz, S. Bonhoeffer, R. M. Anderson, R. M. May, and M. A. Nowak. Viraldynamics in vivo: Limitations on estimates of intracellular delay and virus decay. Proc.Natl. Acad. Sci. USA, 93:7427–7251, July 1996.

[63] Y.-C. Ho and X.-R. Cao. Perturbation Analysis of Discrete Event Dynamic Systems. KluwerAcademic Press, Boston, 1991.

[64] J. J. Holland, L. P. Villarreal, and M. Breindl. Factors involved in the generation andreplication of rhabdovirus defective T particles. J. Virol., 17(3):805–815, March 1976.

[65] H. M. Hulburt and S. Katz. Some problems in particle technology: A statistical mechan-ical formulation. Chemical Engineering Science, 19:555–574, 1964.

[66] Y. Husimi, K. Nishigaki, Y. Kinoshita, and T. Tanaka. Cellstat - a continuous culturesystem of a bacteriophage for the study of the mutation rate and the selection process ofthe DNA level. Review of Scientific Instruments, 53(4):517–522, 1982.

286

[67] F. J. Isaacs, J. Hasty, C. R. Cantor, and J. J. Collins. Prediction and measurement of anautoregulatory genetic module. Proc. Natl. Acad. Sci. USA, 100(13):7714–7719, June 2003.

[68] A. P. J. Jansen. Monte Carlo simulations of chemical reactions on a surface with time-dependent reaction-rate constants. Comput. Phys. Commun., 86:1–12, 1995.

[69] J. A. M. Janssen. The elimination of fast variables in complex chemical reacions. II.Mesoscopic level (reducible case). J. Stat. Phys., 57(1/2):171–185, 1989.

[70] J. A. M. Janssen. The elimination of fast variables in complex chemical reacions. III.Mesoscopic level (irreducible case). J. Stat. Phys., 57(1/2):187–198, 1989.

[71] A. H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, New York,1970.

[72] D. G. Kendall. Stochastic processes and population growth. Journal of the Royal StatisticalSociety: Series B, 11:230–264, 1949.

[73] A. Knijnenburg and U. Kreischer. Discrete simulation of replication of a RNA-bacteriophage prototype system. In K. Bellman, editor, Molecular Genetics InformationSystems: Modelling and Simulation, pages 267–290, Berlin, 1983. Akademie-Verlag.

[74] D. Kong and J. Yin. Whole-virus vaccine development by continuous-culture on a com-plementing host. Biotechnology, 13(6):583–586, June 1995.

[75] E. Kreyszig. Advanced Engineering Mathematics. John Wiley & Sons, New York, 8th edi-tion, 1999.

[76] T. G. Kurtz. The relationship between stochastic and deterministic models for chemicalreactions. Journal of Chemical Physics, 57(7):2976–2978, 1972.

[77] V. Lam, K. A. Duca, and J. Yin. Arrested spread of vesicular stomatitis virus infections invitro depends on interferon-mediated antiviral activity. Biotech. Bioeng., In press, 2005.

[78] I. J. Laurenzi. Stochastic Processes in Biological and Biochemical Kinetics. PhD thesis, Uni-versity of Pennsylvania, October 2002.

[79] I. J. Laurenzi and S. L. Diamond. Monte Carlo simulation of the heterotypic aggregationkinetics of platelets and neutrophils. Biophysical Journal, 77:1733–1746, 1999.

[80] P. Licari and J. E. Bailey. Modeling the population dynamics of baculovirus-infectedinsect cells: Optimizing infection strategies for enhanced recombinant protein yields.Biotechnology and Bioengineering, 39(4):432–441, February 1992.

[81] Y. Lou and P. D. Christofides. Estimation and control of surface roughness in thin filmgrowth using kinetic Monte-Carlo models. Chemical Engineering Science, 58(14):3115–3129, July 2003.

287

[82] Y. Lou and P. D. Christofides. Feedback control of growth rate and surface roughness inthin film growth. AIChE Journal, 49(8):2099–2113, August 2003.

[83] S. E. Luria. General Virology. John Wiley & Sons, New York, 1953.

[84] D. L. Ma, R. D. Braatz, and D. K. Tafti. Compartmental modeling of multidimensionalcrystallization. International Journal of Modern Physics B, 16(1–2):383–390, January 2002.

[85] A. G. Makeev, D. Maroudas, A. Z. Panagiotopoulos, and I. G. Kevrekidis. Coasrse bi-furcation analysis of kinetic Monte Carlo simulations: A lattice-gas model with lateralinteractions. Journal of Chemical Physics, 117(18):8229–8240, November 2002.

[86] S. Manjunath, K. S. Gandhi, R. Kumar, and D. Ramkrishna. Precipitation in smallsystems–I. Stochastic analysis. Chemical Engineering Science, 49(9):1451–1463, 1994.

[87] N. V. Mantzaris, P. Daoutidis, and F. Srienc. Numerical solution of multi-variable cellpopulation balance models: I. Finite difference methods. Computers & Chemical Engi-neering, 25(11–12):1411–1440, November 2001.

[88] N. V. Mantzaris, P. Daoutidis, and F. Srienc. Numerical solution of multi-variable cellpopulation balance models. II. Spectral methods. Computers & Chemical Engineering,25(11–12):1441–1462, November 2001.

[89] N. V. Mantzaris, P. Daoutidis, and F. Srienc. Numerical solution of multi-variable cellpopulation balance models. III. Finite element methods. Computers & Chemical Engineer-ing, 25(11–12):1463–1481, November 2001.

[90] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Constrained model pre-dictive control: Stability and optimality. Automatica, 36(6):789–814, 2000.

[91] H. McAdams and A. Arkin. Simulation of prokaryotic genetic circuits. Annu. Rev. Bio-phys. Bio., 27:199–224, 1998.

[92] B. J. McCoy. A new population balance model for crystal size distributions: re-versible, size-dependent growth and dissolution. Journal of Colloid and Interface Science,240(1):139–149, 2001.

[93] S. A. Middlebrooks. Modelling and Control of Silicon and Germanium Thin Film ChemicalVapor Deposition. PhD thesis, University of Wisconsin–Madison, 2001.

[94] S. Munir and V. Kapur. Regulation of host cell transcriptional physiology by the avianpneumovirus provides key insights into host-pathogen interactions. Journal of Virology,77(8):4899–4910, 2003.

[95] A. C. Nathwani, R. Benjamin, A. W. Nienhuis, and A. M. Davidoff. Current status andprospects for gene therapy. Vox Sanguinis, 87(2):73–81, August 2004.

288

[96] J. C. Nichol and H. F. Deutsch. Biophysical studies of blood plasma proteins. VII. Sep-aration of γ-globulin from the sera of various animals. Journal of the American ChemicalSociety, 70(1):80–83, January 1948.

[97] J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, New York, 1999.

[98] M. A. Nowak and R. M. May. Virus Dynamics: Mathematical Principles of Immunology andVirology. Oxford University Press, 2000.

[99] B. A. Ogunnaike and W. H. Ray. Process Dynamics, Modeling, and Control. Oxford Uni-versity Press, New York, 1994.

[100] A. S. Perelson. Modelling viral and immune system dynamics. Nature Reviews Immunol-ogy, 2(1):28–36, January 2002.

[101] A. S. Perelson, A. U. Neumann, M. Markowitz, J. M. Leonard, and D. D. Ho. HIV-1dynamic in vivo: Virion clearance rate, infected cell life-span, and viral generation time.Science, 271(5255):1582–1586, March 1996.

[102] J. S. Porterfield, D. C. Burke, and A. C. Allison. An estimate of the molecular weightof interferon as measured by its rate of diffusion through agar. Virology, 12(2):197–203,October 1960.

[103] V. Prasad, M. Schley, L. P. Russo, and B. W. Bequette. Product property and productionrate control of styrene polymerization. Journal of Process Control, 12(3):353–372, 2002.

[104] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C.Cambridge University Press, Cambridge, 2nd edition, 1992.

[105] S. Raimondeau, P. Aghalayam, A. B. Mhadeshwar, and D. G. Vlachos. Parameter opti-mization of molecular models: Application to surface kinetics. Industrial and EngineeringChemistry Research, 42(6):1174–1183, March 2003.

[106] D. Ramkrishna. Analysis of population balance—IV: The precise connection betweenMonte Carlo simulation and population balances. Chemical Engineering Science, 36:1203–1209, 1981.

[107] D. Ramkrishna. Population Balances. Academic Press, San Deigo, 2000.

[108] D. Ramkrishna and J. D. Borwanker. A puristic analysis of population balance–I. Chem-ical Engineering Science, 28:1423–1435, 1973.

[109] D. Ramkrishna and J. D. Borwanker. A puristic analysis of population balance–II. Chem-ical Engineering Science, 29:1711–1721, 1974.

[110] A. D. Randolph and M. A. Larson. Transient and steady-state size distributions in con-tinuous mixed suspension crystallizers. AIChE Journal, 8(5):639–645, 1962.

289

[111] A. D. Randolph and E. T. White. Modeling size dispersion in the prediction of crystal-size distribution. Chemical Engineering Science, 32:1067–1076, 1977.

[112] C. V. Rao. Moving Horizon Strategies for the Constrained Monitoring and Control of NonlinearDiscrete-Time Systems. PhD thesis, University of Wisconsin–Madison, 2000.

[113] C. V. Rao and A. P. Arkin. Stochastic chemical kinetics and the quasi-steady-stateassumption: Application to the Gillespie algorithm. Journal of Chemical Physics,118(11):4999–5010, March 2003.

[114] C. V. Rao and J. B. Rawlings. Constrained process monitoring: moving-horizon ap-proach. AIChE Journal, 48(1):97–109, January 2002.

[115] C. V. Rao, J. B. Rawlings, and J. H. Lee. Constrained linear state estimation – a movinghorizon approach. Automatica, 37(10):1619–1628, 2001.

[116] C. V. Rao, J. B. Rawlings, and D. Q. Mayne. Constrained state estimation for nonlineardiscrete-time systems: stability and moving horizon approximations. IEEE Transactionson Automatic Control, 48(2):246–258, February 2003.

[117] M. Rathinam, L. R. Petzold, Y. Cao, and D. T. Gillespie. Stiffness in stochastic chem-ically reacting systems: The implicit tau-leaping method. Journal of Chemical Physics,119(24):12784–12794, December 2003.

[118] J. Rawlings. Tutorial overview of model predictive control. IEEE Control Systems Maga-zine, 20:38–52, 2000.

[119] J. B. Rawlings. Tutorial: Model predictive control technology. In Proceedings of the Amer-ican Control Conference, San Diego, CA, pages 662–676, 1999.

[120] J. B. Rawlings and J. G. Ekerdt. Chemical Reactor Analysis and Design Fundamentals. NobHill Publishing, Madison, WI, 2002.

[121] J. B. Rawlings, W. R. Witkowski, and J. W. Eaton. Modelling and control of crystallizers.Powder Technology, 69:3–9, 1992.

[122] B. Reddy and J. Yin. Quantitative intracellular kinetics of HIV type 1. AIDS Research andHuman Retroviruses, 15(3):273–283, February 1999.

[123] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen. Stochastic stability of the discrete-timeextended Kalman filter. IEEE Transactions on Automatic Control, 44(4):714–728, April 1999.

[124] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen. Stochastic stability of the continuous-timeextended Kalman filter. IEE Proceedings-Control Theory and Applications, 147(1):45–52,January 2000.

290

[125] K. Reif and R. Unbehauen. The extended Kalman filter as an exponential observer fornonlinear systems. IEEE Transactions on Signal Processing, 47(8):2324–2328, August 1999.

[126] H. Resat, H. S. Wiley, and D. A. Dixon. Probability-weighted dynamic Monte Carlomethod for reaction kinetics simulations. Journal of Physical Chemistry B, 105(44):11026–11034, 2001.

[127] R. G. Rice and D. D. Do. Applied mathematics and modeling for chemical engineers. WileySeries in Chemical Engineering. John Wiley & Sons, Inc., New York, 1995.

[128] D. G. Robertson, J. H. Lee, and J. B. Rawlings. A moving horizon-based approach forleast-squares state estimation. AIChE Journal, 42(8):2209–2224, August 1996.

[129] J. K. Rose and M. A. Whitt. Fundamental Virology, chapter Rhabdoviridae: The viruses andtheir replication, pages 665–688. Lippincott Williams & Wilkins, fourth edition, 2001.

[130] S. M. Ross. A first course in probability. Prentice Hall, Upper Saddle River, N. J., 5thedition, 1998.

[131] P. Royston. A remark on algorithm AS 181: The w-test for normality. Appl. Stat.,44(4):547–551, 1995.

[132] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, Inc., New York, third edi-tion, 1976.

[133] C. E. Samuel. Antiviral actions of interferons. Clinical Microbiology Reviews, 14(4):778–809, October 2001.

[134] A. Schwienhorst, B. F. Lindemann, and M. Eigen. Growth kinetics of a bacteriophage incontinuous culture. Biotechnology and Bioengineering, 50(2):217–221, April 1996.

[135] G. C. Sen. Viruses and interferons. Annu. Rev. Microbiol., 55:255–281, 2001.

[136] B. H. Shah, D. Ramkrishna, and J. D. Borwanker. Simulation of particulate systems usingthe concept of the interval of quiescence. AIChE Journal, 23(6):897–904, 1977.

[137] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete sam-ples). Biometrika, 52(3–4):591–611, 1965.

[138] C. I. Siettos, A. Armaou, A. G. Makeev, and I. G. Kevrekidis. Microscopic/stochastictimesteppers and “coarse” control: A KMC example. AIChE Journal, 49(7):1922, 2003.

[139] C. I. Siettos, D. Maroudas, and I. G. Kevrekidis. Coarse bifurcation diagrams via micro-scopic simulators: A state-feedback control-based approach. Accepted for publicationin Int. J. Bif. Chaos, 2003.

[140] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall,New York, 1986.

291

[141] M. Soroush. State and parameter estimations and their applications in process control.Computers & Chemical Engineering, 23(2):229–245, December 1998.

[142] J. C. Spall. Estimation via Markov chain Monte Carlo. IEEE Control Systems Magazine,23(2):34–45, April 2003.

[143] R. Srivastava, L. You, J. Summers, and J. Yin. Stochastic vs. deterministic modeling ofintracellular viral kinetics. Journal of Theoretical Biology, 218(3):309–321, October 2002.

[144] R. F. Stengel. Optimal Control and Estimation. Dover Publications, Inc., 1994.

[145] W. E. Stewart, M. Caracotsios, and J. P. Sørensen. Computer-Aided Modelling of ReactiveSystems. 1999. In preparation.

[146] M. Tenny. Computational Strategies for Nonlinear Model Predictive Control. PhD thesis,University of Wisconsin–Madison, 2002.

[147] M. J. Tenny and J. B. Rawlings. State estimation strategies for nonlinear model predictivecontrol. AIChE Annual Meeting, Reno, November 2001.

[148] M. J. Tenny and J. B. Rawlings. Efficient moving horizon estimation and nonlinear modelpredictive control. In Proceedings of the American Control Conference, pages 4475–4480,Anchorage, Alaska, May 2002.

[149] M. J. Tenny, J. B. Rawlings, and S. J. Wright. Closed-loop behavior of nonlinear modelpredictive control. AIChE Journal, 50(9):2142–2154, September 2004.

[150] M. J. Tenny, S. J. Wright, and J. B. Rawlings. Nonlinear model predictive control viafeasibility-perturbed sequential quadratic programming. Computational Optimization andApplications, 28(1):87–121, April 2004.

[151] J. Tramper, E. J. Vandenend, C. D. Degooijer, R. Kompier, F. L. J. Vanlier, M. Usmany,and J. M. Vlak. Production of baculovirus in a continuous insect-cell culture - Bioreactordesign, operation, and modeling. Annals of the New York Academy of Sciences, 589:423–430,May 1990.

[152] M. L. Tyler and M. Morari. Stability of constrained moving horizon estimation schemes.Preprint AUT96–18, Automatic Control Laboratory, Swiss Federal Institute of Technol-ogy, 1996.

[153] H. A. van der Vorst. Iterative Krylov Methods for Large Linear Systems. Number 13 inCambridge Monographs on Applied and Computational Mathematics. Cambridge Uni-versity Press, New York, NY, 2003.

[154] N. G. van Kampen. Stochastic Processes in Physics and Chemistry. Elsevier Science Pub-lishers, Amsterdam, The Netherlands, 2nd edition, 1992.

292

[155] J. Villadsen and M. L. Michelsen. Solution of Differential Equation Models by PolynomialApproximation. Prentice-Hall, Englewood Cliffs New Jersey, 1978.

[156] D. G. Vlachos. Instabilities in homogeneous nonisothermal reactors: Comparison ofdeterministic and Monte Carlo simulations. Journal of Chemical Physics, 102(4):1781–1790,1995.

[157] M. O. Vlad and A. Pop. A physical interpretation of age-dependent master equations.Physica A, 155(2):276–310, 1989.

[158] R. R. Wagner and A. S. Huang. Inhibition of RNA and interferon synthesis in Krebs-2cells infected with vesicular stomatitis virus. Virology, 28(1):1–10, January 1966.

[159] Y. Wang and E. D. Sontag. Output-to-state stability and detectability of nonlinear sys-tems. Systems & Control Letters, 29:279–290, 1997.

[160] B. R. Ware, T. Raj, W. H. Flygare, J. A. Lesnaw, and M. E. Reichmann. Molecular weightsof vesicular stomatitis virus and its defective particles by laser light-scattering spec-troscopy. J. Virol., 11(1):141–145, January 1973.

[161] G. W. Wertz and J. S. Youngner. Interferon production and inhibition of host synthesisin cells infected with vesicular stomatitis virus. J. Virol., 6(4):476–484, October 1970.

[162] D. O. White and F. J. Fenner. Medical Virology. Academic Press, fourth edition edition,1994.

[163] D. I. Wilson, M. Agarwal, and D. Rippin. Experiences implementing the extendedKalman filter on an industrial batch reactor. Computers & Chemical Engineering,22(11):1653–1672, 1998.

[164] D. Wodarz and M. A. Nowak. Mathematical models of HIV pathogenesis and treatment.BioEssays, 24(12):1178–1187, 2002.

[165] J. Yin and J. S. McCaskill. Replication of viruses in a growing plaque: a reaction-diffusionmodel. Biophysical Journal, 61(6):1540–1549, June 1992.

[166] L. You and J. Yin. Amplification and spread of viruses in a growing plaque. Journal ofTheoretical Biology, 200(4):365–373, 1999.

[167] H. Yu and C. G. Cassandras. Perturbation analysis for production control and optimiza-tion of manufacturing systems. Automatica, 40(6):945–956, June 2004.

293

Vita

Eric Lynn Haseltine was born in Kingsport, Tennessee to Doug and Lydia Haseltine. In June

1995, he graduated as valedictorian of his class from Dobyns-Bennett High School in Kingsport.

In May 1999, he graduated summa cum laude with departmental honors from Clemson Univer-

sity with a Bachelor of Science degree in Chemical Engineering. His undergraduate education

included three cooperative education rotations and one summer internship with the Eastman

Chemical Company in Kingsport, TN. In the fall of 1999, he began his graduate studies under

the direction of James B. Rawlings in the Department of Chemical Engineering at the Uni-

versity of Wisconsin-Madison. After surviving six Wisconsin winters, he will be heading for a

warmer climate as a post-doctoral fellow at the California Institute of Technology in Pasadena,

California.

Permanent Address: 3909 Hemlock Park Dr.Kingsport, TN 37663

This dissertation was prepared with LATEX 2ε1 by the author.

1This particular University of Wisconsin compliant style was carved from The University of Texas at Austinstyles as written by Dinesh Das (LATEX 2ε), Khe–Sing The (LATEX), and John Eaton (LATEX). Knives and chisels wieldedby John Campbell and Rock Matthews.

systems analysis of stochastic and population balance ... · frameworks such as stochastic and cell...

Documents