author's personal copy metabolic engineering with power-law and linear-logarithmic systems

10
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Upload: ull

Post on 27-Jan-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Metabolic Engineering with power-law and linear-logarithmic systems

Alberto Marin-Sanguino a,*, Nestor V. Torres b, Eduardo R. Mendoza c,d, Dieter Oesterhelt a

a Max Planck Institute of Biochemistry, Department of Membrane Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Bayern, Germanyb Grupo de Tecnologia Bioquimica, Departamento de Bioquimica y Biologia Molecular, Facultad de Biologia, Universidad de La Laguna, 38206 La Laguna, Tenerife, Islas Canarias, Spainc Department of Computer Science, University of the Philippines Diliman, Diliman, Quezon City 1101, Philippinesd Physics Department and Center for Nano-Science, Ludwig Maximillians University, Geschwister-Scholl-Platz 1, 80539 Munich, Germany

a r t i c l e i n f o

Article history:Received 30 June 2008Received in revised form 9 December 2008Accepted 16 December 2008Available online 11 January 2009

Keywords:Generalized mass actionS-systemLinlog(Log)linearMetabolic EngineeringBiochemical Systems Theory

a b s t r a c t

Metabolic Engineering aims to improve the performance of biotechnological processes through rationalmanipulation rather than random mutagenesis of the organisms involved. Such a strategy can only suc-ceed when a mathematical model of the target process is available. Simplifying assumptions are oftenneeded to cope with the complexity of such models in an efficient way, and the choice of such assump-tions often leads to models that fall within a certain structural template or formalism. The most popularformalisms can be grouped in two categories: power-law and linear-logarithmic. As optimization andanalysis of a model strongly depends on its structure, most methods in Metabolic Engineering have beendefined within a given formalism and never used in any other.

In this work, the four most commonly used formalisms (two power-law and two linear-logarithmic)are placed in a common framework defined within Biochemical Systems Theory. This framework definesevery model as matrix equations in terms of the same parameters, enabling the formulation of a commonsteady state analysis and providing means for translating models and methods from one formalism toanother. Several Metabolic Engineering methods are analysed here and shown to be variants of a singleequation. Particularly, two problem solving philosophies are compared: the application of the designequation and the solution of constrained optimization problems. Generalizing the design equation toall the formalisms shows it to be interchangeable with the direct solution of the rate law in matrix form.Furthermore, optimization approaches are concluded to be preferable since they speed the exploration ofthe feasible space, implement a better specification of the problem and exclude unrealistic results.

Beyond consolidating existing knowledge and enabling comparison, the systematic approach adoptedhere can fill the gaps between the different methods and combine their strengths.

� 2009 Elsevier Inc. All rights reserved.

1. Introduction

Metabolic Engineering aims to improve the performance of bio-technological systems through rational manipulation, as opposedto random mutagenesis or trial and error methods. In order toachieve such a goal for complex systems, a mathematical modelis needed. Simplifying assumptions are often needed to cope withthe complexity of such models in an efficient way, and the choiceof such assumptions often leads to models that fall within a certainstructural template or formalism. In addition to the advantages ofsimplification, developing systematic formalisms results in struc-turally homogeneous models for which standardized algorithmscan be developed. This has encouraged the search for fast approx-imate methods that take advantage of structural regularities. Thesemethods usually provide results that are close to the accurate solu-

tions found by slower general purpose methods [27], and further-more enable an interactive exploration of the problem. Severalmethods have been proposed in which an equation can be usedto calculate the manipulations needed to obtain a certain steadystate [9,3,12]. To achieve the same goal, optimization approacheshave also been proposed [30,24,13,15].

The most popular formalisms can be grouped in two categories:power-law [31] and linear-logarithmic [25]. In power-law models,rates and variables are linearized in logarithmic axes, in otherwords, they become linear in a log–log plot. The power-law formal-ism includes several variants, of which Generalized Mass Action(GMA) and S-systems are the most common. Linear-logarithmicmodels are based on a mixed linearization in which the reactionrates, and optionally some variables, stay in a Cartesian axis whilethe rest of the variables are transformed into a logarithmic scale.There are two variants of the linear-logarithmic formalism, namely(log)linear and linlog. Although all these model types can be ob-tained through a wide variety of methods [7,14,34,31], they areall based in the same information and their ability to portray differ-ent kinetic laws has been explained in terms of approximation the-

0025-5564/$ - see front matter � 2009 Elsevier Inc. All rights reserved.doi:10.1016/j.mbs.2008.12.010

* Corresponding author.E-mail addresses: [email protected], [email protected] (A. Marin-

Sanguino), [email protected] (N.V. Torres), [email protected] (E.R. Mendoza), [email protected] (D. Oesterhelt).

Mathematical Biosciences 218 (2009) 50–58

Contents lists available at ScienceDirect

Mathematical Biosciences

journal homepage: www.elsevier .com/locate /mbs

Author's personal copy

ory by deriving their fundamental equations through Taylor series[22]. In fact, even such a popular kinetic representation as the Hillequation has been shown to be a particular case of a Taylor series[22].

The first obstacle to overcome when comparing or combiningdifferent formalisms is the existence of different notations.Power-law models are normally presented within the notationspecific to Biochemical Systems Theory (BST) while linear-logarith-mic follow that of Metabolic Control Analysis (MCA). These twoframeworks have parallel histories which have been convergingduring the last 15 years and have been shown to be deeply related[1]. MCA, originally intended to be used without a dynamic model[10], is mainly based on matrix relations between local sensitivities(elasticities) and global sensitivities (response and control coeffi-cients) [16]. Furthermore, the classification of the variables withinits scope favors their separation according to biological properties(enzymes, metabolites, effectors, etc.). BST, on the other hand, fa-vors the explicit formulation of dynamic models and classifies vari-ables according to their mathematical role, depending on whetherthey are constant (independent variables) or time dependent(dependent variables). An enzyme concentration can thus be anindependent variable, if it is considered to be constant during theconsidered time scale, or a dependent variable, if its synthesisand degradation are featured in the model. The MCA approachyields equations with a more immediate biological interpretation,but which have to be modified whenever the biological assump-tions change (interactions among enzymes, moiety conservations,etc.), this has led to many slightly different interpretations of thebasic formalism [28]. BST keeps a unified formulation that remainsconsistent independently of the choices made by the modeler[31,32]. Despite their differences, these two frameworks have ledto developments within the field of Metabolic Engineering thatare sometimes equivalent and sometimes complementary. Giventhe potential benefit of combining such developments, and sincea unified notation for MCA and BST is not to be expected in a nearfuture, this work will use BST as a unifying framework. The reasonfor such a choice is that the less intuitive variable grouping is lar-gely compensated by the flexibility and generality gained in return.Furthermore, BST goes beyond approximate rates as it can dealwith exact representations through detailed mechanistic model-ling [20] or recasting arbitrary non-linear functions [19,6,20].

In the next section, the four considered formalisms will be pre-sented in a consistent manner such that all the rate laws of everyformalism will depend on the same parameters. This will enableus to present some basic results from BST and MCA in a form thatholds for all cases. Finally, a brief overview of the similarities anddifferences between the power-law and linear-logarithmic formal-isms will be presented. This will establish a single framework inwhich different methods can be not only compared, but alsocombined.

2. Theoretical framework

Mathematical models in Metabolic Engineering are often givenas systems of differential equations according to the form:

_xd ¼ N � v: ð1Þ

The reaction rates, transport fluxes, etc. collected in vector v do nor-mally depend on many different factors, some involved in thedynamics of the system (dependent variables, xd) while others re-main constant (independent variables xi). The subindices d and iwill be used for dependent and independent variables in subse-quent formulas.

It is important to note that there can be dependencies amongthe rows in N when there are conservation relations. In such cases,the derivatives of some dependent variables can be written as alinear function of the rest [16] and eliminated from the system.From now on, we will simplify the notation by assuming that thestoichiometric matrix has been thus reduced. This will greatly sim-plify the notation without any loss on generality.

The complexity of v ¼ fðxd; xiÞ can imply an important risk tothe mathematical tractability of the problem, for this reason, it isoften simplified to a non-mechanistic, approximated function witha standard structure such that the resulting equations comply withthe specifications of a given formalism.

2.1. Derivation of the formalisms

In the following sections, the different rate laws will be consis-tently derived making use of Taylor series. As a result, they will allbe defined around a chosen reference state (usually a steady state)as a function of the variables of the system and two kinds ofparameters: kinetic orders and rate constants. Kinetic orders char-acterize the response of rates to changes in the variables and aredefined as:

fi;j ¼o ln v i

o ln xj

��������0

¼ ov i

oxj

xj

v i

��������0

; ð2Þ

where subindex 0 indicate the value of a magnitude in the steadystate. All the kinetic orders of a model can be grouped in a matrix F.

The rate constants can be grouped as a vector c, and will have adifferent definition for every formalism but will always be deter-mined by the rate values at the chosen reference state.

To achieve a compact notation, vectors are often collected indiagonal matrices. These are composed of zeros save for their maindiagonal, which contains the elements of the corresponding vector.Each of these matrices will be represented with the same letter asthe vector but capitalized, V ¼ diagðvÞ.

2.1.1. GMAThe GMA formalism is based in a log–log Taylor series:

lnv i

v0i

’ o ln v i

o ln x1

��������

0ln

x1

jx1j0þ � � � þ o ln v i

o ln xn

��������0

lnxn

jxnj0; ð3Þ

where super or subindex 0 indicate the value of a magnitude in thesteady state. The rate law is therefore linear in the log–log space:

w ¼ Fdyd þ Fiyi þ g; ð4Þ

where w ¼ ln v; y ¼ ln x and g ¼ ln c. Here, ln of a vector u denotesthe vector with components ln ui The rate becomes a power-law byundoing the logarithmic transformation:

v i ¼ cixf11 . . . xfn

n ; ð5Þ

where ci ¼jv i j0

jx1 jf10 ���jxn jfn0

. This expression can be simplified to ci ¼ jv ij0when the variables are normalized by their steady state values. Thisnormalization does not alter any of the properties of the systemthat will be discussed in this paper, but prevent some forms ofrobustness analysis. Particularly, the information about robustnessto changes in the kinetic orders are lost for the reference state.

The rate law can be substituted in Eq. (1) to obtain a GMAmodel.

2.1.2. S-systemsS-systems are a particular case of GMA in which there are only

two fluxes per equation:

_xd ¼ vþ � v�: ð6Þ

A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58 51

Author's personal copy

Both terms are power-laws:

Vþi ¼ aixgi;11 . . . x

gi;nn ; ð7Þ

V�i ¼ bixhi;11 . . . x

hi;nn : ð8Þ

The kinetic orders of these aggregate fluxes can be obtained fromthose of the individual fluxes, and will be grouped in matrices Gand H. For a given GMA, the corresponding S-system can bestraightforwardly computed by aggregating all the synthesis anddegradation fluxes for every variable [18,2]. This aggregation proce-dure is a particular case of a mathematical operation called conden-sation [13]. Both the elimination of conserved moieties and fluxaggregation are examples of condensation and are carried out in asimilar fashion. If we define two matrices Nþ and N� such thatN ¼ Nþ � N�, then the aggregated fluxes will be:

vþ ¼ Nþv; ð9Þv� ¼ N�v: ð10Þ

From these, link matrices [16] can be built to transform the GMAparameters in to those of the S-system.

Sðvþ;vÞ ¼ ðVþ0 Þ�1NþV0; ð11Þ

Sðv�;vÞ ¼ ðV�0 Þ�1N�V0: ð12Þ

To obtain:

G ¼ Sðvþ;vÞF; ð13ÞH ¼ Sðv�;vÞF; ð14Þa�wþ0 ¼ Sðvþ;vÞðg�w0Þ; ð15Þb�w�0 ¼ Sðv�;vÞðg�w0Þ; ð16Þ

where following the same conventions as abovewþ ¼ ln vþ,w� ¼ ln v�; a ¼ ln a and b ¼ ln b.

2.1.3. (Log)linear models(Log)linear models are also based on Taylor series [25] derived

in a different way. From the Taylor expansion of lnx follows:

lnxx0’ x� x0

x0; ð17Þ

and taking derivatives:

ddt

lnxx0’ d

dtx� x0

x0

� �¼ 1

x0

dxdt: ð18Þ

So Eq. (1) is equivalent to:

ddtðln xd � ln jxdj0Þ ¼ jXDj�1

0 Nv; ð19Þ

where jXDj0 ¼ diagðjxdj0Þ.Also using Eq. (17) we can eliminate logarithms from the left-

hand side of the log–log Taylor series in Eq. (3) and obtain anotherlaw for the rates:

v i

v0i

’ 1þ o ln v i

o ln x1

��������

0ln

x1

jx1j0þ � � � þ o ln v i

o ln xn

��������0

lnxn

jxnj0: ð20Þ

Now the Taylor series is no longer in log–log space but linear-log.Using the definition of kinetic orders and logarithmic variables, thisequation can be put in matrix form:

v ’ C 1þ Fðy � y0Þ½ �; ð21Þ

where C ¼ diagðcÞ and 1 is a vector of ones. As happens in a normal-ized GMA, the rate constants for a linlog will be equal to the steadystate fluxes ci ¼ jv ij0.

Substituting the rate law in (19) yields:

ddtðyd � jydj0Þ ¼ jXDj�1

0 NC 1þ Fðy � y0Þ½ �: ð22Þ

2.1.4. Linlog modelsLinlog kinetics is a variant of the (log)linear formalism that has

been enthusiastically reviewed as the only approach that combinesall the properties that are desirable in an approximate kinetic for-mat for metabolic modelling [5], although it has also received somecriticism [22,17]. The rate laws of a linlog are very similar to the(log)linear case. However, whereas the (log)linear rate law in-cludes the logarithms of the enzyme levels as independent vari-ables, the linlog version sets multiplicative terms that can beintegrated in the rate constant. Therefore, Eq. (21) holds true forlinlog system when the rate constants are redefined as ci ¼

jv i j0eijei j0

.Since the derivatives are not transformed, the scaling factorjXDj�1

0 is not needed and the equations of the system are:

ddt

xd ¼ NC½1þ Fðy � y0Þ�: ð23Þ

In the steady state, as long as the enzymes remain constant, both(log)linear and linlog formalisms are equivalent.

2.1.5. SummaryThe main similarities and differences between the power-law

and linear-logarithmic approximations can be seen in Fig. 1, thatshows a plot of a hyperbolic rate law together with their power-law and linlog approximations at the operating point X = 1. Thereare three distinctive regions in the curve. All the rates are equalin the reference point and practically equivalent in the shaded re-gion around it (X0 ± 50% in this case) For high values of x the hyper-bolic rate law reaches saturation while both power-law and linear-logarithmic, although they are both downward concave, tend toplus infinity. The power-law grows, however, faster than the linlog,due to the well-known identity:

limx!þ1

1xa

logbx ¼ 0; ð24Þ

therefore, the linear-logarithmic will represent saturation moreaccurately for high values of the variable while it will fall short torepresent linear or polynomic curves. The situation is inverted forlower values of x, where the linear-logarithmic rate tends rapidlyto minus infinity.

GMA models include linear kinetics and stoichiometric modelsas special cases. Unlike GMA systems, which preserve the originalflux structure, S-systems focus on the species dynamics. This oftenresults in more accurate dynamic behavior in terms of state vari-

Fig. 1. Comparison of the power-law and linear-logarithmic approximations to aMichaelis–Menten like function V ¼ X

Xþ1. The operating point, X = 1, is marked with adiscontinuous line. Both approximations are practically identical within theshadowed area, X0 ± 50%, and present different behaviors out of it.

52 A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58

Author's personal copy

ables and a more convenient structure of the equations, but comesat the cost of losing information about fluxes at the branchingpoints [2]. Rigorous analyses on the mathematical properties ofpower law models [29] have shown that both GMA and S-systemsreproduce non-linear phenomena such as limit cycles and chaos.These phenomena cannot occur in (log)linear systems due to theirlinear structure, which in turn enables the obtention of analyticalsolutions. The mathematical properties of linlog equations havenot yet been studied.

Due to the assumptions used in the Taylor series, working withrelative concentrations is optional with power law models andmandatory in the linear logarithmic case. Such a normalizationdoes not affect any of the properties discussed in this paper but,when applied in any of the four formalisms, leads to loosing theinformation on robustness with respect to changes in the kineticorders in the reference state.

2.2. General framework

Now that all the rate laws are written in terms of the sameparameters, the steady state analysis can presented in a form thatholds for all the formalisms in the reference steady state. It isimportant to note that most of the applications of sensitivities inMetabolic Engineering, such as the ones discussed here are basedin using their numerical values in the reference steady state. Fora more detailed discussion on how the sensitivities diverge fromone formalism to another upon leaving the reference state, see [22].

In the steady state, Eq. (1) is equal to zero so, differentiatingwith respect to xi:

Novoxd

oxd

oxiþ ov

oxi

� �¼ 0: ð25Þ

Defining:

Lðxd;xiÞ ¼o ln xd

o ln xi¼ oyd

oyi: ð26Þ

Eq. (25) becomes:

NVðFdLðxd; xiÞ þ FiÞ ¼ 0; ð27Þ

where V is a diagonal matrix that contains the vector v. Since wehave assumed that any necessary reduction in N has been per-formed, it is full rank and can be inverted:

Lðxd;xiÞ ¼ �ðNVFdÞ�1Fi: ð28Þ

Analogously, we can define:

Lðv; xiÞ ¼o ln vo ln xi

¼ owoyi

; ð29Þ

where Lðv;xiÞ and Lðxd;xiÞ are known as logarithmic gains and de-scribe the response of the system as a whole when an independentvariable is modified.

To find the response of the system to changes in the rate con-stants, Eq. (1) can be differentiated with respect to c leads to:

Sðxd; cÞ ¼ �ðNVFdÞ�1NV; ð30ÞSðv; cÞ ¼ FdSðxd; cÞ þ I; ð31Þ

where Sðxd; cÞ and Sðv; cÞ are the sensitivities of the system with re-spect to the rate constants.

The logarithmic gains and sensitivities can be related to one an-other by the equations:

Lðxd;xiÞ ¼ Sðxd; cÞFi; ð32ÞLðv; xiÞ ¼ Sðv; cÞFi; ð33Þ

which follows from the definitions of the matrices and the well-known chain rule of differentiation.

Other relations that hold in the steady state are the orthogonal-ity conditions, among others:

Sðxd; cÞ1 ¼ 0; ð34ÞSðv; cÞ1 ¼ 1; ð35ÞSðxd; cÞFd ¼ �I; ð36ÞSðv; cÞFd ¼ 0: ð37Þ

These equations are part of the axiomatic definition of linear alge-bra [21] but are often cited as Control Theorems [10].

3. Design equations

By combining some steady state conditions with the particularrate laws of certain formalism, an equation can be obtained thatpredicts how changing some variables will affect the rest of thesystem. Such an equation has been called design equation due toits potential use as a tool for Metabolic Engineering. In this sectionwe show how the design equation was derived for linear-logarith-mic models [3] in a way that can be used to find counterparts forany other formalism.

3.1. Linear logarithmic models

Premultiplying Eq. (21) by Sðv; cÞ: and substituting Eq. (37) weobtain:

Sðv; cÞðC�1v � 1Þ ¼ Lðv;xiÞðyi � jyij0Þ: ð38Þ

Also by premultiplying Eq. (21) by Sðxd; cÞ and substituting (32):

Sðxd; cÞðC�1v � 1Þ ¼ �ðyd � jydj0Þ þ Lðxd; xiÞðyi � jyij0Þ: ð39Þ

Eqs. (38) and (39) can be merged into a single design equation as in[3]. Here we will keep them separate for comparison with other for-malisms. They relate all the variables extending a previous versionwithin MCA [11]. In the specific case of (log)linear models, all thecontrollable variables (enzymes, external metabolites) will be in-cluded in vector yi. In the case of linlog models, the system is sup-posed to be manipulated through changes in the enzymes, whichare included in c.

3.2. GMA

The same operations shown above for (log)linear systems canbe performed on the GMA rate law in logarithmic form (4). Premul-tiplying by Sðv; cÞ and substituting Eqs. (37) and (33) we obtain:

Sðv; cÞðw� gÞ ¼ Lðv;xiÞyi; ð40Þ

where we have eliminated the variables and obtained a constraintfor the fluxes in the steady state.

The same can be done for the variables multiplying by Sðxd; cÞand substituting Eqs. (36) and (32)

Sðxd; cÞðw� gÞ ¼ �yd þ Lðxd;xiÞyi: ð41Þ

In order to compare the design equations for GMA models (Eqs. (40)and (41)) and their linear-logarithmic versions (Eqs. (38) and (39))it is convenient to normalize the variables by their steady state val-ues. This is equivalent to shift the origin of coordinates to the refer-ence state in the logarithmic space y0 ¼ y� y0. As was shown in thederivation of the formalisms, linear-logarithmic rates are based insuch a normalization that can optionally be performed in power-law models. After the change of variables, the right-hand side ofthe equations is the same for both formalisms. Regarding the left-hand sides, for every appearance of v i�jv i j0

jv i j0in the linear-logarithmic

A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58 53

Author's personal copy

version, we find ln v ijv i j0

for the GMA. This is precisely Eq. (17), the

approximation we used to derive the (log)linear rates. Therefore,both versions will be equivalent in the reference state and have asimilar behavior around it.

3.3. S-systems

It is well known that, in spite of their non-linear dynamics, thesteady-state solutions of S-systems follow a linear equation [31]. Itis, however, interesting to compare the steady-state solution for S-systems with the above-mentioned design equations. The steadystate equation for S-systems can be written as:

Adyd þ Aiyi þ c ¼ 0; ð42Þ

where A ¼ G�H and ci ¼ ln biai

� �with a and b being the rate con-

stants of the synthesis and degradation fluxes, respectively.Eq. (42) can be rewritten as:

Sðxd; aÞc ¼ �yd þ Lðxd;xiÞyi; ð43Þ

where

Lðxd; xiÞ ¼ �A�1d Ai;

Sðxd; aÞ ¼ �A�1d :

By substituting Eqs. (13)–(16), these definitions ofLðxd;xiÞ and Sðxd; aÞ can be written in terms of F and c, being equiv-alent to those derived for the general case. The same procedure canshow in a straightforward manner that Eq. (43) is analogous to Eqs.(41) and (39) in the reference state. Due to the special characteris-tics of S-systems is not only a necessary but also sufficient conditionfor the steady state, so (40) is not needed for S-systems.

3.4. An alternative solution

In order to use the design equation, the flux distribution in thedesired steady state must be known beforehand. Such distribu-tion can be obtained by choosing values for some fluxes and solv-ing Eq. (1) for the rest [3] or using optimization methods [12].The next step is to choose the values for the dependent variablesand calculate the rest using the design equations. Of course, thenumber of independent variables including enzyme levels, exter-nal metabolites, etc. creates an underdetermined system. Thiswas solved in the original formulation of the design equationby solving for a fixed subset of the control variables, the enzymeprofiles, while leaving the external metabolites in the right-handside of the equation as values to be chosen before the equation issolved. This, together with the formal requirements to formulatea linlog model, guarantees a solution but limits the potential ofthe method significantly. A biotechnological system may be con-trolled through the modification of many different variables: con-centrations of external metabolites, substrate flux into achemostat, etc. Any such variable can be present in a model de-rived according to the guidelines shown above. The design equa-tion can be therefore solved for a mixed set of variables, cellularor extracellular. This can be done by collecting a subset ofindependent variables such that its matrix of kinetic orders is anon-singular square matrix and solve the rate law directly. Suchvariables will be referred to as bound variables, xb

i , since theirvalues will be determined by the rest, the free variables xf

i , thatwill remain on the right-hand side.

However, with the above-mentioned information available andknown rate laws, there is no real need to derive the design equa-tions since the rate equations themselves can be solved [12]obtaining the same result. For instance, for a GMA system, Eq. (4)can be solved to obtain:

ybi ¼ Fb�1

i Ffi y

fi þ Fdyd þ g�w

h i; ð44Þ

where vector yfi contains the logarithms of the free independent

variables, ybi contains those of the bound variables and Ff

i and Fbi

are the corresponding matrices of kinetic orders.An equivalent approach can be used for the (log)linear case.

Since linlog models are assumed to be controlled through their en-zyme levels alone (rate constants) we can make use of the proper-ties of diagonal matrices to solve for the reciprocals of c:

C�11 ¼ V�1 1þ Fðy � y0Þ½ �: ð45Þ

Therefore, the rate law of every formalism is equivalent to the de-sign equation for the purposes considered in this work.

4. Optimization

The main drawback of the methods described above is the needof specifying a flux distribution a priori and to explore the wholespace of possible solutions manually. Besides, the design processis often subject to additional constraints, some general and someproblem dependent, that cannot be included in the design equa-tion. As has been discussed elsewhere [12], even when the designor rate equation is guaranteed to have a solution for every choice offluxes and metabolites, this solution lies in an unbounded domainthat may go beyond the validity of the approximation or just fallout of the biologically realistic range for the variables. An exampleof this will be shown below for a specific model.

A more general approach that addresses all these issues is opti-mization. Optimization in Metabolic Engineering can often be sum-marized as finding the solution to the following problem:

max or min Yield or Cost ð46Þsubject to :

operation in steady state ð47Þmetabolic andphysico-chemical constraints ð48Þcell viability: ð49Þ

In this generic representation, (46) usually targets a flux or a yield,which changes as a consequence to variations in different variablesof the system that can be controlled by the experimenter. Someexamples are enzyme levels that can be increased by introducingextra copies of a gene, fermentation conditions that can be adjustedexternally, etc. The optimization must occur under several con-straints. The first set (47) ensures that the system will operate un-der steady-state conditions. Other constraints (48) are imposed toretain the system within a physically and chemically feasible stateand so that the total protein or metabolite levels do not impede cellgrowth. Yet other constraints (49) guarantee that no metabolitesare depleted below minimal required levels or accumulate to toxicconcentrations and is normally implemented by allowing a certainupper and lower bounds to the corresponding variables. These setsof constraints are designed to allow sustained operation of thesystem.

The exact mathematical formulation depends on the chosen for-malism since the steady state condition will include the equationsof the model equated to zero. In general, optimization problemscan be extremely difficult to solve unless the structure of the prob-lem belongs to some particular cases. One such is linear program-ming, which can be solved very efficiently even for large scalesystems. Linear programming has been successfully used for bio-technological problems in some of the considered formalisms.Since the design equations are themselves linear, we will explorethe possibility of using them or the equivalent rate equations todefine linear programming problems.

54 A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58

Author's personal copy

4.1. S-systems

Among the formalisms considered, S-system was the first forwhich an optimization strategy was devised due to the simplicityof its steady state equations [30]. The Indirect Optimization Meth-od (IOM) [24] is based in approximating any non-linear system asan S-system which is optimized. This method has been successfullyapplied to a wide variety of systems including a whole industrialplant [8]. In the case of an S-system, both fluxes expressions andthe steady state condition are linear in the logarithmic space. Addi-tional constraints (48) would have to be approximated by a power-law in order to be included in order to keep them linear in the log-arithmic space.

4.2. GMA

Since Eq. (1) is linear in the cartesian space while Eq. (4) is lin-ear in logarithmic space, it is impossible to consider fluxes andvariables within a single set of linear equations. This limits the pos-sibilities of optimization to cases in which the flux distribution isfixed, as in the case of the design equation. Optimization in thiscase is used to search within the space defined by the rate law(or design equation) as shown in [12].

4.3. (Log)linear

Since the rate laws for (log)linear systems in Eq. (21) are linearin the cartesian space, they can be included together with the massbalance Eq. (1) in an optimization program:

max or min Yield or Costsubject to :

N � v ¼ 0v ¼ Cð1þ FyÞyL6 y 6 yU:

So the IOM method can be used with a Loglin representation by justreplacing the steady state condition for an S-system with the loglinrate and mass balance equations.

4.4. Linlog

The inclusion of enzyme levels in matrix C in Eq. (21) createsnon-linearities as long as both enzymes and fluxes are consideredto be variables. For this reason, finding an optimal enzyme profilecan only be done if a fixed flux distribution is already established.This create an obvious parallelism with GMA optimization, the useof design equations and solving the rate laws. Extracting the en-zyme profile as a vector from matrix C in Eq. (21) We can setthe linear program:

max C�11subject to :

C�11 ¼ V�1V0 1þ Fðy � y0Þ½ �yL6 y 6 yU;

where V0 and V are a diagonal matrix containing the basal and de-sired fluxes, respectively, and C�11 is the vector of reciprocals of c.This would yield the way to obtain the desired profile with the leastamount of protein, which would be a desirable goal. It is notewor-thy that, since the rate or the design equation can only be writtenlinearly as a function of the reciprocals of the enzyme concentra-tions, the linear program has to be written as the maximization ofsuch reciprocals. Since the values are always positive and non-zero,the minimum of the function and the maximum of its reciprocal

will coincide. Other objectives for optimization could be minimizingor maximizing single variables or ratios of metabolites, whichwould be linear functions of their logarithms. In order to build anobjective function such as the sum of several metabolites we wouldhave to approximate them with a power-law or a linear-logarithmicexpression.

5. An example

Although the performance of the methods discussed above isproblem dependent, an example is presented to clarify theconcepts discussed above. This GMA model [1] was obtained froma previous version [4] formulated with traditional Michaelis–Men-ten kinetics to explain experimental data, and has often been usedas a benchmarking system [15,24,26,13]. Furthermore, a version ofthis model has also been adapted to a linear logarithmicformalism [25].

_X1 ¼ v in � vHK

_X2 ¼ vHK � vPFK � vPOL

_X3 ¼ vPFK � vGAPD �12

vGOL

_X4 ¼ 2 � vGAPD � vPK ;

_X5 ¼ 2 � vGAPD þ vPK � vHK � vPFK � vPOL � vATP:

ð50Þ

The model has already been translated to BST and MCA parametersin [1]. The GMA equations are:

v in ¼ 0:8122 X�0:23442 X6;

vHK ¼ 2:8632 X0:74641 X0:0243

5 X7;

vPFK ¼ 0:5232 X0:73182 X�0:3941

5 X8;

vGAPD ¼ 0:011 X0:61593 X0:1308

5 X9 X�0:608814 ;

vPK ¼ 0:0945 X0:053 X0:533

4 X�0:08225 X10;

vPOL ¼ 0:0009 X8:61072 X11;

vGOL ¼ 0:0945 X0:053 X0:533

4 X�0:08225 X12;

vATP ¼ X5 X13;

ð51Þ

where X6 to X7 are multipliers for the enzymes corresponding toeach step. These variables can be interpreted as ratios with re-spect to the basal enzyme levels. The corresponding S-systemcan be found in [23]. The linear-logarithmic formulations usethe same values for the kinetic orders while the rate constantshave to be adjusted according to the definitions derived above.The log linear model keeps the same definition for the indepen-dent variables while the linlog will have them included in therate constants.

In order to increase the ethanol production ðvPKÞ by a factor of 3using the design equations, the rest of the desired flux distributionhas to be determined. The mass balance equations in the steadystate have three degrees of freedom, so two more fluxes can bearbitrarily chosen. In this example, the branching pointsvPOL and vGOL will be kept constant. The rest of the fluxes in themodel have to be calculated from these and substituted into thedesign equation together with the desired levels for the metabo-lites. In order to preserve the homeostasis of the cell, the metabo-lite levels will be kept within a certain range around the basal ones.Pyruvate, being the substrate for the last reaction of the model willbe increased to the limit while all the other metabolites will be de-creased except ATP that will remain constant. The equations willbe solved for different ranges in order to evaluate the performanceof the method at increasingly large distances from the operationpoint. For a given margin (m), the metabolite levels will be set to:

A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58 55

Author's personal copy

jXij ¼ ð1�mÞjXij0 i ¼ 1 . . . 3; ð52ÞjX4j ¼ ð1þmÞjX4j0; ð53ÞjX5j ¼ jX5j0: ð54Þ

Once fluxes and metabolites levels have been chosen for the desiredsteady state, the equations are solved. The design equations for a gi-ven formalism yield the same results as directly solving the rateequations.

Fig. 2 shows the predicted enzyme profiles for different marginsof metabolite variation. The predicted enzyme profiles are verysimilar except for one of the enzymes. It can be seen that the linlogprediction for x11 turns negative for metabolite changes above±11%. This is a consequence of the linear-logarithmic linearization,in which low values of the dependent variable result in negativerates as shown in Fig. 1 and observed in [33]. This is an exampleof how the unconstrained solutions of the design equation can fallout of the range in which the formalism is valid. This undesiredoutcome can be prevented through constrained optimization,which will can also search the whole feasible space for the bestoption.

Tables 1 and 2 summarized the result of applying constrainedoptimitzation techniques to this problem. For such an approach,the IOM method was applied to the S-system and (log)linear mod-els. The objective function was taken to be Vpol, the values of thedependent variables were constrained to be within a margin of±50% their basal values while the independent variables were al-lowed up to threefold changes. The immediate outcome of this set-up is a greater degree of flexibility, since the margins and fluxes areno longer fixed and the best option within the limits will be ob-tained for every metabolite.

For linlog and GMA, linear methods can only be used if the fluxdistribution is determined a priori. Once the distribution has beenchosen to be as described above for the application of design equa-tions, an optimization run was used to find the enzyme profile that

achieved the desired flux distribution with the minimum amountof total protein. The metabolites were constrained within a ±25%limit (see Tables 3 and 4).

6. Conclusions

The design equation for a given formalism has been shown to beequivalent to the rate law in matrix form, as it provides the sameresults and requires the same information. The interest of suchequivalence is not purely theoretical since working directly withrate laws is more intuitive and flexible. However, even if alternat-ing the use of mass balances and rate laws (or design equations) isa useful way of exploring the possibilities of biochemical systems,our analysis suggests that both expressions should be used simul-

0 0.1 0.2

2.5

2.6

2.7

2.8

2.9

3

3.1

Metabolite margin

x6

0 0.1 0.22.6

2.8

3

3.2

3.4

3.6

3.8

4

Metabolite margin

x7

0 0.1 0.22.6

2.8

3

3.2

3.4

3.6

3.8

4

Metabolite margin

x8

0 0.1 0.2

2.8

3

3.2

3.4

3.6

3.8

4

Metabolite margin

x9

0 0.1 0.2

2.5

2.6

2.7

2.8

2.9

3

3.1

3.2

3.3

Metabolite margin

x10

0 0.1 0.2

−5

0

5

10

Metabolite margin

x11

0 0.1 0.2

0.85

0.9

0.95

1

1.05

1.1

Metabolite margin

x12

0 0.1 0.2

2.9

3

3.1

3.2

3.3

3.4

Metabolite margin

x13

Fig. 2. Profiles obtained from the design equations in linlog (asterisks) and GMA (circles).

Table 1Optimization results using IOM (S-system). The results predicted by the optimizationare compared to those obtained by substitution of the optimal enzyme profile in theoriginal system.

Variable Predicted Real Deviation (%)

X1 1.3004 1.3110 0.8091X2 1.0000 1.0001 0.0080X3 1.3071 1.3218 1.1261X4 1.2996 1.3359 2.7973X5 1.0000 1.0000 0.0015X6 3.0000 –X7 2.4662 –X8 3.0000 –X9 2.5438 –X10 2.5731 –X11 3.0000 –X12 2.5724 –X13 3.0000 –

Total enzymeFlux 90.3381 90.3387 <10�5

56 A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58

Author's personal copy

taneously within an optimization problem whenever it is possible.This can be done for S-systems and (log)linear systems and allowsa much faster and exhaustive exploration of the state space. Evenfor GMA and linlog systems, in which mass action and rate lawscannot be formulated together in a linear program, optimizationoffers some advantages over the direct use of the rate law. First,optimization can be used explore the rate law itself. Second, theoptimization problem can include additional constraints thatmay be relevant in the design process. Let us consider the exampleof a pathway such as the pentose phosphate pathway, in which asingle enzyme participates in several reactions. In such a case,the design equation can no longer be solved in its original formu-lation due to the arising dependencies among the enzymes. Thiswould not be a problem for optimization since an extra set of con-straints can be defined to bind the fluxes catalysed by the same en-zyme. Finally, some theoretically feasible solutions are impossibleto attain in practice due to excessive accumulation of substrates,too high protein requirements or, in the case of linear-logarithmicmodels, the presence of negative concentrations. Such solutionscan be discarded by defining appropriate boundaries in an optimi-zation problem.

Placing all the methods in a common framework has revealedmany more similarities than expected. Nearly equivalent methodshave been published in parallel as in [12,3] without the similaritiesof both approaches being apparent due to differences in notation.

By deriving the design equations for all four formalisms and show-ing their equivalence to the rate laws used in the different IOMmethods, it has been shown that the analysed methods, whichwere derived independently, are all variations of the same relation,first formulated as Eq. (43) in a completely different context. Onthe one hand, each of these proposed methods is an improvementsince it extends the set of available tools, but one the other handnone of the proposed alternatives can claim to be superior to theothers. This is so because all the methods are equivalent in the ref-erence state and diverge from there following different assump-tions, the validity of which is completely problem dependent.Hence, the need for analysing their common mathematical founda-tion. It is also noteworthy that the ultimate sources of any differ-ence between the predictions of the different methods are notthe methods themselves but their formalism of choice.

Finding the best approximate model for a given case is more anart than a science in which many factors have to be balanced. Thetradeoff between accuracy and convenient structure does not havea single answer. It is therefore impossible to state a priori whichformalism is more accurate or convenient in a general sense. Dur-ing almost 40 years power-law models have been built for manykinds of systems, their limits tested and a wide variety of tech-niques developed for their analysis. Linear logarithmic modelsare relatively recent, and not so thoroughly tested, but are receiv-ing enough attention to expect that their scope and applicabilitywill soon become apparent.

Acknowledgments

Part of this work was supported by a research grant from theSpanish Ministry of Science and Education Ref. BIO2005-08898-C02-02. AMS was funded by a postdoctoral grant from the MaxPlanck society.

References

[1] R. Curto, A. Sorribas, M. Cascante, Comparative characterization of thefermentation pathway of Saccharomyces cerevisiae using biochemical systemstheory and metabolic control analysis: model definition and nomenclature,Math. Biosci. 130 (1) (1995) 25.

[2] P. De Atauri, R. Curto, J. Puigjaner, A. Cornish-Bowden, M. Cascante, Advantagesand disadvantages of aggregating fluxes into synthetic and degradative fluxeswhen modelling metabolic pathways, Eur. J. Biochem. 265 (2) (1999) 671.

[3] Diana Visser, Joseph J. Heijnen, Dynamic simulation and metabolic re-design ofa branched pathway using linlog kinetics, Metab. Eng. 5 (2003) 164.

[4] J. Galazzo, J. Bailey, Fermentation pathway kinetics and metabolic flux controlin suspended and immobilized Saccharomyces cerevisiae, Enzyme Microb.Technol. (12) (1990) 162.

Table 2Optimization results using IOM (loglin). The results predicted by the optimization arecompared to those obtained by substitution of the optimal enzyme profile in theoriginal system.

Variable Predicted Real Deviation (%)

X1 1.2645 1.2717 0.5723X2 1.0001 1.0001 0.0054X3 1.2799 1.2911 0.8728X4 1.3596 1.4100 3.7059X5 1.0000 1.0000 0.0041X6 3.0000 –X7 2.5188 –X8 3.0000 –X9 2.5770 –X10 2.5142 –X11 3.0000 –X12 2.5129 –X13 3.0000 –

Total enzymeFlux 63.1949 90.3397 30.0475

Table 3Optimization results for the linlog model. The results predicted by the optimizationare compared to those obtained by substitution of the optimal enzyme profile in theoriginal system.

Variable Predicted Real

X1 1.2500 1.2270 1.8748%X2 1.2500 1.2408 0.7437%X3 1.2500 1.2295 1.6680%X4 1.2500 1.2399 0.8151%X5 1.0000 0.9949 0.5095%X6 3.0465 –X7 2.4753 –X8 2.4833 –X9 2.6377 –X10 2.6539 –X11 0.3423 –X12 0.8846 –X13 3.1277 –

Total enzyme 17.6513Flux 90.3444 88.0904

Table 4Optimization results for the GMA model with unrestricted enzyme levels. The resultspredicted by the optimization are compared to those obtained by substitution of theoptimal enzyme profile in the original system.

Variable Predicted Real

X1 1.2500 1.2408 0.7399%X2 1.2500 1.2541 0.3235%X3 1.2500 1.2400 0.8104%X4 1.2500 1.2476 0.1887%X5 1.0000 0.9910 0.9126%X6 3.0421 –X7 2.4446 –X8 2.4536 –X9 2.6150 –X10 2.6332 –X11 0.1464 –X12 0.8777 –X13 3.1277 –

Total enzyme 17.3403Flux 90.3444 87.7385

A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58 57

Author's personal copy

[5] J.J. Heijnen, Approximative kinetic formats used in metabolic networkmodeling, Biotechnol. Bioeng. 91 (5) (2005) 534.

[6] B. Hernandez-Bermejo, V. Fairen, L. Brenig, Algebraic recasting of nonlinearsystems of odes into universal formats, J. Phys. A (31) (1998) 24152430.

[7] B. Hernandez-Bermejo, V. Fairen, A. Sorribas, Power-law modeling based onleast-squares minimization criteria, Math. Biosci. 161 (1–2) (1999) 83.

[8] Julio Vera, Carmen G. Moles, Julio Banga, Néstor V. Torres, Simultaneous designand control non-linear optimization through linear programming. Applicationto a wastewater treatment plant, AIChE J. 49 (12) (2003) 3173.

[9] H. Kacser, L. Acerenza, A universal method for achieving increases inmetabolite production, Eur. J. Biochem. 216 (2) (1993) 361.

[10] H. Kacser, J.A. Burns, The control of flux, Symp. Soc. Exp. Biol. 27 (1973) 65.[11] J.C. Liao, J. Delgado, Flux calculation using metabolic control constraints,

Biotechnol. Prog. 14 (4) (1998) 554.[12] A. Marin-Sanguino, N.V. Torres, Optimization of biochemical systems by linear

programming and general mass action model representations, Math. Biosci.184 (2) (2003) 187.

[13] A. Marin-Sanguino, E.O. Voit, C. Gonzalez-Alcon, N.V. Torres, Optimization ofbiotechnological systems through geometric programming, Theor. Biol. Med.Model. 4 (38) (2007).

[14] I.E. Nikerel, W.A. van Winden, W.M. van Gulik, J.J. Heijnen, A method forestimation of elasticities in metabolic networks using steady state anddynamic metabolomics data and linlog kinetics, BMC Bioinformatics 7(2006) 540.

[15] P. Polisetty, E. Gatzke, E. Voit, Yield optimization of regulated metabolicsystems using deterministic branch-and-reduce methods, Biotechnol. Bioeng.99 (5) (2008) 1154.

[16] C. Reder, Metabolic control theory: a structural approach, J. Theor. Biol. 135 (2)(1988) 175.

[17] R. Rosario, E. Mendoza, E. Voit, Challenges in lin-log modelling of glycolysis inLactococcus lactis, Syst. Biol. IET 2 (3) (2008) 136.

[18] M. Savageau, Biochemical systems analysis. I. Some mathematical propertiesof the rate law for the component enzymatic reactions, J. Theor. Biol. 25 (3)(1969) 365.

[19] M. Savageau, E. Voit, Recasting nonlinear differential equations as S-systems: acanonical nonlinear form, Math. Biosci. (87) (1987) 83.

[20] M.A. Savageau, Michaelis–Menten mechanism reconsidered: implications offractal kinetics, J. Theor. Biol. 176 (1) (1995) 115.

[21] M.A. Savageau, A. Sorribas, Constraints among molecular and systemicproperties: implications for physiological genetics, J. Theor. Biol. 141 (1)(1989) 93.

[22] A. Sorribas, B. Hernandez-Bermejo, E. Vilaprinyo, R. Alves, Cooperativity andsaturation in biochemical networks: a saturable formalism using Taylor seriesapproximations, Biotechnol. Bioeng. 97 (5) (2007) 1259.

[23] N. Torres, E. Voit, Pathway Analysis and Optimization in MetabolicEngineering, Cambridge University, Cambridge, 2002.

[24] N. Torres, E. Voit, C. Glez-Alcon, F. Rodriguez, An indirect optimization methodfor biochemical systems. Description of method and application to ethanol,glycerol and carbohydrate production in Saccharomyces cerevisiae,Biotechnol. Bioeng. 5 (55) (1997) 758.

[25] Vassily Hatzimanikatis, James E. Bailey, Effects of spatiotemporal variations onmetabolic control: approximate analysis using (log)linear kinetic models,Biotechnol. Bioeng. 54 (2) (1997) 91.

[26] J. Vera, P. de Atauri, M. Cascante, N. Torres, Multicriteria optimization ofbiochemical systems by linear programming: application to production ofethanol by Saccharomyces cerevisiae, Biotechnol. Bioeng. 83 (3) (2003) 335.

[27] J. Vera, N.V. Torres, C.G. Moles, J. Banga, Integrated nonlinear optimization ofbioprocesses via linear programming, AIChE J. 49 (12) (2003) 3173.

[28] D. Visser, J.J. Heijnen, The mathematics of metabolic control analysis revisited,Metab. Eng. 4 (2) (2002) 114.

[29] E. Voit, Canonical Nonlinear Modeling: S-System Approach to UnderstandingComplexity, Van Nostrand Reinhold, New York, 1991.

[30] E. Voit, Optimization in integrated biochemical systems, Biotechnol. Bioeng.(40) (1992) 572.

[31] E. Voit, Computational Analysis of Biochemical Systems: A Practical Guidefor Biochemists and Molecular Biologists, Cambridge University, Cambridge,2000.

[32] E. Voit, J. Schwacke, Systems Biology: Principles, Methods, and Concepts,Chapter Understanding through Modeling, CRC, Boca Raton, FL, 2006.

[33] F. Wang, C. Ko, E. Voit, Kinetic modeling using S-systems and lin-logapproaches, Biochem. Eng. J. 33 (3) (2007) 238.

[34] L. Wu, W. Wang, W.A. van Winden, W.M. van Gulik, J.J. Heijnen, A newframework for the estimation of control parameters in metabolic pathwaysusing lin-log kinetics, Eur. J. Biochem. 271 (16) (2004) 3348.

58 A. Marin-Sanguino et al. / Mathematical Biosciences 218 (2009) 50–58