cg11 metabolic networks

53
Constraint-Based Modeling of Metabolic Networks based on: “Genome-scale models of microbial cells: Evaluating the consequences of constraints”, Price, et. al (2004) Tomer Shlomi School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel January, 2006

Upload: nguyen-v-n-tung

Post on 18-Dec-2015

248 views

Category:

Documents


2 download

DESCRIPTION

CG11 Metabolic Networks

TRANSCRIPT

  • Constraint-Based Modeling of Metabolic Networks

    based on: Genome-scale models of microbial cells:Evaluating the consequences of constraints, Price, et. al (2004)Tomer ShlomiSchool of Computer Science, Tel-Aviv University, Tel-Aviv, Israel

    January, 2006

  • OutlineMetabolism and metabolic networksKinetic models vs. constraints-based modelingFlux Balance AnalysisExploring the solution spaceAltering phenotypic potential: gene knockouts

  • Cellular MetabolismThe essence of life..Catabolism and anabolismThe metabolic core production of energy anaerobic and aerobic metabolismProbably the best understood of all cellular networks: metabolic, PPI, regulatory, signalingTremendous importance in Medicine; antibiotics, metabolic disorders, liver disorders, heart disordersBioengineering; efficient production of biological products.

  • Metabolites and Biochemical ReactionsMetabolite: an organic substance, e.g. glucose, oxygenBiochemical reaction: the process in which two or more molecules (reactants) interact, usually with the help of an enzyme, and produce a product Glucose + ATP GlucokinaseGlucose-6-Phosphate + ADP

  • Kinetic ModelsDynamics of metabolic behavior over timeMetabolite concentrationsEnzyme concentrationsEnzyme activity rate depends on enzyme concentrations and metabolite concentrationsSolved using a set of differential equations

    Impossible to model large-scale networksRequires specific enzyme rates dataToo complicated

  • Constraint Based ModelingProvides a steady-state description of metabolic behaviorA single, constant flux rate for each reactionIgnores metabolite concentrationsIndependent of enzyme activity ratesAssume a set of constraints on reaction fluxesGenome scale models

    Flux rate:-mol / (mg * h)

  • Constraint Based Modeling Under the constraints:Mass balance: metabolite production and consumption rates are equalThermodynamic: irreversibility of reactionsEnzymatic capacity: bounds on enzyme ratesAvailability of nutrients

    Find a steady-state flux distribution through all biochemical reactions

  • Metabolic NetworksNetwork ReconstructionGenomeAnnotationBiochemistryCellPhysiologyInferredReactionsMetabolic NetworkAnalytical Methods

  • Mathematical RepresentationStoichiometric matrix network topology with stoichiometry of biochemical reactions

    Mass balanceSv = 0Subspace of RThermodynamicvi > 0Convex coneCapacityvi < vmaxBounded convex coneGlucose + ATP GlucokinaseGlucose-6-Phosphate + ADPGlucose -1ATP -1G-6-P +1ADP +1Glucokinasen

  • Growth Medium ConstraintsExchange reactions enable the uptake of nutrients from the media and the secretion of waste productsOxygen 0 InfGlucose 0 2.5CO2 -Inf 0Glucose 1Oxygen 1G-Ex O-Ex Co2-ExCO2 1Lower bound Upper bound

  • Determination of Likely Physiological States

    How to identify plausible physiological states?Optimization methodsMaximal biomass production rate Minimal ATP production rateMinimal nutrient uptake rateExploring the solution spaceExtreme pathwaysElementary modes

  • Outline: Optimization MethodsPredicting the metabolic state of a wild-type strainFlux Balance Analysis (FBA)

    Predicting the metabolic state after a gene knockoutMinimization Of Metabolic Adjustment Regulatory On/Off Minimization

  • Biomass Production OptimizationMetabolic demands of precursors and cofactors required for 1g of biomass of E. coliClasses of macromolecules:Amino Acids, CarbohydratesRibonucleotides, DeoxyribonucleotidesLipids, PhospholipidsSterol, Fatty acidsThese precursors are removed from the metabolic network in the corresponding ratiosWe define a growth reactionZ = 41.2570 VATP - 3.547VNADH+18.225VNADPH + .

  • Biomass Composition IssuesVaries across different organisms Depends on the growth medium Depends on the growth rateThe optimum does not change much with changes in composition within a class of macromoleculesThe optimum does change if the relative composition of the major macromolecules changes

  • Flux Balance Analysis (FBA)

    Successfully predicts:Growth ratesNutrient uptake ratesByproduct secretion ratesSolved using Linear Programming (LP)Max vgro, - maximize growths.tSv = 0, - mass balance constraintsvmin v vmax - capacity constraints

    Finds flux distribution with maximal growth rateFell, et al (1986), Varma and Palsson (1993)

  • FBA Example (1)

  • FBA Example (2)

  • FBA Example (2)

  • Linear Programming Basics (1)

  • Linear Programming Basics (2)

  • Linear Programming Basics (3)

  • Linear Programming: Types of Solutions (1)

  • Linear Programming: Types of Solutions (2)

  • Linear Programming AlgorithmsSimplexUsed in practiceDoes not guarantee polynomial running timeInterior pointWorse case running time is polynomial

    growth

  • Phenotype Predictions: Evolving Growth Rate

  • Exploring the Convex Solution Space

  • Alternative OptimaThe optimal FBA solution is not unique

    One solution Optimal solutions Near-optimal solutions Basic solutions enumeration MILP (Lee, et. al, 2000)Flux variability analysis (Mahadevan, et. al. 2003)Hit and run sampling (Almaas, et. al, 2004)Uniform random sampling (Wiback, et. al, 2004)

  • What Do Multiple Solutions Represent ?Some of the solutions probably do not represent biologically meaningful metabolic behaviors as there are missing constraintsPrevious studies tackled this problem by:Incorporating additional constraints: regulatory constraints (Covert, et. al., 2004)Looking for reactions for which new constraints may significantly reduce the solution space (Wiback, et. al., 2004)

    FBA solution spaceMeaningful solutions

  • Interpretations of Metabolic SpaceEffect of exogenous factors the metabolic space corresponds to growth in a medium under various external conditions that are beyond the models scope such as stress or temperatureHeterogeneity within a population - the metabolic space represents heterogenous metabolic behaviors by individuals within a cell population (Mahadevan, et. al., 2003, Price, et. al., 2004)Alternative evolutionary paths the metabolic space represents different metabolic states attainable through different evolutionary paths (Mahadevan, et. al., 2003, Fong, et. al., 2004)

    The three interpretations are obviously not mutually exclusive

  • Alternative Optima: Basic Solutions Enumeration Lee, et. al, 2000Basic solutions metabolic states with minimal number of non-zero fluxesDifferent solutions differ in at least a single zero fluxUse Mixed Integer Linear ProgrammingFormulate optimization as to identify new solutions that are different from the previous onesApplicable only to small scale models

    growth

  • Alternative Optima: Flux Variability AnalysisMahadevan, et. al. 2003Find metabolic states with extreme values of fluxesUse linear programming to minimize and maximize the flux through each reaction while satisfying all constraints

    Max / Min vi, - maximize growths.tSv = 0, - mass balance constraintsvmin v vmax - capacity constraintsVgro = Vopt - set maximal growth rate

  • Alternative Optima: Hit and Run SamplingAlmaas, et. al, 2004Based on a random walk inside the solution space polytopeChoose an arbitrary solution Iteratively make a step in a random directionBounce off the walls of the polytope in random directions

  • Alternative Optima: Uniform Random SamplingWiback, et. al, 2004The problem of uniform sampling a high-dimensional polytope is NP-HardFind a tight parallelepiped object that binds the polytopeRandomly sample solutions from the parallelepipedCan be used to estimate the volume of the polytope

  • Topological Methods

    Network based pathways: Extreme Pathways (Schilling, et. al., 1999)Elementary Flux Modes (Schuster, el. al., 1999)Decomposing flux distribution into extreme pathwaysExtreme pathways defining phenotypic phase planesUniform random sampling

    Not biased by a statement of an objective

  • Extreme Pathways andElementary Flux ModesUnique set of vectors that spans a solution spaceConsists of minimum number of reactionsExtreme Pathways are systematically independent (convex basis vectors)

  • Extreme Pathways andElementary Flux ModesInherent redundancy in metabolic networks (Price, et. al., 2002)Robustness to gene deletion and changes in gene expression (Stelling, et. al., 2002)Enzyme subsets (correlated reaction sets) in yeast (Papin, et. al., 2002)Design strains (Carlson, et. al., 2002)Assign functions to genes (Forster, et. al, 2002)

  • Altering Phenotypic Potential: Gene Knockouts

  • Altering Phenotypic Potential: Gene Knockouts

    Minimization Of Metabolic Adjustment (MOMA) (Segre et. al, 2002)The flux distribution after a knockout is close to the wild-types state under the Euclidian normRegulatory On/Off Minimization (ROOM) (Shlomi et. al, 2005)Minimize the number of Boolean flux changes from the wild-types state

  • Altering Phenotypic Potential

    Explaining gene dispensability (Papp, el. al., 2004)Only 32% of yeast genes contribute to biomass production in rich mediaConsidered one arbitrary optimal growth solution

    OptKnock Identify gene deletions that generate desired phenotype (Burgard, et. al., 2003)OptStrain Identify strains which can generate desired phenotypes by adding/deleting genes (Pharkya, el., al., 2004)

  • Modeling Gene KnockoutsGene knockout

    Enzyme knockout

    Reaction knockout

  • Cellular Adaptation to Genetic and Environmental PerturbationsTransient changes in expression levels in hundreds of genes (Gasch 2000, Ideker 2001)Convergence to expression steady-state close to the wild-type (Gasch 2000, Daran 2004, Braun 2004)Drop in growth rates followed by a gradual increase (Fong 2004)

  • Regulatory On/Off Minimization (ROOM)Predicts the metabolic steady-state following the adaptation to the knockoutAssumes the organism adapts by minimizing the set of regulatory changesBoolean RegulatoryChangeBoolean FluxChange Finds flux distribution with minimal number of Boolean flux changes

  • ROOM: ImplementationSolved using Mixed Integer Linear Programming (MILP)Boolean variable yi

    Min yi - minimize changess.t v y ( vmax - w) w- distance constraintsv y ( vmin - w) w- distance constraintsSv = 0,- mass balance constraintsvj = 0, jG - knockout constraints

    yi = 1Flux vi change from wild-type MILP is NP-Hard Relax Boolean constraints - solve using LP Relax strict constraint of proximity to wild-type

  • Example Network

  • ROOMs Implicit Growth Rate MaximizationROOM implicitly attempts to maintain the maximal possible growth rate of the wild-type organismA change in growth requires numerous changes in fluxes

    M1M2Mn..

    Growth ReactionBiomass

  • Intracellular Flux MeasurementsIntracellular fluxes measurements in E. coli central carbon metabolismObtained using NMR spectroscopy in C labeling experiments

    5 knockouts: pyk, pgi, zwf, gnd, ppc in Glycolysis and Pentose Phosphate pathwaysGlucose limited and Ammonia limited medias

    FBA wild-type predictions above 90% accuracy

    13Emmerling, M. et al. (2002), Hua, Q. et al. (2003), Jiao, Z et al. (2003), Peng, et. al (2004)

  • Knockout Flux Predictions

    ROOM flux predictions are significantly more accurate than MOMA and FBA in 5 out of 9 experimentsROOM steady-state growth rate predictions are significantly more accurate than MOMA

  • ROOM vs. MOMA ROOM predicts metabolic steady-state after adaptationProvides accurate flux predictionsPreserved flux linearityFinds alternative pathwaysPredicts steady-state growth rates MOMA predicts transient metabolic states following the knockout Provides more accurate transient growth rates

  • Additional Constraints

    Transcriptional regulatory constraints (Covert, et. al., 2002)Boolean representation of regulatory networkUsed to predict growth, changes in expression levels, simulate courses of batch cultures

    Energy balance analysis (Beard, et. al., 2002)Loops are not feasible according to thermodynamic principles resulting in a non-convex solution space

  • Additional Constraints: Slow Changes in the EnvironmentTimescales of cellular process are shorter than those of surrounding environmentGenerate dynamic curves to simulate batch experiments (Varma, et. al., 1994)

  • Thank you for listeningQuestions

    Optimization methods are used for several purposes,

    A pathway is a metabolic state which satisfies stoichiometric and thermodynamic constrains.

    Extreme pathways and elementary flux modes are both unique sets of pathways.Both type of pathways are minimal i.e. there is no other pathway with a subset of the reactions.Only in extreme pathways the pathways are systematically independent. No EP can be described as a combination of the others. EP are the basic of the convex space.

    Each metabolic state can be described as non-negative combination of EPs.

    EP and EM are the same where all exchange fluxes are unidirectional.Example network.There was an ongoing debate on which method is better in describing the phenotypic potential of the network. In a recent (and not so convincing paper) of the main supporters of both methods it was agreed that:The main advantage of EP is that there are far fewer of them in a typical network and that they have a mathematical justification being the basis of the convex space. EM are suitable for studying network properties such as redundancy as it includes all minimal pathways.Both methods were used in numerous papers to study different network properties:The redundancy of the network in producing different nutrients (amino acids) was studied by Price et al. They estimated the redundancy by the number of pathways that involved the synthesis of the nutrients.Robustness of the network to gene deletion was studied by Stelling el al. They estimated robustness to gene knockouts as the percentage of EM that are still feasible after genes are knocked out. The percentage of EM shows high correlation with measurements of lethality. Furthermore, they have shown that the maximal growth is robust to reduction in the number of feasible pathways.Papin have used EP to find sets of reaction that are always activated together. These sets are called Enzyme subsets.Predicting the metabolic state of an organism that undergo gene knock in the lab is harder than predicting the metabolic state of wild-type strain.

    Minimization Of Metabolic Adjustment (MOMA) developed by Segre is an optimization method which assumes that the metabolic state of the knocked-out organism should be close to that of the wild-type strain as the mutated organism did no evolve to maximize its growth. MOMA uses an Euclidian metric to measure the distance between the metabolic state of the wild-type and knocked-out strains.

    Regulatory On/Off Minimization (ROOM) is based on the same assumption (that the metabolic state of the knocked-out strain should be close to that of the wild-type strain), but uses a different metric to measure the distance that is based on the number of Boolean flux changes from the metabolic state of the wild-type strain.

    FBA was used to study metabolic states after the knockouts of dispensable genes. (genes with no observable change in growth following their knockout). They found that only 32% of the genes had non-zero flux contribute to biomass production under rich media. In a work we did we found that if you consider alternative non-optimal FBA solutions, 90% of the genes may contribute in rich media. Therefore, their function role can be found in rich media without the need to look at other medias.

    OptKnock is an optimization methods developed by Burgard which identifies configurations of genes whose knockout may cause the overproduction of desired nutrients (amino acids). This method used FBA or MOMA to predict the metabolic state of the knocked-out strain.

    OptStrain is a newer method which identifies a specific strain which along with a set of genes that can be added or removed from its genome for the same biotechnological purposes.

    There are various experiments showing that following a gene knockout or some kind of environmental perturbation there are large-scale changes in gene expression levels. For example Ideker..However, these experiments and others have shown that following an adaptation period the cell converges to steady-state which is close to that of the wild-type strain. For example we see on the figure on the left (taken from Gasch et al) the average expression ratio following a perturbation in the from of stressful environment which is characterized by sharp changes in expression levels which is followed by a steady-state that is close to the wild-type.A recent experiment by Fong et al have shown that the growth rate of the organism drops immediately following the gene knockout and the gradually increase and reaches a steady-state growth rate which is close or even higher to that of the wild-type.Based on these findings, weve developed the algorithm Regulatory On/Off Minimization to predict the metabolic steady-state following the adaptation of the knocked-out strain to the knockout.

    ROOM is based on the assumption that an organism adapting to gene knockout, minimizes the set of regulatory changes that it makes. Assigning an equal cost to each regulatory change, ROOM tries to minimize the total cost of adapting to the knockout.

    Now, since regulatory constraints are not explicitly incorporated into the metabolic model, ROOM identifies Boolean regulatory changes implicitly by identifying Boolean changes in flux. Therefore ROOM aims to find feasible flux distribution with a minimal number of Boolean flux changes from the flux distribution of the wild-type strain.ROOM is implemented using Mixed Integer Linear Programming (MILP), which is an optimization algorithm similar to LP that allows variables to be defined as integers.

    We define integer Boolean variables yi that specify whether the ith flux change from the wild-types flux distribution. In order to minimize the number of flux changes from the wild-types flux distribution, ROOM is formulated as to minimize the sum of yis.

    To constrain the ith flux to its wild-type value if and only if yi equals zero we use the two constraints labeled distance constraints. When yi equals zero then the distance constraints fix vi to its wild-type value wi, and when yi equals one, the distance constraints do not impose new constraints on vi.

    MILP is NP-hard meaning that the running time of any solver would be exponentially dependent on the size of the problem. Since the size of the problem is determined by the number of constraints which is in the order of hundreds the problem is computationally intractable.In our analysis we have used two relaxation methods that provide a reasonable running time:1. One uses Linear Programming by relaxing the Boolean constraints and allowing the Boolean variables to take any value between zero and one. 2. The other relaxation is in proximity to the wild-type flux distribution, looking for a flux distribution that only minimizes the number of significant flux changes.

    Wild-types solution.MOMAs solution diverges flux.ROOMs solution finds a short alternative pathway. ROOM preserves linearity of flowROOMs solution is with maximal growthTo understand this, lets go back to the pseudo growth reaction that represents the growth rate of the organism. The growth reaction drains out from the cell the metabolites that are required for growth. This number of such metabolites is in the order of tens. Any change in growth rate that is represented as a change in flux through this reaction requires many more changes to preserve mass balance constraint. So ROOM by using a metric that minimizes the number of flux changes implicitly prefers solutions that do not reduce the growth rate.Therefore, we get that although the organism did not evolve to maximize its growth under all possible knockout configurations, the evolved regulatory mechanism that cope with the knockout by minimizing the number of regulatory changes may work to that effect.To compare the flux predictions obtained by ROOM, MOMA and FBA on a real metabolic network, we searched the literature for experimental flux measurements in E. colis central carbon metabolism.

    Weve found experimental flux measurements in 4 different knocked-out strains on Glycolysis and Pentose-Phosphate pathway as shown in the figure. The flux measurements were done on different glucose-limited and ammonia-limited medias. All measurements were obtained in experiments using NMR spectroscopy with isotope carbon sources.

    For each experiment, we start by applying FBA to predict the flux distribution of the wild-type strain. The accuracy of the predictions was above 90% for all experiments. We note that the accuracy is calculated as the the Pearson correlation between the predicted and measured fluxes.Comparing the flux predictions obtained by ROOM with MOMA and FBA for the knocked-out strains we get that, in 4 out of 8 experiments, ROOM flux predictions were significantly more accurate than MOMA and FBA. This is shown in the left graph showing the accuracy of the predictions obtained by ROOM (red), MOMA (green) and FBA (blue). Only in 1 out of the 8 experiments, MOMAs prediction is significantly more accurate than ROOMs (we will discuss this later on).

    Furthermore, weve found that the growth rate predictions obtained by ROOM and FBA are significantly more accurate than MOMA in 4 out of the 8 experiments. The graph on the right shows the error in the growth rate prediction as obtained by the different algorithms. We see that ROOMs and FBAs errors are less than 15% in all cases, while MOMAs error (in green) reaches 50% and 90%.

    Both ROOM and MOMA predicts a flux distribution of the knocked-out organism starting from a specific flux distribution of the wild-type strain. We note that the starting from alternative FBA solutions for the wild-type gives almost the same results that we present here.