modeling biological networks lecture3

Upload: genomius

Post on 10-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Modeling Biological Networks Lecture3

    1/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20081

    Modeling Biological Networks

    Dr. Carlo CosentinoSchool of Computer and Biomedical Engineering

    Department of Experimental and Clinical MedicineUniversit degli Studi Magna Graecia

    Catanzaro, [email protected]://bioingegneria.unicz.it/~cosentino

  • 8/8/2019 Modeling Biological Networks Lecture3

    2/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20082

    Outline

    Classification of biological networks

    Modeling metabolic networks

    Modeling gene regulatory networks

    Inferring gene regulatory networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    3/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20083

    Types of Biological Network

    Several different kinds of biological network can be distinguished at themolecular level

    Gene regulatory

    Metabolic

    Signal transduction

    Proteinprotein interaction

    Moreover other networks can be considered as we move to differentdescription levels, e.g.

    Immunological

    Ecological

    Here we will focus exclusively on molecular processes that take place withinthe cell

  • 8/8/2019 Modeling Biological Networks Lecture3

    4/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20084

    Goals

    A major challenge consists in identifying with reasonable accuracy thecomplex macromolecular interactions at the gene, metabolite and proteinlevels

    Once identified, the network model can be used to

    simulate the process it represents

    predict the features of its dynamical behavior

    extrapolate cellular phenotypes

  • 8/8/2019 Modeling Biological Networks Lecture3

    5/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20085

    Graphs

    A very useful formal tool for describing and visualizing

    biological networks is represented by graphs

    A graph, or undirected graph, is an ordered pair

    G=(V,E), where V is the set of the vertices, or nodes,and E is the set of unordered pairs of distinct

    vertices, called edges or lines

    For each edge {u,v}, the nodes u and v are said to be

    adjacent

    We have a directed graph, or digraph, if E is a set of

    ordered pairs

    In digraphs, the indegree, kin, (outdegree, kout) of a

    node is the number of edges incident to (from) thatnode

    Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004

  • 8/8/2019 Modeling Biological Networks Lecture3

    6/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20086

    Topological Characteristics

    The degree distribution ,P(k), gives the probabilitythat a selected node has exactly k links

    It allows us to distinguish between different classes

    of networks (see next slide)

    The clustering coefficient of a node I, CI, measuresthe aggregation of its adjacents (number of

    triangles passing through node I)

    C(k) is the average clustering coefficient of all nodeswith k links

    Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004

  • 8/8/2019 Modeling Biological Networks Lecture3

    7/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20087

    ErdsRnyi Random Networks

    The ErdsRnyi model of a random network startswith N nodes and connects each pair of nodes withprobabilityp

    The degree follows a Poisson distribution, thus manynodes have the same number of links (close to the

    average degree

    The tail decreases exponentially, which indicates thatnodes with kvery different from the average are rare

    The clustering coefficient is independent of a nodesdegree

    Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004

  • 8/8/2019 Modeling Biological Networks Lecture3

    8/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20088

    ScaleFree Networks

    Scalefree networks are characterized by a powerlawdegree distribution

    The probability that a node hask

    links followsP(k)~k-

    ,where is the degree exponent

    The probability that a node is highly connected isstatistically more significant than in a random graph

    In the BarabsiAlbert model, at each time point a nodewith M links is added to the network, which connects toan already existing node I with probabilityi=ki/jkj

    The underlying mechanism is that nodes with many linkshave higher probability of getting more (this is alsoreferred to aspreferential attachment)

    Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004

  • 8/8/2019 Modeling Biological Networks Lecture3

    9/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20089

    Hierarchical Networks

    A hierarchical structure arises in systems that combinemodularity and scalefree topology

    The hierarchical model is based on the replication of a

    small cluster of four nodes (the central ones)

    The external nodes of the replicas are linked to thecentral node of the original cluster

    The resulting network has a powerlaw degreedistribution, thus it is scalefree

    The average clustering coefficient scales with the degree

    followingC(k )~k-1

    Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004

  • 8/8/2019 Modeling Biological Networks Lecture3

    10/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200810

    Graphs of Biological Networks

    Depending on the kind of biological network, the edges and nodes of thegraph have different meaning

    Metabolic network

    nodes: metabolic product, edge: a reaction transforming A into B

    Transcriptional regulation network (proteinDNA)

    nodes: genes and proteins, edge: a TF regulates a gene

    Protein protein network

    nodes: proteins, edge: interaction between proteins

    Gene regulatory networks (functional association network)

    nodes: genes, edge: expressions of A and B are correlated

  • 8/8/2019 Modeling Biological Networks Lecture3

    11/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200811

    Topology of Biological Networks

    An extensive commentary has been published by Albert in 2005, reviewingliterature on the topology of different kinds of biological networks

    Experimental evidences are reviewed for metabolic, transcriptionalregulatory, signal transduction, functional association networks

    All of the considered networks approximately exhibit powerlaw degree

    distribution, at least for the in or for the outdegree

    For instance, transcriptional regulation networks exhibit a scalefree outdegree distribution, signifying the potential of transcription factors toregulate multiple targets

    On the other hand, their indegree is a more restricted exponentialfunction, suggesting that combinatorial regulation by several TFs is lessfrequent

    Albert, Scalefree networks in cell biology, Journal of Cell

    Science 118(21), 49474957, 2005

  • 8/8/2019 Modeling Biological Networks Lecture3

    12/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200812

    PP Interaction Network in Yeast

    This network is based on yeast twohybrid

    experiments

    Few highly connected nodes (hubs) hold

    the network together

    Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004

    The color of a node indicates thephenotypic effect deriving from

    removing the correspondingprotein

    red: lethal

    green: nonlethal

    orange: slow growth

    yellow: unknown

  • 8/8/2019 Modeling Biological Networks Lecture3

    13/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200813

    Outline

    Classification of biological networks

    Modeling metabolic networks

    Modeling gene regulatory networks

    Inferring gene regulatory networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    14/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200814

    Metabolic Reactions

    Living cells require energy and material for

    building up membranes

    storing molecules

    replenishing enzymes

    replication and repair of DNA

    movement

    Metabolic reactions can be divided in two categories

    Catabolic reactions: breakdown of complex compounds to get energy

    and building blocksAnabolic reactions: assembling of the compounds used by the cellular

    mechanisms

  • 8/8/2019 Modeling Biological Networks Lecture3

    15/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200815

    Basic Concepts of Metabolism

    Historically metabolism is the part of cell functioning that has been studiedmore thoroughly during the last decades

    This implies that several well assessed mathematical tools exist for

    describing this kind of networks

    Enzyme kinetics investigates the dynamic properties of the individual

    reactions in isolation

    Stoichiometric analysis deals with the balance of compound productionand degradation at the network level

    Metabolic control analysis describes the effect of perturbations in the

    network, in terms of changes of metabolites concentrations

    Most of the tools used in the quantitative study of metabolic networks canalso be applied to other types of networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    16/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200816

    Glycolysis

    We will exploit the casestudy of glycolysis in yeast in order to illustrate the

    theoretical concepts introduced hereafter

    The pathway shown below is part of the glycolysis process

    Hynne et al,Fullscale model of glycolysis in Saccharomyces cerevisiae (2001) Biophys. Chem. 94, 121163

    v1: hexokinase

    v2: consumption of glucose6phosphatein other pathways

    v3: phosphoglucoisomerase

    v4: phosphofructokinase

    v5: aldolase

    v6: ATP production in lower glycolysis v7: ATP consumption in other pathways v8: adenylate kinase

    List of Reactions

  • 8/8/2019 Modeling Biological Networks Lecture3

    17/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200817

    ODE Model of Glycolysis

    The system of ODEs describing the pathway is

  • 8/8/2019 Modeling Biological Networks Lecture3

    18/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200818

    ODE Model with Constant Glucose

    The kinetic rates as functions ofreactants can be derived by applying themodels presented in the previous lecture

    Model Parameters

  • 8/8/2019 Modeling Biological Networks Lecture3

    19/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200819

    Stoichiometric Analysis

    The basic elements considered in stoichiometric analysis of metabolicnetworks are

    The concentrations of the various species

    The reactions or transport processes affecting such concentrations

    The stoichiometric coefficients denote the proportion of substrate and

    product molecules involved in a reaction

    For instance, if we consider the reaction

    the stoichiometric coefficients ofS1, S2, P are 1,1,-2 respectively

  • 8/8/2019 Modeling Biological Networks Lecture3

    20/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200820

    Stoichiometric Analysis

    The change of concentrations in time can be described by means of ODEs

    For the simple reaction above we have

    This means that the degradation ofS1with rate v is accompanied by thedegradation ofS2with the same rate and by the production of P with adouble rate

  • 8/8/2019 Modeling Biological Networks Lecture3

    21/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200821

    Stoichiometric Matrix

    In general, for a system ofm substances and rreactions, the systemdynamics are described by

    The number nij is the stoichiometric coefficient of the i-th metabolite in the

    j-th reaction

    For the sake of simplicity, we assume that the changes of concentrations areonly due to reactions (i.e. we neglect the effect of convection or diffusion)

    We can then define the stoichiometric matrix

    in which columns correspond to reactions and rows to concentrationvariations

  • 8/8/2019 Modeling Biological Networks Lecture3

    22/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200822

    Stoichiometric Model

    The mathematical description of the metabolic network can be given inmatrix form as

    where

    S=(S1,,Sm)T is the vector of concentration values

    v=(v1,,vr)T

    is the vector of reaction rates

    If the system is at steadystate (that is dSi/dt= 0 for i=1,,m) we can alsodefine the vector of steadystate fluxes, J=(J1,,Jr)

    T

    Finally, the model involves a certain number of parameters, thus we candefine also a parameter vector, p=(p1,,p)

    T

  • 8/8/2019 Modeling Biological Networks Lecture3

    23/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200823

    Stoichiometric Model of Glycolysis

    For the glycolysis model we have

  • 8/8/2019 Modeling Biological Networks Lecture3

    24/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200824

    Analysis of the Stoichiometric Matrix

    A relevant information that can be readily derived from the Nmatrix iswhich combinations of individual fluxes are possible at steadystate

    The system of algebraic eqs admits a nontrivial solution only ifrank(N)

  • 8/8/2019 Modeling Biological Networks Lecture3

    25/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200825

    An Example

    Let us consider the simple network

    The stoichiometric matrix is N=(1 1 1)

    and the steadystate fluxes are described by the linear combination

  • 8/8/2019 Modeling Biological Networks Lecture3

    26/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200826

    Null Rates at SteadyState

    For the glycolysis model we have r=8 and rank(N)=5, thus the base of thenull space ofN is composed of three vectors

    Note that the entries in the last row are all zero; this means that the net ratefor that reaction is null at steadystate

    Hence, at steadystate we can neglect the reaction v8

  • 8/8/2019 Modeling Biological Networks Lecture3

    27/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200827

    Unbranched Pathways

    Another property that can be readily derived is the presence of unbranchedpathways

    In this case, the net rate of all the reactions in the pathway must be equal

    The entries for the second and third reaction

    in the matrix Kare always equal

    This implies that the fluxes through reactions2 and 3 must be equal at steadystate

  • 8/8/2019 Modeling Biological Networks Lecture3

    28/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200828

    Elementary Flux Modes

    A pathway can be defined as a set of metabolic reactions linked by commonmetabolites

    It is not straightforward to recognize pathways in metabolic maps that have

    been reconstructed from experimental evidences

    This problem is formalized in the concept of finding the Elementary FluxModes (EFMs)

    The aim is to find which are the admissible direct routes for producing acertain metabolite starting from another one

    In order to have an idea of the usefulness of such mathematical methods,

    we can have a glimpse at a typical wholeorganismscale metabolic network

  • 8/8/2019 Modeling Biological Networks Lecture3

    29/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200829

    Metabolic Network in Yeast

    Palsson, Systems Biology: Properties of Reconstructed Networks, 2006

  • 8/8/2019 Modeling Biological Networks Lecture3

    30/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200830

    Elementary Flux Modes

    Without going into the mathematical details, we can have a further insightby looking at the elementary flux modes of two simple networks

    A factor that greatly influences the EFMs is the reversibility of the singlereactions

  • 8/8/2019 Modeling Biological Networks Lecture3

    31/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200831

    Applications of EFM Analysis

    EFMs can be used to

    infer the range of metabolic pathways in the network

    test a set of enzymes for production of a desired compound, and to findthe most convenient pathway

    reconstruct metabolism from annotated genome sequences and analyzethe effects of enzyme deficiency

    reduce drug effects and identify drug targets

  • 8/8/2019 Modeling Biological Networks Lecture3

    32/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200832

    Flux Balance Analysis

    Flux Balance Analysis (FBA) deals with the problem of finding theoperative modes of metabolic networks subject to three kinds of constraints

    1) The operative mode is assumed to be at steadystate

    2) The operative mode must respect the (ir)reversibility of the reactions

    3) The enzyme catalytic activity in each reaction is limited to an

    admissible range, i.e. i vii

    Additional constraints may be imposed by biomass composition or otherexternal conditions

  • 8/8/2019 Modeling Biological Networks Lecture3

    33/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200833

    Flux Balance Constrained Optimization

    Such constraints confine the steadystate fluxes to a feasible set, but usuallydo not yield a unique solution

    Hence, the determination of a particular metabolic flux distribution can be

    cast as a linear optimization problem

    Maximize an objective function

    subject to the constraints given above

  • 8/8/2019 Modeling Biological Networks Lecture3

    34/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200834

    Conservation Relations

    If a substance is neither added nor removed from the reaction system, itstotal concentration remains constant

    This property can be derived by analyzing the null space ofNT, defined by

    the matrix G such that

    The latter implies

    The dimension of the null space is m-rank(N)

    GS= GNv = 0 GS= const

    GN = 0

    C i i Gl l i

  • 8/8/2019 Modeling Biological Networks Lecture3

    35/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200835

    Conservation in Glycolysis

    For the glycolysis example we have

    which means the sum of concentrations of AMP, ADP, ATP remainsconstant

    The conservation relations can be used to simplified the dynamical model,by exploiting the algebraic equations that express the conservationconstraints to express some variables as functions of the others

    M t b li C t l A l i

  • 8/8/2019 Modeling Biological Networks Lecture3

    36/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200836

    Metabolic Control Analysis

    Metabolic Control Analysis (MCA) deals with the sensitivity of the steadystate properties of the network to small parameter changes

    It can be also applied to models of other kinds of network, like signaling

    pathways or gene expression

    Issues addressed by MCA

    Predict properties of the network from knowledge of individual

    components

    Find which specific step has the greatest influence on a flux or steadystate concentration or reaction rate

    Find which is the best target reaction to treat a metabolic disorderThese questions are very relevant in biotechnological production

    processes and health care

    B i C t f MCA

  • 8/8/2019 Modeling Biological Networks Lecture3

    37/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200837

    Basic Concepts of MCA

    The relations between steadystate properties and model parameters areusually highly nonlinear

    There is no general theory predicting the effect of large parameter changes

    The MCA approach deals with small parameter changes

    Under this assumption, the model can be approximated, in theneighborhood of the steadystate, with a linear one

    Given the linearized model it is possible to derive some indexes describingthe properties above mentioned, e.g. elasticity coefficients, controlcoefficients, response coefficients

    O tline

  • 8/8/2019 Modeling Biological Networks Lecture3

    38/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200838

    Outline

    Classification of biological networks

    Modeling metabolic networks

    Modeling gene regulatory networks

    Inferring gene regulatory networks

    Gene Regulatory Networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    39/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200839

    Gene Regulatory Networks

    A protein synthesized from a gene can serve as a transcription factor foranother gene, as an enzyme catalyzing a metabolic reaction, or as acomponent of a signal transduction pathway

    Apart from DNA transcription regulation, gene expression may becontrolled during RNA processing and transport, RNA translation, and the

    posttranslational modification of proteins

    Therefore, gene regulatory networks (GRNs) involve interactions betweenDNA, RNA, proteins and other molecules

    A suitable way to dominate this complexity may consist of using functionalassociation networks

    In this networks the edges of the corresponding graph do not representchemical interactions, but functional influences of one gene on the other

    Example of a GRN

  • 8/8/2019 Modeling Biological Networks Lecture3

    40/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200840

    Example of a GRN

    A toy regulatory network of three genes is depicted in the cartoon below

    De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000

    Modeling GRNs

  • 8/8/2019 Modeling Biological Networks Lecture3

    41/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200841

    Modeling GRNs

    In what follows we will present an overview of the models used to describeGRNs

    Two main issues have to be taken into account when choosing a modeling

    framework

    Computational requirements for simulation

    Available methods for inferring the network topology

    Bayesian Networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    42/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200842

    Bayesian Networks

    In the formalism of Bayesian Networks, the structure of a genetic regulatory

    system is modeled by a directed acyclic graph G=V,E

    The vertices iV, i=1,,n, represent genes expression levels and

    correspond to random variablesXi.

    For eachXi, a conditional distribution p(Xi |parents(Xi)) is defined, whereparents(Xi) denotes the direct regulators ofi

    The graph G and the set of conditional distributions uniquely specify a jointprobability distributionp(X)

    Independency in BN

  • 8/8/2019 Modeling Biological Networks Lecture3

    43/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200843

    Independency in BN

    IfXi is independent ofYgiven Z, where Yand Z are set of variables, we canstate a conditional independency

    For every node i in G,

    Hence, the joint probability distribution can be decomposed into

    i (Xi;Y|Z)

    i (Xi;non descendant(Xi)|parents(Xi))

    p(X) =nY

    i=1

    p(Xi|parents(Xi))

    Example of BN

  • 8/8/2019 Modeling Biological Networks Lecture3

    44/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200844

    Example of BN

    Here we illustrate the formulation of the BN model for a simple network

    Two graphs are said to be equivalent if the imply the same set of

    independencies; they cannot be distinguished by observation on X

    De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000

    Features of BNs

  • 8/8/2019 Modeling Biological Networks Lecture3

    45/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200845

    Features of BNs

    There is no need to specify a single value for each parameter of the model,but rather a distribution over the admissible range of values is assigned

    This characteristic helps in avoiding overfitting, which is common in the

    presence of a small data set and a large number of parameters

    It is a statistical modeling approach, which nicely fits the stochastic natureof biological systems

    BNs are static models, although it is possible to take into account dynamicalaspects through an extension of this theory, namely dynamical bayesiannetworks (DBNs)

    Boolean Networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    46/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200846

    Boolean Networks

    In the framework of Boolean Networks , the expression level of a gene canattain only two values, that is active (on, 1) or inactive (off, 0)

    Accordingly, the interactions between elements of the network are

    represented by Boolean functions

    Smolen, Baxter, Byrne, Mathematical model of gene networks, Neuron 26, 567 580, 2000

    Features of Boolean Networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    47/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200847

    Features of Boolean Networks

    Deterministic description

    Very easy to build the model and to simulate it, even for very large networks

    They provide only a coarsegrained description of the network behavior,thus not useful for a more detailed analysis of the regulatory mechanisms

    ODE Models

  • 8/8/2019 Modeling Biological Networks Lecture3

    48/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200848

    ODE Models

    We have seen that the mechanistic ODE approach has been widelyexploited since the beginning of the last century for modeling biochemicalreactions

    When the order of the system increases, classical nonlinear ODE modelsbecome hardly tractable, in terms of parametric analysis, numerical

    simulation and especially for identification purposes

    In order to overcome this limitations, alternative modeling approaches havebeen devised for application to biological networks

    PowerLaw Models

  • 8/8/2019 Modeling Biological Networks Lecture3

    49/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200849

    The basic concept underlying powerlaw models is the approximation ofclassical ODE models by means of a uniform mathematical structure

    SSystems

  • 8/8/2019 Modeling Biological Networks Lecture3

    50/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200850

    y

    Ssystems are a particular class of powerlaw models in which fluxes areaggregated

    ( ) ( ) ( )===

    n

    j

    h

    ji

    n

    j

    g

    jii jiji tXtX

    dt

    tdX

    11

    ,,,

    Features of S - Systems

  • 8/8/2019 Modeling Biological Networks Lecture3

    51/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200851

    y

    Ssystems feature low computational requirements

    Their structural homogeneity allows to easily identify the model parameters

    from steadystate data by means of logarithmic linearization

    Generalized aggregation may introduce a loss of accuracy

    Violation of biochemical fluxes concentration

    It may conceal important structural features of the network

    PiecewiseLinear Models

  • 8/8/2019 Modeling Biological Networks Lecture3

    52/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200852

    Another class of approximate models based on ODEs is that of piecewise-linear (PWL) models

    The basic idea is to approximate sigmoidal curves through step functions

    The model takes the general form

    where

    and the functions bil() are boolean valued regulation functions expressed in

    terms of step functions

    Casey, De Jong, Gouz, J. Math. Biol. 52, 2756, 2006

    Features of PWL Models

  • 8/8/2019 Modeling Biological Networks Lecture3

    53/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200853

    Numerical simulation studies have shown that PWL models properlyapproximate the behavior of the corresponding original nonlinear ones

    A drawback of this class of systems is that their behavior is very difficult to

    analyze from a rigorous point of view

    PWL models, indeed, can exhibit singular steadystates, that isequilibrium points lying on the threshold surfaces

    Moreover it is known that the stability ofswitching systems cannot be reduced to theanalysis of the stability of the linear systemsin each sub-space

    Outline

  • 8/8/2019 Modeling Biological Networks Lecture3

    54/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200854

    Classification of biological networks

    Modeling metabolic networks

    Modeling gene regulatory networks

    Inferring gene regulatory networks

    Inferring Bayesian Networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    55/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200855

    In order to reverse engineering a Bayesian network model of a genenetwork, we must find the directed acyclic graph that best describes the data

    To do this, a scoring function is chosen, in order to evaluate the candidate

    graphs Gwith respect to the data setD

    The score can be defined using Bayes rule

    If the topology of the network is partially known, the a prioriknowledge can

    be included in P(G)

    The most popular scores are the Bayesian Information Criterion (BIC) orBayesian Dirichlet equivalence (BDe)

    They incorporate a penalty for complexity to cope with overfitting

    P(G|D) =P(D|G)P(G)

    P(D)

    Inferring Bayesian Networks

  • 8/8/2019 Modeling Biological Networks Lecture3

    56/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200856

    The evaluation of all possible networks involves checking all possiblecombinations of interactions among the nodes

    This problem is NP-hard, therefore heuristic methods are used, like the

    greedyhill climbing approach, the MarkovChain Monte Carlo method, orSimulated Annealing

    A software tool for inferring both BNs and DBNs is Banjo, developed bythe group of Hartemink(http://www.cs.duke.edu/~amink/software/banjo)

    Yu et al,Advances to bayesian network inference for generating causal networks fromobservational biological data, Bioinformatics 20: 3594-3603, 2004

    InformationTheoretic Approaches

  • 8/8/2019 Modeling Biological Networks Lecture3

    57/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200857

    Information theoretic approaches use a generalization of the Pearsoncorrelation coefficient

    used in hierarchical clustering, namely the Mutual Information (MI), whichis computed as

    where the marginal and joint entropy are defined, respectively, as

    H(X) = XxX

    p(x)logp(x)

    H(X, Y) = X

    xX,yY

    p(x, y)logp(x, y)

    MI(X;Y) = H(X) + H(Y)H(X, Y)

    InformationTheoretic Approaches

  • 8/8/2019 Modeling Biological Networks Lecture3

    58/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200858

    From the definitions above it follows that

    MI becomes zero if the two variables are statistically independent

    A high value of MI indicates that the variables are nonrandomly

    associated to each other

    MIij=MIji therefore the resulting reconstructed graph is undirected

    An important characteristic is that, since the approach is based on the

    independence of samples, it is not suitable for application to timeseries (itcan applied only to steadystate data sets)

    A software tool based on MutualInformation theory is ARACNE,

    described inBasso et al, Reverse engineering of regulatory networks in human B cells, Nature Genetics37(4): 382-90, 2005

    Inference of ODE Models

  • 8/8/2019 Modeling Biological Networks Lecture3

    59/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200859

    The identification of the structure and parameters of mechanistic nonlinearODE models is a very demanding task for nontrivial networks, both froma theoretical point of view and in terms of computational requirements

    A feasible approach is based on the use of linearized dynamical models,which yield good results when applied to data sets obtained through

    perturbation experiments

    Several methods have been developed from the groups of Gardner and diBernardo, dealing both with steadystate (NIR, MNI) and timeseries data(TSNI)

    TimeSeries Network Identification

  • 8/8/2019 Modeling Biological Networks Lecture3

    60/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200860

    The TSNI algorithm is based on the linearized model

    The data set consists of the expression level of N genes, sampled at M time

    points with a fixed sampling interval

    The experimental data are derived from perturbation experiments (e.g. by

    treatment with a compound or gene overexpression/downregulation)

    A linear regression algorithm is used to estimate the coefficients of thedynamical matrix, aij, and those of the input matrix, bi

    A non-zero coefficient aij indicates an edge in the (directed) graph, betweennodes i andj, whereas a nonzero bij indicates that the node i is directlyaffected by the perturbation

    i = 1, . . . , N

    k = 1, . . . ,M

    Bansal, Della Gatta, di Bernardo, Bioinformatics 22: 815822

    Features of TSNI

  • 8/8/2019 Modeling Biological Networks Lecture3

    61/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200861

    For small networks (tens of genes), TSNI is able to correctly infer thenetwork structure

    Besides topological inference, ODE-based methods are also wellsuited for

    uncovering unknown targets of perturbations, even in complex networks

    It is not possible to exploit prior knowledge about the network topology,because this would require the exact knowledge of nonphysical parameters

    LMI-based Inference Approach

  • 8/8/2019 Modeling Biological Networks Lecture3

    62/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200862

    The basic idea is improving linear ODEbased methods by exploitingavailable prior knowledge about the network topology (as in BNs)

    The identification of the parameters aij, bij, is cast as a convex optimization

    problem, in the form of linear matrix inequalities (LMIs)

    This formulation allows to reduce the admissible solution space by assigningsign constraints to the coefficients corresponding to known interactions

    x1 x2

    x3 x4???x4

    ??>x3

    ?x2

    ???x1

    x4x3x2x1

    Cosentino et al, IET Systems Biology 1(3): 164173, 2007

    activation

    inhibition

    Features of the LMI-based Approach

  • 8/8/2019 Modeling Biological Networks Lecture3

    63/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200863

    Numerical tests show that exploitation of prior knowledge greatly improvesthe reconstruction performances

    The method can exploit qualitative a prioriknowledge, as well as quantitative

    information

    Such knowledge is exploited within the reconstruction, not for a posteriorievaluation

    The optimization problem is convex, therefore the optimal solution, interms of data-interpolation, can be always found

    The latter feature, on other hand, implies a higher tendency to overfitting

    Hard to apply to largescale networks (more than 100 nodes), due to thecomputational load deriving from the high number of constraints

    Choice of the Inference Algorithms

  • 8/8/2019 Modeling Biological Networks Lecture3

    64/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200864

    In a recent study, Bansal et al have compared the performance obtainedusing different modeling formalisms (BNs, MI, hierarchical clustering,ODE-based models)

    Bansal et al,How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007

    Results on Experimental Data Sets

  • 8/8/2019 Modeling Biological Networks Lecture3

    65/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200865

    Bansal et al,How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007

    Results Discussion

  • 8/8/2019 Modeling Biological Networks Lecture3

    66/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200866

    The different techniques considered in the review infer networks thatoverlap for only 10% in the best case

    Furthermore, the edges predicted by more than one method are not more

    accurate than those inferred by a single one

    On the other hand, taking the union of the interactions found by all themethods would yield an even larger number of false positives

    Local perturbation experiments (i.e. affecting one or few genes) seems toyield better results than global ones (perturbations on a high number ofgenes)

    Remarks on Inference Algorithms

  • 8/8/2019 Modeling Biological Networks Lecture3

    67/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200867

    A relevant issue, that is common to all inference algorithm, is that theproblem is very often overdetermined

    All modeling formalisms, indeed, involve a large number of parameters,

    whereas the number of samples is usually limited (curse of dimensionality)

    Possible solutions

    Devise methods to exploit different data sets

    Reduce the dimensionality of the problem, via data preprocessing, e.g.

    clustering algorithm

    elimination of statistically nonexpressed nodes

    Concluding Remarks

  • 8/8/2019 Modeling Biological Networks Lecture3

    68/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200868

    Regardless to the adopted formalism, good inference performances can beachieved only by exploiting the available prior knowledge from biologicalliterature

    Despite the great concern about the topological characterization ofbiological networks, much has still to be done in terms of exploitation of

    such features in the inference process

    Several other approaches exist, both for modeling and inferring biological

    networks (discrete events, formal languages, machine learning methods, etc.)

    References

  • 8/8/2019 Modeling Biological Networks Lecture3

    69/69

    Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200869

    Klipp et al, Systems Biology in Practice, Wiley-VCH, 2005

    Palsson, Systems Biology: Properties of Reconstructed Networks, Cambridge University Press, 2006

    Barabasi, Oltvai,Network Biology: Understanding the Cells Functional Organization, Nature Review

    Genetics 101(5), 101114 , 2004

    Hynne et al, Fullscale model of glycolysis in Saccharomyces cerevisiae(2001) Biophys. Chem. 94, 121163

    De Jong,Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000

    Smolen, Baxter, Byrne,Mathematical model of gene networks, Neuron 26, 567 580, 2000

    Casey et al, Piecewise linear Models of Genetic Regulatory Networks, Equilibria and their Stability, J. Math.

    Biol. 52, 2756, 2006

    Bansal et al, Inference of gene regulatory networks and compound mode of action from time course gene

    expression profiles, Bioinformatics 22: 815822

    Bansal et al, How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007

    Cosentino et al, Linear Matrix Inequalities Approach to Reconstruction of Biological Networks, IETSystems Biology 1(3): 164173, 2007