modeling biological networks lecture3

8/8/2019 Modeling Biological Networks Lecture3

1/69

Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20081

Modeling Biological Networks

Dr. Carlo CosentinoSchool of Computer and Biomedical Engineering

Department of Experimental and Clinical MedicineUniversit degli Studi Magna Graecia

Catanzaro, [email protected]://bioingegneria.unicz.it/~cosentino


2/69


Outline

Classification of biological networks

Modeling metabolic networks

Modeling gene regulatory networks

Inferring gene regulatory networks


3/69


Types of Biological Network

Several different kinds of biological network can be distinguished at themolecular level

Gene regulatory

Metabolic

Signal transduction

Proteinprotein interaction

Moreover other networks can be considered as we move to differentdescription levels, e.g.

Immunological

Ecological

Here we will focus exclusively on molecular processes that take place withinthe cell


4/69


Goals

A major challenge consists in identifying with reasonable accuracy thecomplex macromolecular interactions at the gene, metabolite and proteinlevels

Once identified, the network model can be used to

simulate the process it represents

predict the features of its dynamical behavior

extrapolate cellular phenotypes


5/69


Graphs

A very useful formal tool for describing and visualizing

biological networks is represented by graphs

A graph, or undirected graph, is an ordered pair

G=(V,E), where V is the set of the vertices, or nodes,and E is the set of unordered pairs of distinct

vertices, called edges or lines

For each edge {u,v}, the nodes u and v are said to be

adjacent

We have a directed graph, or digraph, if E is a set of

ordered pairs

In digraphs, the indegree, kin, (outdegree, kout) of a

node is the number of edges incident to (from) thatnode

Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004


6/69


Topological Characteristics

The degree distribution ,P(k), gives the probabilitythat a selected node has exactly k links

It allows us to distinguish between different classes

of networks (see next slide)

The clustering coefficient of a node I, CI, measuresthe aggregation of its adjacents (number of

triangles passing through node I)

C(k) is the average clustering coefficient of all nodeswith k links



7/69


ErdsRnyi Random Networks

The ErdsRnyi model of a random network startswith N nodes and connects each pair of nodes withprobabilityp

The degree follows a Poisson distribution, thus manynodes have the same number of links (close to the

average degree

The tail decreases exponentially, which indicates thatnodes with kvery different from the average are rare

The clustering coefficient is independent of a nodesdegree



8/69


ScaleFree Networks

Scalefree networks are characterized by a powerlawdegree distribution

The probability that a node hask

links followsP(k)~k-

,where is the degree exponent

The probability that a node is highly connected isstatistically more significant than in a random graph

In the BarabsiAlbert model, at each time point a nodewith M links is added to the network, which connects toan already existing node I with probabilityi=ki/jkj

The underlying mechanism is that nodes with many linkshave higher probability of getting more (this is alsoreferred to aspreferential attachment)



9/69


Hierarchical Networks

A hierarchical structure arises in systems that combinemodularity and scalefree topology

The hierarchical model is based on the replication of a

small cluster of four nodes (the central ones)

The external nodes of the replicas are linked to thecentral node of the original cluster

The resulting network has a powerlaw degreedistribution, thus it is scalefree

The average clustering coefficient scales with the degree

followingC(k )~k-1



10/69


Graphs of Biological Networks

Depending on the kind of biological network, the edges and nodes of thegraph have different meaning

Metabolic network

nodes: metabolic product, edge: a reaction transforming A into B

Transcriptional regulation network (proteinDNA)

nodes: genes and proteins, edge: a TF regulates a gene

Protein protein network

nodes: proteins, edge: interaction between proteins

Gene regulatory networks (functional association network)

nodes: genes, edge: expressions of A and B are correlated


11/69


Topology of Biological Networks

An extensive commentary has been published by Albert in 2005, reviewingliterature on the topology of different kinds of biological networks

Experimental evidences are reviewed for metabolic, transcriptionalregulatory, signal transduction, functional association networks

All of the considered networks approximately exhibit powerlaw degree

distribution, at least for the in or for the outdegree

For instance, transcriptional regulation networks exhibit a scalefree outdegree distribution, signifying the potential of transcription factors toregulate multiple targets

On the other hand, their indegree is a more restricted exponentialfunction, suggesting that combinatorial regulation by several TFs is lessfrequent

Albert, Scalefree networks in cell biology, Journal of Cell

Science 118(21), 49474957, 2005


12/69


PP Interaction Network in Yeast

This network is based on yeast twohybrid

experiments

Few highly connected nodes (hubs) hold

the network together


The color of a node indicates thephenotypic effect deriving from

removing the correspondingprotein

red: lethal

green: nonlethal

orange: slow growth

yellow: unknown


13/69


Outline






14/69


Metabolic Reactions

Living cells require energy and material for

building up membranes

storing molecules

replenishing enzymes

replication and repair of DNA

movement

Metabolic reactions can be divided in two categories

Catabolic reactions: breakdown of complex compounds to get energy

and building blocksAnabolic reactions: assembling of the compounds used by the cellular

mechanisms


15/69


Basic Concepts of Metabolism

Historically metabolism is the part of cell functioning that has been studiedmore thoroughly during the last decades

This implies that several well assessed mathematical tools exist for

describing this kind of networks

Enzyme kinetics investigates the dynamic properties of the individual

reactions in isolation

Stoichiometric analysis deals with the balance of compound productionand degradation at the network level

Metabolic control analysis describes the effect of perturbations in the

network, in terms of changes of metabolites concentrations

Most of the tools used in the quantitative study of metabolic networks canalso be applied to other types of networks


16/69


Glycolysis

We will exploit the casestudy of glycolysis in yeast in order to illustrate the

theoretical concepts introduced hereafter

The pathway shown below is part of the glycolysis process

Hynne et al,Fullscale model of glycolysis in Saccharomyces cerevisiae (2001) Biophys. Chem. 94, 121163

v1: hexokinase

v2: consumption of glucose6phosphatein other pathways

v3: phosphoglucoisomerase

v4: phosphofructokinase

v5: aldolase

v6: ATP production in lower glycolysis v7: ATP consumption in other pathways v8: adenylate kinase

List of Reactions


17/69


ODE Model of Glycolysis

The system of ODEs describing the pathway is


18/69


ODE Model with Constant Glucose

The kinetic rates as functions ofreactants can be derived by applying themodels presented in the previous lecture

Model Parameters


19/69


Stoichiometric Analysis

The basic elements considered in stoichiometric analysis of metabolicnetworks are

The concentrations of the various species

The reactions or transport processes affecting such concentrations

The stoichiometric coefficients denote the proportion of substrate and

product molecules involved in a reaction

For instance, if we consider the reaction

the stoichiometric coefficients ofS1, S2, P are 1,1,-2 respectively


20/69


Stoichiometric Analysis

The change of concentrations in time can be described by means of ODEs

For the simple reaction above we have

This means that the degradation ofS1with rate v is accompanied by thedegradation ofS2with the same rate and by the production of P with adouble rate


21/69


Stoichiometric Matrix

In general, for a system ofm substances and rreactions, the systemdynamics are described by

The number nij is the stoichiometric coefficient of the i-th metabolite in the

j-th reaction

For the sake of simplicity, we assume that the changes of concentrations areonly due to reactions (i.e. we neglect the effect of convection or diffusion)

We can then define the stoichiometric matrix

in which columns correspond to reactions and rows to concentrationvariations


22/69


Stoichiometric Model

The mathematical description of the metabolic network can be given inmatrix form as

where

S=(S1,,Sm)T is the vector of concentration values

v=(v1,,vr)T

is the vector of reaction rates

If the system is at steadystate (that is dSi/dt= 0 for i=1,,m) we can alsodefine the vector of steadystate fluxes, J=(J1,,Jr)

T

Finally, the model involves a certain number of parameters, thus we candefine also a parameter vector, p=(p1,,p)

T


23/69


Stoichiometric Model of Glycolysis

For the glycolysis model we have


24/69


Analysis of the Stoichiometric Matrix

A relevant information that can be readily derived from the Nmatrix iswhich combinations of individual fluxes are possible at steadystate

The system of algebraic eqs admits a nontrivial solution only ifrank(N)


25/69


An Example

Let us consider the simple network

The stoichiometric matrix is N=(1 1 1)

and the steadystate fluxes are described by the linear combination


26/69


Null Rates at SteadyState

For the glycolysis model we have r=8 and rank(N)=5, thus the base of thenull space ofN is composed of three vectors

Note that the entries in the last row are all zero; this means that the net ratefor that reaction is null at steadystate

Hence, at steadystate we can neglect the reaction v8


27/69


Unbranched Pathways

Another property that can be readily derived is the presence of unbranchedpathways

In this case, the net rate of all the reactions in the pathway must be equal

The entries for the second and third reaction

in the matrix Kare always equal

This implies that the fluxes through reactions2 and 3 must be equal at steadystate


28/69


Elementary Flux Modes

A pathway can be defined as a set of metabolic reactions linked by commonmetabolites

It is not straightforward to recognize pathways in metabolic maps that have

been reconstructed from experimental evidences

This problem is formalized in the concept of finding the Elementary FluxModes (EFMs)

The aim is to find which are the admissible direct routes for producing acertain metabolite starting from another one

In order to have an idea of the usefulness of such mathematical methods,

we can have a glimpse at a typical wholeorganismscale metabolic network


29/69


Metabolic Network in Yeast

Palsson, Systems Biology: Properties of Reconstructed Networks, 2006


30/69


Elementary Flux Modes

Without going into the mathematical details, we can have a further insightby looking at the elementary flux modes of two simple networks

A factor that greatly influences the EFMs is the reversibility of the singlereactions


31/69


Applications of EFM Analysis

EFMs can be used to

infer the range of metabolic pathways in the network

test a set of enzymes for production of a desired compound, and to findthe most convenient pathway

reconstruct metabolism from annotated genome sequences and analyzethe effects of enzyme deficiency

reduce drug effects and identify drug targets


32/69


Flux Balance Analysis

Flux Balance Analysis (FBA) deals with the problem of finding theoperative modes of metabolic networks subject to three kinds of constraints

1) The operative mode is assumed to be at steadystate

2) The operative mode must respect the (ir)reversibility of the reactions

3) The enzyme catalytic activity in each reaction is limited to an

admissible range, i.e. i vii

Additional constraints may be imposed by biomass composition or otherexternal conditions


33/69


Flux Balance Constrained Optimization

Such constraints confine the steadystate fluxes to a feasible set, but usuallydo not yield a unique solution

Hence, the determination of a particular metabolic flux distribution can be

cast as a linear optimization problem

Maximize an objective function

subject to the constraints given above


34/69


Conservation Relations

If a substance is neither added nor removed from the reaction system, itstotal concentration remains constant

This property can be derived by analyzing the null space ofNT, defined by

the matrix G such that

The latter implies

The dimension of the null space is m-rank(N)

GS= GNv = 0 GS= const

GN = 0

C i i Gl l i


35/69


Conservation in Glycolysis

For the glycolysis example we have

which means the sum of concentrations of AMP, ADP, ATP remainsconstant

The conservation relations can be used to simplified the dynamical model,by exploiting the algebraic equations that express the conservationconstraints to express some variables as functions of the others

M t b li C t l A l i


36/69


Metabolic Control Analysis

Metabolic Control Analysis (MCA) deals with the sensitivity of the steadystate properties of the network to small parameter changes

It can be also applied to models of other kinds of network, like signaling

pathways or gene expression

Issues addressed by MCA

Predict properties of the network from knowledge of individual

components

Find which specific step has the greatest influence on a flux or steadystate concentration or reaction rate

Find which is the best target reaction to treat a metabolic disorderThese questions are very relevant in biotechnological production

processes and health care

B i C t f MCA


37/69


Basic Concepts of MCA

The relations between steadystate properties and model parameters areusually highly nonlinear

There is no general theory predicting the effect of large parameter changes

The MCA approach deals with small parameter changes

Under this assumption, the model can be approximated, in theneighborhood of the steadystate, with a linear one

Given the linearized model it is possible to derive some indexes describingthe properties above mentioned, e.g. elasticity coefficients, controlcoefficients, response coefficients

O tline


38/69


Outline





Gene Regulatory Networks


39/69


Gene Regulatory Networks

A protein synthesized from a gene can serve as a transcription factor foranother gene, as an enzyme catalyzing a metabolic reaction, or as acomponent of a signal transduction pathway

Apart from DNA transcription regulation, gene expression may becontrolled during RNA processing and transport, RNA translation, and the

posttranslational modification of proteins

Therefore, gene regulatory networks (GRNs) involve interactions betweenDNA, RNA, proteins and other molecules

A suitable way to dominate this complexity may consist of using functionalassociation networks

In this networks the edges of the corresponding graph do not representchemical interactions, but functional influences of one gene on the other

Example of a GRN


40/69


Example of a GRN

A toy regulatory network of three genes is depicted in the cartoon below

De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000

Modeling GRNs


41/69


Modeling GRNs

In what follows we will present an overview of the models used to describeGRNs

Two main issues have to be taken into account when choosing a modeling

framework

Computational requirements for simulation

Available methods for inferring the network topology

Bayesian Networks


42/69


Bayesian Networks

In the formalism of Bayesian Networks, the structure of a genetic regulatory

system is modeled by a directed acyclic graph G=V,E

The vertices iV, i=1,,n, represent genes expression levels and

correspond to random variablesXi.

For eachXi, a conditional distribution p(Xi |parents(Xi)) is defined, whereparents(Xi) denotes the direct regulators ofi

The graph G and the set of conditional distributions uniquely specify a jointprobability distributionp(X)

Independency in BN


43/69


Independency in BN

IfXi is independent ofYgiven Z, where Yand Z are set of variables, we canstate a conditional independency

For every node i in G,

Hence, the joint probability distribution can be decomposed into

i (Xi;Y|Z)

i (Xi;non descendant(Xi)|parents(Xi))

p(X) =nY

i=1

p(Xi|parents(Xi))

Example of BN


44/69


Example of BN

Here we illustrate the formulation of the BN model for a simple network

Two graphs are said to be equivalent if the imply the same set of

independencies; they cannot be distinguished by observation on X

De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000

Features of BNs


45/69


Features of BNs

There is no need to specify a single value for each parameter of the model,but rather a distribution over the admissible range of values is assigned

This characteristic helps in avoiding overfitting, which is common in the

presence of a small data set and a large number of parameters

It is a statistical modeling approach, which nicely fits the stochastic natureof biological systems

BNs are static models, although it is possible to take into account dynamicalaspects through an extension of this theory, namely dynamical bayesiannetworks (DBNs)

Boolean Networks


46/69


Boolean Networks

In the framework of Boolean Networks , the expression level of a gene canattain only two values, that is active (on, 1) or inactive (off, 0)

Accordingly, the interactions between elements of the network are

represented by Boolean functions

Smolen, Baxter, Byrne, Mathematical model of gene networks, Neuron 26, 567 580, 2000

Features of Boolean Networks


47/69


Features of Boolean Networks

Deterministic description

Very easy to build the model and to simulate it, even for very large networks

They provide only a coarsegrained description of the network behavior,thus not useful for a more detailed analysis of the regulatory mechanisms

ODE Models


48/69


ODE Models

We have seen that the mechanistic ODE approach has been widelyexploited since the beginning of the last century for modeling biochemicalreactions

When the order of the system increases, classical nonlinear ODE modelsbecome hardly tractable, in terms of parametric analysis, numerical

simulation and especially for identification purposes

In order to overcome this limitations, alternative modeling approaches havebeen devised for application to biological networks

PowerLaw Models


49/69


The basic concept underlying powerlaw models is the approximation ofclassical ODE models by means of a uniform mathematical structure

SSystems


50/69


y

Ssystems are a particular class of powerlaw models in which fluxes areaggregated

( ) ( ) ( )===

n

j

h

ji

n

j

g

jii jiji tXtX

dt

tdX

11

,,,

Features of S - Systems


51/69


y

Ssystems feature low computational requirements

Their structural homogeneity allows to easily identify the model parameters

from steadystate data by means of logarithmic linearization

Generalized aggregation may introduce a loss of accuracy

Violation of biochemical fluxes concentration

It may conceal important structural features of the network

PiecewiseLinear Models


52/69


Another class of approximate models based on ODEs is that of piecewise-linear (PWL) models

The basic idea is to approximate sigmoidal curves through step functions

The model takes the general form

where

and the functions bil() are boolean valued regulation functions expressed in

terms of step functions

Casey, De Jong, Gouz, J. Math. Biol. 52, 2756, 2006

Features of PWL Models


53/69


Numerical simulation studies have shown that PWL models properlyapproximate the behavior of the corresponding original nonlinear ones

A drawback of this class of systems is that their behavior is very difficult to

analyze from a rigorous point of view

PWL models, indeed, can exhibit singular steadystates, that isequilibrium points lying on the threshold surfaces

Moreover it is known that the stability ofswitching systems cannot be reduced to theanalysis of the stability of the linear systemsin each sub-space

Outline


54/69






Inferring Bayesian Networks


55/69


In order to reverse engineering a Bayesian network model of a genenetwork, we must find the directed acyclic graph that best describes the data

To do this, a scoring function is chosen, in order to evaluate the candidate

graphs Gwith respect to the data setD

The score can be defined using Bayes rule

If the topology of the network is partially known, the a prioriknowledge can

be included in P(G)

The most popular scores are the Bayesian Information Criterion (BIC) orBayesian Dirichlet equivalence (BDe)

They incorporate a penalty for complexity to cope with overfitting

P(G|D) =P(D|G)P(G)

P(D)

Inferring Bayesian Networks


56/69


The evaluation of all possible networks involves checking all possiblecombinations of interactions among the nodes

This problem is NP-hard, therefore heuristic methods are used, like the

greedyhill climbing approach, the MarkovChain Monte Carlo method, orSimulated Annealing

A software tool for inferring both BNs and DBNs is Banjo, developed bythe group of Hartemink(http://www.cs.duke.edu/~amink/software/banjo)

Yu et al,Advances to bayesian network inference for generating causal networks fromobservational biological data, Bioinformatics 20: 3594-3603, 2004

InformationTheoretic Approaches


57/69


Information theoretic approaches use a generalization of the Pearsoncorrelation coefficient

used in hierarchical clustering, namely the Mutual Information (MI), whichis computed as

where the marginal and joint entropy are defined, respectively, as

H(X) = XxX

p(x)logp(x)

H(X, Y) = X

xX,yY

p(x, y)logp(x, y)

MI(X;Y) = H(X) + H(Y)H(X, Y)

InformationTheoretic Approaches


58/69


From the definitions above it follows that

MI becomes zero if the two variables are statistically independent

A high value of MI indicates that the variables are nonrandomly

associated to each other

MIij=MIji therefore the resulting reconstructed graph is undirected

An important characteristic is that, since the approach is based on the

independence of samples, it is not suitable for application to timeseries (itcan applied only to steadystate data sets)

A software tool based on MutualInformation theory is ARACNE,

described inBasso et al, Reverse engineering of regulatory networks in human B cells, Nature Genetics37(4): 382-90, 2005

Inference of ODE Models


59/69


The identification of the structure and parameters of mechanistic nonlinearODE models is a very demanding task for nontrivial networks, both froma theoretical point of view and in terms of computational requirements

A feasible approach is based on the use of linearized dynamical models,which yield good results when applied to data sets obtained through

perturbation experiments

Several methods have been developed from the groups of Gardner and diBernardo, dealing both with steadystate (NIR, MNI) and timeseries data(TSNI)

TimeSeries Network Identification


60/69


The TSNI algorithm is based on the linearized model

The data set consists of the expression level of N genes, sampled at M time

points with a fixed sampling interval

The experimental data are derived from perturbation experiments (e.g. by

treatment with a compound or gene overexpression/downregulation)

A linear regression algorithm is used to estimate the coefficients of thedynamical matrix, aij, and those of the input matrix, bi

A non-zero coefficient aij indicates an edge in the (directed) graph, betweennodes i andj, whereas a nonzero bij indicates that the node i is directlyaffected by the perturbation

i = 1, . . . , N

k = 1, . . . ,M

Bansal, Della Gatta, di Bernardo, Bioinformatics 22: 815822

Features of TSNI


61/69


For small networks (tens of genes), TSNI is able to correctly infer thenetwork structure

Besides topological inference, ODE-based methods are also wellsuited for

uncovering unknown targets of perturbations, even in complex networks

It is not possible to exploit prior knowledge about the network topology,because this would require the exact knowledge of nonphysical parameters

LMI-based Inference Approach


62/69


The basic idea is improving linear ODEbased methods by exploitingavailable prior knowledge about the network topology (as in BNs)

The identification of the parameters aij, bij, is cast as a convex optimization

problem, in the form of linear matrix inequalities (LMIs)

This formulation allows to reduce the admissible solution space by assigningsign constraints to the coefficients corresponding to known interactions

x1 x2

x3 x4???x4

??>x3

?x2

???x1

x4x3x2x1

Cosentino et al, IET Systems Biology 1(3): 164173, 2007

activation

inhibition

Features of the LMI-based Approach


63/69


Numerical tests show that exploitation of prior knowledge greatly improvesthe reconstruction performances

The method can exploit qualitative a prioriknowledge, as well as quantitative

information

Such knowledge is exploited within the reconstruction, not for a posteriorievaluation

The optimization problem is convex, therefore the optimal solution, interms of data-interpolation, can be always found

The latter feature, on other hand, implies a higher tendency to overfitting

Hard to apply to largescale networks (more than 100 nodes), due to thecomputational load deriving from the high number of constraints

Choice of the Inference Algorithms


64/69


In a recent study, Bansal et al have compared the performance obtainedusing different modeling formalisms (BNs, MI, hierarchical clustering,ODE-based models)

Bansal et al,How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007

Results on Experimental Data Sets


65/69


Bansal et al,How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007

Results Discussion


66/69


The different techniques considered in the review infer networks thatoverlap for only 10% in the best case

Furthermore, the edges predicted by more than one method are not more

accurate than those inferred by a single one

On the other hand, taking the union of the interactions found by all themethods would yield an even larger number of false positives

Local perturbation experiments (i.e. affecting one or few genes) seems toyield better results than global ones (perturbations on a high number ofgenes)

Remarks on Inference Algorithms


67/69


A relevant issue, that is common to all inference algorithm, is that theproblem is very often overdetermined

All modeling formalisms, indeed, involve a large number of parameters,

whereas the number of samples is usually limited (curse of dimensionality)

Possible solutions

Devise methods to exploit different data sets

Reduce the dimensionality of the problem, via data preprocessing, e.g.

clustering algorithm

elimination of statistically nonexpressed nodes

Concluding Remarks


68/69


Regardless to the adopted formalism, good inference performances can beachieved only by exploiting the available prior knowledge from biologicalliterature

Despite the great concern about the topological characterization ofbiological networks, much has still to be done in terms of exploitation of

such features in the inference process

Several other approaches exist, both for modeling and inferring biological

networks (discrete events, formal languages, machine learning methods, etc.)

References


69/69


Klipp et al, Systems Biology in Practice, Wiley-VCH, 2005

Palsson, Systems Biology: Properties of Reconstructed Networks, Cambridge University Press, 2006

Barabasi, Oltvai,Network Biology: Understanding the Cells Functional Organization, Nature Review

Genetics 101(5), 101114 , 2004

Hynne et al, Fullscale model of glycolysis in Saccharomyces cerevisiae(2001) Biophys. Chem. 94, 121163

De Jong,Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000

Smolen, Baxter, Byrne,Mathematical model of gene networks, Neuron 26, 567 580, 2000

Casey et al, Piecewise linear Models of Genetic Regulatory Networks, Equilibria and their Stability, J. Math.

Biol. 52, 2756, 2006

Bansal et al, Inference of gene regulatory networks and compound mode of action from time course gene

expression profiles, Bioinformatics 22: 815822

Bansal et al, How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007

Cosentino et al, Linear Matrix Inequalities Approach to Reconstruction of Biological Networks, IETSystems Biology 1(3): 164173, 2007

modeling biological networks lecture3

Documents