18. lecture ws 2004/05bioinformatics iii1 v18 metabolic networks - introduction different levels for...

18. Lecture WS 2004/05

Bioinformatics III 1

V18 Metabolic Networks - Introduction

Different levels for describing metabolic networks by computational methods:

- classical biochemical pathways (glycolysis, TCA cycle, ...

- stoichiometric modelling (flux balance analysis): theoretical capabilities of an

integrated cellular process, feasible metabolic flux distributions

- automatic decomposition of metabolic networks

(elementary nodes, extreme pathways ...)

- kinetic modelling (E-Cell ...) problem: general lack of kinetic information

on the dynamics and regulation of cellular metabolism



KEGG database

The KEGG PATHWAY

database (http://www.genome

. jp/kegg/pathway.html) is a

collection of graphical

diagrams (KEGG pathway

maps) representing molecular

interaction networks in various

cellular processes.

Each reference pathway is

manually drawn and updated

with the notation shown left.

Organism-specific pathways

(green-colored pathways) are

computationally generated

based on the KO assignment

in individual genomes.

http://www.genome.jp/





http://www.genome.jp/kegg/document/help_bget_ko.html



Citrate Cycle (TCA cycle) in E.coli



Citrate Cycle (TCA cycle)

Citrate cycle (TCA cycle) - Escherichia coli K-12 MG1655 Citrate cycle (TCA cycle) - Helicobacter pylori 26695



EcoCyc Database

E.coli genome contains 4.7 million DNA bases.

How can we characterize the functional complement of E.coli and according to

what criteria can we compare the biochemical networks of two organisms?

EcoCyc contains the metabolic map of E.coli defined as the set of all known

pathways, reactions and enzymes of E.coli small-molecule metabolism.

Analyze

- the connectivity relationships of the metabolic network

- its partitioning into pathways

- enzyme activation and inhibition

- repetition and multiplicity of elements such as enzymes, reactions, and substrates.

Ouzonis, Karp, Genome Res. 10, 568 (2000)



EcoCyc Analysis of E.coli Metabolism

E.coli genome contains 4391 predicted genes, of which 4288 code for proteins.

676 of these genes form 607 enzymes of E.coli small-molecule metabolism.

Of those enzymes, 311 are protein complexes, 296 are monomers.

Organization of protein complexes. Distribution of subunit counts for all EcoCyc protein complexes. The predominance of monomers, dimers, and tetramers is obvious




ReactionsEcoCyc describes 905 metabolic reactions that are catalyzed by E. coli.

Of these reactions, 161 are not involved in small-molecule metabolism,

e.g. they participate in macromolecule metabolism such as DNA replication and

tRNA charging.

Of the remaining 744 reactions, 569 have been assigned to at least one pathway.

The next figures show an overview diagram of E. coli metabolism. Each node in

the diagram represents a single metabolite whose chemical class is encoded by

the shape of the node. Each blue line represents a single bioreaction. The white

lines connect multiple occurrences of the same metabolite in the diagram.




Reactions

(A) This version of the overview shows all interconnections between occurren-ces of the same metabolite to communicate the complexity of the interconnections in the metabolic network.




Reactions


(B) In this version many of the meta-bolite interconnec-tions have been removed to simplify the diagram; those reaction steps for which an enzyme that catalyzes the reaction is known to have a physiologi-cally relevant acti-vator or inhibitor are highlighted.



Reactions

The number of reactions (744) and the number of enzymes (607) differ ...

WHY??

(1) there is no one-to-one mapping between enzymes and reactions –

some enzymes catalyze multiple reactions, and some reactions are catalyzed

by multiple enzymes.

(2) for some reactions known to be catalyzed by E.coli, the enzyme has not yet

been identified.




Assignment of EC numbers

Of the 3399 reactions defined in the ENZYME database (version 22.0),

604 occur in E.coli.

This means that the remaining 301 reactions of E.coli do not have assigned EC

numbers.

The number of EC class reactions present in E. coli against the total number of EC reaction types. The blue bars signify the percent contribution of each class for all known reactions in E. coli; the green bars signify the percent coverage of the EC classes in the known reactions in EcoCyc.

Due to the apparently finer classification of classes 1-3, the two measures display an inverse relationship: More reactions in E. coli belong to classes 1-3, although they represent a smaller percentage of reactions listed in the EC hierarchy. Ouzonis, Karp, Genome Res. 10, 568 (2000)



Compounds

The 744 reactions of E.coli small-molecule metabolism involve a total of 791

different substrates.

On average, each reaction contains 4.0 substrates.

Number of reactions containing varying numbers of substrates (reactants plus products).





Each distinct substrate occurs in an average of 2.1 reactions.

Compounds



Pathways

EcoCyc describes 131 pathways:

energy metabolism

nucleotide and amino acid biosynthesis

secondary metabolism

Pathways vary in length from a

single reaction step to 16 steps

with an average of 5.4 steps.

Length distribution of EcoCyc pathways




Pathways

However, there is no precise

biological definition of a pathway.

The partitioning of the metabolic

network into pathways (including

the well-known examples of

biochemical pathways) is

somehow arbitrary.

These decisions of course also

affect the distribution of pathway

lengths.




Combine Properties of Basic Entitites

The power of EcoCyc is the abilitity to formulate queries based on relationships

between the basic entities.

Some necessary computational definitions:

A pathway P is a pathway of small-molecule metabolism if it is both

(1) not a signal-transduction pathway, and

(2) not a super-pathway (connected set of small pathways).

A reaction R is a reaction of small-molecule metabolism if either

(1) R is a member of a pathway of small-molecule metabolism, or

(2) none of the substrates of R are macromolecules such as proteins, tRNA or

DNA, and R is not a transport reaction.




Combine Properties of Basic Entitites

More computational definitions:

An enzyme E is one of small-molecule metabolism if

E catalyzes a reaction of small-molecule metabolism.

A side reaction is a reaction substrate that is not a main compound.

A substrate is a reaction product or reactant.

Many other metabolic modelling approaches use special rule sets

( computer science, languages).




Enzyme Modulation

An enzymatic reaction is a type of EcoCyc object that represents the pairing

of an enzyme with a reaction catalyzed by that enzyme.

EcoCyc contains extensive information on the modulation of E.coli enzymes with

respect to particular reactions:

- activators and inhibitors of the enzyme,

- cofactors required by the enzyme

- alternative substrates that the enzyme will accept.

Of the 805 enzymatic-reaction objects within EcoCyc, physiologically relevant

activators are known for 22, physiologically relevant inhibitors are known for 80.

327 (almost half) require a cofactor or prosthetic group.




Enzyme Modulation




Protein Subunits

A unique property of EcoCyc is that it explicitly encodes the subunit organization of

proteins.

Therefore, one can ask questions such as:

Are protein subunits encoded by neighboring genes?

Interestingly, this is the case for > 80% of known heteromeric enzymes.




Reactions Catalyzed by More Than one Enzyme

Diagram showing the number of reactions

that are catalyzed by one or more enzymes.

Most reactions are catalyzed by one enzyme,

some by two, and very few by more than two

enzymes.

For 84 reactions, the corresponding enzyme is not yet encoded in EcoCyc.

What may be the reasons for isozyme redundancy?

(2) the reaction is easily „invented“; therefore, there is more than one protein family

that is independently able to perform the catalysis (convergence).

(1) the enzymes that catalyze the same reaction are homologs and have

duplicated (or were obtained by horizontal gene transfer),

acquiring some specificity but retaining the same mechanism (divergence)




Enzymes that catalyze more than one reaction

Genome predictions usually assign a single enzymatic function.

However, E.coli is known to contain many multifunctional enzymes.

Of the 607 E.coli enzymes, 100 are multifunctional, either having the same active

site and different substrate specificities or different active sites.

Number of enzymes that catalyze one or

more reactions. Most enzymes catalyze

one reaction; some are multifunctional.

The enzymes that catalyze 7 and 9 reactions are purine nucleoside phosphorylase

and nucleoside diphosphate kinase.

Take-home message: The high proportion of multifunctional enzymes implies that

the genome projects significantly underpredict multifunctional enzymes!




Reactions participating in more than one pathway

The 99 reactions belonging to multiple

pathways appear to be the intersection

points in the complex network of chemical

processes in the cell.

E.g. the reaction present in 6 pathways corresponds to the reaction catalyzed by

malate dehydrogenase, a central enzyme in cellular metabolism.

Ouzonis, Karp,

Genome Res. 10, 568 (2000)



Implications of EcoCyc Analysis

Although 30% of E.coli genes remain unidentified, enzymes are the best studied

and easily identifiable class of proteins.

Therefore, few new enzymes can be expected to be discovered.

The metabolic map presented may be 90% complete.

Implication for metabolic maps derived from automatic genome annotation:

automatic annotation does generally not identify multifunctional proteins.

The network complexity may therefore be underestimated.

EcoCyc results often cannot be obtained from protein or nucleic acid sequence

databases because they store protein functions using text descriptions.

E.g. sequence databases don‘t include precise information about subunit

organization of proteins.




Large-scale structure: Metabolic networks are scale-free Attributes of generic network structures.

a, Representative structure of the

network generated by the Erdös–Rényi

network model. b, The network

connectivity can be characterized by the

probability, P(k), that a node has k links.

For a random network P(k) peaks

strongly at k = <k> and decays

exponentially for large k (i.e., P(k) e-k

for k >> <k> and k << <k> ).

c, In the scale-free network most nodes

have only a few links, but a few nodes,

called hubs (dark), have a very large

number of links.

d, P(k) for a scale-free network has no

well-defined peak, and for large k it

decays as a power-law, P(k) k-,

appearing as a straight line with slope -

on a log–log plot. Jeong et al. Nature 407, 651 (2000)



Connectivity distributions P(k) for substrates

a, Archaeoglobus fulgidus (archae);

b, E. coli (bacterium);

c, Caenorhabditis elegans (eukaryote),

shown on a log–log plot, counting

separately the incoming (In) and

outgoing links (Out) for each substrate.

kin (kout) corresponds to the number of

reactions in which a substrate

participates as a product (educt).

d, The connectivity distribution

averaged over 43 organisms.

Jeong et al. Nature 407, 651 (2000)



Properties of metabolic networks

a, The histogram of the biochemical pathway

lengths, l, in E. coli.

b, The average path length (diameter) for

each of the 43 organisms.

c, d, Average number of incoming links (c) or

outgoing links (d) per node for each

organism.

e, The effect of substrate removal on the

metabolic network diameter of E. coli. In the

top curve (red) the most connected substrates

are removed first. In the bottom curve (green)

nodes are removed randomly. M = 60

corresponds to 8% of the total number of

substrates in found in E. coli.

The horizontal axis in b– d denotes the

number of nodes in each organism. b–d,

Archaea (magenta), bacteria (green) and

eukaryotes (blue) are shown. Jeong et al. Nature 407, 651 (2000)



Conclusions about large-scale structure

In a cell or microorganism, the processes that generate mass, energy, information

transfer and cell-fate specification are seamlessly integrated through a complex

network of cellular constituents and reactions.

A systematic comparative mathematical analysis of the metabolic networks of 43

organisms representing all 3 domains of life showed that, despite significant

variation in their individual constituents and pathways, these metabolic networks

have the same topological scaling properties and show striking similarities to the

inherent organization of complex non-biological systems.

This may indicate that metabolic organization is not only identical for all living

organisms, but also complies with the design principles of robust and error-tolerant

scale-free networks, and may represent a common blueprint for the large-scale

organization of interactions among all cellular constituents.

Jeong et al. Nature 407, 651 (2000)



Development of the network-based pathway paradigm

Papin et al. TIBS 28, 250 (2003)

(a) With advanced biochemical tech-

niques, years of research have led to the

precise characterization of individual

reactions. As a result, the complete

stoichiometries of many metabolic

reactions have been characterized.

(b) Most of these reactions have been

grouped into `traditional pathways' (e.g.

glycolysis) that do not account for

cofactors and byproducts in a way that

lends itself to a mathematical description.

However, with sequenced and annotated

genomes, models can be made that

account for many metabolic reactions in

an organism.

(c) Subsequently, network-based,

mathematically defined pathways

can be analyzed that account for a

complete network (black and gray

arrows correspond to active and

inactive reactions).



Stoichiometric matrix

Stoichiometric matrix:

A matrix with reaction stochio-

metries as columns and

metabolite participations as

rows.

The stochiometric matrix is an

important part of the in silico

model.

With the matrix, the methods of

extreme pathway and

elementary mode analyses can

be used to generate a unique

set of pathways P1, P2, and P3

(see future lecture).

Papin et al. TIBS 28, 250 (2003)



Flux balancingAny chemical reaction requires mass conservation.

Therefore one may analyze metabolic systems by requiring mass conservation.

Only required: knowledge about stoichiometry of metabolic pathways and

metabolic demands

For each metabolite:

Under steady-state conditions, the mass balance constraints in a metabolic

network can be represented mathematically by the matrix equation:

S · v = 0

where the matrix S is the m n stoichiometric matrix,

m = the number of metabolites and n = the number of reactions in the network.

The vector v represents all fluxes in the metabolic network, including the internal

fluxes, transport fluxes and the growth flux.

)( dtransporteuseddegradeddsynthesizei

i VVVVdt

dXv



Flux balance analysis

Since the number of metabolites is generally smaller than the number of reactions

(m < n) the flux-balance equation is typically underdetermined.

Therefore there are generally multiple feasible flux distributions that satisfy the mass

balance constraints.

The set of solutions are confined to the nullspace of matrix S.

To find the „true“ biological flux in cells ( e.g. Heinzle, Huber, UdS) one needs

additional (experimental) information,

or one may impose constraints

on the magnitude of each individual metabolic flux.

The intersection of the nullspace and the region defined by those linear inequalities

defines a region in flux space = the feasible set of fluxes.

iii v



Feasible solution set for a metabolic reaction network

(A) The steady-state operation of the metabolic network is restricted to the region

within a cone, defined as the feasible set. The feasible set contains all flux vectors

that satisfy the physicochemical constrains. Thus, the feasible set defines the

capabilities of the metabolic network. All feasible metabolic flux distributions lie

within the feasible set, and

(B) in the limiting case, where all constraints on the metabolic network are known,

such as the enzyme kinetics and gene regulation, the feasible set may be reduced

to a single point. This single point must lie within the feasible set.

Edwards & Palsson PNAS 97, 5528 (2000)



E.coli in silico

Edwards & Palsson

PNAS 97, 5528 (2000)

Best studied cellular system: E. coli.

In 2000, Edwards & Palsson constructed an in silico representation of E.coli

metabolism. Involves lot of manual work!

- genome of E.coli MG1655 is completely sequenced,

- biochemical literature, genomic information, metabolic databases EcoCyc,

KEGG.

Because of long history of E.coli research, there is biochemical or genetic

evidence for every metabolic reaction included in the in silico representation,

and in most cases, there exists both.



Edwards & Palsson

PNAS 97, 5528 (2000)

Genes included in in silico model of E.coli



E.coli in silico

Edwards & Palsson, PNAS 97, 5528 (2000)

Define i = 0 for irreversible internal fluxes,

i = - for reversible internal fluxes (use biochemical literature)

Transport fluxes for PO42-, NH3, CO2, SO4

2-, K+, Na+ was unrestrained.

For other metabolites

except for those that are able to leave the metabolic network (i.e. acetate, ethanol,

lactate, succinate, formate, pyruvate etc.)

Find particular metabolic flux distribution with feasible set by linear programming.

LP finds a solution that minimizes a particular metabolic objective (subject to the

imposed constraints) –Z where

maxii vv 0

vcii vcZ

In fact, the method finds the solution that maximizes fluxes = gives maximal

biomass.



E.coli in silico

Edwards & Palsson

PNAS 97, 5528 (2000)

Examine changes in the metabolic capabilities caused by hypothetical gene

deletions.

To simulate a gene deletion, the flux through the corresponding enzymatic

reaction was restricted to zero.

Compare optimal value of mutant (Zmutant) to the „wild-type“ objective Z

to determine the systemic effect of the gene deletion.

Z

Zmutant



Gene deletions in E. coli MG1655 central intermediary metabolism

Maximal biomass yields on glucose for all possible single gene deletions in the central

metabolic pathways (gycolysis, pentose phosphate pathway (PPP), TCA, respiration).

The results were generated in a simulated aerobic environment with glucose as the carbon

source. The transport fluxes were constrained as follows: glucose = 10 mmol/g-dry weight

(DW) per h; oxygen = 15 mmol/g-DW per h.

The maximal yields were calculated by using FBA with the objective of maximizing growth.

The biomass yields are normalized with respect to the results for the full metabolic

genotype. The yellow bars represent gene deletions that reduced the maximal biomass yield

to less than 95% of the in silico wild type.




Interpretation of gene deletion results

The essential gene products were involved in the 3-carbon stage of glycolysis, 3

reactions of the TCA cycle, and several points within the PPP.

The remainder of the central metabolic genes could be removed while E.coli in

silico maintained the potential to support cellular growth.

This suggests that a large number of the central metabolic genes can be removed

without eliminating the capability of the metabolic network to support growth under

the conditions considered.




E.coli in silico

Edwards & Palsson

PNAS 97, 5528 (2000)

+ and – means growth or no growth.

means that suppressor mutations have

been observed that allow the mutant

strain to grow.

glc: glucose, gl: glycerol, succ:

succinate, ac: acetate.

In 68 of 79 cases, the prediction is

consistent with exp. predictions.

Red and yellow circles are the predicted

mutants that eliminate or reduce growth.



Rerouting of metabolic fluxes

(Black) Flux distribution for the wild-type.

(Red) zwf- mutant. Biomass yield is 99% of

wild-type result.

(Blue) zwf- pnt- mutant. Biomass yield is 92% of

wildtype result. The solid lines represent

enzymes that are being used, with the

corresponding flux value noted.

Note how E.coli in silico circumvents removal of

one critical reaction (red arrow) by increasing

the flux through the alternative G6P P6P

reaction.




SummaryFBA analysis constructs the optimal network utilization simply using

stoichiometry of metabolic reactions and capacity constraints.

For E.coli the in silico results are consistent with experimental data.

FBA shows that in the E.coli metabolic network there are relatively few critical

gene products in central metabolism.

However, the the ability to adjust to different environments (growth conditions) may

be dimished by gene deletions.

FBA identifies „the best“ the cell can do, not how the cell actually behaves under a

given set of conditions. Here, survival was equated with growth.

FBA does not directly consider regulation or regulatory constraints on the

metabolic network. This can be treated separately (see future lecture).


18. lecture ws 2004/05bioinformatics iii1 v18 metabolic networks - introduction different levels for...

Documents

coli genome

metabolic reactions

enzymes of e

metabolic map of e

coli metabolisme

coli smallmolecule metabolism

ecocyc analysis of e

overview diagram of