a t c g t a g c a t c g a g c t c g t a a t 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 1...
TRANSCRIPT
ATC
G
TA
GCAT
CG
AGC
TC
GTA
AT
10
0010
11
0010
11
1111
01
10
1 1
010
10010
1
Instituto de BiotecnologíaUniversidad Nacional Autónoma de México
A network perspective on the evolution of metabolism by gene duplication
J. Javier Díaz-Mejía, Ernesto Pérez-Rueda & Lorenzo Segovia
NetSci 2007 NY, USA
How metabolic networks have been originated and evolve?
http://genomebiology.com/2007/8/2/R26
Gene duplication is recognized as a main source of biological variation and innovation
duplication
Malate dehydrogenase (MDH)
Enzyme Commission reactions classification1 .- Oxidoreductases 2 .- Transferases3 .- Hydrolases 4 .- Lyases5 .- Isomerases 6 .- Ligases
Lactate dehydrogenase (LDH)
EC: 1.1.1.28 EC: 1.1.1.37
NAD+ NADH NAD+ NADH
http://genomebiology.com/2007/8/2/R26
a b c dMetabolic pathway 1
“stepwise” (Horowitz, 1945)
Two pioneer models linking gene duplication and evolution of metabolism
2.3.4.5 1.2.2.1
6.3.1.1
“patchwork” (Jensen, 1976)
Metabolic pathway 2 e2.3.7.8
f4.5.6.
7
g
stepwise patchwork
distance consecutive distantly
chemistry dissimilar similar
http://genomebiology.com/2007/8/2/R26
The peptidoglycan biosynthesis, stepwise or patchwork?
UDP-N-acetylmuramoyl-L-alanyl-D-glutamate
UDP-N-acetylmuramoyl-L-alanyl-D-glutamyl-meso-2,6-diaminoheptanedioate
6.3.2.8
6.3.2.9
6.3.2.13
6.3.2.15
UDP-N-acetylmuramate
UDP-N-acetylmuramoyl-L-alanine
UDP-N-acetylmuramoyl-L-alanyl-D-glutamyl-meso-2,6-diaminoheptanedioate- D-alanyl-D-alanine
D-alanyl-D-alanine + ATP
L-alanine + ATP
D-glutamate + ATP
meso-diaminopimelate+ ATP
stepwise patchwork
distance consecutive
distantly
chemistry
dissimilar similar
The biosynthesis of peptidoglycan stepwise or patchwork?
http://genomebiology.com/2007/8/2/R26
Z-score (Zi) = (Nreali - <Nrandi>)/std(Nrandi)
Re
ac
tio
n t
yp
e 1
(E
C:a
.b.-
.-)
Reaction type 2 (EC:w.x.-.-)
The origin of several preferentially coupled reactions could be explained by both stepwise and patchwork
http://genomebiology.com/2007/8/2/R26
Question:
Whether both the distance and the chemical similarity between reactions influence the retention of duplicates?...
... forget the names of models
http://genomebiology.com/2007/8/2/R26
Methodology
E8E6
E2 a E1E1 b E4E6 c E1E1 c E6E4 d E3E7 ...
E2
E3
E1 E4
E7
Detection of duplicates comparingenzyme sequences
1880EC numbers 4500
sequences
EC:1.1.1.35
EC:1.1.1.100
EC:4.2.1.17
E2
E3
E1E4
E7E8
E6
Pair MPL
Determining theMinimal Path Length
a 1
a 2
a 4E2
E3
E4E6
E6
E8
http://genomebiology.com/2007/8/2/R26
The preferential coupling of reactions partially explains the increased retention of duplicates between closer reactions
Re
ac
tio
n t
yp
e 1
(E
C:a
.b.-
.-)
Reaction type 2 (EC:w.x.-.-)
Real NetworkNull model
(functionally similar)
Rewiring
0
5
10
15
20
25
30
35
40
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
1 2 3 4 5 6 7 8 Alldistances
0
5
10
15
20
25
30
35
40
Real networkNull models
Distance between nodes (enzymes)
ALL: all-against-all reactionsCDR: chemically dissimilar readtionsCSR: chemically ssimilar readtions
Distance between nodes (enzymes)
Ret
ent
ion
of d
upl
icat
es
(%)
0
5
10
15
20
25
30
35
40
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
AL
L
CD
R
CS
R
1 2 3 4 5 6 7 8 Alldistances
0
5
10
15
20
25
30
35
40
Real networkNull models
40
30
20
10
0
Real networkNull model
(Maslov-Sneppen )
Rewiring
http://genomebiology.com/2007/8/2/R26
The increased retention of duplicates between closer reactions is reflected in lower evolutionary distances within modules
1.- Detection of functional modules
3.- Significance of (ED) values
Z-score
> 3 2 1 0
-1 -2< -3
Protein domain content random shuffling
2.- A greater retention of duplicates between pathways implies a lower evolutionary distance (ED)
1.000.670.330.00
(ED)
http://genomebiology.com/2007/8/2/R26
Summary
http://genomebiology.com/2007/8/2/R26
• In metabolic networks the closer two reactions are, the greater the probability (~2-3 folds) that their enzymes are duplicates
This can be partially explained by the preferential biochemical coupling of reactions
This is reflected (or caused) in (by) a high retention of duplicates within modules
• Retention of duplicates between chemically similar reactions is greater (~7 folds) than between chemically dissimilar ones. In both cases the observed frequencies are, however, significantly greater than expected
• These two properties are additive. Hence, the retention of duplicates catalyzing consecutive, chemically similar reactions is ~ 35 %
• In metabolic networks the closer two reactions are, the greater the probability (~2-3 folds) that their enzymes are duplicates
This can be partially explained by the preferential biochemical coupling of reactions
This is reflected (or caused) in (by) a high retention of duplicates within modules
• Retention of duplicates between chemically similar reactions is greater (~7 folds) than between chemically dissimilar ones. In both cases the observed frequencies are, however, significantly greater than expected
• These two properties are additive. Hence, the retention of duplicates catalyzing consecutive, chemically similar reactions is ~ 35 %
A.B.c.d A.B.e.f A.B.g.h
• In metabolic networks the closer two reactions are, the greater the probability (~2-3 folds) that their enzymes are duplicates
This can be partially explained by the preferential biochemical coupling of reactions
This is reflected (or caused) in (by) a high retention of duplicates within modules
• Retention of duplicates between chemically similar reactions is greater (~7 folds) than between chemically dissimilar ones. In both cases the observed frequencies are, however, significantly greater than expected
• These two properties are additive. Hence, the retention of duplicates catalyzing consecutive, chemically similar reactions is ~ 35 % ?
• In silico modeling of the origin and evolution of metabolism is improved by the inclusion of specific functional constraints, such as the preferential biochemical coupling of reactions
• We suggest that the stepwise and patchwork models are not independent of each other: in fact, the network perspective enables us to reconcile and combine these models
Conclusions
http://genomebiology.com/2007/8/2/R26
ATC
G
TA
GCAT
CG
AGC
TC
GTA
AT
10
0010
11
0010
11
1111
01
10
1 1
010
10010
1
Acknowledgments
Lic. Gerardo May (Univ. Aut. Yucatán, México)Dr. L. Segovia’s lab (UNAM, México)Dr. Sergio Encarnación (UNAM, México)Dr. A-L Barabási’s lab (Univ. Notre Dame)Dr. Virginia Walbot (Univ. of Stanford)
Sponsors
National Science and Technology Council (México)UNAM Graduate Student Office
More detailshttp://genomebiology.com/2007/8/2/[email protected]
Shen-Orr SS et al. (2002) Nat Genet
Ret
enti
on
of
du
plic
ates
(%
)This phenomenon is characteristic of enzymatic networks
Distance between proteins(transcription factor regulated gene)
AL
LE
C-E
CP
-PA
LL
EC
-EC
P-P
AL
LE
C-E
CP
-PA
LL
EC
-EC
P-P
AL
LE
C-E
CP
-PA
LL
EC
-EC
P-P
AL
LE
C-E
CP
-PA
LL
EC
-EC
P-P
AL
LE
C-E
CP
-P
1 2 3 4 5 6 7 8 All
1 2 3 4 5 6 7 8 All
6
4
2
0
Gene transcriptional regulatory network from E. coli
Ret
enti
on
of
du
plic
ates
(%
)
Distance between proteins
10
5
0
Protein-protein interactions network from E. coli
Butland G et al. (2005) Nature
ALL: all interactionsEC-EC: enzyme-enzyme interactionsP-P: non-enzymimatic interactions
• Nodes and Edges• Minimal Path Length• ModularityMexico city’s subway network
Some basic network topological properties
murEmurFmraYftsWmurDmurGmurCddlBddlA
E. coli K12
folC
The peptidoglycan biosynthesis, stepwise or patchwork?
UDP-N-acetylmuramoyl-L-alanyl-D-glutamate
UDP-N-acetylmuramoyl-L-alanyl-D-glutamyl-meso-2,6-diaminoheptanedioate
6.3.2.8
6.3.2.9
6.3.2.13
6.3.2.15
UDP-N-acetylmuramate
UDP-N-acetylmuramoyl-L-alanine
UDP-N-acetylmuramoyl-L-alanyl-D-glutamyl-meso-2,6-diaminoheptanedioate- D-alanyl-D-alanine
D-alanyl-D-alanine + ATP
L-alanine + ATP
D-glutamate + ATP
meso-diaminopimelate+ ATP
From a network perspective traditional models stepwise Vs patchwork are conceptually flawed
EC:2.4.2.14PurF
EC:2.7.6.1PrsA
ATPAMP
5-phosphoribosylamine
L-glutamate
EC:2.4.2.22Gpt
xanthosine-5-phosphate
Pi
L-glutamine
Pi
D-ribose-5-phosphate
5-phosphoribosyl 1-pyrophosphate
H2Oxanthine
salvage pathways of guanine, xanthine, and their nucleosides5-phosphoribosyl 1-pyrophosphate biosynthesis Ipurine nucleotides de novo biosynthesis I
R |CH2
|CH2
|C=O |O-
R |CH2
|CH2
|C=O |SCoA
R | CH
||HC | C=O | SCoA
R | CHOH
| CH2
| C=O | SCoA
R | C=O
| CH2
| C=O | SCoA
CoA FAD FADH H2O NAD NADH
R (n-2)
|CH2
|CH2
|C=O |SCoA
R (n+2)
|CH2
|CH2
|C=O |S[ACP]
R | CH
||HC | C=O | S[ACP]
R | CHOH
| CH2
| C=O | S[ACP]
R | C=O
| CH2
| C=O | S[ACP]
FAD FADH H2O NADP NADPH R |CH2
|CH2
|C=O |S[ACP]
R |CH2
|CH2
|C=O |SCoA
Phospholipidsbiosynthesis
ATP synthesis
DEGRADATION
BIOSYNTHESIS
1.1.1.35
1.1.1.1004.2.1.61
4.2.1.171.3.99.3
1.3.1.9
2.3.1.166.2.1.3
6.2.1.20
CoA ACP
Acetil-CoA
Retention of duplicates as groups and single entities
Fatty acids metabolism
2.3.1.41 2.3.1.41
Both groups and single duplicates are significantly retained
E1
{
{E6
I
II
III
IV
V
}
}
}
Gene duplication No gene duplication
Re
ten
tion
of
du
plic
ate
s (%
)
Eco
Cyc
Eco
Ke
gg
Me
taC
yc
Re
fKe
gg
Eco
Cyc
Eco
Ke
gg
Me
taC
yc
Re
fKe
gg
Eco
Cyc
Eco
Ke
gg
Me
taC
yc
Re
fKe
gg
Eco
Cyc
Eco
Ke
gg
Me
taC
yc
Re
fKe
gg
Eco
Cyc
Eco
Ke
gg
Me
taC
yc
Re
fKe
gg
100
80
60
40
20
0
(I) (II) (III) (IV) (V)
E4'
E5'E5
E2 E2'
E3'E3
E4'E4
Null models generation (Maslov-Sneppen)
Real network Maslov-Sneppen model
Rewiring
New null models now include the preferential biochemical coupling of reactions
Rewiring
Real network Null model
EC:1.1.1.5
EC:1.1.4.7
EC:1.1.1.5
EC:2.1.1.1
EC:2.1.4.3
EC:3.5.4.1
EC:1.1.1.5
EC:1.1.4.7
EC:1.1.1.5
EC:2.1.1.1
EC:2.1.4.3
EC:3.5.4.1
Hub influence on gene duplicationE
nzy
me
recr
uit
men
t ra
te (
%)
Distance between nodes (enzymes)
EcoKegg EcoCyc
MetaCycRefKegg
En
zym
e re
cru
itm
ent
rate
(%
)
Distance between nodes (enzymes)
En
zym
e re
cru
itm
ent
rate
(%
)
Distance between nodes (enzymes)
En
zym
e re
cru
itm
ent
rate
(%
)
Distance between nodes (enzymes)
Metabolic networks can be represented by diverse graph types
G6P NADPH
NADP+ 6PGL
H2O
GPGR5PX5P
zwf pgl
gndrpe
compound centric enzyme centric bipartite
G6P NADPH
NADP+
6PGL H2O
GPGX5P
zwf
pglgnd
rpe R5P
Barabási y Oltvai (2004) Nat Rev Genet
• Duplication inheritance
divergence
• By this way scale free
networks have been
generated, but the potential
functionality of such networks
is not assessed Pastor-Satorras et
al, (2003) J Theor Biol
In silico models have successfully simulated the grow of networks by gene duplication
Pfeiffer, Soyer y Bonhoeffer (2005) Plos Biol
• Duplication inheritance
Divergence• Multifunctional enzymes and
transporters• Potential biomass production
• Reaction coupling better fits
connectivity properties of real
networks (existence of hubs)
In silico models have successfully simulated the grow of networks by gene duplication
Becker, Price y Palsson (2006) BMC Bioinformatics
• There are biases in the coupling of
specific metabolites
• These biases follow a power law
distribution
Metabolite coupling is significant in metabolic networks
Barabási y Oltvai (2004) Nat Rev Genet
• scale free• clustering• hierarchical
Some network emerging topological properties
Papp et al (2004) Nature
Lemke et al (2004) Bioinformatics
Essentiality and damage in metabolic networks
Clustering (C) =
Watts y Strogatz (1998) Nature
• Short distance between nodes• High clustering coefficient
The small world into large networks
random
small-world
2ni
ki(ki - 1)ni : direct edges between i neighborski : number of i neighbors
C
C = = 1
C = = 0.05
205(4)
15(4)
The analysis of biological systems from a network perspective have had a great increase in last years
Scale free Modularity
Jeong et al (2000) Nature
Ravasz et al (2002) Science
Some topological properties of metabolic networks
• small world• universality• scale free• hub elimination• modularity
+ -
- +
+ +
e
a b
c d