evolutionary dynamics of prokaryotic transcriptional ... · uncorrected proof evolutionary dynamics...
TRANSCRIPT
doi:10.1016/j.jmb.2006.02.019 J. Mol. Biol. (xxxx) xx, 1–20
ARTICLE IN PRESS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Evolutionary Dynamics of Prokaryotic TranscriptionalRegulatory Networks
M. Madan Babu1,2*, Sarah A. Teichmann1 and L. Aravind2*
7677
78
79
80
81
82
83
84
1National Center for Biotech-nology Information, NationalInstitutes of Health, MD 20894USA
2MRC Laboratory of MolecularBiology, Hills Road, CambridgeCB22QH, UK
U0022-2836/$ - see front matter q 2006 P
Abbreviations used: LSI, lifestyleE-mail addresses of the correspon
[email protected]; aravi
YJMBI 58077—25/2/2006—18:46—SATHYA—
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
TED PROOFThe structure of complex transcriptional regulatory networks has beenstudied extensively in certain model organisms. However, the evolutionarydynamics of these networks across organisms, which would revealimportant principles of adaptive regulatory changes, are poorly under-stood. We use the known transcriptional regulatory network of Escherichiacoli to analyse the conservation patterns of this network across 175prokaryotic genomes, and predict components of the regulatory networksfor these organisms. We observe that transcription factors are typically lessconserved than their target genes and evolve independently of them, withdifferent organisms evolving distinct repertoires of transcription factorsresponding to specific signals. We show that prokaryotic transcriptionalregulatory networks have evolved principally through widespreadtinkering of transcriptional interactions at the local level by embeddingorthologous genes in different types of regulatory motifs. Differenttranscription factors have emerged independently as dominant regulatoryhubs in various organisms, suggesting that they have convergentlyacquired similar network structures approximating a scale-free topology.We note that organisms with similar lifestyles across a wide phylogeneticrange tend to conserve equivalent interactions and network motifs. Thus,organism-specific optimal network designs appear to have evolved due toselection for specific transcription factors and transcriptional interactions,allowing responses to prevalent environmental stimuli. The methods forbiological network analysis introduced here can be applied generally tostudy other networks, and these predictions can be used to guide specificexperiments.
q 2006 Published by Elsevier Ltd.
Keywords: transcriptional regulatory network; evolution; network; networkmotif; regulation
106
*Corresponding authors
C 107108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
NCORREIntroduction
Of the several steps at which the flow ofinformation from a gene to its protein product iscontrolled, regulation at the transcriptional level is afundamental mechanism observed in all organisms.This form of regulation is typically mediated by aDNA-binding protein (transcription factor) thatbinds to target sites in the genome and, eithersingly or in combination with other factors,regulates the expression of one or more targetgenes. The sum total of such transcriptionalinteractions in an organism can be conceptualisedas a network, and is termed the transcriptional
ublished by Elsevier Ltd.
similarity index.ding authors:[email protected]
202231—XML – pp. 1–20 / GH
123
124
125
126
regulatory network.1–11 In such a network, nodesrepresent genes and edges represent regulatoryinteractions. Studies on the transcriptional regula-tory network at an abstract level have shown thatthey have architectures resembling scale-free net-works, with striking structural and topologicalsimilarity to other networks from biological andnon-biological systems. They are characterized bythe recurrence of small patterns of interconnections,called network motifs,3–5,12–16 which were firstdefined in Escherichia coli,3 and were subsequentlyfound in yeast and other organisms.2,4,12
Even though the general structural properties oftranscriptional networks are well understood, thereare several fundamental questions regarding theprovenance and evolution of transcriptional regu-latory networks that remain unanswered: What arethe trends of conservation of transcription factors,
T
2 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
UNCORREC
target genes and regulatory interactions in thenetwork? How do interactions that specify topolo-gically equivalent motifs within the networkevolve? How does the global structure of thenetwork evolve? We addressed these questions byusing the experimentally determined transcrip-tional regulatory network of E. coli as a referencenetwork,3,17 and performing a comparativegenomic analysis to predict components of theregulatory network for 175 prokaryotes withcompletely sequenced genomes, from diverselineages of the bacterial and archaeal kingdoms(a list of the genomes is provided as SupplementaryData).
Reconstruction of transcriptional regulatorynetworks
While there has been considerable progress inunravelling the regulatory networks of variousmodel organisms such as E. coli, the extrapolationof this information to poorly studied organisms,whose complete genome sequences are now avail-able, remains a major challenge. Target genes for atranscription factor can be identified by usingsequence profiles of known binding sites acrossdifferent organisms and by arriving at a set of geneswith conserved regulatory sequences. However,this method requires prior knowledge aboutbinding sites, and is applicable only to closelyrelated genomes, since orthologous transcriptionfactors may regulate orthologous target genesthrough very divergent binding sites in distantlyrelated organisms.18–21
Alternatively, using an experimentally characte-rized transcriptional network as template, one caninfer transcriptional targets of a regulator in agenome of interest by identifying orthologues oftranscription factors and their target genes. It is nowgenerally accepted that in the majority of cases,orthologous transcription factors regulate ortho-logous target genes. This procedure of transferringinformation about transcriptional regulation from agenome with known regulatory interactions toanother genome by identifying orthologous pro-teins was assessed recently by Yu et al.,22 and wasfound to be a fairly robust method for predictingsuch interactions in eukaryotes. In fact similarapproaches, based on orthologue detection using abi-directional best-hit procedure,23–28 have beendeveloped successfully to transfer information oninteractions to other organisms and have proved tobe useful in predicting new interactions.
Detecting orthologues is a non-trivial process.After testing various orthologue detection pro-cedures (e.g. bi-directional best-hit and best hitswith defined e-value cut-offs), we arrived at ahybrid procedure, which was used to identifyorthologous proteins in a genome. Using thisapproach, we predicted components of the tran-scriptional networks. With the best-characterizedtranscriptional regulatory network currently avail-able, that of E. coli with 755 genes (112 transcription
YJMBI 58077—25/2/2006—18:46—SATHYA—202231—XML – pp. 1–20 / G
ED PROOF
factors) and 1295 transcriptional interactions, as atemplate, we used our orthologue detection pro-cedure and predicted transcriptional interactionnetworks, for the first time, for 175 prokaryoticgenomes (Figure 1; Supplementary Data M1, M2and S3). Our method is based on identifying theorthologues of E. coli transcription factors andthe orthologues of E. coli target genes in each ofthe prokaryote genomes using protein sequencecomparisons, without considering conservation ofthe transcription factor-binding sites in the DNA.We chose the orthology-based method rather thanusing binding site information because usingbinding sites (i) would place an additional con-straint and would drastically reduce the size of theE. coli network that we considered (only a fewtranscription factors in the network have enoughexperimentally characterized binding site infor-mation to build a reliable profile) and (ii) wouldlimit the number of genomes that can be compared,as reliable detection of these short binding sites indistantly related organisms is not possible, and mayhence bias our analysis.20 It should be noted that themethod we develop can, in principle, be applied toany reference network, and we chose the E. colinetwork because it is the most comprehensive whencompared to networks available for other prokar-yotes.
The transcriptional regulatory network for E. coliis shown in Figure 1, along with the networks for aGram-positive mammalian pathogen, Bacillusanthracis and a free-living organism, Streptomycescoelicolor, which were reconstructed using the E. colinetwork as the reference network. We wish toemphasize that we do not predict new regulatoryinteractions that have been gained in the otherorganisms, and no current computational methodsallow us to do it in the absence of other externalinformation such as gene expression data, etc.However, we can still predict potentially conservedinteractions, apart from conserved regulatory com-ponents and genes, which can shed light onnetwork evolution (see Supplementary Data S17).
Orthologue detection, which is the foundation forour network reconstruction procedure can beconfounded by rapid duplication, divergence andloss of genes. Hence, in this study, we assessed ourprocedure using the expression data available forVibrio cholerae and the known regulatory network ofBacillus subtilis. We studied the extent to whichtarget genes with the same set of known andpredicted transcription factors have similarexpression profile.29 We found that co-regulatedgenes in E. coli, for which the transcriptionalnetwork is known, and in V. cholerae, for whichpredictions were based on the reconstructednetwork, tend to be strongly co-expressed(Supplementary Data S2). Ideally, one would wantto carry out such an assessment for as manygenomes as possible; however, the availability ofmeaningful gene expression data for other organ-isms limits us to restrict this analysis to V. cholerae.This result supports the validity of reconstructing
H
UNCORRECTED PROOFFigure 1. Known transcriptional regulatory network for E. coli and reconstructed transcriptional regulatory networks for a pathogen, Bacillus anthracis and a free-living
organism, Streptomyces coelicolor. Transcription factors and target genes are represented as red and blue circles, and transcriptional interactions are represented as black lines. Thetranscription factors are ordered according to their connectivity to emphasize the scale-free-like structure of the network at the global level, where there are few dominantregulatory hubs that control many target genes and many transcription factors that regulate few target genes. Information about the number of transcription factors, targetgenes, regulatory interactions and regulatory network motifs is provided at the right.2,4 This Figure illustrates how we can reconstruct transcriptional networks for genomes,including organisms, that are poorly characterised but are important. The table below shows the number of transcription factors that have a DNA-binding domain belonging toa particular family. It is clear by comparing the numbers that each genome has evolved its own set of transcriptional regulators, and hence regulatory interactions, by using thesame DNA-binding domains to different extents.
YJM
BI58077—
25/2/
2006—18:46—
SATHYA—202231—
XML–pp.1–20
/GH
Prokaryotic
Transcrip
tionalRegulatory
Networks
3
ARTICLEIN
PRESS
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
4 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
transcriptional networks by inferring regulatoryinteractions between orthologous transcriptionfactors and orthologous target genes in prokaryotes.Additionally, the experimentally determined tran-scriptional regulatory network of B. subtilis (abacterium very distantly related to our referenceorganism E. coli) shows a good degree of con-gruence with the interactions predicted by ouranalysis, thereby lending support to the validity ofour reconstruction procedure (Supplementary DataS11; note that the experimentally determinedB. subtilis network is far from complete and ismuch smaller than the E. coli network and, hence,this comparison should not be treated as a goldstandard to get false positive and false negativeestimates).
458
459
460
461
462
463
Results and Discussion
Evolution of genes and their regulatoryinteractions
T
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
UNCORREC
Transcription factors evolve rapidly andindependently of their target genes
To assess the evolutionary trends in the networkconservation, we asked if transcriptional regulatorsand their targets are conserved differentially inevolution. When we quantified the extent ofconservation of the 755 genes in the 175 genomes,we found that transcription factors are less con-served across genomes during evolution than theirtarget genes (Figure 2(a)). We assessed the signifi-cance of this observed bias in the conservationpatterns by simulating network evolution (Sup-plementary Data M3), and found it to be statisticallysignificant (p!10K4). This suggests that the evol-ution of major phenotypic differences betweenorganisms is a consequence of replacements oftranscription factors, which provide regulatoryinputs, rather than changes in the gene repertoireitself. The above observation is consistent with thefact that metabolic enzymes, which constitute amajor set of the target genes and thus the metabolicnetwork, are well conserved across the differentorganisms.30
Because a regulatory interaction involves atranscription factor and its target gene, one mightexpect them to be present or absent as pairs. Indeed,studies on the conservation of physically interactingproteins across the different proteomes suggest thatproteins forming a complex, especially interactingpairs, tend to be conserved or lost in a concertedmanner.30,31 To test this, we first created two sets ofgene pairs using the transcriptional regulatorynetwork: (i) the interacting set, which consisted ofall pairs of genes that are known to interact in thetranscriptional network and (ii) the non-interactingset, which consisted of all possible pairs of genes inthe network that are known not to interact in theoriginal network. We then analysed the conser-vation pattern across the 175 organisms for all the
YJMBI 58077—25/2/2006—18:46—SATHYA—202231—XML – pp. 1–20 / G
pairs of genes in both sets. If interacting proteins arepreferentially conserved or lost, one would expectthe trends obtained for the two sets to be vastlydifferent. On the other hand, if interacting proteinsare conserved to the same extent as any pair of non-interacting proteins, one would expect the trendsobtained for the two sets to be similar. Interestingly,in the transcriptional regulatory network studiedhere, the relative conservation of pairs from the twosets is close to 1 across all genomes, suggesting thatthere is no strong preference for pairs of interactingtranscription factors and target genes to be con-served in the network (Supplementary Data M6d;and Figure 2(b)). This observation suggests that theforces of natural selection act relatively indepen-dently to retain or discard transcription factors andtheir targets, as opposed to protein pairs that areinvolved in physical interactions.
ED PROOF
Organisms have evolved their own setof transcription factors
Several bacteria, such as B. anthracis and S. coeli-color (Figure 1), have few transcription factors withorthologues in the E. coli reference network. How-ever, these genomes are relatively large and mustcontain a complex transcriptional regulatory net-work. Hence, we searched these proteomes withsensitive sequence profile methods to identifyproteins with DNA-binding domains typicallyobserved in known transcription factors.33 In mostgenomes, we could identify other transcriptionfactors belonging to the same repertoire of DNA-binding domain families as in E. coli (Figure 1), butnot orthologous to the E. coli proteins. This obser-vation indicates that new lineage-specific transcrip-tion factors, and thereby new interactions, aregained constantly during evolution. However, ourcurrent level of understanding prevents a syste-matic prediction of the newly gained interactions(Supplementary Data S14). For small genomes,typically those of obligate parasites with lowfractions of transcription factors, no additionaltranscription factors were detectable (Supplemen-tary Data S4). Consistent with recent observationsby van Nimwegen, Ranea et al. and by us,34–36 theincrease in the number of regulatory proteins withgenome size is non-linear (Figure 2(c)). This impliesthat as genome size increases, a greater thanproportional increase in the numbers of transcrip-tion factors is required for controlling the newlyadded genes. This tendency might correlate withthe need to regulate specialized groups of genesindividually, or with the need for integration ofdistinct inputs and introducing more layers in theregulatory hierarchy of the metabolically or organi-zationally complex bacteria with large genomes.36
These observations, taken together with the higherdegree of conservation of target genes, suggestcertain general principles that operate in theevolution of transcriptional networks. (1) In smallparasitic genomes, transcription factors have beenlost due to the absence of selective pressures for
H
UNCORRECTED PROOF
Figure 2. Conservation of the transcription regulatory network and regulatory interactions. (a) For each of the 175genomes, the fraction of target genes conserved (x-axis) is plotted against the fraction of transcription factors conserved(y-axis). The diagonal, shown in green, represents equal conservation of transcription factors and target genes, and everypoint represents a genome. The graph shows that more target genes are conserved than transcription factors in thegenomes considered (yZ1.1173xK16.609, R2Z0.9223, p!10K4). The significance of this trend was assessed bysimulating network evolution, which involved neutral removal of different sets of genes 10,000 times (Supplementary
YJMBI 58077—25/2/2006—18:46—SATHYA—202231—XML – pp. 1–20 / GH
Prokaryotic Transcriptional Regulatory Networks 5
ARTICLE IN PRESS
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
6 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
regulation. (2) In larger genomes, target genes areoften controlled by additional regulators or reg-ulators that are non-orthologous, so that genes areresponsive to a variety of different inputs that aretypically dependent on the environmental niche ofthe organism. For example, in a number ofphylogenetically distant free-living bacteria, inclu-ding proteobacteria, B. subtilis and Streptomyces,there is an expansion of so-called one-componenttranscription factors of the LysR family, which sensea wide range of small-molecule ligands. However,there are no homologues of such transcriptionfactors in any of their close relatives, which areobligate pathogens. This is consistent with the needin all of these evolutionarily weakly related, free-living, physiologically complex bacteria to be ableto sense the same set of metabolites in theirenvironment.
T
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
ORRECOrganisms with similar lifestyles preservehomologous regulatory interactions
Given that the conservation of transcriptionfactors and target genes is poorly correlated, wesought to address the extent to which transcrip-tional interactions are conserved across differentorganisms. To do this, we developed an algorithmto deduce the relationship between differentgenomes based on conservation of specific sets oftranscriptional interactions (Supplementary DataM5). We first represented the presence or theabsence of genes and interactions of the referenceE. coli transcriptional network in each of the 175prokaryotic genomes as a binary interaction con-servation profile. The organisms were then hier-archically clustered on the basis of their interactionconservation profiles (Figure 3(a)).
The clustering of organisms based on theinteractions present and depicted in Figure 3(b)reveals that transcriptional interactions appear to beshaped by a set of disparate forces. At low tomoderate evolutionary distance from the referenceorganism E. coli, namely proteobacteria, the cluster-ing generally mirrors organism-based relationshipsconstructed using highly conserved genesequences,37 suggesting that the regulatory networkretains a noticeable phylogenetic signal. Beyond theproteobacteria, a number of other effects appear todominate upon the background of the weakphylogenetic signal. These include the similaritiesin genome size (Supplementary Data S15)38,39 andecological adaptations:40,41 for example, bacteriawith comparable genome size, such as several species
UNCData). (b) For each of the 175 genomes, the fraction of genes cgenes conserved (y-axis) in the interacting set (black) and the nmore conserved, one would expect significant differences betthat the trends are very similar (the average relative conservat0.95), suggesting that pairs of interacting genes are conserveinteracting genes. (c) The number of predicted transcriptiongenome size (x-axis), indicating that new transcription factoryZ9e(K0.5x1.75) fits the data best (R2Z0.83).
YJMBI 58077—25/2/2006—18:46—SATHYA—202231—XML – pp. 1–20 / G
ED PROOF
of Bacillus, Corynebacterium and Mycobacterium,whose principal habitat is the soil, form a cluster(Figure 3(b)). Likewise, the obligate or intracellularparasites from diverse bacterial clades, namelyMycoplasma, Rickettsiae and Chlamydiae, grouptogether in this analysis.
This observation motivated us to investigatewhether our finding that organisms with similarlifestyles conserve similar interactions was ageneral principle, or was limited to anecdotalexamples. First, we systematically defined thelifestyle of each organism as a combination of fourproperties: oxygen requirement, optimal growthtemperature, environmental condition, andwhether it is a pathogen. For example, E. coli K12would belong to the lifestyle class called facul-tative:mesophilic:host-associated:no. Then, wecompared the similarity between organisms belon-ging to different lifestyle classes as a function of theinteractions they have in common (SupplementaryData S12 andM10). Each element in thematrix shownin Figure 4(a) corresponds to the normalized averagesimilarity in the interaction content across all pairs oforganisms belonging to the lifestyle classes con-sidered. The values along the diagonal, which reflectthe average similarity among organisms belonging tothe same lifestyle class, tend to be much greater thanthe off-diagonal elements. This lends support to ourfinding that organisms with similar lifestyles doindeed tend to conserve similar interactions.
We defined an index, called the lifestyle similarityindex (LSI), to measure the strength of this trend.The LSI is calculated as the ratio of the averagesimilarity among the diagonal elements to theaverage similarity of the off-diagonal elements.LSI values greater than 1 mean that organismswithin the same lifestyle class have more similarityof interactions compared to organisms belonging toa different lifestyle class. Amongst the organismsincluded in our study, we obtained an LSI value thatis far above 1 and is statistically significant (LSIZ1.42; p!10K3; Z-scoreZ2.96). This shows thatorganisms belonging to the same lifestyle have asignificantly higher number of interactions incommon in comparison to organisms from otherlifestyle classes. The enrichment in similarity insome off-diagonal elements, corresponding toorganisms belonging to different lifestyle classes,is associated primarily with certain mesophilicorganisms. This arises due to the fact that theselife-style classes contain organisms with largegenomes that are phylogenetically related to otherspecialized organisms spanning the entire spectrum
onserved (x-axis) is plotted against the fraction of pairs ofon-interacting set (green). If interacting pairs of genes are
ween the trends obtained for the two sets. The plot showsion of gene pairs across 175 organisms from the two sets isd to an extent comparable with that of any pair of non-factors (y-axis) increase non-linearly with the increase ins evolve as genome size increases. A power-law equation
H
747
748
749
750
751
752
753
754
755
756
Prokaryotic Transcriptional Regulatory Networks 7
ARTICLE IN PRESS
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
of lifestyle classes. For example, B. subtilis, a meso-phile, may be related to a thermophile like Bacillusstearothermophilus, a pathogen like B. anthracis, anae-robe like Thermoanaerobacter tengcongensis and micro-aerophiles such as Lactobacillus. We emphasize herethat each phylogenetic group does have organismswith different lifestyles and, for each lifestyle classthat we define, there are organisms belonging todifferent phylogenetic groups. Thus, the LSI valuesare not overly biased by evolutionary relatedness fora lifestyle class or by the enrichment of a particularlifestyle class within a phylogenetic group (Sup-plementary Data S13). This proposition applies alsoto a subsequent analysis of lifestyle classes andmotifconservation, described later.
Thus, on the basis of the information aboutconservation of specific transcription factors andtheir interactions, we were able to potentiallyreconstruct the presence of transcriptional responsepathways similar to those in the reference network invarious other genomes, and understand their evol-ution (Supplementary Data S5, S10, M4, M5). Incombination with other context-based methods,42,43
these reconstructions could aid in probing poorlycharacterized or experimentally intractable organ-isms, including key human pathogens. For example,important pathogens like Yersinia pestis, Pseudomonassyringae and a nitrogen-fixing symbiont of the
UNCORRECT
Figure 3 (legen
YJMBI 58077—25/2/2006—18:46—SATHYA—202231—XML – pp. 1–20 / GH
OF
soybean plant, Bradyrhizobium japonicum, have con-served the GntR, KdgR, ExuR and UxuR transcrip-tional regulators. These regulators can sense differenthexuronates and hexuronides, suggesting the con-servation of the pathway that can catabolize thesecompounds. Examinationof target gene conservationshows that they are indeed conserved, thus allowingus to predict the presence of these response pathwaysand their transcriptional regulators. Extending ourapproach with other reference networks in futurecould identify potential targets for therapeuticintervention in unrelated pathogens that share asimilar ecological niche.
Evolution of local network structure
At the local level, the transcription regulatorynetwork shows recurrent topological patterns ofinterconnections called network motifs, which canbe viewed as building blocks of the network.2–4
Such network motifs have been suggested to carryout specific information processing tasks and hencedictate specific patterns of gene expression.6,44 Forexample, the function of feed-forward motifs is torespond only to persistent signals, and that ofsingle-input motifs is to bring about a quick andcoordinated change in gene expression. We inves-tigated whether natural selection acts at the level of
ED PRO
d next page)
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
CTED PROOF
Figure 3. Regulatory interaction conservation in genomes. (a) A method to analyse conservation of regulatoryinteractions in the different genomes. (b) Genomes were clustered according to the interactions they conserve. The Figureshows that: (i) genomes in the same phylogenetic group generally cluster together; (ii) parasitic genomes; and (iii)genomes with similar lifestyle but belonging to different phylogenetic groups cluster together. This suggests thatinteractions are gained or lost depending on the organism’s lifestyle and the environmental conditions in which theylive. Reconstructed transcriptional network for a representative organism from each group is given at the side.
8 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
UNCORREmotifs and whether the interactions, which areconserved across organisms, correspond to networkmotifs in the reference network.
In the E. coli network, there are 277 feed-forwardmotifs, 43 single-input motifs and 70 multiple-inputmotifs. To evaluate motif conservation in thenetwork, we devised a new method (Supplemen-tary Data M8; and Figure 5) that clusters topologi-cally equivalent motifs according to theirconservation profile. First, we generated a motifconservation profile for each genome, and thencarried out a two-way clustering procedure: ak-means clustering of all motifs with similardistribution profile across genomes, followed by ahierarchical clustering of genomes on the basis oftheir motif conservation profile (SupplementaryData S6 and S7).
YJMBI 58077—25/2/2006—18:46—SATHYA—202231—XML – pp. 1–20 / G
Interactions within motifs are conserved to the sameextent as other interactions
Contrary to our expectation, the analysis showeda surprising result, that there is no significantconservation of whole motifs when compared torandom networks of similar size. Interestingly,within coherent monophyletic lineages of prokar-yotes with different lifestyles, particular networkmotifs may not be conserved, whereas genomesbelonging to unrelated phylogenetic groups butwith similar lifestyle conserve these motifs. Forexample, Fnr (a global regulator, activated duringlow levels of oxygen), NarL (transcriptionalregulator of a two-component signal transductionsystem) and NuoN (subunit of the NADH dehy-drogenase complex I) form a feed-forward motif,
H
UNCORRECTED PROOFFigure 4. Organisms with similar lifestyles show similarity in interaction and motif content. Each element in the matrix represents (a) the average similarity in the interactioncontent or (b) the average similarity in the motif content among all pairs of organisms belonging to the two lifestyle classes considered. The lifestyle similarity index (LSI) is theratio of average similarity between organisms in the same lifestyle class to the average similarity between organisms belonging to a different lifestyle class, i.e. the ratio of theaverage value of the diagonal elements to the average value of the off-diagonal elements. LSIO1 suggests that organisms with similar lifestyles tend to show similar interactionsor network motif contents. The LSI values are 1.42 and 1.34 for the similarity based on (a) interactions and (b) network motifs. The matrix shown is not symmetric because eachelement has been row-normalized to highlight the trend. This is done for illustrative purposes only, and will not affect the LSI values. (For details, see Supplementary Data S12and M10.)
YJM
BI58077—
25/2/
2006—18:46—
SATHYA—202231—
XML–pp.1–20
/GH
ARTICLEIN
PRESS
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
UNCORRECTED PROOF
Figure 5.Network motif conservation in genomes. (a) A procedure to build and cluster motif conservation profiles toidentify topologically equivalent motifs that are conserved in the different genomes. (b) Cluster diagram of the 277 feed-
YJMBI 58077—25/2/2006—18:46—SATHYA—202231—XML – pp. 1–20 / GH
10 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
T
Prokaryotic Transcriptional Regulatory Networks 11
ARTICLE IN PRESS
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
REC
which is not conserved in other g proteobacteria,whereas it is found in several distantly relatedgenomes (Figure 6(a) and (b)). These resultssuggested that organisms belonging to the samelifestyle have conserved similar network motifs. Totest the generality of this observation, we computedthe similarity in motif conservation among orga-nisms belonging to different lifestyles (Supplemen-tary Data S12). Figure 4(b), which represents thesimilarity in network motif conservation amongorganisms belonging to different lifestyles, againshows that the diagonal elements (average simi-larity in motif content among organisms within thesame lifestyle class) are much greater than the off-diagonal elements (average lifestyle similarityamong organisms belonging to different lifestyleclass). In this case, the LSI value is 1.34 (p 3!10K3;Z-scoreZ2.83), showing that organisms with thesame lifestyle share common motifs whencompared to organisms of different lifestyles. Thissuggests that, apart from the phylogenetic com-ponent in retaining interactions, organisms withsimilar lifestyles tend to conserve network motifs asa general principle.
Though motifs tend to be conserved withinlifestyle classes, we sought to know whetherinteractions in motifs are conserved preferentiallycompared to other interactions in the networkregardless of lifestyle class. To determine this, wecomputed a motif conservation index (C.I) for eachorganism (Supplementary Data M9). C.I is definedas the logarithm of the ratio of the fraction ofconserved interactions that forms a motif in E. colito the fraction of all interactions conserved. We thencarried out network simulation experiments wherewe: (a) selected for interactions in motifs; (b)neutrally selected interactions in the network; and(c) selected against interactions in motifs andobtained the trends for C.I. As shown inFigure 6(c), the observed trend of conservationindex for the 175 genomes is closest to a model ofunbiased removal of interactions, rather thanpreferential conservation or loss of interactions inmotifs. Thus, we find that there is no preferentialconservation of whole motifs or parts thereof(Supplementary Data S8). These results indicatethat interactions in motifs and whole motifs are notconserved preferentially when compared to otherinteractions in the network, which was unexpected.All these findings suggest that motif formation is
UNCORforward motif conservation profile for the 175 genomes. Theshown as columns. If all constituents of a motif are present inred, or in shades of red to blue reflecting the amount of conserthat different feed-forward motifs have been conserved to vardiagram of 43 single-input motifs and 70 multiple-input motifforward motifs. These parts of the Figure illustrate that netwovarious extents in the different genomes. A representation likehave similar conservation profiles. Two-way clustering, as dohave conserved similar sets of network motifs. Such informpatterns in poorly characterised organisms. (High-resolutiongenomes/madanm/evdy/.)
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / GH
dynamic during evolution, and can easily bereshaped during adaptation to changes in lifestyle.
ED PROOF
Orthologous genes can come under the controlof different motifs in different organisms
Careful examination of partially conservedmotifs provided a possible answer as to whyinteractions in motifs are not specifically conserved.Our analysis revealed that orthologous genes canbecome part of different types of motifs in theregulatory network of different organisms by losingor gaining specific transcription factors (Figure 6(d);and Supplementary Data S9). This observationimplies that an orthologous gene in a differentorganism may acquire a different pattern of geneexpression in order to adapt better to changingenvironments. In this context, it is interesting tonote that a more recent study by Dekel and Alon45
has demonstrated that E. coli strains grown indifferent lactose environments optimize levels ofLacZ expression to increase its growth-rate bykeeping the cost of production low.46 Thus, ourresults lend strong support (at a genomic levelacross many organisms) that different organismsarrive at different solutions by tinkering withspecific regulatory interactions to optimizeexpression levels rather than porting whole blocksof pre-existing transcriptional interactions. Forexample, a gene that may need to be regulatedtightly (as a feed-forward motif) in one organism,might need to be regulated directly and quickly (asin a single-input motif) in a different genome due tochanges in lifestyle, development or environment.Specific cases are shown in Figure 6(d) (a completelist for every genome can be obtained from thesupplementary website; see Supplementary Data).An example of this is E. coli, which is adapted to alife with fixed aerobic and anaerobic phases, andwill not express the fumarate reductase genes (FrdBand FrdC, which converts fumarate to succinateunder anaerobic conditions to derive energy) unlessthere is persistent signal for lack of oxygen (througha feed-forward motif involving both Fnr andNarL). In contrast, Haemophilus influenzae, whichencounters rapid redox fluctuations during hostinfection and needs to regulate the fumaratereductase genes more quickly than E. coli, appearsto depend solely on Fnr for the response(by employing a single-input motif).
genomes are shown as rows and the different motifs area genome of interest, then the particular cell is coloured
vation (blue for completely absent). This Figure illustratesious extents in the different genomes. (c) and (d)) Clusters in the 175 genomes with the same representation as feed-rk motifs are not conserved as blocks but are conserved tothis can be extremely helpful in finding sets of motifs thatne here, also allows us to identify different genomes thatation can be helpful in understanding gene expressionimages are available at http://www.mrc-lmb.cam.ac.uk/
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
UNCORRECTED PROOF
Figure 6. Evolution of local network structure. (a) A feed-forward motif formed by Fnr, NarL and NuoN genes inE. coli is completely conserved in a closely related genome, Salmonella typhi, but not in other g-proteobacterial genomes.Fnr is a regulatory hub and is not always conserved in genomes within the same phylogenetic group. This suggests thatcondition-specific regulatory hubs can either be lost or displaced by a non-orthologous protein in closely relatedgenomes. (b) Distantly related organisms that have conserved all interactions in the regulatory motif and that haveconserved the regulatory hub, Fnr. (c) This is a plot of fraction of genes conserved (x-axis) against conservation index(y-axis), which is a measure of the extent to which interactions in a motif are conserved. The conservation index (C.I.) iscalculated as the logarithm of the ratio of the fraction of conserved interactions that forms a motif in E. coli to the fractionof all interactions conserved. C.I.O0 means that interactions in motifs are selected for; C.I. close to 0 suggests total
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / GH
12 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
Prokaryotic Transcriptional Regulatory Networks 13
ARTICLE IN PRESS
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
In this context, we note that studies of duplicatedgenes within the transcriptional regulatory networkof E. coli and yeast have shown that network motifshave not evolved by duplication of whole ancestralmodules.47,48 These findings hint that the sameinteraction could have existed in different regulatorycontexts in ancestral genomes, which is in agreementwith our findings. Our conclusions are consistentwith the results of a recent study showing the lack ofconservation of higher order network modules,which are semi-independent units larger than net-workmotifs.49 This typeofflexibility in the regulatorycontext of a gene in aparticularprokaryotic cellmightbe viewed as analogous to the presence of multipledistinct pathways for context specific regulation oftarget genes across different cell typeswithin amulti-cellular organism.
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
Evolution of global network structure
We next sought to address how the fine-scale,independent evolution of interactions in networkmotifs affects the global structure of the network.Transcriptional regulatory networks have ahierarchical structure best approximated by thescale-free network model, in which the outgoingconnectivity of transcription factors follows a powerlaw yZaxKg, where y is the number of transcriptionfactors, x is the number of target genes and g is thescale-free exponent. In other words, there are fewtranscription factors that regulate many target genes.These influential transcription factors play the role ofregulatory hubs in the network.
T
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
CTranscriptional regulatory hubs evolve like othertranscription factors in the network
It was postulated that scale-free behaviour shouldhold good for randomly selected parts of networks ofthis form.14 In fact, the reconstructed transcriptionalnetworks for a majority of the genomes follow apower lawdistribution in their outgoing connectivity(Figure 7(a); and Supplementary Data S16). We then
UNCORREneutrality towards maintenance of motifs and C.I.!0 suggobtained by carrying out network simulation where interacselected against are shown in blue, green and red. The plot ofthat form the network motif are not selected for or against in einteraction in the network. This implies that evolution actinteraction in different genomes. (d) Analysis of partially contranscription factors, orthologous genes in different genomesrequirements. Genes in a feed-forwardmotif (FFM) in E. coli calosing a transcription factor (shown in grey). Feed-forward msensitive to fluctuations in input signals. Whereas a SIM regulsome input signal, like small molecules. Genes in a multiple-input motif (SIM) by losing a transcription factor. Genes regtranscribed only when two input signals cross particular thfactor, genes that are tightly regulated can be regulated in a siregulated as a part of an MIM by the loss of a transcriptioninteractions to create different motifs in genomes when ortholthe Figure illustrates that by bringing about temporal exprescan follow different patterns of gene expression, be it in a dif
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / GH
ED PROOF
asked if this is because transcription factors occupy-ing hubs in the network are preferentially conserved.Earlier studies on protein–protein interaction net-works have suggested that the proteinswith themostinteractions tend to be more conserved acrossorganisms.50,51 We explored this possible linkbetween connectedness of a hub and its conservationby comparing the observed retention of regulatoryinteractions to simulated models of network evol-ution where transcription factors are lost in anunbiased manner, and models where hubs arepreferentially conservedor lost. Theobservedpatternis closest to the unbiased removal of transcriptionfactors and is independent of the number of theirtarget genes in the E. coli network. This implies thatthere is no correlation between the degree ofconnectedness of a particular transcription factorand its conservation across genomes, as shown inFigure 7(b). Table 1 provides specific examples oftranscription factors with many target genes that arepoorly conserved across genomes and transcriptionfactors with low connectivity that are conserved inmany genomes.A possible explanation for the absence of any bias
for conservation of influential transcription factorsis suggested by our recent investigation of regula-tory hubs in yeast. Most of these regulatory hubs arecondition-specific, i.e. they regulate many genesonly in particular conditions, but remain silent inother conditions.52,53 Most transcription factors inprokaryotes are condition-specific, in that theyrespond to a specific environmental signal.54 If thelifestyle of the organism does not involve aparticular condition, then that regulatory proteincan be lost. For example, in the opportunisticpathogen P. aeruginosa, which actively utilizesphenolic compounds, the transcriptional regulatorsMhpR, HcaR and FeaR can sense the compoundsand activate target genes that that encode enzymesinvolved in their catabolism. However, moreobligate pathogens like Staphylococcus aureus andCampylobacter jejuni, which do not typically facephenolic compounds in their natural niches, lackboth the regulators and their target genes for the
ests selection against such interactions. The C.I. trendstions in motifs were selected for, neutrally selected andC.I. for the 175 genomes (in black) shows that interactionsvolution and are conserved to the same extent as any others by tinkering to form different motifs using the sameserved motifs revealed that by losing (or gaining) specificcould be expressed in different ways according to specificn be regulated as a part of a single-input module (SIM) byotif regulation ensures that target gene expression is notation ensures expression of target genes as long as there isinput motif (MIM) in E. coli can be regulated as a single-ulated as a part of a MIM ensures that target genes areresholds independently. Thus, by losing a transcriptionmple manner. Genes regulated as a part of an FFM can befactor. Thus, evolution tinkers with specific regulatory
ogous genes need to be expressed differently. Generically,sion of specific transcription factors, the same target geneferent cell type or in different organisms.
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
NCORRECTED PROOF
Figure 7. Evolution of global network structure. (a) The fraction of genes conserved (x-axis) is plotted against the scale-free exponent value (g) of the conserved networks (y-axis) for the different genomes, in black. g is the scale-free exponentin the power law equation yZ axKg, where y is the number of transcription factors and x is the number of target genes.The average value of g obtained after simulating a neutral model of network evolution 10,000 times is shown in green.This plot shows that for genomes that conserve about 30–60% of the genes, the exponent value (g) is higher than inrandom networks, suggesting a more pronounced hub-like behaviour. (b) The fraction of transcriptional regulatoryinteractions conserved in the network (x-axis) is plotted against the fraction of the genes conserved (y-axis) for each ofthe 175 genomes (yZ0.007x2K1.7843xC100, R2Z0.95). To understand what the observed trendmeans, we simulated theprocess of network evolution by incorporating different rules: (i) remove genes with low connectivity first, implyingselection to conserve highly connected transcription factors, shown in blue; (ii) remove genes neutrally with no specialemphasis on connectivity, implying neutral evolution, shown in green; and (iii) remove nodes with high connectivityfirst, implying selection. against highly connected regulatory proteins, shown in red. We find that the observed trend ismore similar to the trend that we obtain while simulating the neutral condition. (c) Reconstructed transcriptionalregulatory networks for H. influenzae and B. pertussis is shown to illustrate that the scale-free-like structure is stillconserved, even though regulatory hubs can be lost or replaced. For example, the distribution for the outgoingconnection for the conserved network in H. influenzae is yZ214.8xK0.49 even though a regulatory hub NarL is lost.
14 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
Uutilization of these aromatic compounds. Theseobservations suggest an important principle ofadaptation at the regulatory level: the adaptivevalue of an orthologous transcription factor might
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / G
be different across different organisms, dependingon the environment, thereby affecting the number ofgenes regulated by it and the overall networkstructure. These findings support an interesting
H
T
OOF
Table 1. Conservation and connectivity profile for transcription factors
Transcriptionfactor GI number
Low connectivity(k%15)
High conservation(R50%) Function
A. Proteins with few target genes (connectivity, k%15) but conserved in more than 50% of the genomes studied
Ada 16130150 5 74.86 Transcriptional regulator of DNA repairBirA 16131807 5 89.71 Biotin operon repressorMarR 33347561 7 54.29 Repressor of multiple antibiotic resistance
(Mar) operonDnaA 16131570 10 87.43 Transcriptional regulator of house-keeping
genesCpxR 16131752 14 55.43 Regulator of a 2CSTa
OmpR 16131282 14 57.14 Regulator of adaptive response (2CSTa)NarP 16130130 15 60.57 Regulator of aerobic respiration (2CSTa)
Transcrip-tion factor GI number
High connectivity(kO15)
Low conservation(!50%) Function
B. Proteins with many target genes (connectivity, kO15) but conserved in less than 50% of the genomes studied
FhlA 16130638 16 34.29 Formate hydrogen lyase activatorRob 16132213 17 24.00 Transcriptional activator for antibiotic resistanceCysB 16129236 18 25.14 Regulator of cysteine biosynthesisFruR 16128073 25 21.14 Fructose operon repressorHns 16129198 26 14.29 General regulatorPurR 16129616 29 33.14 Purine biosynthesis repressorFis 16131149 34 28.00 Regulator of rRNA and tRNA operonsLrp 16128856 53 44.57 Leucine responsive regulatory proteinNarL 16129184 66 36.00 Regulator of anaerobic respirationArcA 16132218 70 40.00 Regulator of aerobic respirationIhf 16129668 96 37.71 General regulatorFnr 16129295 110 49.14 Tanscriptional regulator of aerobic, anaerobic
respiration and osmotic balance
a 2CST, two-component signal transduction system.
Prokaryotic Transcriptional Regulatory Networks 15
ARTICLE IN PRESS
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
CORREChypothesis proposed by Wagner on adaptation,evolvability and robustness and in living systems55
and an elegant work by Hittinger et al., where theyreport gene inactivation, and loss as being associ-ated with adaptation to new ecological niches inyeast.56
The binding affinity and specificity of a transcrip-tion factor and its target site can be affected byrelatively small changes in the DNA-binding inter-face of the transcription factor, or in the bindingsite.57,58 As a result, DNA-binding domains couldevolve new target sites relatively easily, resulting inrapid de novo emergence of new transcriptionalinteractions. This could be another reason for thehigh level of variability of transcriptional regulatoryinteractions in evolution. Protein–protein inter-actions, on the other hand, involve larger interfacesand hence more mutations are needed to alter them.Indeed, Maslov et al.59 have shown that paralogousproteins in yeast that participate in transcriptionalregulatory interactions evolve new interactionsrapidly when compared to paralogous proteinsthat participate in protein–protein interactions.
1883
1884
1885
1886
1887
1888
1889
1890
UNScale-free structure emerges independentlyduring evolution
Our analysis of the reconstructed networksreveals that even though particular regulatoryhubs may be lost or possibly replaced, as shownin Figure 7(c) for Haemophilus influenzae and
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / GH
ED PRBordetella pertussis, the distribution of outgoing
connectivity is still best approximated by a power-law function. It is evident From Figure 7(a) that theexponent (g in the fit yZaxKg) increases in genomeswhere the fraction of conserved nodes is less than60%, and this is statistically significant (Supplemen-tary Data M7). This implies that a hierarchicalstructure, with a connectivity distribution approxi-mated by a power law, is still retained in poorlyconserved networks (Figure 6(c)). The fact that thereconstructed networks are still power-law-like intheir distribution, but with a different exponentvalue when compared to random networks of asimilar size, suggests selection for different proteinsto be regulatory hubs. This may result in theindependent emergence of new network structuresthat resemble scale-free networks. However, forphylogenetically close genomes that have con-served more than 60% of the genes, the values ofthe exponent are similar to that of the ancestralnetwork (Figure 7(a)).To further investigate the prevalence of network
topology resembling a scale-free structure in otherbacteria, we analysed the extent to which thestructure of the regulatory network changes inB. subtilis. For this, we compiled information aboutthe known transcriptional regulatory network ofB. subtilis from DBTBS.60 By comparing it with theE. coli network, we found that even though theyshare a similar set of target genes, the individualtranscriptional factors have been replaced to
UNCORRECTED PROOFFigure 8.Comparisonof theknownB. subtilisand theE. coli transcriptional regulatorynetwork.Regulatoryhubs foreachnetworkare shownseparately.Thenumber inparenthesesrepresents the number of target genes for the transcription factor. Relevant information and the outgoing connectivity distribution are given below each network. (a) The B. subtilisnetwork has a scale-free structure with CcpA as its major regulatory hub. The closest hit in E. coli is the CytR protein (cytidine repressor belonging to the LacI family of repressors),which has only ten target genes. CcpA is the major carbon catabolite repressor protein that interacts with its co-repressor Hpr-P protein and controls genes involved in carbonmetabolism. (b) TheE. colinetworkhasCrp as itsmajor regulatory hub and it has a scale-free structure. Crp is a functional analogue of theCcpAprotein inB. subtilis. It regulates genesinvolved in carbonmetabolismbut is not evolutionarily related toCcpA.There is noCrp orthologue inB. subtilis and this organismdoesnot sense cAMP likeE. coli. The closestmatchtoCrp fromE. coli inB. subtilis is Fnr,which is known to regulate ten target genes inB. subtilis. Thus,we see how twoorganismswith the ability to carry out specific carbonmetabolismhave evolved their own set of regulatory hubs andmaintained the scale-free structure. This supports the argument that the scale-free-like structure evolved independently in the twogenomes.
YJM
BI58077—
25/2/
2006—18:47—
SATHYA—202231—
XML–pp.1–20
/GH
ARTICLEIN
PRESS
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
UNCORRECTED PROOFFigure 9. Evolution of network structure at three levels of organization. Three levels of organization of network structure. (a) At the level of genes and interactions,transcription factors tend to evolve more rapidly than their target genes. This, coupled with the observation that different genomes evolve their own transcription factors, maymean that they sense and respond to different signals in their changing environments. At the level of regulatory interactions, organisms with similar lifestyles conserve similarregulatory interactions, indicating the strong influence of environment on gene regulation. (b) At the local level, interactions in motifs evolve like any other interaction in thenetwork. By losing or gaining individual transcription factors, orthologous genes can come under the influence of different motifs and may hence code for a different pattern ofgene expression. Even though motifs are not conserved as whole units, organisms with similar lifestyles do retain similar motifs. (c) At the global level, regulatory hubs that arelifestyle-specific are lost as rapidly as other transcription factors in the network. Even though hubs can be lost or replaced, organisms with different lifestyles evolve a similarscale-free structure where different proteins emerge as hubs, as dictated by their lifestyle. All observations suggest that transcriptional regulatory networks in prokaryotes arevery flexible and adapt rapidly to changes in environment by tinkering individual interactions to arrive at an optimal design.
YJM
BI58077—
25/2/
2006—18:47—
SATHYA—202231—
XML–pp.1–20
/GH
ARTICLEIN
PRESS
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
18 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
a considerable extent. However, the overall powerlaw-like distribution in the outgoing connectivity ofthe network has been maintained in the B. subtilisnetwork. This comparison revealed that differentproteins act as regulatory hubs in the two genomes.For example, CcpA and Crp are the two regulatoryhubs in B. subtilis and E. coli, respectively control-ling many genes involved in carbon metabolism.Both have very different modes of regulation andare not evolutionarily related, suggesting indepen-dent innovation of regulatory hubs to regulateorthologous target genes. We find also that proteinsthat are regulatory hubs in the E. coli referencenetwork are either absent or regulate very fewtarget genes in other organisms. These observationsprovide additional support for the suggestion thatthe hierarchical structure of these networks hasconverged to an architecture similar to scale-freenetworks, albeit with independently recruitedregulatory hubs (Figure 8).
T
† http://www.ncbi.nlm.nih.gov/‡ http://genome-www5.stanford.edu/
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
UNCORREC
Conclusions
We present the first comprehensive analysis ofthe evolution of transcriptional regulatory networksat three distinct levels of organization by comparingthe conservation of an experimentally establishedreference network of 1295 interactions across 175microbial genomes (computationally analyzingw500,000 protein sequences). At the level ofindividual genes, we show that target genes aremore conserved across genomes than transcriptionfactors, and the conservation of a target gene and itstranscription factor are uncoupled, unlike the tightcorrelation of conservation patterns in physicallyinteracting protein pairs. Each organism hasevolved its own set of transcription factors,suggesting that a major factor in adaptation tonew environment is the emergence of distinctrepertoires of transcription factors, which probablyintegrate new inputs (Figure 9(a)). At the local level,there is no preferential conservation of interactionswithin network motifs, or of whole network motifs.However, it appears that by losing or conservingspecific transcriptional regulators, orthologousgenes in different genomes can be incorporatedwithin different regulatory contexts and can therebyeasily exhibit different patterns of gene expression.We find also that organisms with similar lifestylehave similar motif content and interactions in theirconserved networks (Figure 9(b)). Thus, naturalselection appears to tinker with individual inter-actions to arrive at an optimal design for a givenorganism. At the level of global network topology,we see that conservation of transcription factors isindependent of the number of target genes theyregulate, and depends on the lifestyle of theorganism rather than the phylogenetic distancefrom E. coli (Figure 9(c)). Additionally, a compari-son of the known regulatory networks of E. coli andB. subtilis confirms the above observations andreveals that different proteins act as regulatory
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / G
hubs in the two genomes. This, coupled with ourobservation that the reconstructed networks havedifferent power law exponents, suggests that theoverall tendency towards a scale-free like behaviouris an emergent property in evolution.
With advancement in large-scale experimentalstudies to identify transcriptional regulatory net-works, we believe that the general methods wepresent here will be useful in studying any set ofnetworks and genomes. These results and predic-tions can serve as a scaffold for experimental studieson transcriptional control in poorly characterizedgenomes, and could be relevant for designing ChIp-chip experiments for pathogens and in engineeringof regulatory interactions for organisms withbiotechnological value.
ED PROOF
Materials and Methods
Detailed descriptions of the methods are given in theSupplementary Data.
Dataset
The E. coli transcriptional regulatory network consis-ting of 112 transcription factors, 711 target genes and 1295regulatory interactions was assembled from the Regu-lonDB,17 and from data in the literature.3,33 Proteinsequences for the 176 completely sequenced organismswere downloaded from the NCBI website†. Theexpression dataset for benchmarking predicted regulat-ory interactions was downloaded from the StanfordMicroarray Database‡. The known transcriptional regu-latory network for B. subtilis was obtained from DBTBS.60
Information about lifestyle for the organisms wasobtained from the NCBI genome information websiteand from the Bergeys manual of Bacteriology.
Identification and analysis of conservedtranscriptional regulatory networks
The E. coli regulatory network was used as thereference network to study network evolution in the 175other organisms. The detailed algorithm to reconstructtranscriptional networks and the procedure to identifyorthologous genes are available as Supplementary Data(methods M1 and M2). Validations for the interactiontransfer procedure are available as Supplementary Data(S2 and S11). Network motifs were identified usingstandard algorithms. The algorithm to compare networksand the procedures of the relevant statistical tests tomeasure significance are available as SupplementaryData (M5 to M10). The procedure to comparelifestyle similarity and motif or interaction conservationfor the 175 organisms is described in Supplementary Data(S12).
Uncited Reference
32.
H
Prokaryotic Transcriptional Regulatory Networks 19
ARTICLE IN PRESS
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
Acknowledgements
M.M.B. and L.A. gratefully acknowledge theIntramural research program of National Institutesof Health, USA for funding their research. M.M.B.acknowledges the MRC Laboratory of MolecularBiology, Trinity College, Cambridge, CambridgeCommonwealth Trust and the National Institute ofHealth Visitor Program for financial support. Wethank Dr Nakai and Yuko Makita for sending usinformation on the B. subtilis network. We thank DrsN. Luscombe, C. Chothia, P. TenWolde, L. LoConte,L. M. Iyer, V. Pisupati, S. Balaji, S. Maslau andmembers of our groups for reading the manuscript.
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
Supplementary Data
Supplementary data associated with this articlecan be found, in the online version, at doi:10.1016/j.jmb.2006.02.019
Supplementary methods, information and thepredictions are available at: http://www.mrc-lmb.cam.ac.uk/genomes/madanm/evdy/
T
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
UNCORREC
References
1. Barabasi, A. L. & Oltvai, Z. N. (2004). Networkbiology: understanding the cell’s functional organiz-ation. Nature Rev. Genet. 5, 101–113.
2. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N.,Chklovskii, D. & Alon, U. (2002). Network motifs:simple building blocks of complex networks. Science,298, 824–827.
3. Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. (2002).Network motifs in the transcriptional regulationnetwork of Escherichia coli. Nature Genet. 31, 64–68.
4. Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K. et al. (2002). Transcriptionalregulatory networks in Saccharomyces cerevisiae.Science, 298, 799–804.
5. Guelzim, N., Bottani, S., Bourgine, P. & Kepes, F.(2002). Topological and causal structure of the yeasttranscriptional regulatory network. Nature Genet. 31,60–63.
6. Kalir, S. & Alon, U. (2004). Using a quantitativeblueprint to reprogram the dynamics of the flagellagene network. Cell, 117, 713–720.
7. McAdams, H. H., Srinivasan, B. & Arkin, A. P. (2004).The evolution of genetic regulatory systems inbacteria. Nature Rev. Genet. 5, 169–178.
8. Wall, M. E., Hlavacek, W. S. & Savageau, M. A. (2004).Design of gene circuits: lessons from bacteria. NatureRev. Genet. 5, 34–42.
9. Carroll, S. (2005). Endless Forms Most Beautiful: TheNew Science of Evo Devo and the Making of the AnimalKingdom, W.W. Norton & Co., ???.
10. Harbison, C. T., Gordon, D. B., Lee, T. I., Rinaldi, N. J.,Macisaac, K. D., Danford, T. W. et al. (2004).Transcriptional regulatory code of a eukaryoticgenome. Nature, 431, 99–104.
11. Madan Babu, M., Luscombe, N. M., Aravind, L.,Gerstein, M. & Teichmann, S. A. (2004). Structure andevolution of transcriptional regulatory networks.Curr. Opin. Struct. Biol. 14, 283–291.
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / GH
ED PROOF
12. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I. et al. (2004). Superfamilies ofevolved and designed networks. Science, 303,1538–1542.
13. Albert, R., Jeong, H. & Barabasi, A. L. (2000). Errorand attack tolerance of complex networks. Nature,406, 378–382.
14. Barabasi, A. L. & Albert, R. (1999). Emergence ofscaling in random networks. Science, 286, 509–512.
15. Oltvai, Z. N. & Barabasi, A. L. (2002). Systems biology.Life’s complexity pyramid. Science, 298, 763–764.
16. Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N.& Barabasi, A. L. (2002). Hierarchical organization ofmodularity in metabolic networks. Science, 297,1551–1555.
17. Salgado, H., Gama-Castro, S., Martinez-Antonio, A.,Diaz-Peredo, E., Sanchez-Solano, F., Peralta-Gil, M.et al. (2004). RegulonDB (version 4.0): transcriptionalregulation, operon organization and growth con-ditions in Escherichia coli K-12. Nucl. Acids Res. 32,D303–D306.
18. Alkema, W. B., Lenhard, B. & Wasserman, W. W.(2004). Regulog analysis: detection of conservedregulatory networks across bacteria: application toStaphylococcus aureus. Genome Res. 14, 1362–1373.
19. Rajewsky, N., Socci, N. D., Zapotocky, M. & Siggia,E. D. (2002). The evolution of DNA regulatory regionsfor proteo-gamma bacteria by interspecies compari-sons. Genome Res. 12, 298–308.
20. McGuire, A. M., Hughes, J. D. & Church, G. M. (2000).Conservation of DNA regulatory motifs and discov-ery of new motifs in microbial genomes. Genome Res.10, 744–757.
21. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. &Lander, E. S. (2003). Sequencing and comparison ofyeast species to identify genes and regulatoryelements. Nature, 423, 241–254.
22. Yu, H., Luscombe, N. M., Lu, H. X., Zhu, X., Xia, Y.,Han, J. D. et al. (2004). Annotation transfer betweengenomes: protein-protein interologs and protein–DNA regulogs. Genome Res. 14, 1107–1118.
23. Kelley, B. P., Yuan, B., Lewitter, F., Sharan, R.,Stockwell, B. R. & Ideker, T. (2004). PathBLAST: atool for alignment of protein interaction networks.Nucl. Acids Res. 32, W83–W88.
24. Mazurie, A., Bottani, S. & Vergassola, M. (2005). Anevolutionary and functional assessment of regulatorynetwork motifs. Genome Biol. 6, R35.
25. Rodionov, D. A., Dubchak, I., Arkin, A., Alm, E. &Gelfand, M. S. (2004). Reconstruction of regulatoryand metabolic pathways in metal-reducing delta-proteobacteria. Genome Biol. 5, R90.
26. Rodionov, D. A., Vitreschak, A. G., Mironov, A. A. &Gelfand, M. S. (2002). Comparative genomics ofthiamin biosynthesis in procaryotes. New genes andregulatory mechanisms. J. Biol. Chem. 277,48949–48959.
27. Sharan, R., Suthram, S., Kelley, R. M., Kuhn, T.,McCuine, S., Uetz, P. et al. (2005). Conserved patternsof protein interaction in multiple species. Proc. NatlAcad. Sci. USA, 102, 1974–1979.
28. Espinosa, V., Gonzalez, A. D., Vasconcelos, A. T.,Huerta, A.M. &Collado-Vides, J. (2005). Comparativestudies of transcriptional regulation mechanisms in agroup of eight gamma-proteobacterial genomes.J. Mol. Biol. 354, 184–199.
29. Herrgard, M. J., Covert, M. W. & Palsson, B. O. (2004).Reconstruction of microbial transcriptional regulat-ory networks. Curr. Opin. Biotechnol. 15, 70–77.
T
20 Prokaryotic Transcriptional Regulatory Networks
ARTICLE IN PRESS
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
REC30. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. &Barabasi, A. L. (2000). The large-scale organization ofmetabolic networks. Nature, 407, 65164–65165.
31. Pagel, P., Mewes, H. W. & Frishman, D. (2004).Conservation of protein-protein interactions—lessonsfrom ascomycota. Trends Genet. 20, 72–76.
32. Aravind, L., Watanabe, H., Lipman, D. J. & Koonin,E. V. (2000). Lineage-specific loss and divergence offunctionally linked genes in eukaryotes. Proc. NatlAcad. Sci. USA, 97, 11319–11324.
33. Madan Babu, M. & Teichmann, S. A. (2003). Evolutionof transcription factors and the gene regulatorynetwork in Escherichia coli. Nucl. Acids Res. 31,1234–1244.
34. van Nimwegen, E. (2003). Scaling laws in thefunctional content of genomes. Trends Genet. 19,479–484.
35. Ranea, J. A., Buchan, D. W., Thornton, J. M. & Orengo,C. A. (2004). Evolution of protein superfamilies andbacterial genome size. J. Mol. Biol. 336, 871–887.
36. Aravind, L., Anantharaman, V., Balaji, S., Babu, M. M.& Iyer, L. M. (2005). The many faces of the helix-turn-helix domain: transcription regulation and beyond.FEMS Microbiol. Rev. 29, 231–262.
37. Hedges, S. B. (2002). The origin and evolution ofmodel organisms. Nature Rev. Genet. 3, 838–849.
38. Andersson, J. O. & Andersson, S. G. (1999). Insightsinto the evolutionary process of genome degradation.Curr. Opin. Genet. Dev. 9, 664–671.
39. Cerdeno-Tarraga, A., Thomson, N. & Parkhill, J.(2004). Pathogens in decay. Nature Rev. Microbiol. 2,774–775.
40. Sallstrom, B. & Andersson, S. G. (2005). Genomereduction in the alpha-Proteobacteria. Curr. Opin.Microbiol. 8, 579–585.
41. Thomson, N., Bentley, S., Holden, M. & Parkhill, J.(2003). Fitting the niche by genomic adaptation.Nature Rev. Microbiol. 1, 92–93.
42. Morett, E., Korbel, J. O., Rajan, E., Saab-Rincon, G.,Olvera, L., Olvera, M. et al. (2003). Systematicdiscovery of analogous enzymes in thiamin biosyn-thesis. Nature Biotechnol. 21, 790–795.
43. Eisenberg, D., Marcotte, E. M., Xenarios, I. & Yeates,T. O. (2000). Protein function in the post-genomic era.Nature, 405, 823–826.
44. Mangan, S. & Alon, U. (2003). Structure and functionof the feed-forward loop network motif. Proc. NatlAcad. Sci. USA, 100, 11980–11985.
45. Dekel, E. & Alon, U. (2005). Optimality and evol-utionary tuning of the expression level of a protein.Nature, 436, 588–592.
UNCO
YJMBI 58077—25/2/2006—18:47—SATHYA—202231—XML – pp. 1–20 / G
ED PROOF
46. Elena, S. F. & Lenski, R. E. (2003). Evolutionexperiments with microorganisms: the dynamicsand genetic bases of adaptation. Nature Rev. Genet. 4,457–469.
47. Teichmann, S. A. & Madan Babu, M. (2004). Generegulatory network growth by duplication. NatureGenet. 36, 492–496.
48. Conant, G. C. & Wagner, A. (2003). Convergentevolution of gene circuits. Nature Genet. 34, 264–266.
49. Snel, B. & Huynen, M. A. (2004). Quantifyingmodularity in the evolution of biomolecular systems.Genome Res. 14, 391–397.
50. Yu, H., Greenbaum, D., Xin Lu, H., Zhu, X. &Gerstein, M. (2004). Genomic analysis of essentialitywithin protein networks. Trends Genet. 20, 227–231.
51. Jeong, H., Mason, S. P., Barabasi, A. L. & Oltvai, Z. N.(2001). Lethality and centrality in protein networks.Nature, 411, 41–42.
52. Luscombe, N. M., Babu, M. M., Yu, H., Snyder, M.,Teichmann, S. A. & Gerstein, M. (2004). Genomicanalysis of regulatory network dynamics reveals largetopological changes. Nature, 431, 308–312.
53. Balazsi, G., Barabasi, A. L. & Oltvai, Z. N. (2005).Topological units of environmental signal processingin the transcriptional regulatory network of Escher-ichia coli. Proc. Natl Acad. Sci. USA, 102, 7841–7846.
54. Martinez-Antonio, A., Salgado, H., Gama-Castro, S.,Gutierrez-Rios, R. M., Jimenez-Jacinto, V. & Collado-Vides, J. (2003). Environmental conditions andtranscriptional regulation in Escherichia coli: a physio-logical integrative approach. Biotechnol. Bioeng. 84,743–749.
55. Wagner, A. (2005). Robustness and Evolvability inLiving Systems.
56. Hittinger, C. T., Rokas, A. & Carroll, S. B. (2004).Parallel inactivation of multiple GAL pathway genesand ecological diversification in yeasts. Proc. NatlAcad. Sci. USA, 101, 14144–14149.
57. Luscombe, N. M. & Thornton, J. M. (2002). Protein–DNA interactions: amino acid conservation and theeffects of mutations on binding specificity. J. Mol. Biol.320, 991–1009.
58. Mirny, L. A. & Gelfand, M. S. (2002). Structuralanalysis of conserved base pairs in protein-DNAcomplexes. Nucl. Acids Res. 30, 1704–1711.
59. Maslov, S., Sneppen, K., Eriksen, K. A. & Yan, K. K.(2004). Upstream plasticity and downstream robust-ness in evolution of molecular networks. BMC Evol.Biol. 4, 9.
60. Makita, Y., Nakao, M., Ogasawara, N. & Nakai, K.(2004). DBTBS: database of transcriptional regulationin Bacillus subtilis and its contribution to comparativegenomics. Nucl. Acids Re.s, 32, D75–D77.
2506
2507
2508
R Edited by R. Ebright2509
2510
2511
(Received 1 December 2005; received in revised form 6 February 2006; accepted 7 February 2006)H
2512
2513
2514
2515
2516
2517
2518
2519
2520