comparative genomics ofpseudomonas
TRANSCRIPT
1Comparative Genomics of PseudomonasKristoffer Kiil, Tim T. Binnewies, Hanni Willenbrock, Susse Kirkelund Hansen, Lei Yang,Lars Jelsbak, David W. Ussery, and Carsten Friis
1.1Introduction
The genus Pseudomonas covers one of the most diverse and ecologically significantgroups of bacteria. Members of the genus are found in large numbers in a widerangeof environmental niches, suchas terrestrial andmarine environments, aswell asin associationwithplants andanimals. This almostuniversal distributionofPseudomo-nas suggests a remarkable degree of genomic diversity and genetic adaptability.Pseudomonas aeruginosa is amotileGram-negative rod-shaped bacterium. It differs
fromothermembers of the Pseudomonas genus because of its potential pathogenicityfor human beings and other mammals. P. aeruginosa participates in infections inimmunocompromised individuals, such as patients suffering from AIDS, cancer,burn wounds and cystic fibrosis (CF) [1–3]. P. aeruginosa infections are normallydifficult to eradicate.P. aeruginosa is a generalistwith a diversemetabolic competence. It canutilize simple
smallmolecules aswell as complexorganic compoundsas carbonsourcesand iscapableof growing aerobically as well as anaerobically using NO3 or arginine as respiratoryelectronacceptors.P.aeruginosagrowsoverawiderangeof temperatures, fromcommonenvironmental temperatures up to 42�C, with an optimum temperature at 37�C.The first completed genome of P. aeruginosa is the genome of PAO1, published in
2000 [4]. It is 6.3Mbp and contains 5570 open reading frames. A significant number(8.4%) of P. aeruginosa genes are predicted to be involved in regulation, which at thetime of publication of the genome was the largest fraction of regulators amongsequenced bacterial genomes. The large genome size and remarkable gene com-plexity of P. aeruginosa enable it to adapt and thrive in different environmentalconditions and hosts. In addition to PAO1, PA14 as well as nine other clinicalP. aeruginosa strains are currently available either as completed genomes or asunfinished sequences. Genome comparison analysis of PAO1 and five differentclinical P. aeruginosa isolates (four of them are CF isolates) shows that about 80% of
Pseudomonas. Model Organism, Pathogen, Cell Factory. Edited by Bernd H.A. RehmCopyright � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, WeinheimISBN: 978-3-527-31914-5
j1
the PAO1 genome is conserved among other strains [5]. More than 30 relatively largeregions of the PAO1 genome are found to be nonconserved in thefive clinical strains.Those regions include phages, the pyoverdine biosynthesis locus, genes encoding aputative type I secretion system and a putative restriction modification system.P. aeruginosa is an opportunistic pathogen capable of producing a wide variety of
virulence factors, including lipopolysaccharides, flagellum, type IV pili, proteases,exotoxins, pyocyanin, exopolysaccharides, type III secretion, etc. Many of theextracellular virulence factors have been shown to be regulated by quorum-sensingsignals [6,7]. Biofilm formation can also be regarded as a virulence factor [8]. Acuteinfections with P. aeruginosa can be life threatening, resulting in severe tissuedamage and septicemia (i.e. bloodstream invasion). Although the pathogenicity ofP. aeruginosa is typically characterized by a high level of toxin production [9], severalimportant exceptions to this are known, particularly in the cases of certain chronicinfections of P. aeruginosa. These include infections of CF patients, whereP. aeruginosa develops genetic adaptations during long-term persistence, in whichvirulence factors are normally selected against [10–12]. This shows P. aeruginosa iscapable of choosing distinct strategies for different types of infections.Another remarkable and unfortunate character of P. aeruginosa is its tolerance to
many antimicrobial drugs. It has a number of inherent antibiotic-resistancemechan-isms that include an AmpC b-lactamase that can be induced by b-lactams, whichmakes it inherently resistant to cephalothin and ampicillin [13].P. aeruginosa also hasefflux pumps such as MexAB–OprM, making it impermeable to many antibiotics.MexAB–OprM removes b-lactams, chloramphenicol, fluoroquinolones, novobiocin,as well as various dyes and detergents [14]. Finally, it has been shown that throughmutation, P. aeruginosa is capable of developing resistance to antibiotics that thestrain is not inherently resistant to, such as aminoglycosides and colistin [15].
1.1.1Other Species of Pseudomonas
In addition to P. aeruginosa PAO1, several other Pseudomonas genomes have beensequenced (Table 1.1). P. putida, for instance, is a ubiquitous bacterium frequentlyisolated from soil (particularly polluted soil), the rhizosphere and water. P. putida is aparadigm of a metabolic versatile saprophytic soil bacterium and the best character-ized strain, KT2440 [16,17], has become a model bacterium worldwide – both inlaboratory studies and for the development of biotechnological applications. Some ofthese include bioremediation of contaminated sites [18,19], biocatalysis for theproduction of useful chemicals [20], and the potential development of new biopes-ticides and plant growth promoters as a plant rhizosphere protective agent. StrainKT2440 is a plasmid-free derivative of the original isolate, designated P. arvilla strainmt-2 [21], the natural host of the archetype TOL plasmid pWW0 [16], and subse-quently reclassified P. putida mt-2 [22]. It has maintained its ability to survive in itsnatural environment and is the first Gram-negative soil bacterium that has beencertified as a biosafety host for expression of foreign genes [23]. The genome of strainKT2440 was sequenced in 2004 [24]; the sequence information is continuously
2j 1 Comparative Genomics of Pseudomonas
Table1.1Featureoverview
ofcurren
tlysequ
encedPseudomonas
geno
mes.
Organ
ism
Genom
esize
(bp)
No.
contigsa
GþCcontent(%
)Pe
rcentcoding
(%)
No.
genesb
Accession
[reference]
P.a
eruginosa21
92682
625
382
66.2
8555
46NZ_A
AKW00
0000
00[32]
P.a
eruginosaC37
19614
699
812
466
.586
5065
NZ_A
AKV0000
0000
[32]
P.a
eruginosaLE
S660
175
72
66.3
8053
26[33]
P.a
eruginosaPA7
666
352
914
766
.484
5309
NZ_A
AQE00
000000
[32]
P.a
eruginosaPACS2
649
242
31
66.3
8653
17NZ_A
AQW00
000000
[32]
P.a
eruginosaPAO1
626
440
4–
66.6
8955
68AE00
4091
[32]
P.a
eruginosaUCBPP-PA14
653
764
8–
66.3
8958
92CP00
0438
[32]
P.entom
ophila
L48
588
878
0–
64.2
8851
34CT57
3326
[32]
P.fl
uorescensPf-5
707
489
3–
63.3
8861
37CP00
0076
[32]
P.fl
uorescensPfO
-1643
840
5–
60.5
8957
36CP00
0094
[32]
P.m
endocina
ymp
507
280
7–
64.7
8945
94CP00
0680
[32]
P.p
utidaF1
592
505
912
061
.988
4735
NZ_A
ALM
0000
0000
[32]
P.p
utidaGB-1
605
319
530
62.0
9048
90NZ_A
AXR0000
0000
[32]
P.p
utidaKT24
40618
186
3–
61.5
8653
50AE01
5451
[32]
P.p
utidaW61
9574
855
062
61.5
8946
24NZ_A
AVY00
0000
00[32]
P.stutzeriA1501
456
741
8–
63.9
8941
28CP00
0304
[32]
P.syringaepv.p
haseolicola14
48A
592
878
7–
57.9
8449
84CP00
0058
[32]
P.syringaepv.syringaeB728a
609
369
8–
59.2
8750
89CP00
0075
[32]
P.syringaepv.tom
atostr.DC3000
639
712
6–
58.3
8554
70AE01
6853
[32]
aThenumbers
correspo
ndto
thenumberof
contigs
forunassembled
genom
es,w
hile
adash
sign
ifies
acompletelyassembled
genom
e.bFo
rassembled
genom
esthisisthenumberof
proteinsan
notated
intheofficialNCBIrelease;fortheunassembled
genom
esthisisthenumberof
predictedgenes
from
theEasyG
eneruns.
1.1 Introduction j3
providing new insight into the biology of an adaptable and metabolic versatile groupof soil bacteria, and it is facilitating the increased use of this organism for biotech-nological purposes.P. syringae is a plant pathogen that can infect a wide range of plant species [25].
P. syringae strains are assigned to different pathovars on the basis of their distinctivepathogenicity and ability to infect different plant species. More than 50 differentpathovars exists and completed genome sequences of P. syringae strains fromthree different pathovars (tomato, syringae and phaseolicola) are available for analysis[26–28] (Table 1.1). P. syringae pv. tomato is the causal agent of bacterial speck diseaseon tomato andArabidopsis, whileP. syringae pv. phaseolicola andP. syringae pv. syringaecause halo blight and brown spot disease on bean plants, respectively.Other Pseudomonas species are nonpathogenic plant-associated bacteria that
exhibit plant growth-promoting properties [29]. For example, P. fluorescens Pf-5 andP. fluorescensPfO-1 are commensal bacteria that colonize plant surface environmentsand produce various secondary metabolites that suppress the growth of soil-borneplant pathogens. These bacteria have potential roles as biological disease controlagents in agricultural settings. P. stutzeri strain A1501 is able to colonize and infectrice roots, and is widely used as a rice inoculant in China [30]. It can fix nitrogen andmay provide rice plants with fixed nitrogen and hence promote plant growth.P. entomophila is an entomopathogenic bacterium that is highly pathogenic for a
variety of insects from different orders [31].
1.1.2Obtaining Sequence Data on Pseudomonas
In total, 19 Pseudomonas genomes were downloaded from online resources: 18 fromThe National Center for Biotechnology Information (NCBI) [32] and one wasproducedby thePathogenSequencingGroupat theSanger Institute [33].Anoverviewof the genomes is given in Table 1.1. Of the 19 genomes, 11 were completelysequenced and assembled at the timeof download, andwere obtainedwith traditionalNCBIannotationdescribing the locationofgeneswithin thegenomes.The remainingeightgenomeswere,however,underassemblyat the timeandthushad little in thewayof any official annotation. To identify the position of genes in these genomes weapplied the EasyGene [34] method for gene finding in bacterial genomes.
1.2Pan/Core Genome of Pseudomonas
With the availability of an increasing amount of fully sequenced Pseudomonasgenomes it becomes possible to conduct an investigation of genetic characteristicsdefining the Pseudomonas genus. For instance, what fraction of the genome isconserved throughout all sequenced Pseudomonas?The �core genome� of Pseudomonas is defined as containing those genes that are
present in all strains ofPseudomonas, i.e. theminimumamount of genes required for
4j 1 Comparative Genomics of Pseudomonas
a bacterium to be considered part of the Pseudomonas genus. In contrast the �pan-genome� is defined containing any gene present in any strain of Pseudomonas. Ahistogram of the sizes of the core and pan-genomes of Pseudomonas similar to thatpreviously done for Escherichia coli [35] is shown in Figure 1.1. Figure 1.1 wasconstructed by continuously adding to a list of genes, beginning with the wholegenome of P. aeruginosa 2192 and then examining other Pseudomonas genomes,
Figure 1.1 Pseudomonas pan/core genome. Thecore genome is defined as comprising all genespresent in all sequenced genomes ofPseudomonas, while the pan-genomeencompasses all genes present in any strain. Thefigure shows how the numbers of genes declineand increase in the two with each consecutiverelease of a new sequenced Pseudomonas
genome. Each genome is represented by twobars: one solid giving thenumber of newproteinsnot found in any of the previous strains (i.e. anyof the strains to the left) and one light greyrepresenting the number of unique proteinfamilies. The genomes are presented in an orderroughly equal to evolutionary distance from P.aeruginosa 2192.
1.2 Pan/Core Genome of Pseudomonas j5
adding any genes not already present in the list, in an order based on evolutionarydistance to P. aeruginosa 2192. To determine whether a given gene is present in anorganism or not, the �50/50� rule was used on a BLASTalignment [36] at the proteinlevel. The �50/50� rule states that to be considered conserved across two organisms atleast 50% of the length of the protein should show a minimum of 50% sequenceidentity. The number of genes not present in any of the previous strains is shown foreach strain with a black column. Another column shows the number of new genefamilies, illustrating that the new genes are not just functional orthologs, but indeeddo add new functionality.While the core genome is obviously restrained in size, the pan-genome is theoreti-
cally infinite or at least can be considered to be very large. As more and more strainswithin theP. aeruginosa species are sequenced and added toFigure 1.1, thenumber ofgenes in the core genome slowly drops until it reacheswhat appears to be a stable levelafter four or five genomes have been sequenced describing the P. aeruginosa coregenome. When one begins to add non-P. aeruginosa genomes to the plot the coregenome starts to drop again, now stabilizing at a significantly lower level, whichdescribes the core genome for the whole Pseudomonas genus. Of course, the exactvalues for the sizes of both theP. aeruginosa and thePseudomonasgenus coregenomesare subject to change as even more fully sequenced Pseudomonas genomes becomeavailable. Inparticular, thegenus coregenome is almost certain todrop further asnewstrains become sequenced, not hitherto represented in the core genome.Analysis of the pan/core genomics of an organism is of great importance
towards describing at the genetic level what constitutes a taxonomic group, e.g. a�Pseudomonas bacteria�. There is a considerable difference (about 10-fold) in the sizebetween the core and pan-genomes, and even just within P. aeruginosa the pan-genome is twice the size of the core genome, with many of the genes unique to oneparticular strain not being merely functional orthologs. While perhaps a trivialobservation, it nevertheless has a great impact towards determining whether a givengene exists in Pseudomonas or not. One should be careful before making statementsto that effect or at the very least take the trouble to specify precisely inwhich particularisolate one believes the gene to be present or absent. The genetic diversity of bacteriais quite vast even at the species level and much more so at the genus level.
1.3Phylogeny of Pseudomonas
In comparative genomics phylogeny is almost always a principal component of theanalysis – either as the actual subject of study or as the underlying structure,explaining most of the data. In the study of pan-genomics phylogeny can be usedto explain a lot of the data. There are several ways to assess the phylogeny of a givengroupof strains or species.One of the oldest andmost commonways is by comparingthe sequence of the 16S ribosomal subunits. Figure 1.2 is an example of such a tree.The tree in Figure 1.2 is made by first extracting the 16S rRNA genes, using the
hidden Markov model-based RNAmmer program [37], then aligning them using
6j 1 Comparative Genomics of Pseudomonas
ClustalW [38]. Using the BIONJ algorithm the phylogeny is reconstructed from themultiple alignment [39]. This last step was performed using the phylowin pro-gram [40]. To get a sense of the robustness of the tree it is also bootstrapped. Fromthe tree we can derive some simple conclusions. The four P. aeruginosa are veryclosely related and it seems that P. aeruginosa is a more �well-defined� species thanmany of the other Pseudomonas, at least from a 16S perspective. Also, P. stutzeri andP. mendocina seems to be quite distinct from the other species, while P. entomophila
Figure 1.2 16S phylogenetic tree of the Pseudomonas genus. Thetree is based on similarities in the sequence of 16S ribosomalsubunits and shows the evolutionary distance between these.Study of the 16S ribosomal subunit is a common way ofestablishing evolutionary distance, but it is not the only way andnor is it guaranteed to give the best results.
1.3 Phylogeny of Pseudomonas j7
L48 is quite closely related to the P. putida strains. In fact the 16S rRNA of P. putidaW619 is less similar to that of the other P. putida strains compared to that ofP. entomophila L48. Even more interesting, it is remarkable that while P. fluorescensPfO-1 has a distance to the other species that is comparable to that seen among theother Pseudomonas species, P. fluorescens Pf-5 is so similar to the P. syringae strainsthat it might be mistaken for one if the taxonomists were not there to say otherwise.When confronted with this kind of data, the proper question to ask is whether the
16S rRNA comparison actually tells us what we want to know? To address thisquestionwewill turn to a comparison of thes factors of ourPseudomonas strains. Thes factors, the essential ones at least, happen to have some of the same traits thatmakethe 16S rRNA a good phylogeneticmarker: it interacts with numerous genes,makingchanges relatively infrequent and uptake by horizontal gene transfer improbable;also the household s factor s70 is ubiquitous.Studying the RpoD subtree on Figure 1.3, we see again that the two P. fluorescens
strains clusterwith theP. syringae strains and thatP. entomophila L48 clusterswith theP. putida strains. We also see, however, that the resolution is very poor on the RpoDsubtree. Ifwe instead lookat theFliAandRpoHclades,whichhavehigher resolutions,we see that the two P. fluorescens strains and P. entomophila L48 split out from theP. syringae and P. putida clades, although they still seem to be closely related.
1.4Blast Atlas of Pseudomonas Genomes
The �Blast atlas� is a visualizationmethod to show a vast amount of data in one plot bytaking a reference genome and mapping the conservation of each protein-encodinggene along the chromosome.An example of a BLASTatlas is given inFigure 1.4, usingP. aeruginosa UCBPP-PA14 as the reference genome, compared to 18 other Pseudo-monas genomes, as well as the UniProt database (outer circle in black). The atlas isconstructed largely as described previously [41], with the sole refinement being theintroduction of a continuous color scale to more accurately describe the BLAST hits.As for Figure 1.1, we used the �50/50� rule (see above) as a requirement for what
constitutes a conserved protein. However, in this case, since we are visualizingBLASTresults along a sequence, it is possible to display more information from theBLAST report. In this context, any protein which is well conserved across thePseudomonas genus will appear as a strongly colored band on the atlas, while proteinsthat are weakly conserved or present only in the reference strain and absent in the
Figure 1.3 Phylogenetic tree of the Pseudomonas genus based ons factor similarities. As s factors generally show high degrees ofconservation and s70 in particular, is also ubiquitous, it – like the16S ribosomal subunit – is useful as a measure of evolutionarydistance. While it can be said to be generally neithermore nor lessaccurate than the 16S tree of Figure 1.2, it provides an alternateview.
—————————————————————————————"
8j 1 Comparative Genomics of Pseudomonas
Figure 1.4 Blast atlas of P. aeruginosa UCBPP-PA14 versus 18 other Pseudomonas strains. Theinnermost circles show properties related to thebase composition of the UCBPP-PA14 genomeas well as the locations of the genes. Startingfrom the red colored rings and outwards, thecolors represent BLAST identities to otherPseudomonas genomes. In general theevolutionary distance from UCBPP-PA14increases as one moves from the innermostcircles to the outermost, e.g. the bright red circlesall represent various P. aeruginosa species, while
the blue show alignments to P. putida strains,etc. The strength of the color bands in each circleindicates the degree of conservation in the otherstrains of that area of the UCBPP-PA14 genome.BLAST alignments are performed at the aminoacid level and only for proteins. Thus, anythingnot being translated (e.g. rRNA) has not beenaligned andwill thus be colored grey. A larger andzoomable version of the atlas is available online:http://www.cbs.dtu.dk/services/GenomeAtlas/suppl/zoomatlas/.
1.4 Blast Atlas of Pseudomonas Genomes j11
other proteomes will result in weak bands or outright gaps in the alignments circles.However, because the BLASTalignments are done at the protein level, gaps will alsoappear for genomic regions not being translated, e.g. rRNA islands. Thus, whileweakand strong bands can be interpreted directly, it is prudent to cross-reference anycomplete gapwith the genomic annotationswhich are also given onFigure 1.4 beforedrawing any conclusions either way.The innermost circle is the ATcontent, with the turquoise color indicating that this
genome isGCrich. The second innermost circle is deep red representing regions thatare more than three standard deviations AT-rich than the chromosomal average anddark turquoise regions similarly represent GC-rich regions in the chromosome. Thesecond innermost circle is the GC skew (the bias of Gs towards one strand or theother, averaged over a 10 000-bpwindow). The third circle shows theGlobal repeats asthey represent the bestmatch of a 100-bp piece ofDNA, centered at a given position inthe chromosome, when searched against the entire chromosome. The fourth circlerepresents the genes, with blue for protein-encoding genes oriented clockwise andred coding for genes on the other strand (counterclockwise). The three lanes beforethe individual proteomes start are DNA structural features, based on the physico-chemical properties of the DNA helix. From innermost to outermost they are the�Position Preference� (a measure related to the rigidity of the DNA [42]), the�StackingEnergy� (which states the energyneeded tode-stack, i.e.melt, theDNA [43])and, finally, the �Intrinsic curvature� (which describes the DNA helix preferencetowards bending) [44]. It is also possible to zoom into specific regions of interest and aweb-based �zoomable atlas� for P. aeruginosa UCBPP-PA14 can be found online, aswell as for several other bacterial genomes (http://www.cbs.dtu.dk/services/Geno-meAtlas/suppl/zoomatlas/).One thingwhich is characteristic in distinguishing pathogens fromnonpathogens
is the presence of pathogenic gene islands. Since the Blast atlas is adept at identifyinggenetic regions in the reference genome that show either an abnormally high or lowdegree of conservation, it should be able to pick up on the presence of pathogenicislands. For the same reason, the P. aeruginosa UCBPP-PA14 strain was chosen toserve as the reference genome as this strain is famous for its high degree of virulenceand is known to be much more virulent than, for example, P. aeruginosa PAO1 [45].
1.4.1Region 5 243 000–5 361 000
The atlas of UCBPP-PA14 reveals many regions of potential interest. For example, alarge region around 5250 kbp shows only partial similarity to other P. aeruginosagenomes and practically no similarity to more distantly related Pseudomonas (see thezoom in Figure 1.5). The region also shows some interesting structural propertiesand an examination of the annotations reveals that the majority of the genes presentare located on the direct strand. The region is the previously described pathogenicityisland PAPI-1 which is a cluster of more than 100 genes [46]. Some of the geneswithin PAPI-1 are homologous to known genes with virulence functions in otherhuman and plant bacterial pathogens, and mutations in a number of these PAPI-1
12j 1 Comparative Genomics of Pseudomonas
Figu
re1.5Zoo
mon
alargeregion
arou
nd52
50kbpof
theBlastatlasof
UCBPP
-PA14
show
nin
Figu
re1.4.
Each
ofthelin
ear
coloredbarsinthisplot
reflectson
eofthecirclesinthecircularatlas;forfurtherexplan
ationofthecolors,refertothedescription
inFigu
re1.4an
dthedescriptionin
themaintext.T
hisregion
show
slittle
similarityto
otherP.
aeruginosa
geno
mes
andless
similarityto
moredistan
tlyrelatedPseudomonas.T
heregion
isthepreviouslydescribedpa
thog
enicity
island
PAPI-1,w
hich
isa
clusterof
morethan
100gene
s,someof
which
areho
molog
ousto
know
ngene
swith
virulencefunctio
nsin
otherhu
man
and
plan
tbacterialp
atho
gens.
1.4 Blast Atlas of Pseudomonas Genomes j13
genes results in the attenuation of PA14 virulence in both plant and animal infectionmodels [46]. PAPI-1 is located next to a tRNALys gene, which presumably provides anattachment site for integration of the island after its acquisition.It was recently experimentally demonstrated that PAPI-1 is a mobile genetic
element [47]. The island can be excised from the PA14 genome, transferred andintegrated into the genomes of P. aeruginosa strains such as PAO1 and other strainsthat do not harbor the island naturally [47]. Although the molecular mechanism ofPAPI-1 transfer remains incompletely understood, PAPI-1 carries several genespredicted to encode proteins that could function in a conjugative transfer mecha-nism[47].AsthePAPI-1 islandcarriesseveralvirulencedeterminantsandiscapableofspreadingamongP.aeruginosastrains itcouldpotentiallycontribute to theevolutionofvariants with enhanced pathogenicity due to an increased virulence gene repertoire.Hybridization studies and microarray genomotyping of different P. aeruginosa
strains of diverse environmental origin have showed the presence of PAPI-1 se-quences in a significant fraction of the strains, althoughmost these isolates appear tocarry only aportionof the island [45,46]. This patternofPAPI-1 sequencedistribution,which is also evident in Figure 1.5, is probably due to progressive accumulation ofmutations or deletions of genes that are either unnecessary, deleterious or provide noselective advantages to the recipient after acquisition of the island.
1.4.2Region 713 000–785 000
Another region revealed by the Blast atlas, this time for its high degree of conserva-tion, is found around 713 kbp (see the zoom in Figure 1.6). This region contains36 genes involved in �Translation, ribosomal structure and biogenesis�. These aremainly ribosomal proteins. The region also contains a ribosomal RNA operon andseveral tRNAs. The region obviously plays an important role in transcription andtranslation in the organism, and the high degree of conservation is understandable.Another observation in linewith the biological significance of this island is found if
one looks to the bars displaying the structural parameters: �Intrinsic curvature�,�Stacking Energy� and �Position Preference�. These not only reveal the unusual basecomposition of the rRNA genes (which must in turn give rise to unusual DNAstructures), but also show characteristic structures in the intergenic regions betweenthe genes. These are related to promoter activity since the DNA must destack andunwind for the RNA polymerase to bind, thus initiating transcription. The charac-teristic structures facilitate this, and are indicators of strong promoters and highlevels of expression for the associated genes.
1.5Functional Categories
Are there differences, on a broad level, between the distributions of functional genesin the various different species of Pseudomonas? This question can now be evaluated
14j 1 Comparative Genomics of Pseudomonas
Figu
re1.6Zoo
mon
agene
ticisland
arou
nd75
0kbpof
theBlastatlasof
UCBPP
-PA14
show
nin
Figu
re1.4.Each
ofthelin
ear
coloredbarsinthisplot
reflectson
eofthecirclesinthecircularatlas;forfurtherexplanationofthecolors,refertothedescription
inFigu
re1.4an
dthedescriptionin
themaintext.T
heisland
isiden
tifiedby
itshigh
degree
ofconservatio
nacross
the
Pseudomonasgenu
s,an
dcontains
36gene
srelatedtotran
slation,ribo
somalstructurean
dbiog
enesis.The
region
also
contains
arRNAop
eron
(visibleby
thelight
blue
bars
inthean
notatio
nlin
e)as
wellasseveraltRNAs.Noticethat
theap
parent
lack
ofconservatio
nforthe
rRNAop
eron
isan
artifacto
fthe
BLA
STalignm
ent(which
ison
lype
rformed
fortranslatedsequ
ences)an
dno
tan
accurate
represen
tatio
nof
thebiolog
y.
1.5 Functional Categories j15
based on the set of genomes sequenced to date. For each genome, the fractionof genes in each of several different functional categories was determined.Abar diagramof the distribution of Clusters ofOrthologousGene (COGS) functionalcategory assignments [48] obtained from the Integrated Microbial Genomes (IMG)system [49] for Pseudomonas is shown in Figure 1.7. At the time of writing the IMGcontained assignment information on 16 of the 19 genomes given in Table 1.1. Thenumber of genes for each strain which could be found varied from the 3700 (thelowest) identified for P. mendocina YMP to 4811 (the highest) for P. aeruginosa 2192.In general, the COG assignments cover roughly 75% of the proteins for eachorganism in Table 1.1.As can be seen from Figure 1.7, the great majority of genes (around 90%) have
some form of assignment, while approximately 10% have been labeled function
Figure 1.7 The percentage of proteins assigned to COGcategories by organism. The figure shows data for the 16 speciesofPseudomonas found in the IMGdatabase (see text). Althoughonthe whole the functional breakdown of the various strains is verysimilar, the similarity appears slightly higher within each speciesthan observed when comparing across species.
16j 1 Comparative Genomics of Pseudomonas
unknown. (An assignment of function unknown is different fromaproteinnot having aCOGassignment. Aprotein assigned function unknown is known to exist, although itsfunction is unknown. A protein having no assignment might not be a protein at all!)On the whole the organisms show a high degree of functional similarity at this broadlevel, although a slightly higher difference is observed when comparing acrossspecies relative to within species. It appears, based on these results that there are nomajor differences on a broad level of distribution of functional categories across allthe Pseudomonas genomes.
1.6Codon Usage and Expression
Elucidation of the physical and biological properties of highly expressed genes as wellas the examination of codon usage preferences have been addressed by a numberof studies [50–52]. Figure 1.8 demonstrates how the average codon usage in
Figure 1.8 Codon usage in P. aeruginosa PAO1. Genomic codonusage (blue) versus highly expressed genes (red).
1.6 Codon Usage and Expression j17
P. aeruginosa PAO1 differs from the codon usage preference of highly expressedgenes. This so-called codon usage bias represents an evolutionary strategy tomodulate gene expression and hencemathematical formulations of the codon usagebias have been widely used to predict gene expression on a genomic scale [1].While the first genomes examined in this way – E. coli and Saccharomyces cerevisiae
– provided strong evidence of high translational codon usage bias [52], recent studiesreport bacterial genomes with little codon usage bias [53,54], often these species havegenomes with extreme AT or GC content. Thus, the dominating codon bias inP. aeruginosa PAO1 is not translational. Here, the codon usage bias is much morecorrelated with GC content than with translational bias [54]. However, a translationalcodon adaptation index (tCAI) may still be useful for estimation of gene expressionlevels and prediction of highly expressed genes in P. aeruginosa PAO1 (Figure 1.9).The codon bias in other Pseudomonas species may be more or less translational thanin PAO1 (Table 1.2).Obviously, the tCAI measure is only able to predict highly expressed proteins
(translated genes) since this measure is based on codon usage bias. Therefore,this method does not consider tRNAs, ribosomal RNAs and other noncodingRNAs. However, transcription of DNA is also highly influenced by DNA stackingand flexibility. Consequently, on a more global scale, gene expression may beregulated by specific promoters that are sensitive to DNA structural properties [2,55].For example, the �Position Preference� measure (Figure 1.10) is a DNA structuralmeasure that was originally derived for eukaryotes using chicken DNA and is atrinucleotide model of nucleosome-positioning patterns. It reflects the preference ofa given trinucleotide for being found in a region where the DNAminor groove faces
Figure 1.9 Venn diagrams of the 10% most highly expressedgenes in P. aeruginosa PAO1 as identified by several methods. ThetCAI and dCAI circles represent the predictions made by thetranslational and dominating CAIs, respectively, while PP is the�Position Preference� trinucleotide model and Expr is measuredmicroarray expression values. The figure is modified fromRefs. [1,2].
18j 1 Comparative Genomics of Pseudomonas
Table 1.2 Correlations between translational and dominating CAI (dCAI).
Organism Correlation tCAI versus dCAI
P. aeruginosa PAO1 0.42484871P. entomophila L48 0.52157109P. fluorescens Pf-5 0.04194573P. fluorescens PfO-1 0.49853369P. putida KT2440 0.76612008P. syringae 1448A 0.77669291P. syringae B728a 0.65205812P. syringae DC3000 0.64478432
Figure 1.10 Expression atlas of P. aeruginosaPAO1. The figure shows a list of measurementsrelevant for expression in relation to genomicposition. From the top (outer ring) anddown: theaverages of expression values for each gene
obtained from microarray experiments; the�Position Preference� of the DNA towardschromatin binding (i.e. a measure of DNAflexibility); estimated values for the tCAI. See textfor further explanation of methods.
1.6 Codon Usage and Expression j19
either towards or away from the nucleosome histone core [42]. By using the absolutevalues, the position preference value is also a measure of anisotropic DNA flexibility.Consequently, the �Position Preference� measure also describes a more generalstructural property of DNA, i.e. how easily can it be wrapped around chromatinproteins. As a result, �Position Preference� can be used for prediction of highlyexpressed genes. This makes sense because regions of DNA that are not condensedinto chromatin are more accessible to the RNA polymerase. Although the �PositionPreference� measure is less efficient than the tCAI measure in predicting highlyexpressed genes in P. aeruginosa PAO1 (see figure in [2]), chromosomal regions withlow �Position Preference� do seem to coincide with regions concentrated in highlyexpressed genes – in particular, for the regions containing the ribosomal RNAs(Figure 1.10).
1.7Future Outlook
Only a few years ago to sequence an entire Pseudomonas genome in one contiguouspiece would literally cost millions of dollars and take more than a year to complete,with an additional year or two to annotate and write up themanuscript. At the time ofwriting, it is possible to sequence a Pseudomonas genome for less than US$10 000,taking less than a day tofinish, with an overnight run on the computers to annotate it.To put things in perspective, if the goal of sequencing a human genome for US$1000or less is achieved, as it likely will within a year or two [56,57], thismeans to sequenceaPseudomonas genomewill cost in the range ofUS$2, i.e. it will take perhaps less timeand money than a cup of coffee at a nice restaurant! As the cost of sequencinggenomes becomes less expensive, it will become possible to routinely have projectsthat, instead of obtaining the sequence of just one particular genome from a well-characterized strain, now go for obtaining multiple genomes (perhaps hundreds oreven thousands) from entire strain collections. This revolution has potential for afantastic explosion of data (and perhaps also an explosion of understanding) of ourknowledge ofPseudomonas genomics.However, there is also the potential for difficulttimes for many older traditional microbiologists, who have spent the past severaldecades studying their �favorite gene� in one particular Pseudomonas strain and whomight not be prepared for so much data – indeed too much data for manymicrobiologists. This need not be the case, however, since there is now more thanever a need for knowledge of traditional methods, including physiology, in order toput the pieces back together again, in a sense, and understand the whole (verycomplicated) system.The challenge in the future will be to develop robust newmethods to be able to deal
with the enormous amount of valuable sequence information. The beginnings of anew field of study can already be seen – it is now possible to estimate the �pan-genome� and �core genome� of several different Pseudomonas genomes, as has beenhinted at in this chapter. Perhaps in the future, comparative genomics will movetowards comparative �pan-genomics�, where one compares the �pan-genome� and
20j 1 Comparative Genomics of Pseudomonas
�core genome� of species (or other taxonomic groups) against each other, todetermine which genes seem to be commonly conserved within groups or ecologicalniches and which sets of �additional genes� are often found in differentenvironments.
References
1 Willenbrock, H., Friis, C., Friis, A.S. andUssery, DW. (2006) An environmentalsignature for 323 microbial genomesbased on codon adaptation indices.Genome Biol, 7, R114.
2 Willenbrock, H. and Ussery, D.W. (2007)Prediction of highly expressed genes inmicrobes based on chromatin accessibility.BMC Mol Biol, 8, 11.
3 Lyczak, J.B., Cannon, C.L. and Pier, G.B.(2000) Establishment of Pseudomonasaeruginosa infection: lessons from aversatile opportunist. Microbes Infect, 2,1051–1060.
4 Stover, C.K., Pham, X.Q., Erwin, A.L.,Mizoguchi, S.D., Warrener, P., Hickey,M.J., Brinkman, F.S., Hufnagle, W.O.,Kowalik, D.J., Lagrou, M. et al. (2000)Complete genome sequence ofPseudomonas aeruginosa PAO1, anopportunistic pathogen. Nature, 406,959–964.
5 Klockgether, J., W€urdemann, D.,Wiehlmann, L., Binnewies, T.T., Ussery,D.W. and T€ummler, B. (2008) Genomediversity of Pseudomonas aeruginosa, inPseudomonas: Genetics and MolecularBiology, (ed. P. Cornelis), Horizon Press,Norwich., in press.
6 Brint, J.M. and Ohman, D.E. (1995)Synthesis of multiple exoproducts inPseudomonas aeruginosa is under thecontrol of RhlR-RhlI, another set ofregulators in strain PAO1 with homologyto the autoinducer-responsive LuxR-LuxIfamily. J Bacteriol, 177, 7155–7163.
7 Pearson, J.P., Pesci, E.C. and Iglewski,B.H. (1997) Roles of Pseudomonasaeruginosa las and rhl quorum-sensingsystems in control of elastase and
rhamnolipid biosynthesis genes.J Bacteriol, 179, 5756–5767.
8 Van Alst, N.E., Picardo, K.F., Iglewski,B.H. and Haidaris, C.G. (2007) Nitratesensing and metabolism modulatemotility, biofilm formation, and virulencein Pseudomonas aeruginosa. InfectImmun, 75, 3780–3790.
9 Furukawa, S., Kuchma, S.L. and O'Toole,GA. (2006) Keeping their options open:acute versus persistent infections.J Bacteriol, 188, 1211–1217.
10 Smith, E.E., Buckley, D.G., Wu, Z.,Saenphimmachak, C., Hoffman, L.R.,D'Argenio, D.A., Miller, S.I., Ramsey,B.W., Speert, D.P. and Moskowitz, S.M.et al. (2006) Genetic adaptation byPseudomonas aeruginosa to the airways ofcystic fibrosis patients. Proc Natl Acad SciUSA, 103, 8487–8492.
11 Lee, B., Haagensen, J.A., Ciofu, O.,Andersen, J.B., Hoiby, N. and Molin, S.(2005) Heterogeneity of biofilms formedby nonmucoid Pseudomonas aeruginosaisolates from patients with cystic fibrosis.J Clin Microbiol, 43, 5247–5255.
12 Jelsbak, L., Johansen, H.K., Frost, A.L.,Thogersen, R., Thomsen, L.E., Ciofu, O.,Yang, L., Haagensen, J.A., Hoiby, N.and Molin, S. (2007) Molecularepidemiology and dynamics ofpseudomonas aeruginosa populations inlungs of cystic fibrosis patients. InfectImmun, 75, 2214–2224.
13 Livermore, D.M. (1995) b-Lactamases inlaboratory and clinical resistance. ClinMicrobiol Rev, 8, 557–584.
14 Poole, K. (2001) Multidrug efflux pumpsand antimicrobial resistance inPseudomonas aeruginosa and related
References j21
organisms. J Mol Microbiol Biotechnol, 3,255–264.
15 Livermore, D.M. (2002) Multiplemechanisms of antimicrobial resistance inPseudomonas aeruginosa: our worstnightmare? Clin Infect Dis. 34, 634–640.
16 Franklin, F.C., Bagdasarian, M.,Bagdasarian, M.M. and Timmis, K.N.(1981) Molecular and functional analysisof the TOL plasmid pWWO fromPseudomonas putida and cloning of genesfor the entire regulated aromatic ringmetacleavage pathway. Proc Natl Acad Sci USA,78, 7458–7462.
17 Regenhardt, D., Heuer, H., Heim, S.,Fernandez, D.U., Strompl, C., Moore, E.R.and Timmis, K.N. (2002) Pedigree andtaxonomic credentials of Pseudomonasputida strain KT2440. Environ Microbiol, 4,912–915.
18 Dejonghe, W., Boon, N., Seghers, D., Top,E.M. and Verstraete, W. (2001)Bioaugmentation of soils by increasingmicrobial richness: missing links. EnvironMicrobiol, 3, 649–657.
19 Timmis, K.N., Steffan, R.J. andUnterman,R. (1994) Designing microorganisms forthe treatment of toxic wastes. Annu RevMicrobiol, 48, 525–557.
20 Lehrbach, P.R., Zeyer, J., Reineke, W.,Knackmuss, H.J. and Timmis, KN. (1984)Enzyme recruitmentin vitro: use of clonedgenes to extend the range of haloaromaticsdegraded by Pseudomonas sp. strain B13.J Bacteriol, 158, 1025–1032.
21 Nozaki, M., Kagamiyama, H. andHayaishi, O. (1963) Crystallization andsome properties of metapyrocatechase.Biochem Biophys Res Commun, 11, 65–69.
22 Nakazawa, T. (2002) Travels of aPseudomonas, from Japan around theworld. Environ Microbiol, 4, 782–786.
23 Federal Register . (1982) CertifiedHost–Vector Systems. 17197.
24 Nelson, K.E., Weinel, C., Paulsen, I.T.,Dodson, R.J., Hilbert, H., Martins dosSantos, V.A., Fouts, D.E., Gill, S.R., Pop,M., Holmes, M. et al. (2002) Completegenome sequence and comparative
analysis of the metabolically versatilePseudomonas putida KT2440. EnvironMicrobiol, 4, 799–808.
25 Hirano, S.S. and Upper, C.D. (2000)Bacteria in the leaf ecosystem withemphasis on Pseudomonas syringae – apathogen, ice nucleus, and epiphyte.Microbiol Mol Biol Rev, 64, 624–653.
26 Joardar, V., Lindeberg, M., Jackson, R.W.,Selengut, J., Dodson, R., Brinkac, L.M.,Daugherty, S.C., Deboy, R., Durkin, A.S.,Giglio, M.G. et al. (2005) Whole-genomesequence analysis of Pseudomonas syringaepv. phaseolicola 1448A reveals divergenceamong pathovars in genes involved invirulence and transposition. J Bacteriol,187, 6488–6498.
27 Feil, H., Feil, W.S., Chain, P., Larimer, F.,DiBartolo, G., Copeland, A., Lykidis, A.,Trong, S., Nolan, M., Goltsman, E. (2005)et al. Comparison of the complete genomesequences of Pseudomonas syringae pv.syringae B728a and pv. tomato DC3000.ProcNatl Acad SciUSA, 102, 11064–11069.
28 Buell, C.R., Joardar, V., Lindeberg, M.,Selengut, J., Paulsen, I.T., Gwinn, M.L.,Dodson, R.J., Deboy, R.T., Durkin, A.S.,Kolonay, J.F. et al. (2003) The completegenome sequence of the Arabidopsis andtomato pathogen Pseudomonas syringae pv.tomato DC3000. Proc Natl Acad Sci USA,100, 10181–10186.
29 Haas, D. and Defago, G. (2005) Biologicalcontrol of soil-borne pathogens byfluorescent pseudomonads. Nat RevMicrobiol, 3, 307–319.
30 Lalucat, J., Bennasar, A., Bosch, R., Garcia-Valdes, E. and Palleroni, N.J. (2006)Biology of Pseudomonas stutzeri. MicrobiolMol Biol Rev, 70, 510–547.
31 Vodovar, N., Vallenet, D., Cruveiller, S.,Rouy, Z., Barbe, V., Acosta, C., Cattolico, L.,Jubin, C., Lajus, A., Segurens, B. (2006) etal. Complete genome sequence of theentomopathogenic and metabolicallyversatile soil bacterium Pseudomonasentomophila. Nat Biotechnol, 24, 673–679.
32 NCBI Genome Project. http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi.
22j 1 Comparative Genomics of Pseudomonas
33 Pathogen Sequencing, Group at theSanger Institute. ftp://ftp.sanger.ac.uk/pub/pathogens/pae.
34 Larsen, T.S. and Krogh, A. (2003)EasyGene, – a prokaryotic gene finder thatranks ORFs by statistical significance.BMC Bioinformatics, 4, 21.
35 Binnewies, T., Hallin, P.F., Wassenaar,T.M. and Ussery, D.W. (2007) Tools forcomparison of bacterial genomes, inComparative Genomics and Bioinformaticsfor the Microbiologist, Field, D. (Ed.).Horizon Press, Norwich., pp. 00–00.
36 WU-BLAST. http://blast.wustl.edu.37 Lagesen, K., Hallin, P., Rodland, E.A.,
Staerfeldt, H.H., Rognes, T. andUssery, DW. (2007) RNAmmer: consistentand rapid annotation of ribosomalRNA genes. Nucleic Acids Res, 35,3100–3108.
38 Thompson, J.D., Higgins, D.G. andGibson, T.J. (1994) CLUSTAL W:improving the sensitivity of progressivemultiple sequence alignment throughsequence weighting, position-specific gappenalties andweightmatrix choice.NucleicAcids Res, 22, 4673–4680.
39 Gascuel, O. (1997) BIONJ: an improvedversion of the NJ algorithm based on asimple model of sequence data. Mol BiolEvol, 14, 685–695.
40 Galtier, N., Gouy,M. andGautier, C. (1996)SEAVIEWand PHYLO_WIN: two graphictools for sequence alignment andmolecular phylogeny. Comput Appl Biosci,12, 543–548.
41 Hallin, P.F., Binnewies, T.T. and Ussery,D.W. (2004) Genome update:chromosome atlases. Microbiology, 150,3091–3093.
42 Satchwell, S.C., Drew, H.R. and Travers,A.A. (1986) Sequence periodicities inchicken nucleosome core DNA. J Mol Biol,191, 659–675.
43 Ornstein, R., Rein, R., Breen, D. andMacelroy R. (1978) An optimized potentialfunction for the calculation of nucleic acidinteraction energies, I – base stacking.Biopolymers, 17, 2341–2360.
44 Shpigelman, E.S., Trifonov, E.N. andBolshoy, A. (1993) CURVATURE: softwarefor the analysis of curved DNA. ComputAppl Biosci, 9, 435–440.
45 Lee, D.G., Urbach, J.M., Wu, G., Liberati,N.T., Feinbaum, R.L., Miyata, S., Diggins,L.T., He, J., Saucier, M., Deziel, E. (2006)et al. Genomic analysis reveals thatPseudomonas aeruginosa virulence iscombinatorial. Genome Biol, 7, R90.
46 He, J., Baldini, R.L., Deziel, E., Saucier, M.,Zhang, Q., Liberati, N.T., Lee, D., Urbach,J., Goodman, H.M. and Rahme, LG. (2004)The broad host range pathogen Pseudomo-nas aeruginosa strain PA14 carries twopathogenicity islands harboring plant andanimal virulence genes. Proc Natl Acad SciUSA, 101, 2530–2535.
47 Qiu, X., Gurkar, A.U. and Lory, S. (2006)Interstrain transfer of the largepathogenicity island (PAPI-1) ofPseudomonas aeruginosa.Proc Natl Acad SciUSA, 103, 19830–19835.
48 Tatusov, R.L., Koonin, E.V. and LipmanD.J. (1997) A genomic perspective onprotein families. Science, 278, 631–637.
49 Markowitz, V.M., Korzeniewski, F.,Palaniappan, K., Szeto, E., Werner, G.,Padki, A., Zhao, X., Dubchak, I.,Hugenholtz, P. Anderson, I. (2006)et al. The Integrated MicrobialGenomes (IMG) system. Nucleic AcidsRes, 34, D344–D348
50 Raghava, G.P. and Han, J.H. (2005)Correlation and prediction of geneexpression level from amino acid anddipeptide composition of its protein. BMCBioinformatics, 6, 59.
51 Karlin, S., Barnett, M.J., Campbell, A.M.,Fisher, R.F. and Mrazek, J. (2003)Predicting gene expression levels fromcodon biases in alpha-proteobacterialgenomes. Proc Natl Acad Sci USA, 100,7313–7318.
52 Sharp, P.M. and Li, W.H. (1987) TheCodon Adaptation Index – a measure ofdirectional synonymous codon usage bias,and its potential applications.Nucleic AcidsRes, 15, 1281–1295.
References j23
53 Carbone, A., Kepes, F. and Zinovyev, A.(2005) Codon bias signatures,organization of microorganisms in codonspace, and lifestyle. Mol Biol Evol, 22,547–561.
54 Carbone, A., Zinovyev, A. and Kepes, F.(2003) Codon adaptation index as ameasure of dominating codon bias.Bioinformatics, 19, 2005–2015.
55 Willenbrock, H. and Ussery, D.W. (2004)Chromatin architecture and gene expre-ssion inEscherichia coli.GenomeBiol,5, 252.
56 Service R.F. (2006) Gene sequencing. Therace for the $1000 genome. Science, 311,1544–1546.
57 The Archon X-PRIZE for Genomics.http://www.xprize.org/xprizes/genomics_x_prize.html.
24j 1 Comparative Genomics of Pseudomonas