comparative genomics ofpseudomonas

24
1 Comparative Genomics of Pseudomonas Kristoffer Kiil, Tim T. Binnewies, Hanni Willenbrock, Susse Kirkelund Hansen, Lei Yang, Lars Jelsbak, David W. Ussery, and Carsten Friis 1.1 Introduction The genus Pseudomonas covers one of the most diverse and ecologically signicant groups of bacteria. Members of the genus are found in large numbers in a wide range of environmental niches, such as terrestrial and marine environments, as well as in association with plants and animals. This almost universal distribution of Pseudomo- nas suggests a remarkable degree of genomic diversity and genetic adaptability. Pseudomonas aeruginosa is a motile Gram-negative rod-shaped bacterium. It differs from other members of the Pseudomonas genus because of its potential pathogenicity for human beings and other mammals. P. aeruginosa participates in infections in immunocompromised individuals, such as patients suffering from AIDS, cancer, burn wounds and cystic brosis (CF) [13]. P. aeruginosa infections are normally difcult to eradicate. P. aeruginosa is a generalist with a diverse metabolic competence. It can utilize simple small molecules as well as complex organic compounds as carbon sources and is capable of growing aerobically as well as anaerobically using NO 3 or arginine as respiratory electron acceptors. P. aeruginosa grows over a wide range of temperatures, from common environmental temperatures up to 42 C, with an optimum temperature at 37 C. The rst completed genome of P. aeruginosa is the genome of PAO1, published in 2000 [4]. It is 6.3 Mbp and contains 5570 open reading frames. A signicant number (8.4%) of P. aeruginosa genes are predicted to be involved in regulation, which at the time of publication of the genome was the largest fraction of regulators among sequenced bacterial genomes. The large genome size and remarkable gene com- plexity of P. aeruginosa enable it to adapt and thrive in different environmental conditions and hosts. In addition to PAO1, PA14 as well as nine other clinical P. aeruginosa strains are currently available either as completed genomes or as unnished sequences. Genome comparison analysis of PAO1 and ve different clinical P. aeruginosa isolates (four of them are CF isolates) shows that about 80% of Pseudomonas. Model Organism, Pathogen, Cell Factory. Edited by Bernd H.A. Rehm Copyright Ó 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31914-5 j1

Upload: dtu

Post on 29-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

1Comparative Genomics of PseudomonasKristoffer Kiil, Tim T. Binnewies, Hanni Willenbrock, Susse Kirkelund Hansen, Lei Yang,Lars Jelsbak, David W. Ussery, and Carsten Friis

1.1Introduction

The genus Pseudomonas covers one of the most diverse and ecologically significantgroups of bacteria. Members of the genus are found in large numbers in a widerangeof environmental niches, suchas terrestrial andmarine environments, aswell asin associationwithplants andanimals. This almostuniversal distributionofPseudomo-nas suggests a remarkable degree of genomic diversity and genetic adaptability.Pseudomonas aeruginosa is amotileGram-negative rod-shaped bacterium. It differs

fromothermembers of the Pseudomonas genus because of its potential pathogenicityfor human beings and other mammals. P. aeruginosa participates in infections inimmunocompromised individuals, such as patients suffering from AIDS, cancer,burn wounds and cystic fibrosis (CF) [1–3]. P. aeruginosa infections are normallydifficult to eradicate.P. aeruginosa is a generalistwith a diversemetabolic competence. It canutilize simple

smallmolecules aswell as complexorganic compoundsas carbonsourcesand iscapableof growing aerobically as well as anaerobically using NO3 or arginine as respiratoryelectronacceptors.P.aeruginosagrowsoverawiderangeof temperatures, fromcommonenvironmental temperatures up to 42�C, with an optimum temperature at 37�C.The first completed genome of P. aeruginosa is the genome of PAO1, published in

2000 [4]. It is 6.3Mbp and contains 5570 open reading frames. A significant number(8.4%) of P. aeruginosa genes are predicted to be involved in regulation, which at thetime of publication of the genome was the largest fraction of regulators amongsequenced bacterial genomes. The large genome size and remarkable gene com-plexity of P. aeruginosa enable it to adapt and thrive in different environmentalconditions and hosts. In addition to PAO1, PA14 as well as nine other clinicalP. aeruginosa strains are currently available either as completed genomes or asunfinished sequences. Genome comparison analysis of PAO1 and five differentclinical P. aeruginosa isolates (four of them are CF isolates) shows that about 80% of

Pseudomonas. Model Organism, Pathogen, Cell Factory. Edited by Bernd H.A. RehmCopyright � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, WeinheimISBN: 978-3-527-31914-5

j1

the PAO1 genome is conserved among other strains [5]. More than 30 relatively largeregions of the PAO1 genome are found to be nonconserved in thefive clinical strains.Those regions include phages, the pyoverdine biosynthesis locus, genes encoding aputative type I secretion system and a putative restriction modification system.P. aeruginosa is an opportunistic pathogen capable of producing a wide variety of

virulence factors, including lipopolysaccharides, flagellum, type IV pili, proteases,exotoxins, pyocyanin, exopolysaccharides, type III secretion, etc. Many of theextracellular virulence factors have been shown to be regulated by quorum-sensingsignals [6,7]. Biofilm formation can also be regarded as a virulence factor [8]. Acuteinfections with P. aeruginosa can be life threatening, resulting in severe tissuedamage and septicemia (i.e. bloodstream invasion). Although the pathogenicity ofP. aeruginosa is typically characterized by a high level of toxin production [9], severalimportant exceptions to this are known, particularly in the cases of certain chronicinfections of P. aeruginosa. These include infections of CF patients, whereP. aeruginosa develops genetic adaptations during long-term persistence, in whichvirulence factors are normally selected against [10–12]. This shows P. aeruginosa iscapable of choosing distinct strategies for different types of infections.Another remarkable and unfortunate character of P. aeruginosa is its tolerance to

many antimicrobial drugs. It has a number of inherent antibiotic-resistancemechan-isms that include an AmpC b-lactamase that can be induced by b-lactams, whichmakes it inherently resistant to cephalothin and ampicillin [13].P. aeruginosa also hasefflux pumps such as MexAB–OprM, making it impermeable to many antibiotics.MexAB–OprM removes b-lactams, chloramphenicol, fluoroquinolones, novobiocin,as well as various dyes and detergents [14]. Finally, it has been shown that throughmutation, P. aeruginosa is capable of developing resistance to antibiotics that thestrain is not inherently resistant to, such as aminoglycosides and colistin [15].

1.1.1Other Species of Pseudomonas

In addition to P. aeruginosa PAO1, several other Pseudomonas genomes have beensequenced (Table 1.1). P. putida, for instance, is a ubiquitous bacterium frequentlyisolated from soil (particularly polluted soil), the rhizosphere and water. P. putida is aparadigm of a metabolic versatile saprophytic soil bacterium and the best character-ized strain, KT2440 [16,17], has become a model bacterium worldwide – both inlaboratory studies and for the development of biotechnological applications. Some ofthese include bioremediation of contaminated sites [18,19], biocatalysis for theproduction of useful chemicals [20], and the potential development of new biopes-ticides and plant growth promoters as a plant rhizosphere protective agent. StrainKT2440 is a plasmid-free derivative of the original isolate, designated P. arvilla strainmt-2 [21], the natural host of the archetype TOL plasmid pWW0 [16], and subse-quently reclassified P. putida mt-2 [22]. It has maintained its ability to survive in itsnatural environment and is the first Gram-negative soil bacterium that has beencertified as a biosafety host for expression of foreign genes [23]. The genome of strainKT2440 was sequenced in 2004 [24]; the sequence information is continuously

2j 1 Comparative Genomics of Pseudomonas

Table1.1Featureoverview

ofcurren

tlysequ

encedPseudomonas

geno

mes.

Organ

ism

Genom

esize

(bp)

No.

contigsa

GþCcontent(%

)Pe

rcentcoding

(%)

No.

genesb

Accession

[reference]

P.a

eruginosa21

92682

625

382

66.2

8555

46NZ_A

AKW00

0000

00[32]

P.a

eruginosaC37

19614

699

812

466

.586

5065

NZ_A

AKV0000

0000

[32]

P.a

eruginosaLE

S660

175

72

66.3

8053

26[33]

P.a

eruginosaPA7

666

352

914

766

.484

5309

NZ_A

AQE00

000000

[32]

P.a

eruginosaPACS2

649

242

31

66.3

8653

17NZ_A

AQW00

000000

[32]

P.a

eruginosaPAO1

626

440

4–

66.6

8955

68AE00

4091

[32]

P.a

eruginosaUCBPP-PA14

653

764

8–

66.3

8958

92CP00

0438

[32]

P.entom

ophila

L48

588

878

0–

64.2

8851

34CT57

3326

[32]

P.fl

uorescensPf-5

707

489

3–

63.3

8861

37CP00

0076

[32]

P.fl

uorescensPfO

-1643

840

5–

60.5

8957

36CP00

0094

[32]

P.m

endocina

ymp

507

280

7–

64.7

8945

94CP00

0680

[32]

P.p

utidaF1

592

505

912

061

.988

4735

NZ_A

ALM

0000

0000

[32]

P.p

utidaGB-1

605

319

530

62.0

9048

90NZ_A

AXR0000

0000

[32]

P.p

utidaKT24

40618

186

3–

61.5

8653

50AE01

5451

[32]

P.p

utidaW61

9574

855

062

61.5

8946

24NZ_A

AVY00

0000

00[32]

P.stutzeriA1501

456

741

8–

63.9

8941

28CP00

0304

[32]

P.syringaepv.p

haseolicola14

48A

592

878

7–

57.9

8449

84CP00

0058

[32]

P.syringaepv.syringaeB728a

609

369

8–

59.2

8750

89CP00

0075

[32]

P.syringaepv.tom

atostr.DC3000

639

712

6–

58.3

8554

70AE01

6853

[32]

aThenumbers

correspo

ndto

thenumberof

contigs

forunassembled

genom

es,w

hile

adash

sign

ifies

acompletelyassembled

genom

e.bFo

rassembled

genom

esthisisthenumberof

proteinsan

notated

intheofficialNCBIrelease;fortheunassembled

genom

esthisisthenumberof

predictedgenes

from

theEasyG

eneruns.

1.1 Introduction j3

providing new insight into the biology of an adaptable and metabolic versatile groupof soil bacteria, and it is facilitating the increased use of this organism for biotech-nological purposes.P. syringae is a plant pathogen that can infect a wide range of plant species [25].

P. syringae strains are assigned to different pathovars on the basis of their distinctivepathogenicity and ability to infect different plant species. More than 50 differentpathovars exists and completed genome sequences of P. syringae strains fromthree different pathovars (tomato, syringae and phaseolicola) are available for analysis[26–28] (Table 1.1). P. syringae pv. tomato is the causal agent of bacterial speck diseaseon tomato andArabidopsis, whileP. syringae pv. phaseolicola andP. syringae pv. syringaecause halo blight and brown spot disease on bean plants, respectively.Other Pseudomonas species are nonpathogenic plant-associated bacteria that

exhibit plant growth-promoting properties [29]. For example, P. fluorescens Pf-5 andP. fluorescensPfO-1 are commensal bacteria that colonize plant surface environmentsand produce various secondary metabolites that suppress the growth of soil-borneplant pathogens. These bacteria have potential roles as biological disease controlagents in agricultural settings. P. stutzeri strain A1501 is able to colonize and infectrice roots, and is widely used as a rice inoculant in China [30]. It can fix nitrogen andmay provide rice plants with fixed nitrogen and hence promote plant growth.P. entomophila is an entomopathogenic bacterium that is highly pathogenic for a

variety of insects from different orders [31].

1.1.2Obtaining Sequence Data on Pseudomonas

In total, 19 Pseudomonas genomes were downloaded from online resources: 18 fromThe National Center for Biotechnology Information (NCBI) [32] and one wasproducedby thePathogenSequencingGroupat theSanger Institute [33].Anoverviewof the genomes is given in Table 1.1. Of the 19 genomes, 11 were completelysequenced and assembled at the timeof download, andwere obtainedwith traditionalNCBIannotationdescribing the locationofgeneswithin thegenomes.The remainingeightgenomeswere,however,underassemblyat the timeandthushad little in thewayof any official annotation. To identify the position of genes in these genomes weapplied the EasyGene [34] method for gene finding in bacterial genomes.

1.2Pan/Core Genome of Pseudomonas

With the availability of an increasing amount of fully sequenced Pseudomonasgenomes it becomes possible to conduct an investigation of genetic characteristicsdefining the Pseudomonas genus. For instance, what fraction of the genome isconserved throughout all sequenced Pseudomonas?The �core genome� of Pseudomonas is defined as containing those genes that are

present in all strains ofPseudomonas, i.e. theminimumamount of genes required for

4j 1 Comparative Genomics of Pseudomonas

a bacterium to be considered part of the Pseudomonas genus. In contrast the �pan-genome� is defined containing any gene present in any strain of Pseudomonas. Ahistogram of the sizes of the core and pan-genomes of Pseudomonas similar to thatpreviously done for Escherichia coli [35] is shown in Figure 1.1. Figure 1.1 wasconstructed by continuously adding to a list of genes, beginning with the wholegenome of P. aeruginosa 2192 and then examining other Pseudomonas genomes,

Figure 1.1 Pseudomonas pan/core genome. Thecore genome is defined as comprising all genespresent in all sequenced genomes ofPseudomonas, while the pan-genomeencompasses all genes present in any strain. Thefigure shows how the numbers of genes declineand increase in the two with each consecutiverelease of a new sequenced Pseudomonas

genome. Each genome is represented by twobars: one solid giving thenumber of newproteinsnot found in any of the previous strains (i.e. anyof the strains to the left) and one light greyrepresenting the number of unique proteinfamilies. The genomes are presented in an orderroughly equal to evolutionary distance from P.aeruginosa 2192.

1.2 Pan/Core Genome of Pseudomonas j5

adding any genes not already present in the list, in an order based on evolutionarydistance to P. aeruginosa 2192. To determine whether a given gene is present in anorganism or not, the �50/50� rule was used on a BLASTalignment [36] at the proteinlevel. The �50/50� rule states that to be considered conserved across two organisms atleast 50% of the length of the protein should show a minimum of 50% sequenceidentity. The number of genes not present in any of the previous strains is shown foreach strain with a black column. Another column shows the number of new genefamilies, illustrating that the new genes are not just functional orthologs, but indeeddo add new functionality.While the core genome is obviously restrained in size, the pan-genome is theoreti-

cally infinite or at least can be considered to be very large. As more and more strainswithin theP. aeruginosa species are sequenced and added toFigure 1.1, thenumber ofgenes in the core genome slowly drops until it reacheswhat appears to be a stable levelafter four or five genomes have been sequenced describing the P. aeruginosa coregenome. When one begins to add non-P. aeruginosa genomes to the plot the coregenome starts to drop again, now stabilizing at a significantly lower level, whichdescribes the core genome for the whole Pseudomonas genus. Of course, the exactvalues for the sizes of both theP. aeruginosa and thePseudomonasgenus coregenomesare subject to change as even more fully sequenced Pseudomonas genomes becomeavailable. Inparticular, thegenus coregenome is almost certain todrop further asnewstrains become sequenced, not hitherto represented in the core genome.Analysis of the pan/core genomics of an organism is of great importance

towards describing at the genetic level what constitutes a taxonomic group, e.g. a�Pseudomonas bacteria�. There is a considerable difference (about 10-fold) in the sizebetween the core and pan-genomes, and even just within P. aeruginosa the pan-genome is twice the size of the core genome, with many of the genes unique to oneparticular strain not being merely functional orthologs. While perhaps a trivialobservation, it nevertheless has a great impact towards determining whether a givengene exists in Pseudomonas or not. One should be careful before making statementsto that effect or at the very least take the trouble to specify precisely inwhich particularisolate one believes the gene to be present or absent. The genetic diversity of bacteriais quite vast even at the species level and much more so at the genus level.

1.3Phylogeny of Pseudomonas

In comparative genomics phylogeny is almost always a principal component of theanalysis – either as the actual subject of study or as the underlying structure,explaining most of the data. In the study of pan-genomics phylogeny can be usedto explain a lot of the data. There are several ways to assess the phylogeny of a givengroupof strains or species.One of the oldest andmost commonways is by comparingthe sequence of the 16S ribosomal subunits. Figure 1.2 is an example of such a tree.The tree in Figure 1.2 is made by first extracting the 16S rRNA genes, using the

hidden Markov model-based RNAmmer program [37], then aligning them using

6j 1 Comparative Genomics of Pseudomonas

ClustalW [38]. Using the BIONJ algorithm the phylogeny is reconstructed from themultiple alignment [39]. This last step was performed using the phylowin pro-gram [40]. To get a sense of the robustness of the tree it is also bootstrapped. Fromthe tree we can derive some simple conclusions. The four P. aeruginosa are veryclosely related and it seems that P. aeruginosa is a more �well-defined� species thanmany of the other Pseudomonas, at least from a 16S perspective. Also, P. stutzeri andP. mendocina seems to be quite distinct from the other species, while P. entomophila

Figure 1.2 16S phylogenetic tree of the Pseudomonas genus. Thetree is based on similarities in the sequence of 16S ribosomalsubunits and shows the evolutionary distance between these.Study of the 16S ribosomal subunit is a common way ofestablishing evolutionary distance, but it is not the only way andnor is it guaranteed to give the best results.

1.3 Phylogeny of Pseudomonas j7

L48 is quite closely related to the P. putida strains. In fact the 16S rRNA of P. putidaW619 is less similar to that of the other P. putida strains compared to that ofP. entomophila L48. Even more interesting, it is remarkable that while P. fluorescensPfO-1 has a distance to the other species that is comparable to that seen among theother Pseudomonas species, P. fluorescens Pf-5 is so similar to the P. syringae strainsthat it might be mistaken for one if the taxonomists were not there to say otherwise.When confronted with this kind of data, the proper question to ask is whether the

16S rRNA comparison actually tells us what we want to know? To address thisquestionwewill turn to a comparison of thes factors of ourPseudomonas strains. Thes factors, the essential ones at least, happen to have some of the same traits thatmakethe 16S rRNA a good phylogeneticmarker: it interacts with numerous genes,makingchanges relatively infrequent and uptake by horizontal gene transfer improbable;also the household s factor s70 is ubiquitous.Studying the RpoD subtree on Figure 1.3, we see again that the two P. fluorescens

strains clusterwith theP. syringae strains and thatP. entomophila L48 clusterswith theP. putida strains. We also see, however, that the resolution is very poor on the RpoDsubtree. Ifwe instead lookat theFliAandRpoHclades,whichhavehigher resolutions,we see that the two P. fluorescens strains and P. entomophila L48 split out from theP. syringae and P. putida clades, although they still seem to be closely related.

1.4Blast Atlas of Pseudomonas Genomes

The �Blast atlas� is a visualizationmethod to show a vast amount of data in one plot bytaking a reference genome and mapping the conservation of each protein-encodinggene along the chromosome.An example of a BLASTatlas is given inFigure 1.4, usingP. aeruginosa UCBPP-PA14 as the reference genome, compared to 18 other Pseudo-monas genomes, as well as the UniProt database (outer circle in black). The atlas isconstructed largely as described previously [41], with the sole refinement being theintroduction of a continuous color scale to more accurately describe the BLAST hits.As for Figure 1.1, we used the �50/50� rule (see above) as a requirement for what

constitutes a conserved protein. However, in this case, since we are visualizingBLASTresults along a sequence, it is possible to display more information from theBLAST report. In this context, any protein which is well conserved across thePseudomonas genus will appear as a strongly colored band on the atlas, while proteinsthat are weakly conserved or present only in the reference strain and absent in the

Figure 1.3 Phylogenetic tree of the Pseudomonas genus based ons factor similarities. As s factors generally show high degrees ofconservation and s70 in particular, is also ubiquitous, it – like the16S ribosomal subunit – is useful as a measure of evolutionarydistance. While it can be said to be generally neithermore nor lessaccurate than the 16S tree of Figure 1.2, it provides an alternateview.

—————————————————————————————"

8j 1 Comparative Genomics of Pseudomonas

1.4 Blast Atlas of Pseudomonas Genomes j9

10j 1 Comparative Genomics of Pseudomonas

Figure 1.4 Blast atlas of P. aeruginosa UCBPP-PA14 versus 18 other Pseudomonas strains. Theinnermost circles show properties related to thebase composition of the UCBPP-PA14 genomeas well as the locations of the genes. Startingfrom the red colored rings and outwards, thecolors represent BLAST identities to otherPseudomonas genomes. In general theevolutionary distance from UCBPP-PA14increases as one moves from the innermostcircles to the outermost, e.g. the bright red circlesall represent various P. aeruginosa species, while

the blue show alignments to P. putida strains,etc. The strength of the color bands in each circleindicates the degree of conservation in the otherstrains of that area of the UCBPP-PA14 genome.BLAST alignments are performed at the aminoacid level and only for proteins. Thus, anythingnot being translated (e.g. rRNA) has not beenaligned andwill thus be colored grey. A larger andzoomable version of the atlas is available online:http://www.cbs.dtu.dk/services/GenomeAtlas/suppl/zoomatlas/.

1.4 Blast Atlas of Pseudomonas Genomes j11

other proteomes will result in weak bands or outright gaps in the alignments circles.However, because the BLASTalignments are done at the protein level, gaps will alsoappear for genomic regions not being translated, e.g. rRNA islands. Thus, whileweakand strong bands can be interpreted directly, it is prudent to cross-reference anycomplete gapwith the genomic annotationswhich are also given onFigure 1.4 beforedrawing any conclusions either way.The innermost circle is the ATcontent, with the turquoise color indicating that this

genome isGCrich. The second innermost circle is deep red representing regions thatare more than three standard deviations AT-rich than the chromosomal average anddark turquoise regions similarly represent GC-rich regions in the chromosome. Thesecond innermost circle is the GC skew (the bias of Gs towards one strand or theother, averaged over a 10 000-bpwindow). The third circle shows theGlobal repeats asthey represent the bestmatch of a 100-bp piece ofDNA, centered at a given position inthe chromosome, when searched against the entire chromosome. The fourth circlerepresents the genes, with blue for protein-encoding genes oriented clockwise andred coding for genes on the other strand (counterclockwise). The three lanes beforethe individual proteomes start are DNA structural features, based on the physico-chemical properties of the DNA helix. From innermost to outermost they are the�Position Preference� (a measure related to the rigidity of the DNA [42]), the�StackingEnergy� (which states the energyneeded tode-stack, i.e.melt, theDNA [43])and, finally, the �Intrinsic curvature� (which describes the DNA helix preferencetowards bending) [44]. It is also possible to zoom into specific regions of interest and aweb-based �zoomable atlas� for P. aeruginosa UCBPP-PA14 can be found online, aswell as for several other bacterial genomes (http://www.cbs.dtu.dk/services/Geno-meAtlas/suppl/zoomatlas/).One thingwhich is characteristic in distinguishing pathogens fromnonpathogens

is the presence of pathogenic gene islands. Since the Blast atlas is adept at identifyinggenetic regions in the reference genome that show either an abnormally high or lowdegree of conservation, it should be able to pick up on the presence of pathogenicislands. For the same reason, the P. aeruginosa UCBPP-PA14 strain was chosen toserve as the reference genome as this strain is famous for its high degree of virulenceand is known to be much more virulent than, for example, P. aeruginosa PAO1 [45].

1.4.1Region 5 243 000–5 361 000

The atlas of UCBPP-PA14 reveals many regions of potential interest. For example, alarge region around 5250 kbp shows only partial similarity to other P. aeruginosagenomes and practically no similarity to more distantly related Pseudomonas (see thezoom in Figure 1.5). The region also shows some interesting structural propertiesand an examination of the annotations reveals that the majority of the genes presentare located on the direct strand. The region is the previously described pathogenicityisland PAPI-1 which is a cluster of more than 100 genes [46]. Some of the geneswithin PAPI-1 are homologous to known genes with virulence functions in otherhuman and plant bacterial pathogens, and mutations in a number of these PAPI-1

12j 1 Comparative Genomics of Pseudomonas

Figu

re1.5Zoo

mon

alargeregion

arou

nd52

50kbpof

theBlastatlasof

UCBPP

-PA14

show

nin

Figu

re1.4.

Each

ofthelin

ear

coloredbarsinthisplot

reflectson

eofthecirclesinthecircularatlas;forfurtherexplan

ationofthecolors,refertothedescription

inFigu

re1.4an

dthedescriptionin

themaintext.T

hisregion

show

slittle

similarityto

otherP.

aeruginosa

geno

mes

andless

similarityto

moredistan

tlyrelatedPseudomonas.T

heregion

isthepreviouslydescribedpa

thog

enicity

island

PAPI-1,w

hich

isa

clusterof

morethan

100gene

s,someof

which

areho

molog

ousto

know

ngene

swith

virulencefunctio

nsin

otherhu

man

and

plan

tbacterialp

atho

gens.

1.4 Blast Atlas of Pseudomonas Genomes j13

genes results in the attenuation of PA14 virulence in both plant and animal infectionmodels [46]. PAPI-1 is located next to a tRNALys gene, which presumably provides anattachment site for integration of the island after its acquisition.It was recently experimentally demonstrated that PAPI-1 is a mobile genetic

element [47]. The island can be excised from the PA14 genome, transferred andintegrated into the genomes of P. aeruginosa strains such as PAO1 and other strainsthat do not harbor the island naturally [47]. Although the molecular mechanism ofPAPI-1 transfer remains incompletely understood, PAPI-1 carries several genespredicted to encode proteins that could function in a conjugative transfer mecha-nism[47].AsthePAPI-1 islandcarriesseveralvirulencedeterminantsandiscapableofspreadingamongP.aeruginosastrains itcouldpotentiallycontribute to theevolutionofvariants with enhanced pathogenicity due to an increased virulence gene repertoire.Hybridization studies and microarray genomotyping of different P. aeruginosa

strains of diverse environmental origin have showed the presence of PAPI-1 se-quences in a significant fraction of the strains, althoughmost these isolates appear tocarry only aportionof the island [45,46]. This patternofPAPI-1 sequencedistribution,which is also evident in Figure 1.5, is probably due to progressive accumulation ofmutations or deletions of genes that are either unnecessary, deleterious or provide noselective advantages to the recipient after acquisition of the island.

1.4.2Region 713 000–785 000

Another region revealed by the Blast atlas, this time for its high degree of conserva-tion, is found around 713 kbp (see the zoom in Figure 1.6). This region contains36 genes involved in �Translation, ribosomal structure and biogenesis�. These aremainly ribosomal proteins. The region also contains a ribosomal RNA operon andseveral tRNAs. The region obviously plays an important role in transcription andtranslation in the organism, and the high degree of conservation is understandable.Another observation in linewith the biological significance of this island is found if

one looks to the bars displaying the structural parameters: �Intrinsic curvature�,�Stacking Energy� and �Position Preference�. These not only reveal the unusual basecomposition of the rRNA genes (which must in turn give rise to unusual DNAstructures), but also show characteristic structures in the intergenic regions betweenthe genes. These are related to promoter activity since the DNA must destack andunwind for the RNA polymerase to bind, thus initiating transcription. The charac-teristic structures facilitate this, and are indicators of strong promoters and highlevels of expression for the associated genes.

1.5Functional Categories

Are there differences, on a broad level, between the distributions of functional genesin the various different species of Pseudomonas? This question can now be evaluated

14j 1 Comparative Genomics of Pseudomonas

Figu

re1.6Zoo

mon

agene

ticisland

arou

nd75

0kbpof

theBlastatlasof

UCBPP

-PA14

show

nin

Figu

re1.4.Each

ofthelin

ear

coloredbarsinthisplot

reflectson

eofthecirclesinthecircularatlas;forfurtherexplanationofthecolors,refertothedescription

inFigu

re1.4an

dthedescriptionin

themaintext.T

heisland

isiden

tifiedby

itshigh

degree

ofconservatio

nacross

the

Pseudomonasgenu

s,an

dcontains

36gene

srelatedtotran

slation,ribo

somalstructurean

dbiog

enesis.The

region

also

contains

arRNAop

eron

(visibleby

thelight

blue

bars

inthean

notatio

nlin

e)as

wellasseveraltRNAs.Noticethat

theap

parent

lack

ofconservatio

nforthe

rRNAop

eron

isan

artifacto

fthe

BLA

STalignm

ent(which

ison

lype

rformed

fortranslatedsequ

ences)an

dno

tan

accurate

represen

tatio

nof

thebiolog

y.

1.5 Functional Categories j15

based on the set of genomes sequenced to date. For each genome, the fractionof genes in each of several different functional categories was determined.Abar diagramof the distribution of Clusters ofOrthologousGene (COGS) functionalcategory assignments [48] obtained from the Integrated Microbial Genomes (IMG)system [49] for Pseudomonas is shown in Figure 1.7. At the time of writing the IMGcontained assignment information on 16 of the 19 genomes given in Table 1.1. Thenumber of genes for each strain which could be found varied from the 3700 (thelowest) identified for P. mendocina YMP to 4811 (the highest) for P. aeruginosa 2192.In general, the COG assignments cover roughly 75% of the proteins for eachorganism in Table 1.1.As can be seen from Figure 1.7, the great majority of genes (around 90%) have

some form of assignment, while approximately 10% have been labeled function

Figure 1.7 The percentage of proteins assigned to COGcategories by organism. The figure shows data for the 16 speciesofPseudomonas found in the IMGdatabase (see text). Althoughonthe whole the functional breakdown of the various strains is verysimilar, the similarity appears slightly higher within each speciesthan observed when comparing across species.

16j 1 Comparative Genomics of Pseudomonas

unknown. (An assignment of function unknown is different fromaproteinnot having aCOGassignment. Aprotein assigned function unknown is known to exist, although itsfunction is unknown. A protein having no assignment might not be a protein at all!)On the whole the organisms show a high degree of functional similarity at this broadlevel, although a slightly higher difference is observed when comparing acrossspecies relative to within species. It appears, based on these results that there are nomajor differences on a broad level of distribution of functional categories across allthe Pseudomonas genomes.

1.6Codon Usage and Expression

Elucidation of the physical and biological properties of highly expressed genes as wellas the examination of codon usage preferences have been addressed by a numberof studies [50–52]. Figure 1.8 demonstrates how the average codon usage in

Figure 1.8 Codon usage in P. aeruginosa PAO1. Genomic codonusage (blue) versus highly expressed genes (red).

1.6 Codon Usage and Expression j17

P. aeruginosa PAO1 differs from the codon usage preference of highly expressedgenes. This so-called codon usage bias represents an evolutionary strategy tomodulate gene expression and hencemathematical formulations of the codon usagebias have been widely used to predict gene expression on a genomic scale [1].While the first genomes examined in this way – E. coli and Saccharomyces cerevisiae

– provided strong evidence of high translational codon usage bias [52], recent studiesreport bacterial genomes with little codon usage bias [53,54], often these species havegenomes with extreme AT or GC content. Thus, the dominating codon bias inP. aeruginosa PAO1 is not translational. Here, the codon usage bias is much morecorrelated with GC content than with translational bias [54]. However, a translationalcodon adaptation index (tCAI) may still be useful for estimation of gene expressionlevels and prediction of highly expressed genes in P. aeruginosa PAO1 (Figure 1.9).The codon bias in other Pseudomonas species may be more or less translational thanin PAO1 (Table 1.2).Obviously, the tCAI measure is only able to predict highly expressed proteins

(translated genes) since this measure is based on codon usage bias. Therefore,this method does not consider tRNAs, ribosomal RNAs and other noncodingRNAs. However, transcription of DNA is also highly influenced by DNA stackingand flexibility. Consequently, on a more global scale, gene expression may beregulated by specific promoters that are sensitive to DNA structural properties [2,55].For example, the �Position Preference� measure (Figure 1.10) is a DNA structuralmeasure that was originally derived for eukaryotes using chicken DNA and is atrinucleotide model of nucleosome-positioning patterns. It reflects the preference ofa given trinucleotide for being found in a region where the DNAminor groove faces

Figure 1.9 Venn diagrams of the 10% most highly expressedgenes in P. aeruginosa PAO1 as identified by several methods. ThetCAI and dCAI circles represent the predictions made by thetranslational and dominating CAIs, respectively, while PP is the�Position Preference� trinucleotide model and Expr is measuredmicroarray expression values. The figure is modified fromRefs. [1,2].

18j 1 Comparative Genomics of Pseudomonas

Table 1.2 Correlations between translational and dominating CAI (dCAI).

Organism Correlation tCAI versus dCAI

P. aeruginosa PAO1 0.42484871P. entomophila L48 0.52157109P. fluorescens Pf-5 0.04194573P. fluorescens PfO-1 0.49853369P. putida KT2440 0.76612008P. syringae 1448A 0.77669291P. syringae B728a 0.65205812P. syringae DC3000 0.64478432

Figure 1.10 Expression atlas of P. aeruginosaPAO1. The figure shows a list of measurementsrelevant for expression in relation to genomicposition. From the top (outer ring) anddown: theaverages of expression values for each gene

obtained from microarray experiments; the�Position Preference� of the DNA towardschromatin binding (i.e. a measure of DNAflexibility); estimated values for the tCAI. See textfor further explanation of methods.

1.6 Codon Usage and Expression j19

either towards or away from the nucleosome histone core [42]. By using the absolutevalues, the position preference value is also a measure of anisotropic DNA flexibility.Consequently, the �Position Preference� measure also describes a more generalstructural property of DNA, i.e. how easily can it be wrapped around chromatinproteins. As a result, �Position Preference� can be used for prediction of highlyexpressed genes. This makes sense because regions of DNA that are not condensedinto chromatin are more accessible to the RNA polymerase. Although the �PositionPreference� measure is less efficient than the tCAI measure in predicting highlyexpressed genes in P. aeruginosa PAO1 (see figure in [2]), chromosomal regions withlow �Position Preference� do seem to coincide with regions concentrated in highlyexpressed genes – in particular, for the regions containing the ribosomal RNAs(Figure 1.10).

1.7Future Outlook

Only a few years ago to sequence an entire Pseudomonas genome in one contiguouspiece would literally cost millions of dollars and take more than a year to complete,with an additional year or two to annotate and write up themanuscript. At the time ofwriting, it is possible to sequence a Pseudomonas genome for less than US$10 000,taking less than a day tofinish, with an overnight run on the computers to annotate it.To put things in perspective, if the goal of sequencing a human genome for US$1000or less is achieved, as it likely will within a year or two [56,57], thismeans to sequenceaPseudomonas genomewill cost in the range ofUS$2, i.e. it will take perhaps less timeand money than a cup of coffee at a nice restaurant! As the cost of sequencinggenomes becomes less expensive, it will become possible to routinely have projectsthat, instead of obtaining the sequence of just one particular genome from a well-characterized strain, now go for obtaining multiple genomes (perhaps hundreds oreven thousands) from entire strain collections. This revolution has potential for afantastic explosion of data (and perhaps also an explosion of understanding) of ourknowledge ofPseudomonas genomics.However, there is also the potential for difficulttimes for many older traditional microbiologists, who have spent the past severaldecades studying their �favorite gene� in one particular Pseudomonas strain and whomight not be prepared for so much data – indeed too much data for manymicrobiologists. This need not be the case, however, since there is now more thanever a need for knowledge of traditional methods, including physiology, in order toput the pieces back together again, in a sense, and understand the whole (verycomplicated) system.The challenge in the future will be to develop robust newmethods to be able to deal

with the enormous amount of valuable sequence information. The beginnings of anew field of study can already be seen – it is now possible to estimate the �pan-genome� and �core genome� of several different Pseudomonas genomes, as has beenhinted at in this chapter. Perhaps in the future, comparative genomics will movetowards comparative �pan-genomics�, where one compares the �pan-genome� and

20j 1 Comparative Genomics of Pseudomonas

�core genome� of species (or other taxonomic groups) against each other, todetermine which genes seem to be commonly conserved within groups or ecologicalniches and which sets of �additional genes� are often found in differentenvironments.

References

1 Willenbrock, H., Friis, C., Friis, A.S. andUssery, DW. (2006) An environmentalsignature for 323 microbial genomesbased on codon adaptation indices.Genome Biol, 7, R114.

2 Willenbrock, H. and Ussery, D.W. (2007)Prediction of highly expressed genes inmicrobes based on chromatin accessibility.BMC Mol Biol, 8, 11.

3 Lyczak, J.B., Cannon, C.L. and Pier, G.B.(2000) Establishment of Pseudomonasaeruginosa infection: lessons from aversatile opportunist. Microbes Infect, 2,1051–1060.

4 Stover, C.K., Pham, X.Q., Erwin, A.L.,Mizoguchi, S.D., Warrener, P., Hickey,M.J., Brinkman, F.S., Hufnagle, W.O.,Kowalik, D.J., Lagrou, M. et al. (2000)Complete genome sequence ofPseudomonas aeruginosa PAO1, anopportunistic pathogen. Nature, 406,959–964.

5 Klockgether, J., W€urdemann, D.,Wiehlmann, L., Binnewies, T.T., Ussery,D.W. and T€ummler, B. (2008) Genomediversity of Pseudomonas aeruginosa, inPseudomonas: Genetics and MolecularBiology, (ed. P. Cornelis), Horizon Press,Norwich., in press.

6 Brint, J.M. and Ohman, D.E. (1995)Synthesis of multiple exoproducts inPseudomonas aeruginosa is under thecontrol of RhlR-RhlI, another set ofregulators in strain PAO1 with homologyto the autoinducer-responsive LuxR-LuxIfamily. J Bacteriol, 177, 7155–7163.

7 Pearson, J.P., Pesci, E.C. and Iglewski,B.H. (1997) Roles of Pseudomonasaeruginosa las and rhl quorum-sensingsystems in control of elastase and

rhamnolipid biosynthesis genes.J Bacteriol, 179, 5756–5767.

8 Van Alst, N.E., Picardo, K.F., Iglewski,B.H. and Haidaris, C.G. (2007) Nitratesensing and metabolism modulatemotility, biofilm formation, and virulencein Pseudomonas aeruginosa. InfectImmun, 75, 3780–3790.

9 Furukawa, S., Kuchma, S.L. and O'Toole,GA. (2006) Keeping their options open:acute versus persistent infections.J Bacteriol, 188, 1211–1217.

10 Smith, E.E., Buckley, D.G., Wu, Z.,Saenphimmachak, C., Hoffman, L.R.,D'Argenio, D.A., Miller, S.I., Ramsey,B.W., Speert, D.P. and Moskowitz, S.M.et al. (2006) Genetic adaptation byPseudomonas aeruginosa to the airways ofcystic fibrosis patients. Proc Natl Acad SciUSA, 103, 8487–8492.

11 Lee, B., Haagensen, J.A., Ciofu, O.,Andersen, J.B., Hoiby, N. and Molin, S.(2005) Heterogeneity of biofilms formedby nonmucoid Pseudomonas aeruginosaisolates from patients with cystic fibrosis.J Clin Microbiol, 43, 5247–5255.

12 Jelsbak, L., Johansen, H.K., Frost, A.L.,Thogersen, R., Thomsen, L.E., Ciofu, O.,Yang, L., Haagensen, J.A., Hoiby, N.and Molin, S. (2007) Molecularepidemiology and dynamics ofpseudomonas aeruginosa populations inlungs of cystic fibrosis patients. InfectImmun, 75, 2214–2224.

13 Livermore, D.M. (1995) b-Lactamases inlaboratory and clinical resistance. ClinMicrobiol Rev, 8, 557–584.

14 Poole, K. (2001) Multidrug efflux pumpsand antimicrobial resistance inPseudomonas aeruginosa and related

References j21

organisms. J Mol Microbiol Biotechnol, 3,255–264.

15 Livermore, D.M. (2002) Multiplemechanisms of antimicrobial resistance inPseudomonas aeruginosa: our worstnightmare? Clin Infect Dis. 34, 634–640.

16 Franklin, F.C., Bagdasarian, M.,Bagdasarian, M.M. and Timmis, K.N.(1981) Molecular and functional analysisof the TOL plasmid pWWO fromPseudomonas putida and cloning of genesfor the entire regulated aromatic ringmetacleavage pathway. Proc Natl Acad Sci USA,78, 7458–7462.

17 Regenhardt, D., Heuer, H., Heim, S.,Fernandez, D.U., Strompl, C., Moore, E.R.and Timmis, K.N. (2002) Pedigree andtaxonomic credentials of Pseudomonasputida strain KT2440. Environ Microbiol, 4,912–915.

18 Dejonghe, W., Boon, N., Seghers, D., Top,E.M. and Verstraete, W. (2001)Bioaugmentation of soils by increasingmicrobial richness: missing links. EnvironMicrobiol, 3, 649–657.

19 Timmis, K.N., Steffan, R.J. andUnterman,R. (1994) Designing microorganisms forthe treatment of toxic wastes. Annu RevMicrobiol, 48, 525–557.

20 Lehrbach, P.R., Zeyer, J., Reineke, W.,Knackmuss, H.J. and Timmis, KN. (1984)Enzyme recruitmentin vitro: use of clonedgenes to extend the range of haloaromaticsdegraded by Pseudomonas sp. strain B13.J Bacteriol, 158, 1025–1032.

21 Nozaki, M., Kagamiyama, H. andHayaishi, O. (1963) Crystallization andsome properties of metapyrocatechase.Biochem Biophys Res Commun, 11, 65–69.

22 Nakazawa, T. (2002) Travels of aPseudomonas, from Japan around theworld. Environ Microbiol, 4, 782–786.

23 Federal Register . (1982) CertifiedHost–Vector Systems. 17197.

24 Nelson, K.E., Weinel, C., Paulsen, I.T.,Dodson, R.J., Hilbert, H., Martins dosSantos, V.A., Fouts, D.E., Gill, S.R., Pop,M., Holmes, M. et al. (2002) Completegenome sequence and comparative

analysis of the metabolically versatilePseudomonas putida KT2440. EnvironMicrobiol, 4, 799–808.

25 Hirano, S.S. and Upper, C.D. (2000)Bacteria in the leaf ecosystem withemphasis on Pseudomonas syringae – apathogen, ice nucleus, and epiphyte.Microbiol Mol Biol Rev, 64, 624–653.

26 Joardar, V., Lindeberg, M., Jackson, R.W.,Selengut, J., Dodson, R., Brinkac, L.M.,Daugherty, S.C., Deboy, R., Durkin, A.S.,Giglio, M.G. et al. (2005) Whole-genomesequence analysis of Pseudomonas syringaepv. phaseolicola 1448A reveals divergenceamong pathovars in genes involved invirulence and transposition. J Bacteriol,187, 6488–6498.

27 Feil, H., Feil, W.S., Chain, P., Larimer, F.,DiBartolo, G., Copeland, A., Lykidis, A.,Trong, S., Nolan, M., Goltsman, E. (2005)et al. Comparison of the complete genomesequences of Pseudomonas syringae pv.syringae B728a and pv. tomato DC3000.ProcNatl Acad SciUSA, 102, 11064–11069.

28 Buell, C.R., Joardar, V., Lindeberg, M.,Selengut, J., Paulsen, I.T., Gwinn, M.L.,Dodson, R.J., Deboy, R.T., Durkin, A.S.,Kolonay, J.F. et al. (2003) The completegenome sequence of the Arabidopsis andtomato pathogen Pseudomonas syringae pv.tomato DC3000. Proc Natl Acad Sci USA,100, 10181–10186.

29 Haas, D. and Defago, G. (2005) Biologicalcontrol of soil-borne pathogens byfluorescent pseudomonads. Nat RevMicrobiol, 3, 307–319.

30 Lalucat, J., Bennasar, A., Bosch, R., Garcia-Valdes, E. and Palleroni, N.J. (2006)Biology of Pseudomonas stutzeri. MicrobiolMol Biol Rev, 70, 510–547.

31 Vodovar, N., Vallenet, D., Cruveiller, S.,Rouy, Z., Barbe, V., Acosta, C., Cattolico, L.,Jubin, C., Lajus, A., Segurens, B. (2006) etal. Complete genome sequence of theentomopathogenic and metabolicallyversatile soil bacterium Pseudomonasentomophila. Nat Biotechnol, 24, 673–679.

32 NCBI Genome Project. http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi.

22j 1 Comparative Genomics of Pseudomonas

33 Pathogen Sequencing, Group at theSanger Institute. ftp://ftp.sanger.ac.uk/pub/pathogens/pae.

34 Larsen, T.S. and Krogh, A. (2003)EasyGene, – a prokaryotic gene finder thatranks ORFs by statistical significance.BMC Bioinformatics, 4, 21.

35 Binnewies, T., Hallin, P.F., Wassenaar,T.M. and Ussery, D.W. (2007) Tools forcomparison of bacterial genomes, inComparative Genomics and Bioinformaticsfor the Microbiologist, Field, D. (Ed.).Horizon Press, Norwich., pp. 00–00.

36 WU-BLAST. http://blast.wustl.edu.37 Lagesen, K., Hallin, P., Rodland, E.A.,

Staerfeldt, H.H., Rognes, T. andUssery, DW. (2007) RNAmmer: consistentand rapid annotation of ribosomalRNA genes. Nucleic Acids Res, 35,3100–3108.

38 Thompson, J.D., Higgins, D.G. andGibson, T.J. (1994) CLUSTAL W:improving the sensitivity of progressivemultiple sequence alignment throughsequence weighting, position-specific gappenalties andweightmatrix choice.NucleicAcids Res, 22, 4673–4680.

39 Gascuel, O. (1997) BIONJ: an improvedversion of the NJ algorithm based on asimple model of sequence data. Mol BiolEvol, 14, 685–695.

40 Galtier, N., Gouy,M. andGautier, C. (1996)SEAVIEWand PHYLO_WIN: two graphictools for sequence alignment andmolecular phylogeny. Comput Appl Biosci,12, 543–548.

41 Hallin, P.F., Binnewies, T.T. and Ussery,D.W. (2004) Genome update:chromosome atlases. Microbiology, 150,3091–3093.

42 Satchwell, S.C., Drew, H.R. and Travers,A.A. (1986) Sequence periodicities inchicken nucleosome core DNA. J Mol Biol,191, 659–675.

43 Ornstein, R., Rein, R., Breen, D. andMacelroy R. (1978) An optimized potentialfunction for the calculation of nucleic acidinteraction energies, I – base stacking.Biopolymers, 17, 2341–2360.

44 Shpigelman, E.S., Trifonov, E.N. andBolshoy, A. (1993) CURVATURE: softwarefor the analysis of curved DNA. ComputAppl Biosci, 9, 435–440.

45 Lee, D.G., Urbach, J.M., Wu, G., Liberati,N.T., Feinbaum, R.L., Miyata, S., Diggins,L.T., He, J., Saucier, M., Deziel, E. (2006)et al. Genomic analysis reveals thatPseudomonas aeruginosa virulence iscombinatorial. Genome Biol, 7, R90.

46 He, J., Baldini, R.L., Deziel, E., Saucier, M.,Zhang, Q., Liberati, N.T., Lee, D., Urbach,J., Goodman, H.M. and Rahme, LG. (2004)The broad host range pathogen Pseudomo-nas aeruginosa strain PA14 carries twopathogenicity islands harboring plant andanimal virulence genes. Proc Natl Acad SciUSA, 101, 2530–2535.

47 Qiu, X., Gurkar, A.U. and Lory, S. (2006)Interstrain transfer of the largepathogenicity island (PAPI-1) ofPseudomonas aeruginosa.Proc Natl Acad SciUSA, 103, 19830–19835.

48 Tatusov, R.L., Koonin, E.V. and LipmanD.J. (1997) A genomic perspective onprotein families. Science, 278, 631–637.

49 Markowitz, V.M., Korzeniewski, F.,Palaniappan, K., Szeto, E., Werner, G.,Padki, A., Zhao, X., Dubchak, I.,Hugenholtz, P. Anderson, I. (2006)et al. The Integrated MicrobialGenomes (IMG) system. Nucleic AcidsRes, 34, D344–D348

50 Raghava, G.P. and Han, J.H. (2005)Correlation and prediction of geneexpression level from amino acid anddipeptide composition of its protein. BMCBioinformatics, 6, 59.

51 Karlin, S., Barnett, M.J., Campbell, A.M.,Fisher, R.F. and Mrazek, J. (2003)Predicting gene expression levels fromcodon biases in alpha-proteobacterialgenomes. Proc Natl Acad Sci USA, 100,7313–7318.

52 Sharp, P.M. and Li, W.H. (1987) TheCodon Adaptation Index – a measure ofdirectional synonymous codon usage bias,and its potential applications.Nucleic AcidsRes, 15, 1281–1295.

References j23

53 Carbone, A., Kepes, F. and Zinovyev, A.(2005) Codon bias signatures,organization of microorganisms in codonspace, and lifestyle. Mol Biol Evol, 22,547–561.

54 Carbone, A., Zinovyev, A. and Kepes, F.(2003) Codon adaptation index as ameasure of dominating codon bias.Bioinformatics, 19, 2005–2015.

55 Willenbrock, H. and Ussery, D.W. (2004)Chromatin architecture and gene expre-ssion inEscherichia coli.GenomeBiol,5, 252.

56 Service R.F. (2006) Gene sequencing. Therace for the $1000 genome. Science, 311,1544–1546.

57 The Archon X-PRIZE for Genomics.http://www.xprize.org/xprizes/genomics_x_prize.html.

24j 1 Comparative Genomics of Pseudomonas