encyclopedia of life sciences || phylogeny based on 16s rrna/dna

10
Phylogeny Based on 16S rRNA/DNA Erko Stackebrandt, DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany Polyphasic systematics of prokaryotes is guided by the results of comparative analysis of the evolutionary conservative 16S ribosomal ribonucleic acid (RNA) genes. Certain genes coding for ‘housekeeping’ proteins support the phylogenetic outline but most of these genes are either not ubiquitous or they are evolving too rapidly to embrace all higher taxa. Dendrograms of phylogenetic relatedness show the order at which organisms evolved in time, thus providing the frame for their classification. The delineation and circumscription of taxa should not be solely based on the topology of dendrograms but it is advised to include a wide range of molecular, chemical and metabolic properties, which need to be assessed and evaluated in the light of novel taxonomic information. Introduction In contrast to more highly evolved eukaryotes, in which complex morphologies visibly reflect their evolutionary history, the microscopic and ultrastructural features of microorganisms cannot be used to deduce the way in which the prokaryotes and the morphologically simple eukaryo- tic forms evolved. Before 1960, taxonomists were unable to appreciate the complexity of microbial systematics and to recognize those groups based on superficial properties alone did not necessarily reflect those which arose due to evolutionary processes. However, at this same time (when most microbiologists resigned themselves to the belief that the true course of prokaryote phylogeny could never be unravelled), concepts were developed that turned out to change our view about the interrelatedness of living or- ganisms. See also: Systematics: Historical Overview Semantic Macromolecules: A Basis for Phylogenetic Studies Biological molecules can be classified into three categories – semantides, episemantides and asemantides – according to the information they carry (Zuckerkandl and Pauling, 1965). At the highest level is the semantides, which are information-carrying molecules. From the evolutionary origin of the first cell to contemporary cells, information on reproduction, behaviour, survival, maintenance, etc., have been laid down in blueprints and passed in semantides from one generation to the other. See also: Cell Macromolecules; Molecular Evolution; Semantides and Modern Bacterial Systematics Three semantides have been defined according to their role in the cell. Following the biological dogma of infor- mation flow, deoxyribonucleic acid (DNA) is the primary semantide, ribonucleic acid (RNA) is the secondary se- mantide and proteins are the tertiary semantides. As changes in the primary structure of DNA occur by an on- going process of random mutation, breaking–rejoining reactions and selection, the composition of this molecule is constantly changing and this information is passed through messenger RNA to the proteins. The sequences of these molecules are the historical record of evolution, and the determination of their primary structure provides a powerful means by which evolutionary relationships can be measured. In essence, two organisms possessing a given stretch of semantides which differ in only a few changes (mutations, nucleotide order or amino acid positions) are more closely related to each other than those organisms in which a higher number of changes have accumulated. Thus these molecules can be considered as chronometers. As different genes are subjected to different rates of changes (same frequency of mutation but different level of mani- festation of changes), each cell possesses a variety of chro- nometers from which different evolutionary events can be determined. Slow running clocks will reflect early evolu- tionary events such as the separation of the main lines of descent, whereas fast running clocks will reflect more recent events, such as the more precise separation of genera and species. Comparative analysis of one particular Advanced article Article Contents . Introduction . Semantic Macromolecules: A Basis for Phylogenetic Studies . Sequence Determination, Sequence Alignment and Determination of Sequence Similarities . Recognition of the Higher Taxa of Prokaryotes . Polyphasic Approach to Bacterial Systematics . The Taxonomic Rank ‘Species’ in Bacteriology Online posting date: 15 th September 2009 ELS subject area: Microbiology How to cite: Stackebrandt, Erko (September 2009) Phylogeny Based on 16S rRNA/ DNA. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd: Chichester. DOI: 10.1002/9780470015902.a0000462.pub2 ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net 1

Upload: erko

Post on 04-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Phylogeny Based on 16SrRNA/DNAErko Stackebrandt, DSMZ-German Collection of Microorganisms and Cell Cultures GmbH,

Braunschweig, Germany

Polyphasic systematics of prokaryotes is guided by the results of comparative analysis of

the evolutionary conservative 16S ribosomal ribonucleic acid (RNA) genes. Certain

genes coding for ‘housekeeping’ proteins support the phylogenetic outline butmost of

these genes are either not ubiquitous or they are evolving too rapidly to embrace all

higher taxa. Dendrograms of phylogenetic relatedness show the order at which

organisms evolved in time, thus providing the frame for their classification. The

delineation and circumscription of taxa should not be solely based on the topology of

dendrograms but it is advised to include a wide range of molecular, chemical and

metabolic properties, which need to be assessed and evaluated in the light of novel

taxonomic information.

Introduction

In contrast to more highly evolved eukaryotes, in whichcomplex morphologies visibly reflect their evolutionaryhistory, the microscopic and ultrastructural features ofmicroorganisms cannot be used to deduce theway inwhichthe prokaryotes and the morphologically simple eukaryo-tic forms evolved. Before 1960, taxonomists were unable toappreciate the complexity of microbial systematics and torecognize those groups based on superficial propertiesalone did not necessarily reflect those which arose due toevolutionary processes. However, at this same time (whenmost microbiologists resigned themselves to the belief thatthe true course of prokaryote phylogeny could never beunravelled), concepts were developed that turned out tochange our view about the interrelatedness of living or-ganisms. See also: Systematics: Historical Overview

Semantic Macromolecules: A Basis forPhylogenetic Studies

Biological molecules can be classified into three categories– semantides, episemantides and asemantides – accordingto the information they carry (Zuckerkandl and Pauling,

1965). At the highest level is the semantides, which areinformation-carrying molecules. From the evolutionaryorigin of the first cell to contemporary cells, information onreproduction, behaviour, survival, maintenance, etc., havebeen laid down in blueprints andpassed in semantides fromone generation to the other.See also: CellMacromolecules;Molecular Evolution; Semantides and Modern BacterialSystematicsThree semantides have been defined according to their

role in the cell. Following the biological dogma of infor-mation flow, deoxyribonucleic acid (DNA) is the primarysemantide, ribonucleic acid (RNA) is the secondary se-mantide and proteins are the tertiary semantides. Aschanges in the primary structure of DNA occur by an on-going process of random mutation, breaking–rejoiningreactions and selection, the composition of this moleculeis constantly changing and this information is passedthrough messenger RNA to the proteins. The sequences ofthese molecules are the historical record of evolution, andthe determination of their primary structure provides apowerfulmeans bywhich evolutionary relationships canbemeasured. In essence, two organisms possessing a givenstretch of semantides which differ in only a few changes(mutations, nucleotide order or amino acid positions) aremore closely related to each other than those organisms inwhich a higher number of changes have accumulated. Thusthese molecules can be considered as chronometers. Asdifferent genes are subjected to different rates of changes(same frequency of mutation but different level of mani-festation of changes), each cell possesses a variety of chro-nometers from which different evolutionary events can bedetermined. Slow running clocks will reflect early evolu-tionary events such as the separation of the main linesof descent, whereas fast running clocks will reflect morerecent events, such as themore precise separation of generaand species. Comparative analysis of one particular

Advanced article

Article Contents

. Introduction

. Semantic Macromolecules: A Basis for Phylogenetic

Studies

. Sequence Determination, Sequence Alignment and

Determination of Sequence Similarities

. Recognition of the Higher Taxa of Prokaryotes

. Polyphasic Approach to Bacterial Systematics

. The Taxonomic Rank ‘Species’ in Bacteriology

Online posting date: 15th September 2009

ELS subject area: Microbiology

How to cite:Stackebrandt, Erko (September 2009) Phylogeny Based on 16S rRNA/

DNA. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd:Chichester.

DOI: 10.1002/9780470015902.a0000462.pub2

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net 1

homologous semantide from many organisms will allowthe determination of the order in which this gene or proteinevolved during evolution, thus unravelling its family tree.See also: Molecular Biology: The Central Dogma; Mole-cular Evolution: Introduction; Mutations and New Varia-tion: Overview

Episemantic molecules, which are synthesized under thecontrol of tertiary semantides, are adenosine triphosphate(ATP), carotenoids and chemotaxonomic markers. Thesemolecules are not phylogenetic markers per se but the pasthas shown that groups of organisms that form a phyloge-netic cluster can, inmost cases, be recognizedby the chemicalcomposition of markers such as peptidoglycan, lipids, fattyacids, isoprenoid quinones, polyamines and mycolic acids.

Asemantic molecules are molecules that are not pro-duced by the organisms themselves and therefore do not

express any of the information that this organism contains.For example, molecules such as exogenously supplied vi-tamins, phosphate ions, oxygen, viruses, etc. cannot beused in the reconstruction of evolutionary events. See also:Molecular Phylogeny Reconstruction

16S rRNA and the gene coding for it arereliable homologous molecules

The most useful molecular chronometers used todayfor the determination of phylogenetic relationships arethe 16S ribosomal (r)RNA genes and their gene products,the 16S rRNAs (Figure 1) (Woese, 1987). Inmost organismsthese genes occur in multiple copies per cell, but veryslow-growing species may have only a single copy. The 16SrRNAgene is part of an rrnoperon, usually located at the 5’

1114f1100r

926f

907r536f

519f

27f

1392r

342r357f

5’

3’

Figure 1 Secondary structure of a 16S rRNA molecule based on the Escherichia coli structure (Maidak et al., 1994; available in the public domain Ribosomal

Database Project). Highly variable regions are red; highly conservative stretches are green. Binding sites of primers used in PCR amplification of the rRNAgene are

blue, with the direction of amplification indicated by arrows. The other nucleotides are black.

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net2

terminus, followed by the larger 23S rRNA gene andthe small 5S rRNA gene. These genes are separated byspacers, some of which contain genes for transfer RNA.Some exceptions occur in which the large rRNA genes areseparated and not part of the same rrn operon. Duringtranslation, the pre-rRNA is folded under the influenceof the ribosomal proteins into tertiary and quaternarystructures. This is the basis for the maturation process, inwhich the 5’ and 3’ flanking region of the rRNA genes aredigested by specific enzymes. The mature RNA and theribosomal proteins are assembled to form the two rib-osomal subunits of which each ribosome is composed. Thesmall 30S ribosomal subunit contains the 16S rRNAand21proteins, whereas the larger 50S subunit contains the 23SrRNA, the 5S rRNA and 32 proteins. Fully assembledribosomes are part of the translation process, in which thegenetic information, channelled through the ribosomes viatranscribed messenger RNA, is translated into polypep-tides and proteins. It can be deduced from the universalityof the biological code that such a transcription process,although primitive in early evolution, had already beenfunctioning during the early stages of life. Thus, the 16SrRNA genes, as well as the genes coding for other compo-nents of the ribosomes, are presumably derived from acommonancestor and are homologousmolecules.See also:Bacterial Ribosomes: Assembly; Ribosomal RNA; rRNAStructure

Several criteria have been identified which make the 16SrRNAand their genes themostwidely studiedphylogeneticmarkers (1) the function of ribosomes has not changed forabout 3.8 billion years, (2) the 16S rRNA genes are uni-versally present among all cellular life forms, (3) the size of1540 nucleotides makes them easy to analyse, (4) the pri-mary structure is an alternating sequence of invariant,more or less conserved to highly variable regions and (5)lateral gene transfer is a rare event and has been noticedmainly between closely related strains. Other homologousmolecules have been sequenced, for example, genes codingfor 23S and 5S rRNA, ribosomal proteins and, increas-ingly, the called ‘housekeeping’ genes that are constitu-tively expressed due to their constant requirement by thecell. In January 2009 more than 640 000 16S rRNA genesequences have been deposited in public databases (e.g.http://www.arb-silva.de/documentation/), 3% of whichoriginate from cultured prokaryotes. The database of the‘Living Tree Project’ (http://www.arb-silva.de/projects/living-tree/) compiles the almost complete 16S rRNA genesequences of more than 7000 type strains of Prokaryotesand is thus the most complete database of Bacteria andArchaea. The vast majority of 16S rRNA gene sequences(4600 000) in public databases originate from as yet un-cultured prokaryotes. These sequences are usually short(80–800 nucleotides), recovered mostly from clones andgradient gel bands of amplified rrn genes of environmentalDNA. See also: Bacterial Ribosomes; EvolutionaryDevelopmental Biology: Homologous Regulatory Genesand Processes; Housekeeping Genes; Ribosome Structureand Shape

Sequence Determination, SequenceAlignment and Determination ofSequence Similarities

Progress in the elucidation of phylogenetic relationshipsparallels the development of sequencing methods. Com-parative sequence analysis of 16S rRNA fragments wasintroduced by Carl Woese and collaborators more than 30years ago (Woese and Fox, 1977). The demanding and ex-pensive sequence analysis of T1-generated 16S rRNAoligonucleotides (16S/18S rRNA cataloguing), applied byonly a few scientists, led to the discovery of the kingdomArchaebacteria (now Archaea) as a third main line ofevolutionary descent and to the first main outline of theevolution of prokaryotes. Ten years later, with the intro-duction of reverse transcriptase sequence analysis, se-quence analysis of amplified copies of rRNA developedinto a routine method. Today, rRNA genes are amplifiedby the polymerase chain reaction (PCR) and this approachis used worldwide in identification and diagnosis of bac-terial species and in assessing the phylogenetic position ofnovel strains and the composition of complex microbialcommunities. See also: Archaea; Polymerase Chain Reac-tion (PCR)The main reason for the routine application of the PCR

method is the presence of a set of conservative nucleotidestretches, which are scattered over the rRNAgenes, servingas target sites of oligonucleotide primers (usually 14–20bases in length) (Figure 1); these primers are targets for am-plification and subsequent sequence analysis. Thus, a set ofnot more than 7 primers is usually sufficient to analyse thealmost complete gene sequence, a wide spectrum of phylo-genetically diverse organisms. For shorter amplificates,such as those recovered from gradient gels (e.g. DGGE –Denaturing Gradient Gel Electrophoresis), specific prim-ers are constructed. Sequence analysis can be performed onboth purified nucleic acid preparations and crude extractsof bacterial cells. Themultiple copies of rrn operons per cell(up to 14) may carry slight variations in length and/orcomposition of nucleotides, mostly, in the variable regionsof their primary structure. This heterogeneity may causeproblems in the elucidation of the sequence when bulkamplificates instead of cloned PCR amplificates are se-quenced. See also: Gel ElectrophoresisToday, the amplified DNA fragments are sequenced

directly by applying the chain terminationmethod, inwhichthe statistical introduction of a nucleotide analogue, such asdideoxynucleotides, allows them to compete with conven-tional nucleotides and cause base-specific termination ofthe elongation products. This results in populations ofsingle-stranded (ss)DNA fragments of different lengthssharing a common 5’ end (the primer). Automated DNAsequence analysis ismost conveniently carried out as a linearPCR cycle sequencing reaction. Novel sequencing tech-niques have been introduced (e.g. Solexa, http://www3.appliedbiosystems.com/; 454, http://www.454.com/), whichhave revolutionized the speed and lowered the costs of full

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net 3

genome sequences. With the advancement to sequence, thegenomes of all prokaryotic type strains (http://www.as-m.org/) high quality sequences of all rrn operons and othergenes of taxonomic and phylogenetic relevance will becameavailable in the near future. The sequences are then aligned,which means that homologous nucleotides derived from acommon position within the ancestral sequence are ar-ranged in columns, and consequently are recognized as be-ing identical or different (Stackebrandt and Rainey, 1995).See also: DNA Sequencing

Phylogenetic relationships can be assessed by pairwisesimilarities. One hundredper cent similarity foundbetweena pair of 16S rRNA gene sequences indicates very highrelatedness, if not identity of the investigated organisms.The lower the value the more unrelated the comparedorganisms. If, however, the number of organisms is toolarge and the sequences differ significantly in length andcomposition, the respective similarity matrix cannot beinterpreted meaningfully. In this case, phylogenetic rela-tionships can be visualized graphically by using algorithmsthat transform the similarity values into dissimilarity val-ues to compensate for superimposed (multiple) substitu-tions. These phylogenetic distances form the basis forphylogenetic trees or dendrograms. The most widely ap-plied treeing methods in the past were distance methods(e.g. neighbour-joining) but other approaches, such asmaximum parsimony and maximum likelihood methodsare now mandatory in scientific communications. Treetopologies are best tested by comparing the evolution ofdifferent phylogenetic markers with a similar degree of se-quence conservatism. The results of comparative analysesof other conserved molecules responsible for central func-tions such as elongation factors (associatedwith ribosomesand necessary for the elongation of the growing peptide), bsubunit of adenosine triphosphatase (ATPase) (used tocatalyse the breakdown of ATP to ADP+Pi) and RNApolymerases (necessary for the growth of mRNA strandsduring transcription) (Ludwig et al., 1993; Richert et al.,2007; Adekambi et al., 2008) do in general support thebranching pattern of organisms within major lines of de-scent as inferred by 16S rRNA gene analysis (Figure 2). Forpopulation studies on groups of phylogenetically closelyrelated strains, genes other than 16S rRNA are sequencedto more precisely determine the path of their evolution(Gevers et al., 2005). See also: Classification; MolecularPhylogeny Reconstruction; Universal Tree of Life

Recognition of the Higher Taxa ofProkaryotes

Prokaryotes do not constitute a coherent phylogeneticgroup of organisms but form two lineages, the Archaebac-teria (now Archaea) and the Eubacteria (now Bacteria),each of them being as unrelated to the 18S rRNA – rep-resenting the nucleus component of the eukaryotes – asthey were related among each other (Figure 3) (Woese and

Fox, 1977). Considering that the root of the tree is placedwithin the eubacterial lineage, the eukaryotes and Archae-bacteria appear to be specifically related. The distinctnessof the three major rRNA lines of descent was later con-firmed by analyses of the genes coding for some proteinsinvolved in the regulation of translation and in energy-yielding processes. The concept of these primary king-doms contrasted dramatically with the traditional, morewidely accepted five-kingdom classification, describingfour eukaryotic kingdoms (animals, plants, fungi and pro-tozoa) and a single prokaryotic kingdom (bacteria).However, in phylogenetic, molecular and cellular termsin the eukaryotic kingdoms are virtuallymuchmore closelyrelated than each of them is to any of the prokaryotickingdomsand the five-kingdomconcept does not recognizethe fundamental differences that distinguish the twogroupsof prokaryotes. See also: ProgenotesThe relationships among prokaryotes and the origin of

eukaryotic cellular organization from a prokaryotic an-cestor is one of the major unresolved problems in thehistory of life, and several theories of eukaryotic originsand their phylogenetic relationships with prokaryotes areunder discussion. Inferring deep divergences in the historyof life is difficult. AsMcInerney andWilkinson (2005) state‘there has been enough time for signal to be overwritten asnoise and for systematic biases to accumulate in the se-quence data that are usually used to reconstruct phylogeny.There has been enough time for hidden paralogy and hor-izontal gene transfer (HGT), both of which can yield in-correct species or genome trees evenwhen the gene trees arecorrectly inferred.’ The extend of HGT in early evolutionwas sufficiently extensive to ‘call into question the existenceof a genomic phylogeny, and some theories postulate theoccurrence of genomic fusion events, which are not ac-commodated in phylogenetic trees’. Hence, the elucidationof early phylogenetic events and the emergence of deeplyrooting lineages cannot be resolved by aminute fraction ofthe genome such as the rrn operons or components thereof.At ahigher level of higher relatedness, for example at and

above the level of phyla, these molecules have demon-strated their valuable potential to support the establish-ment of a hierarchic systematic system. Nevertheless, withthe release of genome sequences and the development of thephylotaxon concept, one should be prepared to witnessmajor changes at all hierarchic levels. See also: ProkaryoticSystematics: a Theoretical Overview

The domain concept

The three-kingdom classification was subsequently re-placed by the Domain concept (Woese et al., 1990). Thistaxon has been created at the highest taxonomic rank tohighlight the importance of the tripartite division of theliving world. The suffix ‘bacteria’ was omitted in the namesof the two prokaryotic domains to reflect their exclusiveevolutionary relationship. Instead the terms archaea, bac-teria and eukarya have been proposed. See also: History ofTaxonomy

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net4

Figure 2 Phylogenetic distribution of peptidoglycan amino acids as a salient chemotaxonomic property. Type strains ofMicrobacterium (M.) with ornithine in their peptidoglycan are indicated in red, those with

lysine are indicated in blue. Neighbour-Joining dendrograms of relationship of genes coding for 16S rRNA (1A), rpoB (1B), recA (1C), gyrB (1D), ppk (1E) and a concatenated dataset (1F), containing the gene

sequences that formed the basis for genes displayed in Figure 1A–1E. Bootstrap values of 430% are indicated at relevant branching points. Agromyces albus DSM 15934T served as a root. Scale bar indicates 2%

inferred nucleotide changes. Reprinted from Richert et al. (2007), www.elsevier.com/locate/sam. Copyright (2007) with permission from Elsevier.

Ph

ylog

eny

Based

on

16S

rRN

A/D

NA

ENCYCLO

PEDIA

OFLIFE

SCIENCES&2009,JohnWiley&

Sons,Ltd

.www.els.n

et

5

Sequence comparisons provide neutral genotypic meas-urements but they do not reveal differences in phenotype.If, however,molecular analyses point towards the existenceof major evolutionary groups, one would expect to findadditional profound differences at the molecular level thatare expressed at the phenotypic level. The most straight-forward approach in the determination of differences at theDNA level is analysis of whole genomes, but though morethan 2500 genome sequences are listed in public databases(e.g. http://www.genomesonline.org/), the information isstill too sparse to provide information on the large set oforganisms required for meaningful analysis. Presently,besides the comparison of a homologous gene, it is thepresence of certain phenotypic traits that confirms the sep-arateness of the two prokaryotic domains. Charactersshared between members of two of the three highest taxaare of no use for placing strains in a phylogeneticallycorrect taxon, but certain characters have been identifiedthat are domain-specific and hence are of diagnosticvalue. Among others, these are the chemical linkages andcompositions of amino acids and sugars in the cell wall(peptidoglycan (bacteria) versus pseudomurein, protein-acous and polysaccharide walls (archaea)), ester-linked(bacteria) versus ether-linked (archaea) lipids, the modifi-cation pattern of transfer ribonucleic acids (tRNAs) andthe resistance to antibiotics (Danson et al., 1992). See also:Archaeal Cell Walls; Bacterial Cell Wall; Transfer RNAModification

Both of the two prokaryotic domains contain a numberof more or less well-separated sublines of descent, formerly

named the phyla or divisions. With the introduction of thetaxon domain, within the archaea the phyla were describedas kingdoms.Most of themare evolutionarilymore ancientthan the presently described eukaryotic kingdoms but itcan be assumed that the number of kingdoms within thedomain eukarya will increase significantly once the lowereukaryotes have been investigated by extensive phyloge-netic analysis.

The kingdoms of the archaea

Although archaea are traditionally considered to be ext-remophiles, nonextremophile members of this domain areabundant in terrestrial and aquatic ecosystems, and con-tribute significantly to ecosystem function (Barns et al.,1996).Twomain lines, definedaskingdoms,have so far beendefinedwithin the domainArchaea: the Euryarchaeota, andtheCrenarchaeota. These taxa embracemost of the culturedand well-investigated species of archaea. The first kingdomcontains the physiologically defined methanogenic organ-isms.Related to themare nonmethanogenic taxa such as thewallless thermoplasmas, the extreme halophilic and alkali-philic archaea and the peculiar thermophilic Archaeoglobusfulgidus, an organism capable of forming methane and re-ducing sulfate. Also included is the thermophilic speciesThermococcus celer and the symbiotic methanogens thatserve as hydrogen sinks in many protozoa. The secondkingdom Crenarchaeota contains the hyperthermophilicarchaea, some of which grow optimally at temperatures upto 1058C and most of which require elemental sulfur for

Methanobacterium

Thermotoga

Domain Ar chaeaKingdomEuryarchaeota

Methanosarcina

Methanococcus

ThermococcusSulfolobus

Pyrodictium

Thermofilum

KingdomCrenarchaeota

KingdomKorarchaeota

pJP78

Cyanobacteria

Gram-positives

FibrobacterProteobacteria

Bacteroides ChlorobiumThermus

Aquifex

Microsporidia

Higher evolvedEukarya

Giardia

DomainEukar ya

DomainBacteria

5%

Figure 3 The phylogenetic tripartition of living organisms based on the analysis of genes coding for small subunit rRNA (16S and 18S rRNA). Within the

prokaryotic domainsmost of themain lines of descent, called kingdoms in the domain Archaea, and phyla in the domain Bacteria, are indicated. The origin of the

evolutionary lineages appears to be located close to the branching point of the lineage of the Bacteria (indicated by a circle). The bar represents 5 nucleotide

substitutions per 100 nucleotides.

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net6

optimal growth. This taxon is broadened phylogeneticallyby an enormous number of yet uncultured bacteria fromgeothermal hot springs but also from low to moderate tem-perature,marine and terrestrial environments, including thedeep sea. The tentative third kingdom Korarchaeota con-tains only a few uncultured hot spring and deep-sea ventorganisms, which shares features of both of the main phyla.As derived from16S rRNAgene sequences, itmay representevolutionary primitive life forms. Another tentative king-dom is Nanoarchaeota, containing the peculiar speciesNanoarchaeum equitans, which is an obligatory symbiont onthe archaeon Igniccoccus. Other recently detected archaealorganisms are only distantly related to any of these groups,such as the Archaeal Richmond Mine Acidophilic Nano-organisms (ARMAN), which were recently discovered.See also: Archaea: Diversity

The phyla within the domain bacteria

The phylogenetic structure of the domain bacteria is muchmore complex than that of the archaea and a formal king-dom structure has not yet been proposed. The reason forthe delay can be explained by the large number of phyla,some of which are separated by such small internode dis-tances that the precise order of their emergence in evolutioncannot be deduced (Olsen et al., 1994). The relationshipswithin the individual lines are often unexpected and sup-port for these groupings from the sharing of propertiesother than similarities in conserved genes is rare. With theintroduction of molecular diversity study, the richness ofbacterial diversity can now be fully appreciated. The mostsurprising outcome of the nonculture studies was the rec-ognition that by the need to rely on cultured organismsmicrobiologist probably missed the largest part of meta-bolic, chemical and phylogenetic diversity. Although in1997 the tree prokaryotic life consisted of 12 phyla definedby 16S rRNA similarities, only 10 years later 12 of the 36recognized phyla were defined by noncultured organisms(candidate phyla). In 2007 the number of candidate phyla(70) was more than two times higher than those of phyla ofcultured prokaryotes (30).

A few phyla – that is, those embracing Gram-positivebacteria, the Proteobacteria, the Bacteroides-Cytophagataxon, the spirochetes and the cyanobacteria – embrace themajority of described species and conventional isolationtechniqueswill enrichmembers of these taxa. Consequently,these phyla show a complex phylogenetic structure, causinghigh taxonomic attention in species-rich genera such asStreptomyces, Bacillus or Pseudomonas. Of the species-poorer phyla, some contain well-recognized taxa, such asChloroflexus, Chlorobium and Chlamydia, but most of thetaxa that have been identified to represent new and phylo-genetically ancient phyla have been isolated during the past20 years, such as the orders Aquifecales, Thermotogales,Planctomycetales, Gemmatimonadales, Verrucomicrobi-ales and the genera Fibrobacter, Acidobacterium, Deinococ-cus and Thermus and their relatives. Among these, the vastdiversity of as yet uncultured acidimicrobia and the

planctomycetes with their anaerobic ammonium oxidizingrelatives (ANAMMOX) have attracted attention of phys-iologists and ecologists. See also: Aquificales; Proteobacter-ia; The Genus Bacteroides; ThermotogalesThe most convincing branching pattern is that of the

Gram-positive bacteria. The presence of two major subdi-visions correlates nicely with the distribution of the DNAG+C content of their members (i.e. the lowG+C ‘clostri-dial’Firmicutes subline and the highG+C ‘actinobacteria’subline). The classActinobacteriawas the first example of afully hierarchical, phylogeny-based classification systemabove the genus level within the domain bacteria. Here, arich pattern of chemotaxonomic properties has facilitatedthe circumscription of genera. Within the Firmicutes, thetaxonomic structure of the aerobic sporeformers (Bacill-ales) has been revised whereas significant differences stillexist between the formal taxonomic structure and theirphylogenetic placement of anaerobic sporeformers (Clost-ridiales). The situation within the class Proteobacteria hasbeen largely resolved in the latest edition of Bergey’s Man-ual (Garrity et al., 2005). These organisms lack the richspectrum of chemotaxonomic properties found among theGram-positive bacteria and only broad taxonomic groupscan be circumscribed by the structure of pigments, polya-mine patterns, ubiquinone types, fatty acid composition,chemical composition of lipid A and the core region of thelipopolysaccharides. Chemotaxonomy is only slowly win-ning ground on those taxa in which morphological andphysiological properties have traditionally played the ma-jor role in defining genera.See also: Actinobacteria; Gram-type Positive Bacteria

Polyphasic Approach to BacterialSystematics

Microbiologists should be aware that the available phylo-genetic branching pattern reflects the actual situation innature quite incompletely. Phylogenetic reconstructionsare based on inferred homologies from only a few mole-cules and thus can be considered, at best, an approximation(Stackebrandt et al., 2002). The gradually emerging 16SrRNA tree is probably best considered as presenting a hy-pothesis about relationships which should be tested byother data. To maintain stability in systematics the de-scription of the twomost important levels in taxonomy, thespecies and the genus, should be based on a combination ofproperties; this strategy is referred to as the ‘polyphasicapproach to taxonomy’ (Vandamme et al., 1996; Rosello-Mora and Amann, 2001). In this approach the phylogene-tic branching pattern serves as an aid to recognize clustersof phylogenetically related strains but the delineation ofphylogenetically neighbouring clusters is predominantlymade on the basis of morphology, biochemical propertiesand episematic molecules. These include the chemicalstructure of certain cell constituents, such as fatty acids,lipids, peptidoglycan, polyamines, isoprenoid quinones,

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net 7

but also selected nucleotides of the 16S rRNA and othergene sequences. See also: Bacterial Origins

Above the genus level (family!order!class!kingdom(phylum)) the availability of phenotypic properties for thecircumscription of taxa is poor and descriptions are basedmainly on sequence information (Stackebrandt, 2006). Thetreemirrors the presence of certain categories, for example,many of the deeply rooting phyla within the domain bac-teria, which are emerging constantly, no matter whichmolecule and method used. However, the isolated positionof these groups may disappear as more organisms are in-vestigated. Themain advantage of the phylogenetic systemlies in its stability; only the rank (either vertically or hor-izontally) within a hierarchical structure, but not its placewithin a higher taxon will change.

The Taxonomic Rank ‘Species’ inBacteriology

The biological species concept (BSC), which was originallyformulated for ‘higher’ eukaryotes, describes groups of in-terbreeding natural populations which are reproductivelyisolated from other such groups. Gene flow does occur be-tween prokaryotes but it occurs by different mechanismsand has different effects compared to those in sexualeukaryotes (Cohan, 2002; Gogarten et al., 2002). Pro-karyotes do not produce gametes and do not undergo me-iosis. Consequently, any concept such as the BSC whichrequires these characteristics (i.e. reproduction linked tosex) cannot be applied. Gene transfer among individuals ofdifferent bacteria varies in frequency and the rate of re-combination may vary among loci encoding proteins ofdifferent types. Prokaryotic species may evolve in a definedecological niche but their enormous ability tomove aroundin the environment and to explore new niches makes themprone to new kinds of competition, stress, exchange of ge-netic material and hence to rapid changes of the genotypeand phenotype at the strain level (Achtman and Wagner,2008). See also: Bacterial Genetic Exchange; SpeciesConcepts

These characteristics mean that prokaryotic taxono-mists face the problem of seeing how bacterial populationgenetics can provide a consistent biological species conceptfor prokaryotes. To define bacterial species, bacterial tax-onomists have traditionally used approaches based onoverall resemblance (i.e. phenetic relationship). Methodshave been used to analyse a group of strains for manyunweighted characters, applying statistical analysis(numerical phenetic taxonomy) to identify phenotypicallyhomogeneous clusters which, by convention, are thencalled ‘species’. The higher the overall resemblance amongstrains of a species and the more discriminating theseproperties from other species, the more convincing the va-lidity of the description.Although the status of a ‘species’ isdifficult to justify from a broad conceptual viewpoint, thislabel do play a vital role in clinical settings, the food-

processing industry, agriculture, bioremediation, environ-mental science, biosafety and especially in public health,where in the case of prokaryotic pathogens, species arehistorically defined on the basis of the disease they cause,regardless of other ecological or evolutionary considera-tions. As a consequence, species demarcation in pro-karyotes is not defined by a theory-based concept, andtends to be more arbitrary, rooted in practical necessity.The status of ‘species’ confers little evolutionary and ec-ological meaning in prokaryotes in general.With the development of DNA:DNA reassociation as-

says, in which genomic similarity was measured, a newquality of assessment of relatedness was introduced. Nu-cleic acid pairing assays have the advantage that theytheoretically sample a very large number of genomiccharacters, that is, the reassociation between short DNAfragments. Bias is introduced by the presence of largeplasmids and differences in genome size of the pairingpartner DNA and different methods may give higher orlower values for the same strains.Allmethods are subject tosignificant experimental error, typically in the region of afew per cent, but these deviations will not change the gen-eral picture of genomic relatedness (Rosello-Mora, 2005).The greatest problem, however, is that it is not possible tobuild a cumulative database for hybridization values.The current definitionof a prokaryotic species is a cluster

of highly related strains as revealed by the quantitativeassessment of similarity amongst the primary structure oftheir DNA; the strainsmust also be phenotypically similar.A guideline of 70% or greater DNA:DNA reassociationwith high thermal stability of heterologous duplexes hasbeen recommended as the boundary value for strains in thesame species (Wayne et al., 1987). A value of 70% DNAsimilarity as determined by hybridization corresponds toabout 96% genome identity as determined by sequenceanalysis. This level is based on the empirical observationthat strains of bacteria which are highly related phenotyp-ically often share at least this amount of DNA:DNAreassociation. See also: Cot Analysis: Single-copy versusRepetitive DNAThe level of DNA:DNA reassociation between strains

of established taxonomic species can be greater than 70%but is seldom much lower. For example, the two speciesNeisseria gonorrhoeae and Neisseria meningitidis shareover 74% DNA:DNA reassociation but are kept as sep-arate species because they can be phenotypically distin-guished and because they cause different importantdiseases. Another well-known case is the genera pair Es-cherichia and Shigella, whose species cannot be distin-guished genomically, except for a few genes that cause theirdifferent pathogenicities. Strains of a species that share lessthan 70% DNA similarity are not reclassified as new spe-cies when the new taxon lacks any phenotypic propertiesthat would allow taxonomists to recognize the entity as anew species.Where does rRNA gene sequence similarity fit into this

discussion? Among highly related strains the correlationbetween DNA:DNA similarity and rRNA gene similarity

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net8

is not linear, in many examples there is actually very littlecorrelation. A compilation of data obtained with the samepairs of randomly selected strains have shown that despitehigher than 99% sequence similarity the correspondingDNA:DNA similarity was lower than 40% (Stackebrandtand Goebel, 1994). This lack of correspondence is not sur-prising when it is considered that the overall nucleotidesimilarity of the genome is compared, and that this repre-sents a mosaic of slowly to rapidly evolving genes withsingle genes usually slowly evolving. It can be assumed thatevolutionary changes occurring at the DNA level will onlybe mirrored by changes at the rRNA gene level after mil-lions of years. Calculations for 16S rRNA gene sequencesof prokaryotic symbionts, for which the age of the host andthus the approximate invasion of the host by the symbiontis known, reveal that 40–50 million years are requiredbefore the 16S rRNA primary structure is altered by1% (approximately 15 nucleotides) (Ochman et al., 1999).During such a period, prokaryotic strains may altersignificantly with respect to their physiological and bio-chemical properties. Experience has shown that the vastmajority of intraspecies 16S rRNA gene sequence similar-ities are 100%. See also: Coevolution: Host–Parasite

Another phenomenon that prohibits the sole use of 16SrRNA gene sequence similarity to define species is thefinding that this gene evolves at a different rate in differentgroups of prokaryotes. The time that elapsed for changingthe 16S rRNA gene primary structure of two species of thegenus A species by 2%may be significantly different to thetime that is needed for a 2% change in the homologousmolecule of two genus B species. It is therefore expectedthat these two genera also differ in the extent of their overalltaxonomic properties. See also: Molecular Evolution:Patterns and Rates

At the beginning of the 1990s with the avalanche of 16SrRNA gene sequences being set off, it could be demon-strated on the basis of a limited dataset that DNA reas-sociation values were lower than 70% at and below athreshold value of 98.5% gene sequence similarity. To re-duce the workload involved in DNA–DNA reassociationexperiments, it was suggested perform reassociation ex-periments only for those strains which shared gene se-quence similarities higher than about 97.0% (Stackebrandtand Goebel, 1994). Being cited more than 2000 times since1994 (http://www.scopus.com/scopus/home.url), the de-marcation value indeed turned out to be a railing for sci-entists and referees. A survey on correlation data in 2005led to the recommendation to use a 16S rRNA gene se-quence similarity threshold valueof 98.5%atwhichDNA–DNA reassociation experiment should be mandatory fortesting the genomic uniqueness of (a) novel isolate(s)(Stackebrandt and Ebers, 2006).

One cannot expect a single gene, the 16S rRNA gene,to solve all problems in prokaryotic systematics and tax-onomy. This molecule has had (and still has) a greater im-pact on our understanding of the history of living forms onthe planet Earth than any other molecule analysed. rRNAand the rRNA-encoding genes have laid the path for

molecular evolution, molecular phylogeny, molecular clas-sification, molecular identification and diagnosis and formolecular microbial ecology. Recent insights into thephylogenetic structure of highly related strains of a speciesmy multi-locus sequence analysis did notice a subspecieslevel organization, that traditional taxonomy fail tosee. See also: Molecular Ecology; Molecular Evolution:Techniques

References

Achtman M and Wagner M (2008) Microbial diversity and the

genetic nature of microbial species. Nature Reviews Microbiol-

ogy 6: 431–440.

Adekambi T, Shinnick TM, Raoult DL and Drancourt M (2008)

Complete rpoB gene sequencing as a suitable supplement to

DNA–DNA hybridization for bacterial species and genus de-

lineation. International Journal of Systematic and Evolutionary

Microbiology 58: 1807–1814.

Barns SM, Delwiche CF, Palmer JD and Pace NR (1996) Per-

spectives on archaeal diversity, thermophily and monophyly

from environmental rRNA sequences. Proceedings of the Na-

tional Academy of Sciences of the USA 93: 9188–9193.

Cohan FM (2002) What are bacterial species? Annual Review of

Microbiology 56: 457–487.

Danson MJ, Hough DW and Lunt GG (1992) The Archaebac-

teria: Biochemistry and Biotechnology. Biochemical Society

Symposium 58. London: Portland Press.

GarrityGM,BrennerDJ,KriegNRandStaley JT (2005)Bergey’s

Manual of Systematic Bacteriology, vol. 2. The Proteobacteria.

New York: Springer.

Gevers D, Cohan FM, Lawrence JG et al. (2005) Opinion: re-

evaluating prokaryotic species.Nature ReviewsMicrobiology 3:

733–739.

Gogarten JP, Doolittle WF and Lawrence JG (2002) Prokaryotic

evolution in light of gene transfer. Molecular and Biological

Evolution 19: 2226–2238.

Ludwig W, Neumaier J, Klugbauer N et al. (1993) Phylogenetic

relationships of bacteria based on comparative sequence anal-

ysis of elongation factor TU and ATP-synthase b-subunitgenes. Antonie van Leeuwenhoek 64: 285–305.

Maidak BL, Larsen N, McCaughey MJ et al. (1994) The Rib-

osomal Database Project. Nucleic Acids Research 22: 3485–

3487.

McInerney JO and Wilkinson M (2005) New methods ring

changes for the tree of life. Trends in Ecology and Evolution 20:

105–107.

Ochman H, Elwyn S andMoran NA (1999) Calibrating bacterial

evolution.Proceedings of theNationalAcademy of Science of the

USA 96: 12638–12643.

Olsen GJ, Woese CR and Overbeek R (1994) The winds of

(evolutionary) changes: breathing new life into microbiology.

Journal of Bacteriology 178: 1–6.

Richert K, Brambilla E and Stackebrandt E (2007) The phylo-

genetic significance of peptidoglycan types: Molecular analysis

of the genera Microbacterium and Aureobacterium based upon

sequence comparison of gyrB, rpoB, recA and ppk and

16SrRNA genes. Systematic and Applied Microbiology 30:

102–108.

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net 9

Rosello-MoraR(2005)DNA–DNAreassociationmethodsapplied

to microbial taxonomy and their critical evaluation. In: Stacke-

brandt E (ed.) Molecular Identification, Systematics, and Popu-

lation Structure of Prokaryotes, pp. 23–50. Heidelberg: Springer.

Rosello-Mora R and Amann R (2001) The species concept for

prokaryotes. FEMSMicrobiology Reviews 25: 39–67.

Stackebrandt E (2006)Defining taxonomic ranks In: DworkinM,

FalkowS,RosenbergE, SchleiferKHandStackebrandt E (eds)

The Prokaryotes, 2nd edn, vol. 1. New York: Springer.

Stackebrandt E and Ebers J (2006) Taxonomic parameters revis-

ited: tarnished gold standards.Microbiology Today 4: 152–155.

Stackebrandt E and Goebel BM (1994) A place for DNA–DNA

reassociation and 16S rRNA sequence analysis in the present

species definition in bacteriology. International Journal of Sys-

tematic Bacteriology 44: 846–849.

Stackebrandt E, FrederiksenW, Garrity GM et al. (2002) Report

of the ad hoc committee for the re-evaluation of the species

definition in bacteriology. International Journal of Systematic

and Evolutionary Microbiology 52: 1043–1047.

Stackebrandt E and Rainey FA (1995) Partial and complete 16S

rDNA sequences, their use in generation of 16S rDNA phylo-

genetic trees and their implications in molecular ecological

studies. In: Akkermans ADL, van Elsas JD and de Bruijn FJ

(eds) Molecular Microbial Ecology Manual. Amsterdam:

Kluwer Academic Publishers.

VandammeP,PotB,GillisM et al. (1996) Polyphasic taxonomy, a

consensus approach to bacterial systematics. Microbiological

Reviews 60: 407–438.

Wayne L, BrennerDJ, Colwell RR et al. (1987) International Com-

mittee on Systematic Bacteriology: report of the ad hoc commit-

tee on reconciliation of approaches to bacterial systematics.

International Journal of Systematic Bacteriology 37: 463–464.

WoeseCR (1987)Bacterial evolution.Microbiological Reviews 51:

221–271.

Woese CR and Fox GE (1977) Phylogenetic structure of the pro-

karyotic domain: the primary kingdoms. Proceedings of the

National Academy of Sciences of the USA 74: 5088–5090.

Woese CR,Kandler O andWheelisML (1990) Towards a natural

system of organisms. Proposal for the domains archaea, bac-

teria, and eucarya. Proceedings of the National Academy of Sci-

ences of the USA 87: 4576–4579.

Zuckerkandl E and Pauling L (1965) Molecules as documents of

evolutionary history. Journal of Theoretical Biology 8: 357–366.

Further Reading

AtlasRM(1997)Microbial systematics, evolution, phylogeny and

classification. In: Atlas RM (ed.) Principles of Microbiology,

2nd edn, pp. 888–951. Chicago: Wm.C. Brown Publishers.

Doolittle WF (1999) Phylogenetic classification and the universal

tree. Science 284: 2124–2129.

Dworkin M, Falkow S, Rosenberg E, Schleifer K-H and Stacke-

brandt E (eds) (2006) The Prokaryotes. A Handbook on the Bi-

ology of Bacteria: Ecophysiology, Isolation, Identification,

Applications, 3rd edn. New York: Springer.

Golding GB and Gupta RS (1995) Protein-based phylogenies

support a chimeric origin for the eukaryotic genome.Molecular

Biology and Evolution 12: 1–6.

GoodfellowM and O’Donnell AG (1993)Handbook of New Bac-

terial Systematics. London: Academic Press.

Konstantinidis KT and Tiedje JM (2005) Towards a genome

based taxonomy for prokaryotes. Journal of Bacteriology 187:

6258–6264.

Lake JAandRiveraMC (2004)Reconstructing evolutionary trees

from DNA and protein sequences: paralinear distances. Mo-

lecular and Biological Evolution 21: 681–690.

LiW-HandGraurD (1991)Fundamentals ofMolecularEvolution.

Sunderland: Sinauer Associates.

LudwigW and Klenk H-P (2001) Overview: a phylogenetic back-

bone and taxonomic framework for prokaryotic systematics.

In: Boon DR, Castenholz RW and Garrity GM (eds) Bergey’s

Manual of Systematic Bacteriology, 2nd edn, vol. 1, pp. 49–65.

New York: Springer.

Pace NR (1997) A molecular view of microbial diversity and the

biosphere. Science 276: 734–740.

Priest FG, Ramos-Cormenzana A and Tindall BJ (eds)

(1995) Bacterial Diversity and Systematics. New York: Plenum

Press.

Stackebrandt E (1992) Unifying phylogeny and phenotypic prop-

erties. In: Balows A, Truper HG, Dworkin M, Harder W and

SchleiferKH (eds)TheProkaryotes, AHandbook on the Biology

of Bacteria: Ecophysiology, Isolation, Identification, Applica-

tions, 2nd edn. New York: Springer.

Woese CR, Gutell R, Gupta R and Noller H (1983) Detailed

analysis of the higher order structure of 16S like ribosomal ri-

bonucleic acids. Microbiological Reviews 47: 621–669.

Yarza P, Richter M, Peplies J et al. (2008) The All-Species Living

Tree project: A 16S rRNA-based phylogenetic tree of all se-

quenced type strains. Systematic and Applied Microbiology 31:

241–250.

Zeigler DR (2003) Gene sequences useful for predicting related-

ness of whole genomes in bacteria. International Journal of Sys-

tematic and Evolutionary Microbiology 53: 1893–1900.

Phylogeny Based on 16S rRNA/DNA

ENCYCLOPEDIA OF LIFE SCIENCES & 2009, John Wiley & Sons, Ltd. www.els.net10