phylodynamics the use of phylogenetics in … use of phylogenetics in epidemiology ... demographic...

67
Phylodynamics The use of phylogenetics in epidemiology CONOR MEEHAN UNIT OF MYCOBACTERIOLOGY BIOMEDICAL SCIENCES

Upload: ngokhanh

Post on 28-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

PhylodynamicsTheuseofphylogeneticsinepidemiology

CONORMEEHANUNITOFMYCOBACTERIOLOGY

BIOMEDICALSCIENCES

BIOMEDICALSCIENCES

Masir andCaetano-Anollés,ScienceAdvances2015https://en.wikipedia.org/wiki/Three-domain_system

Bacteria/Virusdiversity

(nogood2domainimage,seeTomWilliamswork)

Beginningsofepidemiology

BIOMEDICALSCIENCES

JohnSnow

FatherofinfectiousdiseasesepidemiologyDemographicinformationScientificmethods

CholeraoutbreakinLondon1854

BIOMEDICALSCIENCES

4

CholerainLondon

DotmapandVoronoidiagram

CenteredonpumpNearcesspit

ShowedconnectionbetweenwatercompaniesandfatalitiesDidn'tleadtoacceptance

Germtheory1864

BIOMEDICALSCIENCES

5

Molecularepidemiology

BIOMEDICALSCIENCES

Terminology

MolecularepidemiologyTheinterfacebetweenmolecularbiologyandepidemiologyContributionofgeneticandenvironmentalfactorstopathogenspread

PhylodynamicsTheinterfacebetweenevolutionarybiologyandmolecularepidemiologyEstimatingpathogenevo/epiparametersfromphylogenies

MutationratesTransmissionratesandchains(R0)Populationdynamics

BIOMEDICALSCIENCES

7

Genome-basedphylogenetics/phylodynamics

PhylogenyprogramsbuiltwithassumptionofsinglegeneinputAlldatapresent

Wholegenomedatacanbeinputin2ways:SNPalignment

HaveanascertainmentbiasSelectedonlythevariablesites

BreakthecalculationsWholegenomealignments

LargeamountsofdataIfcombinedwithlargenumberoftaxacanbecomputationallytooexpensive

BIOMEDICALSCIENCES

8

BIOMEDICALSCIENCES

9

Thepowerofcode

LargedatasetsRequirelotsofstepsandcomputingpowerManualprocessingofthousandsofgenomesisnotfeasible

UNIXpipelinesLoopingonmultiplesets(folders/files)E.g.Assembleallgenomesinthesameway

CodinglanguagePython/Perl/C++/othersProcessingpipelinesE.g.Createwholegenomealignmentsandinputsfortreebuilding

Cloud/ServercomputingServerusagerequiresUNIXknowledgeAmazon,CyVerse,university/national

BIOMEDICALSCIENCES

10

Wholegenomeconversion/DNAreconstitutionmethod

BIOMEDICALSCIENCES

11

Wholegenomealignment

Variablesites Constantsites

CountsofACGTSNPalignment

Phylogeneticmethod

SinglesitecalculationX

CountofbaseCompletecalculations

Leachéetal.SysBio2015;StamatakisascertainmentbiascorrectioninRAxML

(assumesnorecombination)

WholegenomephylodynamicsTheuseofphylogeneticsinepidemiology

CONORMEEHANUNITOFMYCOBACTERIOLOGY

BIOMEDICALSCIENCES

SourcetrackinginMRSA

BIOMEDICALSCIENCES

MRSA

Methicillin-resistantStaphylococcusaureusAnySAresistanttoβ-lactamantibiotics

Primarilyhospitalrelatedinfections(HA-MRSA)Nowalsofoundinthecommunity(CA-MRSA)Alsoinlivestockanddomesticanimals(LA-MRSA)

PrimarilywoundinfectionSurgicalandnon-surgicalSkintoskin/infectedobjecttransmission

BIOMEDICALSCIENCES

14

BIOMEDICALSCIENCES

BIOMEDICALSCIENCES

Setting

BabyunitinUKhospitalInfectioncontrolunitscreensallbabiesforS.aureuscarriageusinganasopharyngealswab‘Outbreak’wasdetectedwith3patientscolonizedwithbacteriawiththesameantibioticresistanceprofileReviewedmicrobiologicalrecordsforotherpatientswithS.aureuswithsameresistanceprofile:8morepatientsmatched

BIOMEDICALSCIENCES

17

Outbreaktracking

BIOMEDICALSCIENCES

18

Extensiontocontacts

BIOMEDICALSCIENCES

19

Extensiontohealthcareworkers

BIOMEDICALSCIENCES

20

Joiningofclassicalandmolecularepidemiology

WGScontributedto:

Identifyingtheextentoftheoutbreakwithinhospitalandcommunity

Identifyingthehospitalworkerwholikelyre-introducedMRSAafterdeepcleaningoftheward

Followedbytreatment:endofoutbreak?

BIOMEDICALSCIENCES

21

DeterminingtransmissionroutesinHIV

BIOMEDICALSCIENCES

WheredidHIVcomefrom?

Lentivirusesareknowntoinfectseveralspeciesofprimatesinsub-SaharanAfricaTreeconstructedcontainingsequencesfromSIVandbothHIV-1andHIV-2HIV-1likelyaroseinwesternequatorialAfricaHIV-2likelyaroseinWestAfrica

Primarilyconfinedtheretoo

BIOMEDICALSCIENCES

23WertheimandWorobey,PLOSCompBio2009

WhendidHIVgettohumans?

BIOMEDICALSCIENCES

24WertheimandWorobey,PLOSCompBio2009

EarlyestimationsofHIV-1divergencefromSIVcpzdatedthiseventas~1960(Lietal1988,Mol.Biol.Evol.)Reanalysisfoundthatthisestimationusedtoosimpleamodelofnucleotidesequenceevolution

HIVanalysisusuallyestimatedundertheGTRmodel

Datesspreadthroughoutearly20th century

HowdidHIVgettohumans?

BIOMEDICALSCIENCES

25

Anearlyhypothesis,outlinedin‘TheRiver’byEdwardHooper(1999)suggestedaacontaminatedoralpoliovaccine(OPV)usedinthe1950’sTheotherleadinghypothesisisthatblood-to-bloodtransmissionoccurredfrombutcheredprimatemeattohuntersMolecularevidencewasgatheredbySharpetal (2001,Phil.Trans.R.Soc.Lond.B)toreviewtheseclaims

OPVtrialchimpanzeeswerenotthesameasthosesuggestedtoberesevoirforSIVTheoriginsweredatedas~1931,not1950’

Althoughthebushmeathypothesiscannotbedirectlyproven,phylogeneticandmolecularanalysislendstrongsupportagainsttheOPVhypothesis

HowdoesHIVtransmitfromhumantohuman?(Quiztime)

BIOMEDICALSCIENCES

HIVtransmissionroutes

SexualcontactAnal(1.43%;(0.62%/0.11%))Vaginal(0.08%;0.04%)Oral(extremelylowbutnotzero)

Blood-borneUnsterilizedpre-usedneedles(0.15-10%;contextdependant)Bloodtransfusions(90%)

Mothertochild15-30%frompregnancy/delivery5-20%frombreastfeeding

Needstocontacttheblood,can’tpassthroughepithelialcellsRiskfactorscanincrease/decrease

Viralload,otherSTIs,tearing,anti-retroviraltreatment

BIOMEDICALSCIENCES

27

PhylogeneticsandcriminalprosecutionofHIVtransmission

IntentionalornegligenttransmissionofHIVcanresultinchargesofassault,manslaughterormurderinseveralcountriesTwothingsoftenmustbeprovenforthis:

ThedefendantwasrecklessThedefendantinfectedthecomplainant

IntheUKitwasrequiredthatscientificevidencemustbeusedtoproveinfection,evenifapleaof‘guilty’wasentered

Phylogeneticsisoftenusedinthisstep

PhylogeneticsisoftenrequiredtoproverecklessnesstooTimeofinfectionmustbeafterthedefendantbecameawareoftheirstatusandbeforethecomplainantbecameawareofthedefendant’sstatus

BIOMEDICALSCIENCES

28

PhylogeneticsandcriminalprosecutionofHIVtransmission

Firstusedin1990inacaseofadentistinfectingseveralpatientsthoughthiscaseneverwenttocourt

FirstusedinacriminalrapecaseinSwedenin1992,thoughdirectionalitywasnotdetermined

In2002phylogeneticanalysiswasusedtoupholdaconvictionduringappealbyagastroenterologistinthe2nd degreemurderchargeofhisgirlfriendafterithadbeenfoundtomeetstandardsofevidenceadmissibility

BIOMEDICALSCIENCES

29

PhylogeneticsandcriminalprosecutionofHIVtransmission

Lemey etal.“MoleculartestingofmultipleHIV-1transmissionsinacriminalcase”,AIDS19(15),2005Onesuspectandsixvictims2samplesfromeachperson,anonymouslylabelledandsequencedforpolandenvfragments30controlstakenfromlocalhospitalfittingascloselytotheage,riskandgeographicalparametersasthesuspect/victimsandfromaroundthesametimeofallegedtransmissionaspossiblePhylogenetictreesbuiltunderMLusing3methodsandalsousingBayesianinference

Sitesknowntoinferdrugresistancewereexcludedtopreventclusteringbasedondrugregimes

BIOMEDICALSCIENCES

30

EvidenceofHIVtransmission

Demonstratedgroupingofsuspectandvictimsamples,monophyletictotheexclusionofcontrols

Noinferencewasmadeaboutdirectionality(usuallyindicatedbyparaphyleticrelationshipofsourcesequencesaroundrecipientsequencesinatimetree)Cannotruleoutcaseofbothsuspectandvictiminfectedbya3rd personorsuspectinfectingapersonwhoinfectedvictimsLocalcontrolselectioniscritical

BIOMEDICALSCIENCES

31

EstimatingepidemicoriginsinInfluenza

BIOMEDICALSCIENCES

Influenza

SeasonalinfectionFever,musclepains,headache,coughing,nasaldischarge250-500kdeathsayear

CausedbyInfluenzavirusThreetypes(A-C)

AcausesallpandemicsSerotypesbasedonhemagglutinin(H/HA)andneuraminidase(N/NA)

E.g.InfluenzaAH1N1(”Spanishflu”or‘Swineflu”)

BIOMEDICALSCIENCES

33

Trackinganinfluenzaoutbreak

TheH1N1(swineflu)influenzastrainwasfirstidentifiedinApril2009

Withinafewmonthsitreachedpandemicproportions

PhylogeographicanalysisWheredidtheoutbreakstart?Howandwhendiditspread?

BIOMEDICALSCIENCES

34

Trackinganinfluenzaoutbreak

Lemeyetal (2009),“Reconstructingtheinitialglobalspreadofahumaninfluenzapandemic”PLOScurrents242sequences

HAandNAgenesequences40locationsworldwide30thMarchto12thJuly2009

BayesianframeworkHKY+gammamodelRelaxedmolecularclockBSSVSmodelofspatialdiffusion

Bayesianstochasticsearchvariableselection7discretelocationsaspriorsAllowMCMCtoassignlocationprobabilitiestointernalnodes

BIOMEDICALSCIENCES

35

BIOMEDICALSCIENCES

36Lemeyetal.PLoS Curr.2009

PhylogeneticreconstructionindicatesMexicoasthelikelyoriginofthevirus

SeveralUSAstrainswereseededearlyintheoutbreak

MostEuropeanlineagescamefromUSAstrains,notMexico

OriginandspreadofH1N1

BIOMEDICALSCIENCES

37

TheemergenceofMycobacteriumulceransinAfrica

BIOMEDICALSCIENCES

Mycobacterium genus

ThegenusMycobacteriumincludesmanyimportanthumanpathogensM.tuberculosis(TB)M.ulcerans (Buruliulcer)M.leprae (Leprosy)

Allothermycobacteriaaretermednontuberculousmycobacteria(NTMs)

PrimarilyenvironmentalManyemergingopportunisticpathogens

BIOMEDICALSCIENCES

39

Mycobacteriumulcerans

CausitiveagentofBuruliUlcer(BU)~6000casesayear(declining)CausesskinulcerationandsometimesboneinvolvementM.ulceransproducesatoxin,mycolactone,whichdamagestissueWARNING:photos!

BIOMEDICALSCIENCES

40

Mycobacteriumulceransgeographicspread

BIOMEDICALSCIENCES

41

Mycobacteriumulceranstransmission

CurrentlyunknownNotdirectlyhumantohumanProximitytoslowflowing/stagnantwaterPrevailinghypothesis:

Environmentalspeciesthatinfectsaftermicrotrauma

DohumansplayaroleinthespreadofMU?

BIOMEDICALSCIENCES

42

Aimsanddataset

WhatisthepopulationstructureandevolutionaryhistoryofMUinAfrica?Vandelannooteetal.GBE.2017165isolates

1964-2012MostendemicAfricancountriesPapuaNewGuineaoutgroupIlluminareadsassembledwithSnippypipelineSNPalignment

Recombinationfree9,193SNPs

BIOMEDICALSCIENCES

43

Maximumlikelihoodreconstruction

RAxMLv8.2GTRCATwithDNAreconstitutionascertainmentbiascorrection(Stamatakismethod)RoottoTipdistancecalculationsandcorrelationwithTreeStatandR

MRCA:12226

BIOMEDICALSCIENCES

44

BIOMEDICALSCIENCES

45

Bayesianreconstruction

BEAST2Testedclockandpopulationmodelcombinations(Pathsampling)

Uncorrelatedlognormalandconstantcoalescentfoundtobebest

Testedfortimesignalwithpermutationtestsandprioradjustments

Mutationrate:6.32E-8/site/year[3.90E-8- 8.84E-8]0.33SNPs/chromosome/year[0.20- 0.46]

Introductionscoincidewithcolonisation

BIOMEDICALSCIENCES

46

ThespreadofMycobacteriumulcerans inAfrica

SlowevolvingbacteriumOneoftheslowestratesrecordedClonalexpansionwithnorecombination

MultipleintroductionsEachmajorlineageintroducedseparatelyLikelybeganinSouth-EastAsia(stillbeconfirmed)ExactplaceoffirstintroductionintoAfricanotknown(perhapscentral)

PotentiallyspreadbyhumansInfectedAfricansmovedtonewareaduring’ScrambleforAfrica’Shedintowaterwhichtheninfectsnewhosts

Willtreatmentofhumansdeclinetheenvironmentalpopulationtoo?Populationmodellingsuggestsyes

BIOMEDICALSCIENCES

47

EstimatingtransmissiondynamicsofEbola

BIOMEDICALSCIENCES

EstimatingtherateofinfectionofEbola

The2013WestAfricanEbolavirusepidemicspreadprimarilythroughGuinea,SierraLeoneandLiberiaandkilledover11,000people

EstimatedthatstrainbeganatafuneralinGuineaisDecember2013

PhylogeneticanalysisshowsMRCAoftheoutbreaktobelateFebruary2014with2strainsintroducedtoSierraLeone

BIOMEDICALSCIENCES

49Stephen K. Gire et al. Science 2014;345:1369-1372

EstimatingtherateofinfectionofEbola

Multiplebirth-deathmodelapproacheswereusedontheSierraLeonesequencestoestimateepidemiologicalparametersacrossaBayesianphylogenyofthesequencesHere,birthistherateoftransmissionfromaninfectiouspersonanddeathistherateofbecomingnon-infectiousthroughrecoveryordeath

BIOMEDICALSCIENCES

50StadlerTetal.PLOSCurrentsOutbreaks.2014

EstimatingtherateofinfectionofEbola

R0:~2.18(range1.24- 3.55)Incubationtime:~4.92daysInfectiousperiod:~2.58daysThus,onaverage2peoplewillbeinfectedbyeveryinfectedindividualThisislowwhencomparedtosomeothercommonpathogens.E.g.:

Influenza:2-3HIV:2-5Measles:12-18

BIOMEDICALSCIENCES

51StadlerTetal.PLOSCurrentsOutbreaks.2014

ConorMeehan [email protected]

Bacterialevolutionandclassification

DIDYOUJUSTASSUMEMYHISTORY?

BIOMEDICALSCIENCES

Lateralgenetransferandphylogenetics

BIOMEDICALSCIENCES

LateralgeneTransfer(LGT)

Alsocalledhorizontalgenetransfer(HGT)Firstobservedbetweenpneumococciinmice3mainways:

TransformationUptakeofnakedDNAOftenlimitedtospecificenvironmentalcuesEstimated~1%ofknownspecies

ConjugationInvolvesthetransferofplasmidsManyplasmidsarehighlypromiscuous

TransductionInvolvesanintermediatephageRampantevidenceinnearlyallprokaryoticgenomes

BIOMEDICALSCIENCES

55

©!!""#!Nature Publishing Group!

!

1 Entry into the transfer process• Release of naked DNA

• Packaging into phage particle• Presence of pac sites• Interaction with mating-pair formation apparatus• Integration of plasmid into chromosome

3 Uptake + successful entry• Restriction• Antirestriction systems• Selection against restriction sites

Donor

Recipient

2 Selection of recipient• Uptake sequences in DNA• Binding of naked DNA

• Surface exclusion

• Phage receptor specificity• Pilus specificity

4 Establishment• Replication • Integration• Homologous recombination• Illegitimate recombination

COMPETENCEThe ability of bacteria to take up extracellular DNA.

For natural transformation to occur, bacterial cells must first develop a regulated physio logical state of COMPETENCE, which has been found to involve approximately 20 to 50 proteins. With the exception of Neisseria gonorrhoeae, most naturally transform-able bacteria develop time-limited competence in response to specific environmental conditions such as altered growth conditions, nutrient access, cell density (by quorum sensing) or starvation. The proportion of bacteria that develop competence in a bacterial population might range from near zero to almost 100%. As the growth environments and factors that regulate competence development vary between bacterial species and strains6, there is no universal approach to determine if a given bacterial isolate can develop competence as a part of its life cycle. To the extent investigated, the proportion of bacteria found to be naturally transformable is approximately 1% of the validly described bacterial species7. The ability to take up naked DNA by natural transformation has been detected in archaea and divergent subdivisions (phyla) of bacteria, including representatives of the Gram-positive bacteria, cyanobacteria, Thermus spp.,

Deinococcus spp., green sulphur bacteria and many other Gram-negative bacteria8,9. Many human patho-genic bacteria, including representatives of the genera Campylobacter, Haemophilus, Helicobacter, Neisseria, Pseudomonas, Staphylococcus and Streptococcus, are naturally transformable9. The conserved ability to acquire DNA molecules by natural transformation among a broad range of bacteria indicates that the genetic trait is functionally important in the environ-ment, enabling access to DNA as a source of nutri-ents or genetic information. Prerequisites for natural transformation include the release and persistence of extracellular DNA, the presence of competent bacte-rial cells and the ability of translocated chromosomal DNA to be stabilized by integration into the bacterial genome or the ability of translocated plasmid DNA to integrate or recircularize into self-replicating plasmids (FIG. 2).

Release of extracellular DNA in the environment. Natural transformation relies on bacterial exposure to extracellular DNA molecules in the environment. DNA continually enters the environment upon release from decomposing cells, disrupted cells or viral particles, or through excretion from living cells. The release of intact DNA from decomposing cells depends on the activity and location of nucleases and reactive chemi-cals. Active excretion of DNA has been reported for many genera of bacteria, including Acinetobacter, Alcaligenes, Azotobacter, Bacillus, Flavobacterium, Micrococcus, Pseudomonas and Streptococcus8–10. For instance, extracellular DNA has been found at con-centrations of up to 1–3 µg per ml in liquid cultures of an Acinetobacter sp. and Bacillus subtilis11 and up to 780 µg per ml in cultures of the environmental isolate Pseudomonas aeruginosa KYU-1 REF. 12. Recently, extracellular DNA has been identified as an important component in biofilm formation13. Nevertheless, the extent of, and role of, active release of DNA by bacteria in natural, nutrient-limited habitats remains to be fully understood.

Passive release of DNA from dead bacteria occurs after self-induced lysis, a process that results in broken cell walls and membranes and the subsequent exposure to, and release of, cytoplasmic contents, including DNA, in the environment14. Pathogenic microorganisms can also undergo lysis caused either by the host immune system or the antibiotic treatment of infections. From studies of 14C-labelled Escherichia coli, it has been estimated that between 95% and 100% of the bacte-rial DNA is released after contact with the immune system15. Most of this DNA is probably degraded by DNases present in human serum and plasma. In one study, the mean DNase activity of 50 patients destroyed 90% of the added DNA of Haemophilus influenzae within a few minutes16. A different study, however, reported longer persistence times for both chromosomal and plasmid DNA in serum17 — large plasmids and chromosomal DNA were substantially degraded after a 4-hour exposure of a serum-sensitive E. coli strain, but smaller plasmids (pBR322 and

Figure 1 | The process of horizontal gene transfer. A schematic outlining the stages through which DNA must go on its journey from donor to recipient bacteria. The process begins with DNA in a potential donor cell becoming available and ends when this DNA becomes a functional part of a recipient cell’s genome.

712 | SEPTEMBER 2005 | VOLUME 3 www.nature.com/reviews/micro

R E V I EWS

LGTconsequences

XenologsNewfunctionOrthologousreplacement

PhylogeneticsBreakseverythingSpeciestreesfromgenetrees

Ensureitsnotaxenolog firstWhatisthetruehistoryofanorganism?

Severalsub-histories?Whatistheunitofreproduction?

Philosophical,yetimportantformappingevolution

BIOMEDICALSCIENCES

56

DetectingLGT:homologyapproach

DetectionofLGTisaverydifficultproblem

Thesearesomesuggestions,eachwiththeirflaws

Homology-basedBi-directionalbesthits(BBH)HGTector (Zhuetal.BMCGen2014)

IssuesDatabasecoverageDistantLGTdifficulttofindPhylogeneticallyunaware

BIOMEDICALSCIENCES

57

DetectingLGT:phylogeneticcongruenceapproach

LeighetalGBE2011(DOI:10.1093/gbe/evr050)Isthespeciestreeareasonabletreegiventhegenealignment?Collectallreasonabletreesgiventhegenealignment

Bootstrapreplicates/BayesianposteriordistributionoftenusedShouldbealltrees

Isthespeciestreewithinthisreasonableset?AUtest

Issues:Limitedorancient(e.g.prespeciation) LGTinmostlyverticalwillnotbedetected

Samplingissues

Whatisthespeciestree?Canyoueverbesureamarkerhasnotbeentransferred?Oftencircularreasoning

Networkthinking

BIOMEDICALSCIENCES

58

Whatisamicrobialspecies?

BIOMEDICALSCIENCES

Doweneedmicrobialspecies?

Perhapsnot

Usefultoclinicians

Usefulforcountingorganismsinanenvironmentorrelatingabundancestochanges

Usefulfordiscussingprojectsetc.

“Ilookatthetermspeciesasonearbitrarilygivenforthesakeofconveniencetoasetofindividualsresemblingeachother”(Darwin,1859)

BIOMEDICALSCIENCES

60

Speciesasclusters

SpeciesconceptsaregenerallybasedonthenotionthatorganismscomprisedistinctclustersinnatureThismeansthatthereisnotacontinuumofgenotypesand/orphenotypesHowever,clusterscanformunderrandombirth/deathmodels

Aspeciesshouldpresumablybeaclusterthatisformedbysomeprocess,notjustrandomdrift

Anygapsbetweenclustersshouldnotbeduetosamplingbiasorerror

Probablythebiggestproblemforprovingclustering

BIOMEDICALSCIENCES

61

Definingaspecies

TheBiologicalSpeciesConceptisthemostoftenuseddefinitionofaspecies(oratleastmostgenerallyknown)

StatesthataspeciesisagroupwherememberscanproducefertileoffspringthroughmatingWorksfor(most)animalsandplantsExcludesallasexualorganisms

CohansecotypemodelStatesthatanasexualclonalspeciescanformbymutationsthatallowittooutcompeteothersandthusselectivesweepsoccurLGTisallowedinmodeltoinitiateaselectivesweepbutnottoshapelong-termcohesiveness

Recombinationhasbeenshowntocontributemoretodiversificationthanpointmutationsinsomebacteria

BIOMEDICALSCIENCES

62

Definingmicrobialspecies

Inprokaryotes,specieswereoriginallydefinedby>70%inastandardizedDNA–DNAhybridisationexperiment

MakessomebacterialspeciesasdiverseasvertebrateordersNow,oftenaspeciesisdefinedashavingwithin97%identical16Ssequencesbetweenthetwoorganisms

Singlegene’sevolutionaryhistoryasbasisMultiplecopieswithlargedifferencesarepossible

Canalsousesharedorthologousgenesusing:ConcatenatedtreesAverageNucleotideIdentity(ANI)≥95%Genome-to-GenomeDistance(GGD)≥70%Genomic-Signature-Delta-Difference(GS-DD)δ<δ*

BIOMEDICALSCIENCES

63

Problemwithconcatenatedapproaches

BIOMEDICALSCIENCES

64MeehanandBeiko.GBE.2014

16S Concatenated

ANI

BIOMEDICALSCIENCES

65

investigators think that an ANI of 99% would match more closelyto phenotypic diversity among species of animals and plants(Konstantinidis and Tiedje 2005), and perhaps even this is notstringent enough (Fig. 2).

Whatever species definition we adopt, there remains theproblem of coupling to some underlying species concept(s) thatrationalizes its methods and cut-off values. As Gevers et al. (2006)lament, ‘‘any effort to produce a robust species definition is hin-

dered by the lack of a solid theoreticalbasis explaining the effect of biologicalprocesses on cohesion within and di-vergence between species.’’ Possible co-hesive forces are addressed in the nextsections, but it is worth mentioning herethat two recent formulations of pro-karyotic species concepts appear to be(deliberately) so general that, like deQuerioz’s general lineage concept, theyfinesse the concept–definition coupling.The first is Staley’s ‘‘genomic-phylogeneticspecies concept’’ (Staley 2006), and thesecond is a ‘‘metapopulation lineage’’formulation endorsed by Achtman andWagner (2008). These latter authors, ac-knowledging a debt to and quoting deQuerioz, claim that ‘‘unlike other spe-cies concepts, metapopulation lineagesdo not have to be phenotypically dis-tinguishable, or diagnosable, or mono-phyletic, or reproductively isolated, orecologically divergent, to be species.They only have to be evolving separatelyfrom other lineages. Microbes that formdistinct groups owing to a cohesive forceare metapopulation lineages and thusform species, whereas microbes withoutlimits imposed by a cohesive force donot.’’

This way of thinking embodies thespirit of what one hopes to capture witha species concept. But we must againpoint out that by giving up all methodsof detecting or quantifying ‘‘cohesiveforces,’’ such bare bones species conceptscannot be used to answer any questionswe might have about species in general—such as how many there are, what theirpopulations sizes are, and whether they arecosmopolitan or endemic.

Clustered diversity and itsmeaning for speciesBasic to any notion of species is that innature they comprise discrete clustersof organisms, defined genomically andphenomically—that genome/phenomespace is not uniformly filled by a seam-less spectrum of intergrading types. AsKonstantinidis et al. (2006) note, ‘‘animportant issue that remains unresolvedis whether bacteria exhibit a geneticcontinuum in nature. . .’’

It is necessary to recall here thateven the simplest random birth and deathmodel of replicating lineages will produce

Figure 2. Comparison of average nucleotide identities (ANI) with gene content. 773 genomes availablein NCBI’s RefSeq database were initially clustered using 16S rRNA identity of at least 97% as a guide to formgroups. A dozen clusters were selected (list of genomes within each cluster is available in SupplementalTable 1). For genomes within each cluster, pairwise ANI was calculated essentially as described in Kon-stantinidis and Tiedje (2005). Shared genes for each pair of genomes were identified as reciprocal top-scoring BLASTP matches (E-value < 0.001, z = 20,000,000). The proportion of shared genes was calculatedas a ratio of the number of shared genes over the average number of genes in two genomes. Each ORF ina genome was assigned to a functional category according to the Clusters of Orthologous Groups (COG)database (August 2005 release), and three selected categories are depicted in this figure: categories J, P,and Q in COG category one-letter designation. Note that genomes of the E. coli/Shigella group have similarANI values, but dramatically varying gene content. Some groups form tight clusters (e.g., Legionella spp.),while others exhibit a continuum of ANI/shared genes values (e.g., Burkholderia spp.). The clustering alsoexhibits a large variability in the number of shared genes if genes are considered by functional category.

Doolittle and Zhaxybayeva

746 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on July 17, 2012 - Published by genome.cshlp.orgDownloaded from

Species Suggestedaction

Newname

M.conceptionense M.senegalense fusion M.senegalenseM.chimaera M.intracellulare M.yongonense subspeciation M.intracellulare

subsp.intracellularesubsp.chimaerasubsp.yongonense

M.engbaekii M.hiberniae fusion M.hiberniae

M.austroafricanum M.vanbaalenii fusion M.austroafricanum

M.marinum M.pseudoshottsii

fusion M.marinum

Fusesmanyspeciesdefinedinotherways(11/134Mycobacteriumspecies)

Perhapsfusestoomany

DoolittleandZhaxybayeva.GenomeRes.2009Tortoli,Fedrizzi,Meehanetal Submitted

Microbiomesandspecies

Wecannotaskthesimplequestion‘Whoisthere?’withoutdefiningthewho(species?)Metagenomedatahasallowedustosomewhatovercomethesamplingbias

MinorpopulationsContinuum?

CommunitymicrobialecologyraisesmanyquestionsCommunityorassemblage?

Boon,Meehanetal.FEMSmicrorev2014Hostandmicrobeeffectsoneachothersevolution?

CanbegintoaskwhataunitofdiversityisAgeneorcellorcommunity?

BIOMEDICALSCIENCES

66

ConorMeehan [email protected]