research article replacing and additive horizontal gene transfer

12
Research article Replacing and Additive Horizontal Gene Transfer in Streptococcus Sang Chul Choi, 1 Matthew D. Rasmussen, 1 Melissa J. Hubisz, 1 Ilan Gronau, 1 Michael J. Stanhope, 2 and Adam Siepel *,1 1 Department of Biological Statistics and Computational Biology, Cornell University 2 Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University *Corresponding author: E-mail: [email protected]. Associate editor: Daniel Falush Abstract The prominent role of Horizontal Gene Transfer (HGT) in the evolution of bacteria is now well documented, but few studies have differentiated between evolutionary events that predominantly cause genes in one lineage to be replaced by homologs from another lineage (replacing HGT) and events that result in the addition of substantial new genomic material (additive HGT). Here in, we make use of the distinct phylogenetic signatures of replacing and additive HGTs in a genome-wide study of the important human pathogen Streptococcus pyogenes (SPY) and its close relatives S. dysgalactiae subspecies equisimilis (SDE) and S. dysgalactiae subspecies dysgalactiae (SDD). Using recently developed statistical models and computational methods, we find evidence for abundant gene flow of both kinds within each of the SPY and SDE clades and of reduced levels of exchange between SPY and SDD. In addition, our analysis strongly supports a pronounced asymmetry in SPY–SDE gene flow, favoring the SPY-to-SDE direction. This finding is of particular interest in light of the recent increase in virulence of pathogenic SDE. We find much stronger evidence for SPY–SDE gene flow among replacing than among additive transfers, suggesting a primary influence from homologous recombination between co-occurring SPY and SDE cells in human hosts. Putative virulence genes are correlated with transfer events, but this correlation is found to be driven by additive, not replacing, HGTs. The genes affected by additive HGTs are enriched for functions having to do with transposition, recombination, and DNA integration, consistent with previous findings, whereas replacing HGTs seen to influence a more diverse set of genes. Additive transfers are also found to be associated with evidence of positive selection. These findings shed new light on the manner in which HGT has shaped pathogenic bacterial genomes. Key words: bacterial evolutionary genomics, recombination, Streptococcus pyogenes, Streptococcus dysgalactiae. Introduction During the last 20 years, it has become increasingly apparent that Horizontal Gene Transfer (HGT) has played a major role in genomic evolution, particularly among bacteria and other microbes (Smith et al. 1992; Maynard-Smith et al. 1993; Ochman et al. 2000; Koonin et al. 2001). Indeed, it is likely that HGT has been sufficiently prevalent throughout evolu- tionary history that no present-day gene can trace an unbro- ken history of vertical descent to a common ancestor for all species (Zhaxybayeva and Doolittle 2011), meaning that no single phylogenetic marker can be used to reconstruct a uni- versal tree of life. For HGT to occur in bacteria, DNA must first be trans- ferred from a donor to a recipient cell by transformation, transduction, or conjugation, then must be integrated into the recipient cell’s genome (setting aside the case of replication-proficient plasmids). This typically occurs in one of two major ways: either the new sequence replaces a ho- mologous sequence through the process of homologous re- combination (similar to gene conversionin sexually reproducing organisms) or it is acquired through an additive (nonreplacing) integration process (Thomas and Nielsen 2005). Although the mechanisms underlying these replacingand additiveevents overlap (e.g., additive HGTs can be associated with homologous recombination at flanking regions), replacing and additive HGTs nevertheless leave distinct molecular evolutionary signatures (fig. 1). The rates at which these processes occur depend on many factors, including the co-occurrence of bacterial species in the envi- ronment, the degree of competence of recipient cells, se- quence similarity between the donor and recipient, and barriers to conjugation such as surface exclusion. In addition, the persistence of horizontally transferred DNA segments in present-day species depends strongly on the subsequent ef- fects of natural selection. Not surprisingly, rates of HGT seem appear to vary considerably across groups of bacteria (e.g., Feil et al. 2001). HGT in the Streptococcus genus is of particular interest. This group of Gram-positive bacteria contains several human pathogens, including Streptococcus pyogenes (SPY; the cause of Group A streptococcal infections, including pharyngitis, impetigo, cellulitis, necrotizing fasciitis, and rheumatic fever), S. pneumoniae (bacterial pneumonia), S. agalactiae (neonatal sepsis), S. mutans (dental caries), and S. dysgalactiae ß The Author 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 29(11):3309–3320 doi:10.1093/molbev/mss138 Advance Access publication May 21, 2012 3309 Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418 by guest on 07 February 2018

Upload: nguyencong

Post on 31-Dec-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Research article Replacing and Additive Horizontal Gene Transfer

Research

articleReplacing and Additive Horizontal Gene Transfer inStreptococcus

Sang Chul Choi,1 Matthew D. Rasmussen,1 Melissa J. Hubisz,1 Ilan Gronau,1 Michael J. Stanhope,2 andAdam Siepel*,1

1Department of Biological Statistics and Computational Biology, Cornell University2Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University

*Corresponding author: E-mail: [email protected].

Associate editor: Daniel Falush

Abstract

The prominent role of Horizontal Gene Transfer (HGT) in the evolution of bacteria is now well documented, but few studies havedifferentiated between evolutionary events that predominantly cause genes in one lineage to be replaced by homologs fromanother lineage (“replacing HGT”) and events that result in the addition of substantial new genomic material (“additive HGT”).Here in, we make use of the distinct phylogenetic signatures of replacing and additive HGTs in a genome-wide study of theimportant human pathogen Streptococcus pyogenes (SPY) and its close relatives S. dysgalactiae subspecies equisimilis (SDE) and S.dysgalactiae subspecies dysgalactiae (SDD). Using recently developed statistical models and computational methods, we findevidence for abundant gene flow of both kinds within each of the SPY and SDE clades and of reduced levels of exchange betweenSPY and SDD. In addition, our analysis strongly supports a pronounced asymmetry in SPY–SDE gene flow, favoring theSPY-to-SDE direction. This finding is of particular interest in light of the recent increase in virulence of pathogenic SDE. Wefind much stronger evidence for SPY–SDE gene flow among replacing than among additive transfers, suggesting a primaryinfluence from homologous recombination between co-occurring SPY and SDE cells in human hosts. Putative virulence genes arecorrelated with transfer events, but this correlation is found to be driven by additive, not replacing, HGTs. The genes affected byadditive HGTs are enriched for functions having to do with transposition, recombination, and DNA integration, consistent withprevious findings, whereas replacing HGTs seen to influence a more diverse set of genes. Additive transfers are also found to beassociated with evidence of positive selection. These findings shed new light on the manner in which HGT has shaped pathogenicbacterial genomes.

Key words: bacterial evolutionary genomics, recombination, Streptococcus pyogenes, Streptococcus dysgalactiae.

IntroductionDuring the last 20 years, it has become increasingly apparentthat Horizontal Gene Transfer (HGT) has played a major rolein genomic evolution, particularly among bacteria and othermicrobes (Smith et al. 1992; Maynard-Smith et al. 1993;Ochman et al. 2000; Koonin et al. 2001). Indeed, it is likelythat HGT has been sufficiently prevalent throughout evolu-tionary history that no present-day gene can trace an unbro-ken history of vertical descent to a common ancestor for allspecies (Zhaxybayeva and Doolittle 2011), meaning that nosingle phylogenetic marker can be used to reconstruct a uni-versal tree of life.

For HGT to occur in bacteria, DNA must first be trans-ferred from a donor to a recipient cell by transformation,transduction, or conjugation, then must be integrated intothe recipient cell’s genome (setting aside the case ofreplication-proficient plasmids). This typically occurs in oneof two major ways: either the new sequence replaces a ho-mologous sequence through the process of homologous re-combination (similar to “gene conversion” in sexuallyreproducing organisms) or it is acquired through an additive(nonreplacing) integration process (Thomas and Nielsen

2005). Although the mechanisms underlying these“replacing” and “additive” events overlap (e.g., additiveHGTs can be associated with homologous recombination atflanking regions), replacing and additive HGTs neverthelessleave distinct molecular evolutionary signatures (fig. 1). Therates at which these processes occur depend on many factors,including the co-occurrence of bacterial species in the envi-ronment, the degree of competence of recipient cells, se-quence similarity between the donor and recipient, andbarriers to conjugation such as surface exclusion. In addition,the persistence of horizontally transferred DNA segments inpresent-day species depends strongly on the subsequent ef-fects of natural selection. Not surprisingly, rates of HGT seemappear to vary considerably across groups of bacteria (e.g.,Feil et al. 2001).

HGT in the Streptococcus genus is of particular interest.This group of Gram-positive bacteria contains several humanpathogens, including Streptococcus pyogenes (SPY; the causeof Group A streptococcal infections, including pharyngitis,impetigo, cellulitis, necrotizing fasciitis, and rheumaticfever), S. pneumoniae (bacterial pneumonia), S. agalactiae(neonatal sepsis), S. mutans (dental caries), and S. dysgalactiae

� The Author 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, pleasee-mail: [email protected]

Mol. Biol. Evol. 29(11):3309–3320 doi:10.1093/molbev/mss138 Advance Access publication May 21, 2012 3309Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 2: Research article Replacing and Additive Horizontal Gene Transfer

ssp. equisimilis (SDE; cellulitis, peritonitis, pneumonia, andother infections), as well as agricultural pathogens, such asS. uberis and S. dysgalactiae ssp. dysgalactiae (SDD; mastitis incows, ewes, and goats), S. equi ssp. equi (strangles in horses),and S. canis (various infections in dogs and other animals).Many of these species are concentrated in the predominantlybeta-hemolytic pyogenic division (Facklam 2002). These spe-cies have adapted to a remarkable variety of distinct ecolog-ical niches and display high rates of recombination and HGT,as well as substantial evidence of positive selection (Feil et al.2001; Marri et al. 2006; Anisimova et al. 2007; Lefebure andStanhope 2007; Suzuki et al. 2011). Complete genome se-quences are now available for several species in the pyogenicdivision (e.g., Ferretti et al. 2001; Holden et al. 2009;Shimomura et al. 2011; Suzuki et al. 2011), providing a richresource for the study of the genomic evolution of pathogenicbacteria.

Several studies have focused in particular on the issue ofHGT between the human-colonizing species SPY and SDE.Until recently, SDE was considered to be primarily a commen-sal organism (Vandamme et al. 1996), unlike SPY, but it hasbecome increasingly apparent that it also has an importantpathogenetic role, with a disease spectrum similar to that ofSPY (Brandt and Spellerberg 2009). Shared virulence genes arewell documented between the two species (e.g., Davies,McMillan, Beiko et al. 2007), genetic exchange presumablyhaving been enabled by both their close evolutionary rela-tionship and shared ecological niches. In recent years, evi-dence has emerged for more widespread SPY–SDE geneflow (Kalia and Bessen 2004; Davies et al; 2005, Davies,McMillan, Beiko et al. 2007; Davies, McMillan, Domselaaret al. 2007; Ahmad et al. 2009; McMillan et al. 2010; Jensenand Kilian 2012). An early study of seven housekeeping genesused for multilocus sequence typing (MLST) (Kalia et al.2001) additionally claimed a pronounced bias in the directionof gene flow, favoring the SPY-to-SDE direction. However, thisstudy was eventually retracted due to the authors’ inability toreplicate their findings and concerns about sample contam-ination (Kalia et al. 2009). Subsequent analyses of the sameloci have not supported SPY-to-SDE gene flow (Ahmad et al.2009; McMillan et al. 2010), and examples have been found ofgene flow in the opposite (SDE-to-SPY) direction (Sachseet al. 2002). In any case, these studies have all been limitedto small numbers of loci, and questions remain open aboutgenome-wide patterns of gene flow between SPY and SDE.These issues are of particular interest because of the possibilitythat increasing virulence of SDE may derive, at least in part,from gene flow from SPY.

Most analyses of HGT have been based on relatively simplephylogenetic methods that make use of incongruity of in-ferred gene trees across loci (e.g., Kalia and Bessen 2004;Ahmad et al. 2009). These methods are sensitive to errorsin phylogenetic reconstruction and ignore information frombranch lengths and phylogenetic correlation at adjacent loci.Recently, alternative, model-based approaches have been pro-posed for statistical inference of replacing HGTs, assumed toderive from homologous recombination (Didelot and Falush2007; Didelot et al. 2010). Using Bayesian principles andMarkov chain Monte Carlo (MCMC) techniques, these meth-ods allow for uncertainty in the phylogeny, make use ofbranch lengths and correlations across loci, and allow for in-ference not only of individual recombination events but alsoof parameters describing global rates and patterns of recom-bination. In addition, several recent methods have been in-troduced to detect HGT in a phylogenetic framework, byparsimoniously reconciling reconstructed gene trees with agiven species tree, allowing for duplication, transfer, and lossevents (Merkle et al. 2010; David and Alm 2011; Doyon et al.2011; Doyon et al. 2012). These new model- and parsimony-based methods are complementary in many respects, andcould be particularly useful in combination.

In this article, we take a fresh look at the issue of HGT inthe pyogenic division, making use of newly available completegenome sequences for SPY, SDE, and the related species SDD,as well as new statistical and parsimony-based methods for

A

B

FIG. 1. Replacing and additive HGT. (A) Foreign DNA can be added to arecipient genome by a replacing HGT (left) or an additive HGT (right).(B) These types of transfers produce distinct phylogenetic signatures.In a replacing HGT, a segment of DNA is effectively overwritten by ahomologous segment from another species, which causes a lineage inthe phylogeny to be replaced by a transferred lineage. This type oftransfer can be identified in gene-tree/species-tree reconciliation bythe appearance of coinciding transfer and loss events (left tree). Anadditive HGT, on the other hand, leads to a transfer event that is notpaired with a loss event (right tree). Of course, parallel losses can causean additive HGT to appear similar to a replacing HGT. Therefore, in ouranalysis, we conservatively require that at least one descendant species(here, B) contains genes that both descend from (b1) and do not de-scend from (b2) a transferred gene when inferring an additive HGTevent. It is worth noting that similar biological processes may contributeto both types of events—for example, an additive HGT may be associ-ated with homologous recombination in flanking regions. Our interesthere is in distinguishing between evolutionary processes that do(additive HGT) and do not (replacing HGT) tend to alter the size andgene composition of a genome.

3310

Choi et al. . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 3: Research article Replacing and Additive Horizontal Gene Transfer

analysis. The combined use of these methods allows us toexamine both replacing and additive HGTs and compare theirgenome-wide effects. We find evidence for abundant geneflow within SPY, within SDE, and between SPY and SDE andfor greatly reduced exchange between SPY and SDD. In addi-tion, our genome-wide analysis supports a pronounced pref-erence for the SPY-to-SDE direction in SPY–SDE gene flow.We find that this property is much more evident for replacingthan for additive transfers. We also examine correlations ofgene transfer events with functional categories and positiveselection. Our results have been made available as tracks inour recently released Streptococcus genome browser.

Materials and Methods

Genome Sequences and Alignments

Our primary analysis was based on five complete genomesequences, including two representatives of SPY (accessionsNC_004070 [Beres et al. 2002; SPY1] and NC_008024 [Bereset al. 2006; SPY2]), two of SDE (CP002215 [Suzuki et al. 2011;SDE1] and NC_012891 [Shimomura et al. 2011; SDE2]),and one of SDD (CM001076 [Suzuki et al. 2011]). In addition,we used S. equi ssp. equi (SEE) strain 4047 (NC_012471[Holden et al. 2009]) as an outgroup in our parsimony-basedanalysis (supplementary table S1, Supplementary Materialonline). Our primary data set includes only 2 of the 14 publiclyavailable genome sequences for SPY. The reason for this re-striction is that our evolutionary analyses require awell-defined “clonal frame,” or species phylogeny, thatapplies genome wide and, therefore, is not well suited forsimultaneous analysis of multiple genomes from the samespecies (assuming non-negligible levels of intraspecies recom-bination). Instead, we tested the robustness of our results tothe choice of SPY genomes by repeating several of our anal-yses using 10 different pairs of genomes in place of SPY1 andSPY2, making sure to include representatives of all 11 sero-types for which genome sequences are publicly available (seeResults).

For the model-based analysis of replacing transfers, weobtained high-quality alignments of SPY1, SPY2, SDE1,SDE2, and SDD using the pipeline detailed by Didelotand coworkers (Didelot and Falush 2007; Didelot et al.2010). Briefly, we first aligned these five genomes usingprogressiveMauve v2.3.1 (Darling et al. 2004, 2010) with de-fault options (supplementary fig. 1, Supplementary Materialonline), and then used stripSubsetLCBs (from theClonalOrigin package; Didelot et al. 2010) to identify 276blocks of sequence alignments that exceed a length thresholdof 1,500 bp. This conservative threshold was selected to avoidbiases from boundary effects in short alignment blocks basedon a preliminary analysis. Two blocks containing long gapswere excluded from the analysis. The final set of 274 align-ment blocks contained 1,155,016 columns with relatively fewgaps (0.2% of characters) and covered 1,106 of the 1,951 genesin the SPY1 genome. Note that this alignment procedureeffectively focused on the “core genome” by discardingregions that were not conserved among the five bacterial

individuals in question. The same pipeline was applied tothe 10 alternative pairs of SPY genomes with similar results.

For the parsimony-based analysis, we began with 3,408gene families representing all six genomes (including SEE),as described by Suzuki et al (2011). Approximately one-thirdof these families (1,094) are represented by a single gene froma single genome. These families do not contribute to transferevents among the analyzed genomes and were thus removedfrom the analysis. Among the remaining 2,314 gene familyclusters, approximately half (1,066) contain a single gene fromeach of the six genomes, whereas the remaining familiesranged from as few as two genes (415 families) to as manyas 59 (one family) (supplementary fig. S2, SupplementaryMaterial online). We aligned protein sequences correspond-ing to each family using MUSCLE (Edgar 2004) and then ob-tained nucleotide alignments by reverse translation, using thegenomic sequences as a guide. Next we identified likely intra-genic recombinations using the single breakpoint recombina-tion (SBP) method from the HyPhy package (KosakovskyPond et al. 2006). Based on the Akaike InformationCriterion, SBP identified at least one topology-altering recom-bination in 1,094 (47.5%) gene families. These genes weresplit into two putative nonrecombining gene fragments forsubsequent analysis. We later performed a similar analysis inwhich we allowed for as many as eight breakpoints per geneand observed little change in our results (SupplementaryMaterial online). Note that the alignments described herein are representative of the “pan genome,” including regionsthat frequently turn over (“dispensable” regions), in additionto the “core genome” analyzed by the model-based approach(e.g., Tettelin et al. 2005; Lefebure and Stanhope 2007).

Model-Based Analysis of Replacing HGTsClonalFrame and ClonalOriginTo study replacing HGTs, we made use of the ClonalOriginprogram recently developed by Didelot et al. (2010). Given aset of alignment blocks, ClonalOrigin samples from the pos-terior distribution of recombinant graphs using a MCMC al-gorithm. A recombinant graph consists of a pre-estimatedrooted phylogeny with branch lengths (a “clonal frame”),augmented by a set of recombinant edges. Each recombinantedge is defined by the two points along the branches of theclonal frame that correspond to the donor and recipient inthe hypothesized recombination event and by the start andend of the affected genomic interval. The donor can predatethe recipient to allow for a delayed coalescence, but it cannotpostdate it (supplementary fig. S3, Supplementary Materialonline). Another program, called ClonalFrame (Didelot andFalush 2007), can be used to infer the phylogeny that serves asthe basis of the recombinant graph. ClonalFrame makes useof a similar, but somewhat simpler, probabilistic model.

Fitting the ModelFollowing Didelot et al. (2010), we used ClonalFrame (v1.1) toestimate a clonal frame for our five-way alignments, runningthe algorithm for 104 burn-in iterations followed by another104 sampling iterations. A series of seven additional samplingruns with random initialization converged to essentially

3311

Horizontal Gene Transfer in Streptococcus . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 4: Research article Replacing and Additive Horizontal Gene Transfer

identical estimates, indicating stable convergence. We thenused an initial run of ClonalOrigin (subversion r19) to esti-mate global parameters including the mutation rate, recom-bination rate, and average recombinant tract length. Weapplied the method separately to each of the 274 blocks,in all cases using 106 burn-in iterations, 107 sampling itera-tions, and subsampling every 105 iterations. A global estimatefor each parameter was then obtained by taking the block--length-weighted median of the block-specific posteriormeans, which down-weighed less reliable estimates obtainedfrom shorter alignment blocks.

Sampling Recombinant GraphsWe ran ClonalOrigin a second time on all alignment blocks,this time fixing the clonal frame and global model parametersat their pre-estimated values, and collected samples of recom-binant graphs across the genome. For each block, we ran thesampler for 107 burn-in iterations and 108 sampling iterations,subsampling every 105 iterations to obtain 1,001 recombinantgraphs per alignment block. We performed a replicate of thetwo stages of the ClonalOrigin inference to ensure adequateconvergence of the sampler.

Summary statistics of recombinant graphsWe used various summary statistics to describe the sampledrecombinant graphs. First, we recorded the relative frequen-cies of the 105 possible rooted trees at each site in the align-ments. Second, we recorded the frequencies of sampledrecombinant edges, grouping them by the associated donorand recipient branches in the clonal frame (Didelot et al.2010), and took ratios of these sampled counts with corre-sponding expected values under the assumed prior distribu-tion. Third, we recorded the frequencies of these edges typesper site. Finally, we defined a general, branch-independentrecombination intensity at each site by summing the poste-rior probabilities of all recombinant edges. In some cases, wealso computed intensities for particular types of replacingtransfers, such as those between the SPY and SDE clades orthose producing topologies different from the clonal frame(Supplementary Material online).

Significance TestingWe used simulations to assess the significance of theprior-normalized counts of recombinant edges. First, we gen-erated 100 replicate data sets under the prior model, with thesame block number and block lengths as the real data. Wethen applied our entire inference procedure to these data setsand computed the same ratios of posterior estimates to priorexpectations as for the real data. We then compared theratios estimated from real data with this empirical null distri-bution. Empirical one-sided P values were computed as thefraction of null estimates greater or less than the values esti-mated from real data. This approach had the advantage ofnot only assessing the significance of departures from priorexpectations but also correcting for any systematic biasesimposed by ClonalOrigin (Supplementary Material online).

To compare the rates of SPY-to-SDE and SDE-to-SPY geneflow, we computed, for each of the 1,001 genome-wide col-lections of recombinant graphs, the number of recombinant

edges from any of the three SPY branches (SPY1, SPY2, andSPY) to any of the three SDE branches (SDE1, SDE2, and SDE)and the number of recombinant edges in the reverse direc-tion. We then took the ratio of these two counts. This gave us1,001 samples from the posterior distribution of the ratio ofSPY-to-SDE and SDE-to-SPY recombinant edges. These pos-terior samples were compared with the prior expected valueof the ratio of SPY-to-SDE to SDE-to-SPY counts.

Parsimony-Based AnalysisMowgliTo study additive HGTs, we used a parsimony-based methodfor reconciling gene and species trees called Mowgli (Doyonet al. 2011). Given a gene tree and a species tree, Mowgli findsa reconciliation scenario that minimizes the number of geneduplication, loss, and transfer (DLT) events, considering treetopologies with branching orders (“labeled histories”;Edwards 1970). We estimated unrooted gene trees from thenucleotide alignments for each of our putative nonrecombin-ing gene fragments using RAxML (Stamatakis 2006), thenapplied Mowgli to each of these trees. We used the tree to-pology estimated by ClonalFrame, augmented with the SEEoutgroup, as the species tree in this analysis. Since Mowglirequired rooted gene trees, we ran Mowgli for all possiblerootings of each gene tree, and chose the scenario that pro-duced the minimum DLT score. In the case of multiple mostparsimonious reconciliations, one was chosen uniformly atrandom.

Distinguishing between Replacing and Additive HGTsAlthough Mowgli’s evolutionary model does not distinguishbetween additive and replacing HGTs, the two kinds of eventscan be separated, to a degree, in a subsequent step. In theabsence of prediction error, this separation could be accom-plished by simply assuming a replacing HGT wheneveran inferred transfer coincided exactly with an inferred lossin the recipient lineage. However, loss events are difficultto place accurately because they tend to be pushed to-ward the root of the tree due to errors in phylogeny recon-struction or parallel losses in multiple lineages. Therefore,we classified a transfer event as additive whenever therecipient species or a descendant contained both a genedescended from the transferred copy and a gene not de-scended from that copy. All other transfers were consideredreplacing HGTs. This approach is conservative about inferringadditive HGTs (supplementary fig. S4, SupplementaryMaterial online).

Statistical EnrichmentAs with replacing HGTs, we classified transfer events by donorand recipient branch and recorded the number of inferredadditive HGTs of each type. We compared these numbers toexpectations based on simulations that assumed constantrates of DLT across the species tree, given the total numberof events inferred across all 2,314 gene families divided by thetotal branch lengths of the trees (Supplementary Materialonline).

3312

Choi et al. . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 5: Research article Replacing and Additive Horizontal Gene Transfer

Gene Category Associations and Putative VirulenceGenes

We assigned genes to gene ontology (GO) categories by com-paring the Streptococcus genes with bacterial proteins fromthe Uniref90 database using BLASTP and then assigningthe same GO classification as the target gene of the uniProtGOA database if the match had an E value of <1.0� 10�5.Putative virulence genes were identified using the methodsdescribed by Suzuki et al. (2011). A gene family was assigneda given classification if any of its genes was assigned thatclassification. To test for associations with replacing genetransfers we used the recombination intensity estimated foreach gene, and for additive gene transfers we used thenumber of additive transfers inferred for each family. Totest for significance, we performed a Mann–Whitney U-test(MWU) of the values (recombination intensities or numbersof transfers) associated with a given category verses thevalues for the all other genes/families. We used theBenjamini and Hochberg (1995) method to correct for mul-tiple comparisons.

Results

Clonal Frame and Global Parameter Estimates

By applying ClonalFrame to our five-way genomic alignments,we obtained a phylogeny in which the two SPY samplesgrouped together, as did the two SDE samples, and inwhich SDD and SDE formed a clade, with SPY as an outgroup(fig. 2). This tree was consistent with previous results from 16SrRNA sequences (Facklam 2002) and with general patternsobserved genome wide (Suzuki et al. 2011). The estimatedlevels of interspecies divergence were substantial, at 0.48substitution per site for SDD and SDE and 0.66 substitutionsper site for SPY and SDD/SDE. An initial analysis withClonalOrigin (see Materials and Methods) produced an esti-mated per-site population-scaled mutation rate of y= 0.081(interquartile range across blocks: [0.067–0.094]), a per-sitepopulation-scaled recombination rate of r= 0.012 (0.006–0.019), and an average recombinant tract length of �= 744(346–2848) bp. The full distributions of these quantitiesacross alignment blocks are shown in supplementary figureS5, Supplementary Material online. A second run of theMCMC sampler produced nearly identical results. For com-parison, Didelot et al. (2010) obtained estimates of y= 0.044,r= 0.017, and �= 236 for the Bacillus cereus group. Our esti-mates implied a ratio of r/y= 0.15, considerably lower thanthe estimate of Didelot et al. (2010) of r/y= 0.405 for B.cereus, suggesting increased rates of mutation and/or de-creased rates of recombination after controlling for differ-ences in effective population size. We found that estimatesof the average recombinant tract length, � , were quite sen-sitive to our threshold for minimum block length, with theinclusion of short alignments producing much larger esti-mates due to edge effects. We experimented with a rangeof thresholds and selected a value (1,500 bp) at which theestimates stabilized.

Model-Based Analysis of Replacing HGTs

In a second round of analysis with ClonalOrigin, we obtainedsamples from an approximate posterior distribution of re-combinant graphs along the genome, conditional on ourfive-way alignments and the previously estimated parameters.(A recombinant graph is modeled by ClonalOrigin as theclonal frame augmented by zero or more recombinantedges; see Materials and Methods.) This distribution indicatedthat a majority of sites in the genome (66%) were topologi-cally consistent with the clonal frame but that there was alsosubstantial support for various alternative tree topologies(supplementary table S2, Supplementary Material online).The second most frequent tree topology (9.3% of the sites)had SPY and SDE as sister clades and SDD as an outgroup,providing an initial indication of gene transfer between SPYand SDE. No other single tree topology appeared with a fre-quency of � 4%.

To gain further insight into rates and patterns of HGT, weestimated the rates of occurrence of various types of recom-binant edges, grouping them by their donor and recipientedges (see Materials and Methods). There are 81 possibleedge pairs associated with our nine-branch rooted phylogeny,but because donor edges cannot postdate recipient edgesunder the model, 21 of these pairs are prohibited, leaving60 possible recombinant edge types. Each of these viableedge types corresponds to a class of replacing HGTs, including

subst/site

FIG. 2. Clonal frame inferred for the five genomes. This phylogeny wasinferred using the ClonalFrame program (Didelot and Falush 2007).Branch lengths are in units of expected substitutions per site and aredrawn to scale in the horizontal dimension. The labels for the ancestralnodes of the tree and the branches immediately ancestral to them(SPY, SDE, and SD) are used throughout the article. The outgroup spe-cies, SEE, was not part of the estimated clonal frame but was used in theparsimony-based analysis.

3313

Horizontal Gene Transfer in Streptococcus . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 6: Research article Replacing and Additive Horizontal Gene Transfer

not only transfers from the donor to the recipient edgebut also transfers from any (sufficiently old) descendant ofthe donor to the recipient (due to the possibility of delayedcoalescence). Since all recombinant edges are not equallylikely a priori, we summarized the estimated rates by takingratios with respect to their prior expectations under theClonalOrigin model. Experiments conducted on simulateddata demonstrated that our approach had reasonablepower to detect recombination scenarios that deviatedfrom the prior, with some underestimation of the degree ofdeviation due to the use of the prior in the inference proce-dure (Supplementary Material online and supplementaryfigs. S6 and S7).

We summarized these normalized rates in a heatmap(fig. 3A; supplementary tables S3 and S4, SupplementaryMaterial online) and found that most cells fell in theblue-to-white range, corresponding to rates of occurrenceless than or equal to what was expected under the prior. Afew edge types were strongly under-represented (blue), suchas those between the SDD branch and the branches of theSPY clade (P< 0.01 based on simulations; see Materials andMethods), probably due to ecological isolation of human andstrictly veterinary pathogens. Edges from the branch at theroot were also under-represented, suggesting a deficiencyof long delays in coalescence relative to the prior. In con-trast, the recombinant edges between the two SDE genomes,and between the two SPY genomes, were significantlyover-represented (P< 0.01), consistent with previous evi-dence for abundant intraspecies recombination in

Streptococcus (Feil et al. 2001). In addition, we found a pro-nounced enrichment for recombinant edges between the SPYand SDE clades, as has been reported previously (Sachse et al.2002; Kalia and Bessen 2004; Davies, McMillan, Beiko et al.2007; Davies, McMillan, Domselaar et al. 2007). We also ob-served a slight enrichment for SDE-to-SDD edges.

Although SPY–SDE edges occurred at elevated rates inboth directions, they showed a pronounced directional asym-metry, with edges from SPY to SDE occurring at up to fourtimes the expected rate, whereas those in the reverse direc-tion occurred at only about twice the expected rate. By rea-nalyzing the sampled recombinant graphs, we were able toobtain a Bayesian posterior mean estimate of 2.0 (95% cred-ible interval: 1.8–2.2) for the ratio of the numbers ofSPY-to-SDE to SDE-to-SPY recombinant edges, comparedwith a ratio of 1.4 implied by the prior (the prior ratio islarger than one because of differences in branch lengths).None of the sampled values was as small as the prior expectedratio, indicating strong support for a preference for theSPY-to-SDE direction. In addition, we counted genes havinghigh probability of recombination (>0.6) for various types ofrecombinant edges (supplementary table S5, SupplementaryMaterial online), and found that many more genes showedstrong evidence of SPY-to-SDE than of SDE-to-SPY recombi-nant edges (e.g., 59 vs. 11 genes, when the entire SPY and SDEclades are considered).

We generated new tracks for our recently releasedStreptococcus Genome Browser (http://strep-genome.bscb.cornell.edu) that summarize the results of this model-based

A B

FIG. 3. Heatmaps showing rates of replacing and additive transfers. Each cell of the heat map represents the base-2 logarithm of the ratio of theestimated number of recombination events to its prior expectation, for the corresponding donor (y axis) and recipient (x axis) branches. Cells in blackindicate prohibited transfer events. (A) Replacing transfers, as inferred by the model-based approach. The plotted values represent average values acrosssampled recombinant graphs. The prior considers the clonal frame with branch lengths (fig. 2) and the global estimates of the three populationparameters. Prohibited transfer events are ones for which the recipient branch is strictly older than the donor branch. An asterisk indicates statisticalsignificance (P< 0.01). (B) Additive transfers, as inferred by the parsimony-based approach. The plotted log ratios reflect total numbers of inferredadditive transfers across all gene families. The prior considers the clonal frame (with branch lengths) and the total number of events of each typeinferred by the analysis. Prohibited transfer events are ones for which the recipient and donor branch do not share a common time interval. Significancelevels are denoted by asterisks: one for P< 0.01, two for P< 0.005, and three for P< 0.001. See Materials and Methods for details.

3314

Choi et al. . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 7: Research article Replacing and Additive Horizontal Gene Transfer

analysis alongside known genes, alignments, and other anno-tations (fig. 4). These tracks can be used to inspect loci ofinterest and to compare the results of our model- andparsimony-based analyses (later). They can also be queriedand intersected with other tracks using the University ofCalifornia, Santa Cruz (UCSC) Table Browser.

Parsimony-Based Analysis

To gain further insight into HGT in Streptococcus, we esti-mated gene trees for 2,314 gene families from six genomes(including the SEE outgroup), obtained parsimonious recon-ciliations of these gene trees with the clonal frame (fig. 2)

using the Mowgli program (Doyon et al. 2011), and parti-tioned the inferred transfers into replacing and additiveHGTs (see Materials and Methods). This analysis producedestimated numbers of five types of events for each branch ofthe phylogeny (gene duplications, losses, replacing and addi-tive transfers, and “appearances,” most of which are probablytransfers from phylogenetically distant species) and an esti-mated number of genes at each ancestral node (fig. 5). Weobserved large numbers of appearances on all branches of thetree, suggesting a steady influx of genes into the clade by HGT.Indeed, appearance events accounted for 1,957 of the 3,432(57%) gene additions (by duplication, transfer, or appearance;

A

B

FIG. 4. Genome Browser tracks. Our inferences of replacing and additive horizontal gene transfers are summarized in new tracks in the StreptococcusGenome Browser (Suzuki et al. 2011). (A) The main browser display, with gene annotations for the selected reference genome (S. pyogenes strainMGAS315) shown in red and putative virulence genes highlighted in blue. The third track from top (in black) shows the putative non-recombining genefragments analyzed by Mowgli. The next series of tracks shows the results of the ClonalOrigin analysis of replacing gene transfers. Each track in this seriescorresponds to a recipient lineage in the phylogeny (fig. 2) and describes the posterior probabilities along the genome of recombinant edges from allpossible donor lineages (shown in different colors; see key). Here the putative virulence gene SpyM3_0465, a dipeptidase, shows strong evidence of arecombinant edge from the SPY to the SDE lineage, as well as some evidence of a SPY! SPY2 edge. The genome-wide multiple alignment obtainedwith Mauve is shown at bottom. (B) The gene tree and its reconciliation displayed after clicking on the gene fragment highlighted in orange. Note thatthe most parsimonious reconciliation of the tree estimated by RAxML involves a single SPY! SDE replacing HGT; however, this tree is also consistentwith the combined influence of SPY! SDE and SPY! SPY2 replacing transfers, as inferred under the richer statistical model of ClonalOrigin.

3315

Horizontal Gene Transfer in Streptococcus . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 8: Research article Replacing and Additive Horizontal Gene Transfer

loss events are excluded in this study). The somewhat re-duced numbers of genes in SPY (particularly in SPY1) wereprimarily explained by reduced rates of gene appearance,rather than by increased loss or other factors. The estimatednumbers of ancestral genes tended to decrease toward theroot of the tree, but this likely reflects under estimation dueto parallel losses of some genes (especially on the branchesbeneath the root) rather than a true increase in gene numberover evolutionary time. An exception was the branch leadingto the ancestor of the two SPY individuals, which was en-riched for gene loss events, perhaps associated with nicheadaptation. High rates of loss also occurred on the branchesto SDE1, SDE2, and SDD. Duplications were relatively infre-quent overall, but, as has been noted previously (Marri et al.2006), they were substantially enriched on external branchesof the phylogeny.

We conservatively identified 290 of the 1,218 (non appear-ance) transfers (23.8%) as additive HGTs. As with the repla-cing HGTs evaluated in the model-based analysis, thesepredicted additive transfers were significantly enrichedwithin the SDE clade, and they were weakly enrichedwithin the SPY clade (fig. 3B and supplementary fig. S8,Supplementary Material online). They were also significantlyenriched between SDE and SDD, with a preference for theSDE-to-SDD direction, and significantly depleted betweenSDD and SPY. However, unlike the replacing HGTs fromthe previous section, these additive transfers were not signif-icantly enriched between SPY and SDE. They also did notexhibit a pronounced directional asymmetry between SPYand SDE, except between the ancestral SPY and SDE branches,where they were significantly depleted in both directions, butmore strongly from SDE-to-SPY. It is worth noting, however,that this apparent asymmetry in depletion could in fact re-flect an asymmetry in enrichment, as the normalization inthis case reflects average rates across the tree, and an inflatedaverage could cause a global shift in the heatmap towarddepletion (blue). Other differences also make it difficult tocompare the model- and parsimony-based analyses—for ex-ample, the first considered the core genome only, whereas the

second considered the pan genome; and the two types ofanalysis might have quite different power for events affectingdifferent portions of the phylogeny. Nevertheless, we findmuch clearer evidence that homologous recombination,rather than additive transfer, has driven the apparent geneflow between SPY and SDE, particularly in the SPY-to-SDEdirection. Notably, we compared the replacing HGTs identi-fied in the parsimony-based analysis with those from themodel-based analysis, and found reasonable concordance,despite substantial differences between the data sets andmethods (Supplementary Material online).

Functional Categories of Transferred Genes

To gain insight into the functional impact of HGT, we as-signed genes to various functional categories and looked forstatistical associations between these categories and pre-dicted gene transfer events (see Materials and Methods).First, we partitioned all genes into “virulence” and“nonvirulence” categories, considering genes whose homo-logs in other genera of bacteria exhibit putative virulencephenotypes (based on VFDB) to be putative virulencegenes (see Suzuki et al. 2011). Among gene families forwhich we inferred additive HGTs, putative virulence geneswere significantly enriched (P = 1.33� 10�11). Moreover, pu-tative virulence genes associated with additive HGTs showeda pronounced enrichment for SPY-to-SDE events (25.0% ofthe events compared with 12.1% for all genes at the level ofthe entire SPY and SDE clades, P = 0.01, Fisher’s exact test;supplementary table S6, Supplementary Material online).Transfers in the reverse direction (SDE-to-SPY) were alsoover represented (17.5% vs. 10.6%) but not significantly(P = 0.11). Replacing gene transfers, on the other hand, werenot significantly enriched for putative virulence genes, andthose putative virulence genes that did exhibit strongevidence of replacing transfers were not enriched for SDE/SPY transfers (supplementary table S5, SupplementaryMaterial online). We also found a significant association be-tween putative virulence genes and gene duplications(P = 7.13� 10�3). Not surprisingly, families associated with

SDE1

SDE2

SDD

SPY1

SPY2

SEE

2095

1979

1746

1762

1858

16621475

1865

2107

2001

2070

A197 D21 L182T33 R168

A192 D14 L42T36 R117

A279 D0 L9T1 R0

A146 D1 L135T3 R1

A81 D2 L104T17 R100

A103 D1 L214T6 R20

A134 D19 L102T33 R119

A255 D73 L231T78 R170

A343 D90 L73T54 R112

A227 D36 L201T29 R121

FIG. 5. Distribution of inferred gene duplication, loss, and transfer events across the six-species phylogeny. Numbers at nodes represent gene counts forextant species and estimated counts for ancestral species. Numbers on branches indicate inferred gene appearances (A*), duplications (D*), losses (L*),additive transfers (T*), and replacing transfers (R*). Transfer events are recorded on recipient branches (donors are not indicated). Singleton families(containing a single gene) are included as appearance events on external branches of the phylogeny.

3316

Choi et al. . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 9: Research article Replacing and Additive Horizontal Gene Transfer

bacteriophages (based on an annotated “phage” product forleast one gene in the family) were strongly enriched for ad-ditive HGTs; 88 of 288 (30.6%) of phage families exhibitedadditive HGTs compared with only 269 of all 3,408 (7.8%)families (P< 1.6� 10�33).

Next, we assigned GO terms to all genes and searched forsignificant associations with transfer events. Several GO cat-egories having to do with transposition, recombination, andDNA integration were significantly associated with additivegene transfers (table 1), consistent with observations in otherbacterial groups (e.g., Liu et al. 2009). Essentially the samecategories were enriched when phage genes were excluded.Replacing transfers, on the other hand, showed much weaker,and quite different, functional associations compared withadditive transfers. These associations were similar for repla-cing transfers inferred by the model- and parsimony-basedmethods (supplementary tables S7 and S8, SupplementaryMaterial online).

Positive Selection and Gene Transfer

We tested for evidence of positive selection within the genefamilies and for associations between positive selection andvarious types of events. We performed a likelihood ratio testusing the “sites models” M1a and M2b from the PhylogeneticAnalysis by Maximum Likelihood (PAML) package (Yang2007) on each of the 2,991 gene fragment families withthree or more genes. After controlling for multiple compar-isons, 31 distinct families showed evidence of positive selec-tion (false discovery rate< 5%). Among high-confidence genetrees (bootstrap values of > 90% on every branch), families

with duplication events were slightly enriched for evidence ofpositive selection (P< 0.046, MWU) as were families withadditive transfers (P< 0.002, MWU). These results heldeven when additional recombination break points were in-ferred (Supplementary Material online). Of the 31 familieswith significant positive selection, 13 families were part ofthe core genome analyzed by ClonalOrigin. Replacing trans-fers from SDE to SPY and from SPY to SDE were slightlyenriched for positive selection (P< 0.005 and P< 0.01,MWU). For 11 of the 13 (85%) families showing evidence ofpositive selection, the SPY-to-SDE direction had greater in-tensity than the SDE-to-SPY direction, consistent with overalldirectional bias (83.3%). The same directional bias was notseen for additive transfers between the two clades.

Comparison with Previous Results for HousekeepingGenes

Several previous studies have considered the issue of geneflow between SPY and SDE based on MLST data for smallnumbers of housekeeping genes (e.g., Ahmad et al. 2009;McMillan et al. 2010; Jensen and Kilian, 2012). We comparedour results with those of Ahmad et al. (2009), who performedone of the few MLST studies to address the issue of direction-ality in gene flow. Six of the seven genes examined by Ahmadet al. (2009) (gki, gtr, murI, mutS, recP, and xpt) were includedin both our model- and parsimony-based analyses, whereasthe seventh (yqiL; atoB in SDE) was excluded from ourmodel-based analysis, owing to elevated sequence divergencebetween SPY and SDE.

Table 1. GO Enrichments for Additive Gene Transfers.

Pa qb Countc GO Term Description

7e-61 3e-58 67 GO:0006313 Transposition, DNA mediated

3e-58 7e-56 59 GO:0004803 Transposase activity

1e-30 2e-28 52 GO:0015074 DNA Integration

1e-28 1e-26 23 GO:0032196 Transposition

1e-13 1e-11 527 GO:0003677 DNA binding

2e-12 2e-10 86 GO:0006310 DNA recombination

1e-11 7e-10 197 GO:0003676 Nucleic acid binding

3e-09 1e-07 36 GO:0016987 Sigma factor activity

3e-09 1e-07 36 GO:0006352 Transcription initiation

3e-08 1e-06 142 GO:0043565 Sequence-specific DNA binding

4e-07 1e-05 12 GO:0008170 N-methyltransferase activity

1e-06 4e-05 12 GO:0019867 Outer membrane

2e-06 5e-05 25 GO:0007059 Chromosome segregation

1e-05 4e-04 15 GO:0006306 DNA methylation

2e-05 6e-04 15 GO:0006470 Protein amino acid dephosphorylation

2e-04 4e-03 76 GO:0005618 Cell wall

3e-04 7e-03 25 GO:0008236 Serine-type peptidase activity

3e-04 7e-03 65 GO:0009986 Cell surface

5e-04 1e-02 10 GO:0003872 6-phosphofructokinase activity

1e-03 2e-02 11 GO:0009307 DNA restriction–modification system

2e-03 5e-02 84 GO:0006974 Response to DNA damage stimulus

a P value based on a Mann–Whitney U-test (see Materials and Methods).

b Corresponding false discovery rate estimated by the Benjamini–Hochberg method. All categories having at least 10 genes and q� 0.05 are displayed.

c Number of genes assigned to category.

3317

Horizontal Gene Transfer in Streptococcus . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 10: Research article Replacing and Additive Horizontal Gene Transfer

We observed general concordance with Ahmad et al.(2009) in the genes predicted to have undergone HGT.Specifically, we found a strong signal for SPY–SDE HGT inmutS, recP, and xpt, a weaker signal in gki, and no evidence forHGT in gtr and murI, in agreement with Ahmad et al. (2009).(Two of the genes showing evidence of HGT, gki, and mutS,are annotated as putative virulence genes.) Our model-and parsimony-based analyses were also concordant forall of these genes. The recombinant edges sampled byClonalOrigin in gki and mutS (and, to a lesser extent, xpt)showed a clear bias for the SPY-to-SDE direction, whereasboth directions were sampled at similar rates in recP. In con-trast, Ahmad et al. (2009) found some evidence for gene flowin the opposite direction, from SDE to SPY. However, thisdiscrepancy may be due to limitations in methods byAhmad et al. (see Discussion).

Alternative SPY Genomes

To examine the robustness of our results to the choice of SPYgenomes, we repeated our model-based analysis 10 additionaltimes, in each case substituting two different SPY genomesfor SPY1 and SPY2, whereas leaving SDE1, SDE2, and SDDunchanged. These experiments made use of 12 of the 14publicly available SPY genomes and covered all 11 repre-sented serotypes (supplementary tables S9 and S10,Supplementary Material online). For each replicated dataset, we applied our entire analysis pipeline, including multiplealignment, processing and filtering of alignment blocks, esti-mation of a clonal frame, and two rounds of ClonalOriginanalysis (see Materials and Methods). The estimated align-ments were qualitatively similar in all cases, and all clonalframes showed a high degree of concordance, as did theglobal parameters estimated by ClonalOrigin (supplementarytable S10 and supplementary fig. S9, Supplementary Materialonline). In addition, the heatmaps summarizing themodel-based analysis all strongly resembled the one infigure 3A (supplementary fig. S10, Supplementary Materialonline). Because the finding of directional asymmetry inSPY/SDE gene flow was of particular interest, we repeatedour analysis to estimate the ratio of the numbers ofSPY-to-SDE to SDE-to-SPY recombinant edges under boththe prior and posterior distributions for all replicate datasets. We found the resulting estimates to be highly consistentwith prior means ranging from 1.37 to 1.47 and posteriormeans from 1.81 to 2.05 (supplementary table S11,Supplementary Material online). In all cases, all posterior sam-ples of this ratio exceeded the expected value under the prior,indicating strong evidence in the data for a bias in gene flowfavoring the SPY-to-SDE direction. Notably, because thesereplicate analyses involved a reapplication of the entire anal-ysis pipeline, they allow to a degree for statistical uncertaintyin the estimation of the alignment and the clonal frame, aswell as in the ClonalOrigin analysis. We performed a similarvalidation of our parsimony-based analysis, substituting twoalternative SPY genomes for SPY1 and SPY2 (supplementaryfig. S11, Supplementary Material online), and also observedgeneral agreement with the primary analysis in this case.

DiscussionThere has been a great deal of interest for decades in theprocess of HGT and the manner in which it has influencedphylogenetic relationships, particularly among bacteria(Koonin et al. 2001). Most previous studies, however, haveeither simply examined patterns of phylogenetic discordance,without differentiating between additive and replacing HGTs(e.g., Lerat et al. 2005), or have focused specifically on theprocess of homologous recombination, which is primarily as-sociated with replacing HGTs (Didelot and Falush 2007;Didelot et al. 2010). To our knowledge, this is the firststudy to examine both processes on a genome-wide scale,using both model- and parsimony-based methods.

We have focused our analysis on SPY and two of its closerelatives, SDE and SDD, a human commensal-like organismand a strict veterinary pathogen, respectively. We find strongevidence of gene flow both within and between the SPY andSDE groups. Events between SPY and SDE are more promi-nent among inferred replacing than among inferred additivetransfers, suggesting that they may be driven by homologousrecombination, although we acknowledge several challengesin comparing inferences from the model- and parsimony-based analyses. We also find strong support for an asymmetryin replacing gene transfers between SPY and SDE, with apreference for the SPY-to-SDE direction. The situation withSDD is more complex, with somewhat elevated rates ofSDE-to-SDD transfer (more pronounced for additive thanfor replacing HGTs), mixed evidence for SDD-to-SDE transfer,and generally reduced rates of transfer between SDD and SPY,perhaps due to a combination of genetic divergence and dif-ferent ecological niches. These findings hold regardless of theparticular SPY genomes selected for the analyses. We also findthat putative virulence genes are significantly associated withadditive, but not replacing, gene transfers, as are genes underpositive selection. Genes that have undergone replacingtransfers between SPY and SDE are also enriched for positiveselection. A central component of our study is the use ofsimulations to demonstrate that our methods have goodpower and accuracy in the detection of both types of transferevents.

In comparing replacing and additive HGTs, it is worthbearing in mind that these events are likely to be stronglycorrelated with the core and dispensable portions of thegenome. In part, this reflects an ascertainment bias in theinference procedures used to detect these events. Methodsfor detecting recombination (replacing transfers) tend to relyon large-scale alignments spanning multiple syntenic loci togain power. As a result, these methods are effectively limitedto application in the core genome, even though homologousrecombination probably also occurs between genes that areless widely shared across species. In contrast, phylogeneticreconciliation methods can be applied to the entire pangenome, as in this study. In addition, true additive transferevents will be enriched in the dispensable genome, by defini-tion, and it is likely that homologous recombination occurs athigher rates in the core genome (which is enriched for closehomologs between species). For all these reasons, it is likely

3318

Choi et al. . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 11: Research article Replacing and Additive Horizontal Gene Transfer

that some apparent differences between the two types oftransfer events simply reflect general differences betweenthe core and dispensable portions of the genome, ratherthan properties specifically associated with modes of genetransfer.

Our main goal in this study was to statistically characterizeglobal patterns of HGT in the pyogenic division, and ourpower for reconstructing the histories of individual geneswas necessarily limited by the small number of genomes an-alyzed. Nevertheless, we were able to compare our results forsix genes with those of Ahmad et al. (2009) in some detail. Weobserved remarkably good concordance with Ahmad et al.(2009) in predictions of which genes had experienced HGTbut poorer agreement in the inferred directionality of geneflow. However, Ahmad et al. (2009) based their conclusionson a simple clustering analysis that used pairwise comparisonsof SPY and SDE sequences. Our model-based approach, whichadditionally made use of the SDD outgroup and phylogeneticsignals at flanking sites (as well as in the genes themselves),might offer improved power in addressing the issue of direc-tionality, despite our limited number of samples. Of course, itis also possible that the collection of genomes we examinedand the one analyzed by Ahmad et al. (2009) simply hadquite different histories of HGT at the few genes at which acomparison was possible.

The question of the mechanism responsible for the asym-metric gene flow between SPY and SDE remains open.Possible mechanisms include differences in the sizes or demo-graphic structure of commingling populations of cells or dif-ferences in barriers to gene transfer (Thomas and Nielsen2005). One possibility that has been raised is that SPY is in-herently less capable than SDE of acquiring DNA by homol-ogous recombination, for example, due to differences in therestriction modification system (RMS), heteroduplex forma-tion, and/or mismatch repair. Interestingly, although most ofSPY genome contain both types I and II RMSs, the two SDEgenomes studied here is contain only type II RMSs (Robertset al. 2010).

As sequence data become available for many more closelyand distantly related organisms, it will become increasinglyimportant to devise improved methods that consider bothadditive and replacing HGT. As noted earlier, the currentstatistical models of recombination, such as ClonalOrigin,are effectively limited to considering orthologs, with onecopy per species, in the core genome. The phylogenetic rec-onciliation methods, on the other hand, fail to allow for multi-locus gene transfer events and ignore information frombranch lengths. We have shown that it is possible, to adegree, to extend these methods to distinguish between ad-ditive and replacing transfers, but our methods are limited bytheir dependency on heuristic rules and parsimony assump-tions. It may be possible to develop integrated statisticalmodels that consider both types of gene transfers and pro-duce unbiased estimates of the relative rates at which theseevents occur. Another extension worth considering is directmodeling of incomplete lineage sorting (ILS), which is likely tobe increasingly important as larger phylogenies are consid-ered, especially given the large effective population sizes of

many bacteria. ILS has recently been integrated into modelsfor duplication and loss (Rasmussen and Kellis 2012) but hasyet to be considered together with gene transfer. We expectthat the combination of improved models and richer datasets will allow for a much more detailed understanding of theimportant process of HGT.

Supplementary MaterialSupplementary material, tables S1–S11, and figures S1–S11are available at Molecular Biology and Evolution online(http://www.mbe.oxfordjournals.org/).

Acknowledgments

The authors thank Xavier Didelot for assistance withClonalOrigin. This study was funded by the NationalInstitute of Allergy and Infectious Disease, US NationalInstitutes of Health, under grant number AI073368-01A2(to M.J.S. and A.S.). Additional support was provided byNational Science Foundation CAREER Award DBI-0644111and a David and Lucile Packard Fellowship for Science andEngineering (to A.S.).

ReferencesAhmad Y, Gertz RE, Li Z, et al. (11 co-authors). 2009. Genetic relation-

ships deduced from emm and multilocus sequence typing of inva-sive Streptococcus dysgalactiae subsp. equisimilis and S. canisrecovered from isolates collected in the United States. J ClinMicrobiol. 47:2046–2054.

Anisimova M, Bielawski J, Dunn K, Yang Z. 2007. Phylogenomic analysisof natural selection pressure in Streptococcus genomes. BMC EvolBiol. 7:154.

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: apractical and powerful approach to multiple testing. J Roy Stat Soc B.57:289–300.

Beres SB, Richter EW, Nagiec MJ, Sumby P, Porcella SF, DeLeo FR, MusserJM. 2006. Molecular genetic anatomy of inter- and intraserotypevariation in the human bacterial pathogen group A Streptococcus.Proc Natl Acad Sci U S A. 103:7059–7064.

Beres SB, Sylva GL, Barbian KD, et al. (16 co-authors). 2002. Genomesequence of a serotype M3 strain of Group A Streptococcus:phage-encoded toxins, the high-virulence phenotype, and cloneemergence. Proc Natl Acad Sci U S A. 99:10078–10083.

Brandt CM, Spellerberg B. 2009. Human infections due to Streptococcusdysgalactiae subspecies equisimilis. Clin Infect Dis. 49:766–772.

Darling ACE, Mau B, Blattner FR, Perna NT. 2004. Mauve: multiplealignment of conserved genomic sequence with rearrangements.Genome Res. 14:1394–1403.

Darling AE, Mau B, Perna NT. 2010. progressiveMauve: multiple genomealignment with gene gain, loss and rearrangement. PLoS One. 5:e11147.

David LA, Alm EJ. 2011. Rapid evolutionary innovation during anarchaean genetic expansion. Nature 469:93–96.

Davies MR, McMillan DJ, Beiko RG, Barroso V, Geffers R, Sriprakash KS,Chhatwal GS. 2007. Virulence profiling of Streptococcus dysgalactiaesubspecies equisimilis isolated from infected humans reveals 2 dis-tinct genetic lineages that do not segregate with their phenotypes orpropensity to cause diseases. Clin Infect Dis. 44:1442–1454.

3319

Horizontal Gene Transfer in Streptococcus . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018

Page 12: Research article Replacing and Additive Horizontal Gene Transfer

Davies MR, McMillan DJ, Domselaar GHV, Jones MK, Sriprakash KS.2007. Phage 3396 from a Streptococcus dysgalactiae subsp. equisimilispathovar may have its origins in Streptococcus pyogenes. J Bacteriol.189:2646–2652.

Davies MR, Tran TN, McMillan DJ, Gardiner DL, Currie BJ, Sriprakash KS.2005. Inter-species genetic movement may blur the epidemiologyof streptococcal diseases in endemic regions. Microbes Infect. 7:1128–1138.

Didelot X, Falush D. 2007. Inference of bacterial microevolution usingmultilocus sequence data. Genetics 175:1251–1266.

Didelot X, Lawson D, Darling A, Falush D. 2010. Inference of homologousrecombination in bacteria using whole-genome sequences. Genetics186:1435–1449.

Doyon JP, Hamel S, Chauve C. 2012. An efficient method for exploringthe space of gene tree/species tree reconciliations in a probabilisticframework. IEEE/ACM Trans Comput Biol Bioinform. 9:26–39.

Doyon JP, Ranwez V, Daubin V, Berry V. 2011. Models, algorithmsand programs for phylogeny reconciliation. Brief Bioinformatics. 12:392–400.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accu-racy and high throughput. Nucleic Acids Res. 32:1792–1797.

Edwards AWF. 1970. Estimation of the branch points of a branchingdiffusion process. J Roy Stat Soc B. 32:155–174.

Facklam R. 2002. What happened to the streptococci: overview of tax-onomic and nomenclature changes. Clin Microbiol Rev. 15:613–630.

Feil EJ, Holmes EC, Bessen DE, et al. (12 co-authors). 2001.Recombination within natural populations of pathogenic bacteria:short-term empirical estimates and long-term phylogenetic conse-quences. Proc Natl Acad Sci U S A. 98:182–187.

Ferretti JJ, McShan WM, Ajdic D, et al. (23 co-authors). 2001. Completegenome sequence of an M1 strain of Streptococcus pyogenes. ProcNatl Acad Sci U S A. 98:4658–4663.

Holden MTG, Heather Z, Paillot R, et al. (26 co-authors). 2009. Genomicevidence for the evolution of Streptococcus equi: host restriction,increased virulence, and genetic exchange with human pathogens.PLoS Pathog. 5:e1000346.

Jensen A, Kilian M. 2012. Delineation of Streptococcus dysgalactiae,its subspecies, and its clinical and phylogenetic relationship toStreptococcus pyogenes. J Clin Microbiol. 50:113–126.

Kalia A, Bessen DE. 2004. Natural selection and evolution of streptococ-cal virulence genes involved in tissue-specific adaptations. J Bacteriol.186:110–121.

Kalia A, Enright MC, Spratt BG, Bessen DE. 2001. (Retracted) Directionalgene movement from human-pathogenic to commensal-like strep-tococci. Infect Immun. 69:4858–4869.

Kalia A, Enright MC, Spratt BG, Bessen DE. 2009. Retraction. directionalgene movement from human-pathogenic to commensal-like strep-tococci. Infect Immun. 77:4688.

Koonin EV, Makarova KS, Aravind L. 2001. Horizontal gene transfer inprokaryotes: quantification and classification. Annu Rev Microbiol.55:709–742.

Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW.2006. GARD: a genetic algorithm for recombination detection.Bioinformatics 22:3096–3098.

Lefebure T, Stanhope MJ. 2007. Evolution of the core and pan-genomeof Streptococcus: positive selection, recombination, and genomecomposition. Genome Biol. 8:R71.

Lerat E, Daubin V, Ochman H, Moran NA. 2005. Evolutionary origins ofgenomic repertoires in bacteria. PLoS Biol. 3:e130.

Liu M, Siezen RJ, Nauta A. 2009. In silico prediction of horizontal gene

transfer events in Lactobacillus bulgaricus and Streptococcus thermo-

philus reveals protocooperation in yogurt manufacturing. Appl

Environ Microbiol. 75:4120–4129.

Marri PR, Hao W, Golding GB. 2006. Gene gain and gene loss in

Streptococcus: is it driven by habitat? Mol Biol Evol. 23:2379–2391.

Maynard-Smith J, Smith NH, O’Rourke M, Spratt BG. 1993. How clonal

are bacteria? Proc Natl Acad Sci USA. 90:4384–4388.

McMillan DJ, Bessen DE, Pinho M, Ford C, Hall GS, Melo-Cristino J,

Ramirez M. 2010. Population genetics of Streptococcus dysgalactiae

subspecies equisimilis reveals widely dispersed clones and extensive

recombination. PLoS One. 5:e11741.

Merkle D, Middendorf M, Wieseke N. 2010. A parameter-adaptive dy-

namic programming approach for inferring cophylogenies. BMC

Bioinformatics 11(Suppl 1), S60.

Ochman H, Lawrence JG, Groisman EA. 2000. Lateral gene transfer and

the nature of bacterial innovation. Nature 405:299–304.

Rasmussen MD, Kellis M. 2012. A unified model of gene duplica-

tion, loss, and coalescence using a locus tree. Genome Res. In

revision.

Roberts RJ, Vincze T, Posfai J, Macelis D. 2010. REBASE–a database for

DNA restriction and modification: enzymes, genes and genomes.

Nucleic Acids Res. 38:D234–D236.

Sachse S, Seidel P, Gerlach D, Gunther E, Rodel J, Straube E, Schmidt KH.

2002. Superantigen-like gene(s) in human pathogenic Streptococcus

dysgalactiae subsp equisimilis: genomic localisation of the gene

encoding streptococcal pyrogenic exotoxin G (speG(dys)). FEMS

Immunol Med Microbiol. 34:159–167.

Shimomura Y, Okumura K, Murayama SY, Yagi J, Ubukata K, Kirikae T,

Miyoshi-Akiyama T. 2011. Complete genome sequencing and anal-

ysis of a lancefield Group G Streptococcus dysgalactiae subsp. equi-

similis strain causing streptococcal toxic shock syndrome (STSS).

BMC Genomics 12:17.

Smith MW, Feng DF, Doolittle RF. 1992. Evolution by acquisition:

the case for horizontal gene transfers. Trends Biochem Sci. 17:

489–493.

Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylo-

genetic analyses with thousands of taxa and mixed models.

Bioinformatics 22:2688–2690.

Suzuki H, Lefebure T, Hubisz MJ, Bitar PP, Lang P, Siepel A, Stanhope MJ.

2011. Comparative genomic analysis of the Streptococcus dysgalac-

tiae species group: gene content, molecular adaptation, and pro-

moter evolution. Genome Biol Evol. 3:168–185.

Tettelin H, Masignani V, Cieslewicz MJ, et al. (46 co-authors). 2005.

Genome analysis of multiple pathogenic isolates of Streptococcus

agalactiae: implications for the microbial “pan-genome.” Proc Natl

Acad Sci U S A. 102:13950–13955.

Thomas CM, Nielsen KM. 2005. Mechanisms of, and barriers to,

horizontal gene transfer between bacteria. Nat Rev Microbiol. 3:

711–721.

Vandamme P, Pot B, Falsen E, Kersters K, Devriese LA. 1996. Taxonomic

study of lancefield streptococcal groups C, G, and L (Streptococcus

dysgalactiae) and proposal of S. dysgalactiae subsp. equisimilis subsp.

nov. Int J Syst Bacteriol. 46:774–781.

Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood.

Mol Biol Evol. 24:1586–1591.

Zhaxybayeva O, Doolittle WF. 2011. Lateral gene transfer. Curr Biol. 21:

R242–R246.

3320

Choi et al. . doi:10.1093/molbev/mss138 MBE

Downloaded from https://academic.oup.com/mbe/article-abstract/29/11/3309/1143418by gueston 07 February 2018