evolution of the insertion-deletion mutation rate across …lynchlab/pdf/lynch244.pdf · evolution...

9
INVESTIGATION Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,* ,1,2 Matthew S. Ackerman, ,1 Marcus M. Dillon, Thomas G. Platt, , ** Clay Fuqua, Vaughn S. Cooper, , ** and Michael Lynch *Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina 28223, Department of Biology, Indiana University, Bloomington, Indiana 47405, Microbiology Graduate Program, University of New Hampshire, Durham, New Hampshire 03824, § Division of Biology, Kansas State University, Manhattan, Kansas 66506, and **Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pennsylvania 15219 ORCID ID: 0000-0002-8336-8913 (W.S.) ABSTRACT Mutations are the ultimate source of variation used for evolutionary adaptation, while also being predominantly deleterious and a source of genetic disorders. Understanding the rate of insertion- deletion mutations (indels) is essential to understanding evolutionary processes, especially in coding regions, where such mutations can disrupt production of essential proteins. Using direct estimates of indel rates from 14 phylogenetically diverse eukaryotic and bacterial species, along with measures of standing variation in such species, we obtain results that imply an inverse relationship of mutation rate and effective population size. These results, which corroborate earlier observations on the base-substitution mutation rate, appear most compatible with the hypothesis that natural selection reduces mutation rates per effective genome to the point at which the power of random genetic drift (approximated by the inverse of effective population size) becomes overwhelming. Given the substantial differences in DNA metabolism pathways that give rise to these two types of mutations, this consistency of results raises the possibility that renement of other molecular and cellular traits may be inversely related to species-specic levels of random genetic drift. KEYWORDS insertion-deletion mutation rate mutation-rate evolution drift barrier mutation accumulation Mutations are a double-edged sword in all organisms, constituting the ultimate source of variation used for evolutionary adaptation, while also being predominantly deleterious and a source of genetic disorders. Hence, researchers have long sought the primary factors governing mutation-rate evolution. Some have argued that the mutation rate of an organism reects a balance between the delete- rious effect of mutations and physiological limitations, with further renement of replication delity limiting the speed of DNA synthesis necessary for efcient daughter-cell production (Drake 1991; Sniegowski et al. 2000). However, replication delity can be im- proved without a signicant decrease in doubling time (Loh et al. 2010), and prokaryotes undergo high cell-division rates and have low mutation rates (Drake 1991; Lynch 2010), suggesting that rep- lication delity does not limit the rate of daughter-cell production. Furthermore, because there is no negative correlation between cell- division rate and genome size (Mira et al. 2001; Vieira-Silva et al. 2010), and the reverse may even be true in bacteria (Lynch and Marinov 2015), cell-division rates do not appear to be limited by the amount of DNA synthesized. Thus, alternative forces may gov- ern mutation-rate evolution. A general relationship describing mutation-rate variation was pro- posed by Drake et al. (1998), who suggested that the mutation rate per nucleotide site scales inversely with genome size in bacteria and uni- cellular eukaryotes, such that there is a constant 0.003 mutations per haploid genome per cell division. However, as direct estimates of mutation rates for additional organisms became available, the general relationship between genome size and mutation rate became less ap- parent, even when scaled to the number of cell divisions per generation in multicellular species (Lynch 2010). Copyright © 2016 Sung et al. doi: 10.1534/g3.116.030890 Manuscript received May 8, 2016; accepted for publication June 13, 2016; published Early Online June 15, 2016. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Supplemental material is available online at www.g3journal.org/lookup/suppl/ doi:10.1534/g3.116.030890/-/DC1 1 These authors contributed equally to this work. 2 Corresponding author: Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd. Charlotte, NC 28223. E-mail: [email protected] Volume 6 | August 2016 | 2583

Upload: buitram

Post on 19-Aug-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

INVESTIGATION

Evolution of the Insertion-Deletion Mutation RateAcross the Tree of LifeWay Sung12 Matthew S Ackermandagger1 Marcus M DillonDagger Thomas G Plattdagger Clay Fuquadagger

Vaughn S CooperDagger and Michael Lynchdagger

Department of Bioinformatics and Genomics University of North Carolina at Charlotte North Carolina 28223daggerDepartment of Biology Indiana University Bloomington Indiana 47405 DaggerMicrobiology Graduate Program University ofNew Hampshire Durham New Hampshire 03824 sectDivision of Biology Kansas State University Manhattan Kansas66506 and Department of Microbiology and Molecular Genetics University of Pittsburgh School of MedicinePennsylvania 15219

ORCID ID 0000-0002-8336-8913 (WS)

ABSTRACT Mutations are the ultimate source of variation used for evolutionary adaptation while alsobeing predominantly deleterious and a source of genetic disorders Understanding the rate of insertion-deletion mutations (indels) is essential to understanding evolutionary processes especially in codingregions where such mutations can disrupt production of essential proteins Using direct estimates of indelrates from 14 phylogenetically diverse eukaryotic and bacterial species along with measures of standingvariation in such species we obtain results that imply an inverse relationship of mutation rate and effectivepopulation size These results which corroborate earlier observations on the base-substitution mutationrate appear most compatible with the hypothesis that natural selection reduces mutation rates per effectivegenome to the point at which the power of random genetic drift (approximated by the inverse of effectivepopulation size) becomes overwhelming Given the substantial differences in DNA metabolism pathways thatgive rise to these two types of mutations this consistency of results raises the possibility that refinement ofother molecular and cellular traits may be inversely related to species-specific levels of random genetic drift

KEYWORDS

insertion-deletionmutation rate

mutation-rateevolution

drift barriermutationaccumulation

Mutations are a double-edged sword in all organisms constitutingthe ultimate source of variation used for evolutionary adaptationwhile also being predominantly deleterious and a source of geneticdisorders Hence researchers have long sought the primary factorsgoverning mutation-rate evolution Some have argued that themutation rate of an organism reflects a balance between the delete-rious effect of mutations and physiological limitations with furtherrefinement of replication fidelity limiting the speed of DNA synthesis

necessary for efficient daughter-cell production (Drake 1991Sniegowski et al 2000) However replication fidelity can be im-proved without a significant decrease in doubling time (Loh et al2010) and prokaryotes undergo high cell-division rates and havelow mutation rates (Drake 1991 Lynch 2010) suggesting that rep-lication fidelity does not limit the rate of daughter-cell productionFurthermore because there is no negative correlation between cell-division rate and genome size (Mira et al 2001 Vieira-Silva et al2010) and the reverse may even be true in bacteria (Lynch andMarinov 2015) cell-division rates do not appear to be limited bythe amount of DNA synthesized Thus alternative forces may gov-ern mutation-rate evolution

A general relationship describing mutation-rate variation was pro-posed by Drake et al (1998) who suggested that the mutation rate pernucleotide site scales inversely with genome size in bacteria and uni-cellular eukaryotes such that there is a constant 0003 mutationsper haploid genome per cell division However as direct estimates ofmutation rates for additional organisms became available the generalrelationship between genome size and mutation rate became less ap-parent even when scaled to the number of cell divisions per generationin multicellular species (Lynch 2010)

Copyright copy 2016 Sung et aldoi 101534g3116030890Manuscript received May 8 2016 accepted for publication June 13 2016published Early Online June 15 2016This is an open-access article distributed under the terms of the CreativeCommons Attribution 40 International License (httpcreativecommonsorglicensesby40) which permits unrestricted use distribution and reproductionin any medium provided the original work is properly citedSupplemental material is available online at wwwg3journalorglookupsuppldoi101534g3116030890-DC11These authors contributed equally to this work2Corresponding author Department of Bioinformatics and Genomics University ofNorth Carolina at Charlotte 9201 University City Blvd Charlotte NC 28223E-mail waysungindianaedu

Volume 6 | August 2016 | 2583

In a previous analysis we found a relationship between the base-substitutionmutation rate per site per generation (ubs)multiplied by theamount of functional DNA in a genome (Ge approximated by pro-teome size) and the power of random genetic drift which is inverselyproportional to the effective population size (Ne) (Sung et al 2012a)Because mutations are generally deleterious this finding suggested thatselection operates to reduce genome-wide mutation rates by refiningDNA replication fidelity and repair until further improvements aretoo inconsequential to overcome the power of random genetic drift(Sniegowski and Raynes 2013) This result is consistent with thedrift-barrier hypothesis (DBH) which proposes that natural selectionoperates to improve molecular and cellular traits until the selectiveadvantage of a beneficial mutation refining the trait is so miniscule thatthe probability of it being fixed is essentially the same as that for neutralmutations (Lynch 2011 Sung et al 2012a)

While the negative correlation between ubsGe and Ne is consistentwith expectations from population-genetic theory there is a potentialissue of circularity when correlating these factors as the estimationof Ne relies indirectly on the estimation of ubs (Sung et al 2012a)Although we presented an analysis suggesting that the correlated pa-rameters are not likely to be the primary factor in the observed relation-ship (Sung et al 2012a) and provide another one here (SupplementalMaterial File S1) a more independent analysis is desirable and giventhe amount of data that has accumulated it is time to go beyond a studythat simply considers base-substitutionmutations Here we present therate of insertion-deletion mutation (indel) events (uid) per site pergeneration across eight eukaryotic and seven bacterial species whilealso providing genome-wide estimates of ubs and uid from three newbacterial mutation-accumulation studies These data continue to sup-port a negative correlation between the genome-wide mutation rateand Ne

The DBH postulates that genetic drift determines the limit ofadaptive molecular refinement that can be achieved for any traitincluding those that determine the rate of indels Indels are a class ofmutations separate from base substitutions differing in how theyoriginate Indels generally arise from strand slippage or double-strandbreaks whereas base-substitution mutations originate primarily frombasemisincorporation or biochemical alteration Furthermore there aremajor differences in how the two mutation types are repaired Base-substitution mutations are often reversed by enzymes such as DNAphotolyases and alkyl transferases which do not require DNA in-cision and synthesis (Sancar et al 2004) or are identified by glyco-sylases in base-excision repair (BER) pathways and repaired byincision and DNA-gap filling (Krokan and Bjoras 2013) On theother hand indel mutations are not surveyed by BER but arerepaired primarily by nucleotide-excision repair (NER) which hasbroad substrate specificity and is used to excise bulky lesions arisingfrom the insertion or deletion of nucleotides (Morita et al 2010)Although the mismatch-repair (MMR) pathway can operate on bothbase-substitution mutations and indels MMR-deficient strains ofEscherichia coli and Caenorhabditis elegans exhibit a significantlygreater elevation of the indel mutation rate relative to that for basesubstitutions providing further evidence for the differential treat-ment of mutation types by DNA-repair pathways (Denver et al2005 Lee et al 2012) Furthermore depending on the type of mis-match and local sequence context the error rates of different poly-merases are highly variable between indel and base-substitutionmutations (McCulloch and Kunkel 2008 Kunkel 2009 Sung et al2015) In summary because the enzymes influencing base-substitutionand indel mutation rates differ (and shared enzymes differ in thespectrum of repaired premutations) a focus on the indel mutation

rate provides a means of testing the validity of the DBH that issubstantially independent biologically (and essentially fully inde-pendent in terms of investigator sampling) of that used to extrap-olate measures of the power of random genetic drift

Selection operates to refineDNAreplicationfidelity and repairwhenthe genome-wide deleterious load confers a discernable fitness disad-vantage on an organism (Kimura 1967 1983 Lynch 2010) and thecontributions of indel and base substitution mutations to genome-widedeleterious load differ in two ways First the effects of base substi-tutions in coding regions are highly variable (Eyre-Walker andKeightley 2007) and some base substitutions may not have anyeffect on organismal fitness which may create some uncertaintiesin quantifying the effective genome size (Ge) thereby reducing thecorrelation observed between ubsGe and Ne (Sung et al 2012a) Onthe other hand most indel mutations that arise in protein-codinggenes will generate a frame-shift mutation interfering with genefunction and having a direct effect on organismal fitness Becausesuch indels are generally deleterious selection is then expected tomore efficiently fine tune the rate at which indels arise and if theDBH holds true this should yield a close correlation between uidGe

and Ne Second base-substitutions are generally limited to singlenucleotides while indels may involve many base pairs Althoughthis might suggest that indels have a larger effect than base substi-tutions single-base pair indels and gene-sized indels both result ingene disruption thus generating more similar fitness effects regard-less of the indel length In fact single base-pair indels in codingDNA may generate malformed gene products that require degrada-tion which might be even more harmful than entire gene deletionsBecause the number of indel events and not the size of indelsdetermines the genome-wide deleterious burden we define the pa-rameter uid to be the number of indel mutation events per site pergeneration and use this parameter to test the DBH

MATERIALS AND METHODSTo examine the effect of genetic drift onmutation-rate evolution it isnecessary to derive accurate estimates of the mutation rate and geneticdiversity across phylogenetically diverse organisms Whole-genomesequencing (WGS) has greatly improved our ability to estimate suchparameters Highly accurate measurements of ubs and uid can beobtained through WGS of mutation-accumulation (MA) lines inwhich repeated single-organism bottlenecking minimizes the effi-ciency of selection allowing for the accumulation of all but the mostdeleterious mutations (Lynch et al 2008 Denver et al 2009Ossowski et al 2010 Sung et al 2012a 2012b 2015 Schrider et al2013) Along with data from prior MA studies this study containsMA data from four new MA experiments For new bacterial MAspecies 100 independent MA lines were initiated from a singlefounder colony The new strains used were as followsAgrobacteriumtumefaciens str C58 Staphylococcus epidermidis ATCC 12228 andVibrio cholerae 2740-80

Dependingon the speedofgrowth a singlecolony fromeachMAlinewas isolated and transferred to a fresh plate every 1ndash3 d over the courseof the experiment The bottlenecking process ensures that mutationsaccumulate in an effectively neutral fashion (Kibota and Lynch 1996)After each transfer the original plate was retained as a backup plate at4 If the destination plate was contaminated or if a single colony couldnot be picked a single colony was transferred from the last 4 backupplate

To estimate the generation times that occurred between each trans-fer every 2 wk an entire colony from five randomly selected MA lineswas transferred to 1 middot PBS saline buffer These were vortexed serially

2584 | W Sung et al

diluted and replated Cell density was calculated from viable cell countsin both the growth conditions used throughout the bottleneck processas well as growth conditions at 4 The total number of generations foreachMA line was calculated by the average number of cell divisions pertransfer multiplied by the total number of transfers If backup plateswere used the average number of cell divisions at 4 was used in placeof the average number of cell divisions per bottleneck at standardgrowth temperatures

The average number of cell divisions across the MA are as follows(Dataset S1) A tumefaciens 5819 Bacillus subtilis 5078 (Sung et al2015) E coli 4246 (Lee et al 2012) Mesoplasma florum 2351 (Sunget al 2012a) S epidermidis 7170 and V cholerae 6453 The averagenumber of generations used for reanalysis of the C elegans MA studywas 250 (Denver et al 2009) (Dataset S2)

DNA extraction of MA lines was done using the wizard DNAextraction kit (Promega) or lysis media (CTAB or SDS) followed byphenolchloroform extractions to Illumina library standards Then101-bp paired-end Illumina (Illumina Hi-Seq platform) sequencingwas applied to randomly selected MA lines of A tumefaciensS epidermidis and V cholerae Each MA line was sequenced to acoverage depth of 100 middot with an average library fragment size(distance between paired-end reads) of 175 bp The paired-endreads for each MA line were individually mapped against the refer-ence genome (assembly and annotation available from the NationalCenter for Biotechnology Information httpswwwncbinlmnihgov) using two separate alignment algorithms BWA v074 (Liand Durbin 2009) and NOVOALIGN v20802 (available at wwwnovocraftcom) The resulting pileup files were converted to SAMformat using SAMTOOLS v0118 (Li et al 2009) Using in-houseperl scripts the alignment information was further parsed to gen-erate forward and reverse mapping information at each site result-ing in a configuration of eight numbers for each line (A a C c G gT and t) corresponding to the number of reads mapped at eachgenomic position in the reference sequence A separate file wasalso generated to display sites that had indel calls from the twoalignment algorithms Mutation calling was performed using a con-sensus method (Lynch et al 2008 Denver et al 2009 Ossowski et al2010 Lee et al 2012 Sung et al 2012a 2012b 2015)

A random subset of base-substitutions mutations called using thesemethods have been previously validated in E coli and B subtilis MAlines using fluorescent sequencing technology at the Indiana MolecularBiology Institute at IndianaUniversity (Lee et al 2012 Sung et al 2015)(Dataset S3)

Toverify indelmutationswedesigned38primer sets toPCRamplify300ndash500 bp regions surrounding the putative indel mutation in the Bsubtilis MA lines (Dataset S4) All 2929 short indels ( 10 bp) weredirectly confirmed using standard fluorescent sequencing technologyTwo out of nine large indels ( 10 bp) were confirmed through sizingof the PCR product on gel electrophoresis The remaining seven largeindels did not amplify For all cases the indel was also confirmed to beabsent in one other line without the mutation

To calculate the base-substitution mutation rate per cell division foreach line we used the following equation

ubs frac14 mnT

where ubs is the base-substitution mutation rate (per nucleotide siteper generation) m is the number of observed base substitutions n isthe number of nucleotide sites analyzed and T is the number ofgenerations that occurred in the mutation-accumulation study TheSE for an individual line is calculated using (Denver et al 2004 2009)

SEx frac14ffiffiffiffiffiffiubsnT

r

The total SEof base-substitutionmutation rate is given by the SDof themutation rates across all lines (s) divided by the square root of thenumber of lines analyzed (N)

SEpooled frac14 sffiffiffiffiN

p

The same calculationwasused to calculate indelmutation ratewithubsreplaced with uid

Data availabilityIllumina DNA sequences for the MA lines used in this study aredeposited under the following Bioprojects A tumefaciens PRJNA256312B subtilis PRJNA256312 M florum PRJNA256337 S epidermidisPRJNA256338 and V cholerae PRJNA256339

File S1 contains detailed descriptions of eukaryotic uid estimates aswell as calculations for Ge Gnc us ps and phylogenetic independentcontrasts for both eukaryotic and prokaryotic organisms Figure S1contains average depth of sequencing coverage for each MA line inA tumefaciens S epidermidis and V cholerae Figure S2 displays thesimilarity in us when increasing the number of unique alleles analyzedFigure S3 shows the frequency distribution of mutant calls across MAlines Table S1 contains the calculation for the estimated limit of selec-tion to fix antimutators Figure S4 Figure S5 Figure S6 and Table S2contain statistical support for the DBH Dataset S1 Dataset S2 DatasetS3 and Dataset S4 contain single nucleotide polymorphisms and indelsfor prokaryotic and eukaryotic organisms generated in this study

RESULTSTo examine the effect of genetic drift on mutation-rate evolution it isnecessary to derive accurate estimates of the mutation rate and geneticdiversity across phylogenetically diverse organisms WGS has greatlyimproved our ability to estimate such parameters Highly accuratemeasurements of ubs and uid can be obtained through WGS of MAlines in which repeated single-organism bottlenecking minimizes theefficiency of selection allowing for the accumulation of all but the mostdeleterious mutations (Lynch et al 2008 Denver et al 2009 Ossowskiet al 2010 Sung et al 2012a 2012b 2015 Schrider et al 2013)

The power of genetic drift is related to the inverse of the effectivepopulation size [1Ne for haploids 1(2Ne) for diploids] Under theassumption of neutrality the effective population size (Ne) can be es-timated from the average nucleotide heterozygosity at silent sites innatural populations (ps) or as a function of the number of segregatingsites in the population (us) both of which lead to expected values equalto 4Neubs in diploids and 2Neubs in haploids (Kimura 1983) For mostorganisms analyzed in this study enough WGS data were available toallow calculation of species-specific us values (see File S1 and Table 1)For the remaining species we pooled large available multilocus-sequence studies to estimate ps In all cases we set the estimates of usorps equal to 4Neubs in diploids (2Neubs in haploids) and solved forNe

by factoring out ubs Because this calculation only involves ubs theestimate of Ne is uninfluenced by sampling error in uid thus providingan independent trait measurement by which to test the DBH (see FileS1 for further evaluation of the nonindependence issue)

To provide additional data for testing whether the power of geneticdrift constrains the lower limit of indel mutation-rate evolution weperformed MA experiments in A tumefaciens str C58 S epidermidisATCC 12228 and V cholerae 2740-80 Each bacterial MA experimentwas initiated from multiple lines derived from a single progenitor

Volume 6 August 2016 | Drift Constrains Indel Rate | 2585

colony each of which was repeatedly bottlenecked to accumulate mu-tations for an average of 5819 7170 and 6453 generations respectively(seeMaterials and Methods harmonic mean population sizes betweentransfers were 134 (01) 126 (03) and 149 (02) respectively) Then101-bp paired-end WGS was applied to randomly selected MA lines(47 A tumefaciens 22 S epidermidis and 46 V cholerae MA linesDataset S1) The average sequencing coverage depth is greater than20 middot per site across all MA lines surveyed in these organisms (FigureS1) and greater than 50 middot per site for 9375 (150160) of the MAlines providing high accuracy for measurement of ubs and uid Muta-tions were called and categorized for each of the three species (DatasetS3 and Dataset S4) with ubs and uid shown in Table 1

To test the DBH we combined ubs and uid from the three bacterialspecies analyzed in this study with ubs and uid from four bacterialand eight eukaryotic MAWGS studies (Table 1 Dataset S1 DatasetS2 Dataset S3 and Dataset S4) and also included the same esti-mates for human derived from WGS of parent-offspring trios uidincludes all indel events in each of the 15 study species (see File S1)Due to the highly repetitive DNA sequence in eukaryotic genomesthe number of large indels events ( 9 bp) in eukaryotes may bedownwardly biased when using WGS methods Therefore our es-

timate of the number of large indel events also includes eventsidentified by comparative genome hybridization arrays for organ-isms where data were available (Lynch et al 2008 Lipinski et al2011) Large indel events only account for 150 of total indelsevents across the study bacteria (76506 Dataset S4) suggestingthat any underestimation of the number of large indel events shouldonly have a small effect on uid

To determine the genome-wide deleterious burden in eachorganismassociated with indel mutations we multiplied uid with Ge approxi-mating the latter by the proteome size of that organism A plot of thelogs of the two parameters of uidGe and Ne against one another yieldsa strong negative correlation across all of cellular life (Figure 1Ar2 = 089) Because the power of genetic drift is inversely proportionaltoNe this observation is consistent with the idea that selection operatesto reduce mutation rates to a barrier imposed by random genetic driftPhylogenetic nonindependence may complicate observed relationshipsbetween genomic attributes andNe (Whitney andGarland 2010) How-ever the relationship between Ne and uidGe remains robust even afterphylogenetic correction (Figure 2 A and B r2 = 083) indicating thatthe correlation betweenNe and uidGe reflects a true biological phenom-enon across the Tree of Life

n Table 1 Effective genome size (Ge) indel events per site per generation (uid) base-substitution mutation rate per generation (ubs) us (orps denoted by ) measurements for population mutation rate (Watterson 1975 Tajima 1989 Fu 1995) and estimated effective populationsize (Ne) for seven prokaryotic and eight eukaryotic organisms (see File S1 for details)

Species LabelGe

(middot 107 Sites)Gc + Gnc

(middot 107 Sites)uid (middot 10210 per

Site per Generation)ubs (middot 10210 Events perSite per Generation) us or ps

Ne

(middot 106)

ProkaryotesAgrobacterium tumefaciens Agt 050 057 030 292 0200 34247Bacillus subtilis Bs 036 043 120d 335d 0041 6119Escherichia coli Ec 039 046 037e 200e 0071 17960Mesoplasma florum Mf 007 008 2310f 9780f 0021 107Pseudomonas aeruginosa Pa 059 067 014g 079g 0033 21070Staphlyococcus epidermidis Se 021 026 113 740 0052 3514Vibrio cholerae Vc 034 039 018 115 0110 47826

EukaryotesArabidopsis thaliana At 421 555a 1120h 6950hp 0008 029Caenorhabditis elegans Ce 250 637b 669i 1450q 0003 054Chlamydomonas reinhardtii Cr 392 551 044 j 380j 0032 4331Drosophila melanogaster Dm 232 886c 461k 5165k 0018 086Homo sapiens Hs 365 2175b 1820l 13513l 0001 002Mus musculus Mm 355 2717b 310m 5400m 0004 177Paramecium tetraurelia Pt 568 728 004n 019n 0008 10180Saccharomyces cerevisiae Sc 087 102b 092o 263o 0004 778

Gc + Gnc is the effective genome size when including the total amount of coding (Gc) and noncoding DNA (Gnc) that is estimated to be under purifying selectionFootnotes in uid and ubs indicate data sources (rates pooled when multiple data sources are available) and when absent indicate data generated in this study (seeMaterials and Methods)aHaudry et al (2013)

bSiepel et al (2005)

cHalligan et al (2004)

dSung et al (2015)

eLee et al (2012)

fSung et al (2012a)gSung et al (2012b)

hOssowski et al (2010)

iLipinski et al (2011)jSung et al (2012a) Ness et al (2015)kSchrider et al (2013)

lConrad et al (2011) OrsquoRoak et al (2011 2012) Kong et al (2012) Campbell and Eichler (2013) Wang and Zhu (2014) The 1000 Genomes Project Consortium(2015)

mUchimura et al (2015)

nSung et al (2012b)

oLynch et al 2008) (Zhu et al 2014)

pOssowski et al (2010) (Yang et al 2015)

qLipinski et al (2011)

2586 | W Sung et al

DISCUSSIONBecause the DBH makes general predictions about the pattern ofmolecular and cellular evolution across the Tree of Life because ourfocus is on one of the central determining factors in the evolutionaryprocess (themutation rate) andbecause the patterns appear so strong itis essential to consider the range of factors that might give rise to theobserved statistical relationships and also to alternative evolutionaryhypotheses for them We first consider three issues with respect toestimating the key parametersNe ubs uid andGe and then elaborate onthe significance and implications of the relationship between uidGe andNe for our understanding of molecular evolution

First we address the estimation ofNe one of themost difficult issuesin empirical population genetics Because populations fluctuate in den-sity over time any estimate of Ne must reflect a long-term averagepresumably approximating a harmonic mean not the immediate pop-ulation state Because evolution is a long-term process however themean is most relevant to the issues being examined herein Recentselective sweeps or population bottlenecks can transiently modify levelsof genetic variation at individual loci (Charlesworth 2009 Karasov et al2010) introducing noise into any estimates of Ne derived from limitednumbers of genetic loci but this would reduce the strength of any trueunderlying correlation between the rate of mutation (uidGe) and long-term Ne ie would operate against our ability to detect the expectedsignal of the DBH

Such effects are especially likely in asexual species where thepossibility of reduced recombination might subject many neutral nu-cleotide sites to the effects of selection on nearby linked sites Thusto minimize sampling error wherever possible we have relied upongenome-wide sampling of the number of segregating sites to obtain alow-variance estimator of Neu from observations on silent sites(Watterson 1975) The utilization of an average us across a large num-ber of nucleotide sites and individual isolates reduces the effects ofevolutionary sampling variance associated with chromosomally local-ized and population-specific sweeps arising within individual species(Fu and Li 1993) Using available genomic data we calculated us acrossa large number of within-species genotypic isolates excluding nearlyidentical lab strains that originated from the same individual (seeMaterials and Methods) Although no estimates of silent-site diversity(the source of Ne estimates) are without error estimates derived fromsegregating polymorphic sites across large-scale genomic data sets ap-pear quite robust (Figure S2) Moreover should the levels of variation

sampled in our various study species reflect recent events to whichmutation-rate evolution has not had adequate time to respond(Brandvain and Wright 2016) this would only introduce noise intothe relationship between effective population size and mutation rates

Second as we have noted earlier there is some concern thatcorrelations between estimates of mutation rates and Ne could in partbe spurious artifacts resulting from the use of estimates of Ne obtainedby dividing measures of standing variation at silent-sites by ubs (Sunget al 2012a) If the sampling variance of ubs is substantial enough thiscould lead to a negative correlation between the observed ubs andextrapolated Ne estimates and if there were a sampling covariancebetween ubs and uid this could carry over into the current study Inthe Supplemental Material (File S1 Figure S4 Figure S5 Figure S6 andFigure S7) we provide complementary analyses to that in Sung et al(2012a) indicating that the sampling variance of ubs from WGS-MAstudies is not large enough to explain the negative correlation previ-ously seen between ubs and Ne estimates Because ubs and uid are mea-sured by different methods the sampling covariance between these twomeasures is expected to be negligible We emphasize that it is thesampling variance not the evolutionary variance that is of concernhere The variance of the log-scaled values of ubs would have to exceedthe log-scaled values of Ne by two orders of magnitude in order tocreate the negative correlations that we observe (File S1) As an extremeway of looking at the situation if silent-site variation were constantacross all taxa and the parametric values of mutation rates andNewereobtained without error the only explanation for the data would be atrue underlying negative evolutionary covariance between the two fea-tures In fact there is a marginal negative correlation between estimatesof ps and ubs (Figure S3 Figure S4 Figure S5 Figure S6 Figure S7 andTable S2) further bolstering the idea that ubs and uid decline evolution-arily as Ne increases

Third the DBH proposes that the strength of selection operating toreduce the indel mutation rate is based upon the total indel deleteriousmutational load ie the product of themutational rate of appearance ofindels at individual nucleotide sites (uid) and the number of sites underselective constraint in the genome (Ge approximated by the proteomesize of the organism) However some noncodingDNA (eg noncodingfunctional RNAs and cis-regulatory units in untranslated regions orintrons) is certainly under selective constraint with mutations at thesesites increasing the deleterious mutational load Thus it can be arguedthat the estimated number of nucleotides affecting fitness (Ge) scales

Figure 1 Relationship betweenthe rate of indel events pergeneration per effective ge-nome (uidGe) and effective pop-ulation size (Ne) (A) Regressionlog10(uidGe) = 223(048) ndash 073(007)log10Ne (r2 = 089 P = 681middot 1028 df = 13) with SE ofparameter estimates shown inparentheses Blue circles repre-sent bacteria red circles multi-cellular eukaryotes and blackcircles unicellular eukaryoteswith all data summarized inTable 1 The full list of indelevents for analyzed organismsis presented in Dataset S4

Chromosomal distributions of indel events at each site across all mutation-accumulation experiments are shown in Figure S1 A and B (B)Relationship when adding the number of estimated noncoding sites under purifying selection into the effective genome size (Gc + Gnc) foreukaryotic organisms Regression log10[uid(Gc + Gnc)] = 349(066) ndash 087(009)log10Ne (r2 = 087 P = 313 middot 1027 df = 13)

Volume 6 August 2016 | Drift Constrains Indel Rate | 2587

differently than the protein-coding region of the genome particularlyin larger eukaryotic genomes with a considerable number of noncodingsites (Halligan et al 2004 Siepel et al 2005 Halligan and Keightley2006) Difficulties can arise when estimating the proportion of non-coding DNA that is under selective constraint (Gnc) as the estimatednumber of such sites can vary greatly depending on the model used todefine noncoding DNA and the identification of conserved noncodingDNA is highly sensitive to the available phylogeny (Siepel et al 2005)Nevertheless if we sum the estimated total amount of noncoding DNAunder selective constraint (Gnc see File S1) with that of coding DNA(Gc) we find that uid(Gc +Gnc) andNe remain highly correlated (Figure1B r2 = 087) simply because the fraction of functional noncodingDNA increases with the total amount of coding DNA

We currently adhere to the DBH as an explanation for the phylo-genetic pattern of mutation-rate variation primarily because it has beendifficult to reconcile the patterns with alternative hypotheses In theintroduction we provided arguments as to why selection for replicationspeed appears to be unlikely to explain a negative correlation betweenmutation rates and population size in unicellular species and inmulticellular species the simultaneous deployment of hundreds tothousands of origins of replication makes such an explanation evenmore unlikely Nor does a general constraint on replication fidelityexplain the data

A second potential explanation for variation in the per-generationmutation rate is that it is driven largely by variation in numbers ofgermline cell divisions (Ness et al 2012) but this cannot be reconciledwith the fact that the base-substitution mutation rate scales negativelywith Ne in analyses entirely restricted to unicellular species (Sung et al2012a) In all such species there is one cell division per generation andyet the base-substitution mutation rate per site per cell division rangesfrom 10211 in Paramecium tetraurelia (Sung et al 2012b) to1028 inM florum (Sung et al 2012a) Similarly the number of indelmutational events per site per cell division differs by over two orders ofmagnitude across unicellular organisms (Table 1 and Figure 3) and thenegative regression with Ne remains significant when confined to uni-cellular species (Figure 1 r2 = 066 P = 0003)

A third hypothesis for mutation-rate evolution is that selection iseffectiveenoughtoreduce theerror rate tothepointatwhichthephysicallaws of thermodynamics take over (Kimura 1967) However it is dif-ficult to reconcile this argument with the data now showing that mu-tation rates vary by three orders of magnitude as there are no knownmechanisms by which basic biophysical features (such as diffusioncoefficients and stochastic molecularmotion) would vary by this degreeamong the cytoplasms of different taxa There is of course the issue ofevolved differences in the biochemical features and efficiency of oper-ation of the proteins involved in replication and repair However thistype of variation is in the explanatory domain of the DBH The DBHpostulates that replication fidelity is typically not at the maximumpossible level of refinement but just the lowest level possible underthe prevailing level of random genetic drift which varies substantiallyamong lineages

That a decline in replication fidelity should decline with decreasingeffective population size appears to be a unique prediction of the DBHAlthough other theoretical work has been done on mutation-rateevolution in no case is this type of scaling obviously predicted(acknowledging that this has not been a central focus of such work)For example allowing for a role of beneficial mutations Kimura (1967)and Leigh (1970) suggested that the long-term rate of adaptation ismaximized when the genome-wide mutation rate equals the rate ofpopulation fixation of beneficial mutations The precise predictionsof this hypothesis are not entirely clear but because mutations ariseat a higher rate in large populations and if beneficial fix with higherprobabilities a positive association between the mutation rate and Ne

seems to be implied A rather different model argues that populationsshould evolve genome-wide mutation rates equal to the average effectof a deleterious mutation (Orr 2000 Johnson and Barton 2002) whichseems to imply an optimal mutation rate independent of populationsize (unless one wishes to postulate an association between averagemutational effect and Ne for which we are unaware of any evidence)

The DBH proposes that new alleles that reduce the genome-wideindel mutation rate (ie anti-mutators) can be promoted by selectiononly if they provide a significant enough advantage to offset the power

Figure 2 Relationship between indel events per site per generation (uidGe) and effective population size (Ne) after phylogenetic correction (A)Standardized phylogenetically independent contrasts performed using Compare (Martins 2004) and the PDAP module in Mesquite (Garland et al1993) with branch lengths of 10 The regression equation of the contrasts through the origin is uidGe = ndash060(007)Ne (r2 = 083 P = 128 middot 1026df = 13) with SE in parentheses (B) Phylogenetic tree showing the relationship between organisms

2588 | W Sung et al

of genetic drift The average selective effect of an antimutator or muta-tor allele (which operate opposite to each other) can be approximatedby stΔUid with ΔUid representing the change in the genome-wide indelmutation rate with respect to the population mean rate s being theaverage reduction in fitness per mutation (Lynch 2010) and t being thenumber of generations a mutation remains associated with its mutatorgenetic background (Lynch 2011) ΔUid can be approximated by thechange in the indel mutation rate over the effective genome or ΔuidGe

(Lynch 2011) By setting stΔuidGe equal to the power of random geneticdrift [1Ne for haploids 1(2Ne) for diploids] we can acquire somesense of the average reduction in the indelmutation rate that is requiredfor the power of selection to exceed power of genetic drift Usingestimates of an average value of the selective coefficient (s = 001)(Lynch et al 1999 Eyre-Walker and Keightley 2007) and assumingthat free recombination unlinks mutation-rate modifier alleles fromtheir background every 2 generations in sexually outcrossing species(t = 2) (Lynch 2010) solving stΔuidGe = 1Ne [= 1(2Ne) for dip-loids] for Δuid suggests that the average antimutator must reduce theindel mutation rate by greater than01ndash1 in most organisms (TableS1) in order to be promoted by selection One major limitation of thiskind of analysis is that values of s and t are not well known and arelikely vary across organisms A second and equally important caveat isthat the prior analysis assumes that mutator and antimutator allelesarise with equal frequency Owing to the high level of refinement of thereplication and repair machinery it seems much more likely that mu-tations involving the components of such machinery will increaserather than decrease the mutation rate This will push the equilibriummutation rate to higher levels than expected (Lynch 2008) althoughwithout quantitative information on such bias it is difficult to deter-mine the exact position at which the mutation rate will stall

Finallywenote thatbecause recombinationunlinksalleles fromtheirgenetic background the capacity of selection to enhance replicationfidelity is ultimately a function of the recombination rate (Kimura 1967Lynch 2008) Thus it may be viewed as surprising that bacteria whichdo not undergo meiotic recombination exhibit a relationship between

uid and Ne similar to that in eukaryotic species engaging in periodic toregular meiosis (Figure 1 A and B) It should be noted however thatbacterial recombination occurs through multiple mechanisms (trans-formation conjugation andor transduction) Many bacterial speciesare known to naturally undergo high rates of recombination with ratiosof recombination to mutation rates frequently being comparable tothose in multicellular eukaryotes (Feil and Spratt 2001 Lynch 2007Doroghazi et al 2014 Lassalle et al 2015) so in this sense comparablebehavior of bacterial and eukaryotic species is not unexpected

In summary as in our previous work on the base-substitutionmutation rate (Sung et al 2012a) the strong correlation between thegenome-wide indel rate and Ne appears not to be a statistical artifactMoreover among various hypotheses that have been suggested formutation-rate evolution the DBH appears to provide the most com-patible explanation for the 1000-fold range of variation of this traitacross the Tree of Life As noted above the molecular mechanisms thatgenerate and resolve base-substitution and indel mutations differ in anumber of ways and the rate of occurrence of these two types ofmutations differ by one to two orders of magnitude (with uid rangingfrom 18 to 119 of ubs presumably because of the elevated deleteriouseffects of indel mutations) Yet despite these differences both ubs anduid scale similarly with changes inNe (Figure 3 r2 = 089) Because theforces of mutation selection and drift apply to all biological traits themaximum achievable level of refinement for other fundamental cellulartraits may also be influenced by the drift barrier

ACKNOWLEDGMENTSSupport was provided by the Multidisciplinary University ResearchInitiative Award W911NF-09-1-0444 and from the US ArmyResearch Office to M L P Foster H Tang and S Finkel andW911NF-14-1-0411 to M L P Foster J McKinlay and J T Lennonby CAREER award DEB-0845851 from the National Science Foun-dation to V C and by National Institutes of Health AwardsF32-GM103164 to WS and R01-GM036827 to M L and W KThomas This material is based upon work supported by theNational Science Foundation under grant no CNS-0521433 CNS-0723054 and ABI-1062432 to Indiana UniversityAuthor contributions WS CF VC and ML designed theresearch WS MA MD and TP performed the research WSand MA analyzed the data and WS MA and ML wrote thepaper

LITERATURE CITEDBrandvain Y and S I Wright 2016 The limits of natural selection in a

nonequilibrium world Trends Genet 32 201ndash210Campbell C D and E E Eichler 2013 Properties and rates of germline

mutations in humans Trends Genet 29 575ndash584Charlesworth B 2009 Fundamental concepts in genetics effective popu-

lation size and patterns of molecular evolution and variation Nat RevGenet 10 195ndash205

Conrad D F J E Keebler M A DePristo S J Lindsay Y Zhang et al2011 Variation in genome-wide mutation rates within and betweenhuman families Nat Genet 43 712ndash714

Denver D R K Morris M Lynch and W K Thomas 2004 Highmutation rate and predominance of insertions in the Caenorhabditiselegans nuclear genome Nature 430 679ndash682

Denver D R S Feinberg S Estes W K Thomas and M Lynch 2005 Muta-tion rates spectra and hotspots in mismatch repair-deficient Caenorhabditiselegans Genetics 170 107ndash113

Denver D R P C Dolan L J Wilhelm W Sung J I Lucas-Lledo et al2009 A genome-wide view of Caenorhabditis elegans base-substitutionmutation processes Proc Natl Acad Sci USA 106 16310ndash16314

Figure 3 Relationship between the rate of indel events per site pergeneration (uid) and the base-substitution mutation rate per site pergeneration (ubs) Regression log10(uid) = ndash156(074) + 091(008)log10ubs (r2 = 090 P = 413 middot 1028 df = 13) SE measurementsare shown in parentheses Blue circles represent eubacteria red circlesmulticellular eukaryotes and black circles unicellular eukaryotes withall data summarized in Table 1

Volume 6 August 2016 | Drift Constrains Indel Rate | 2589

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 2: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

In a previous analysis we found a relationship between the base-substitutionmutation rate per site per generation (ubs)multiplied by theamount of functional DNA in a genome (Ge approximated by pro-teome size) and the power of random genetic drift which is inverselyproportional to the effective population size (Ne) (Sung et al 2012a)Because mutations are generally deleterious this finding suggested thatselection operates to reduce genome-wide mutation rates by refiningDNA replication fidelity and repair until further improvements aretoo inconsequential to overcome the power of random genetic drift(Sniegowski and Raynes 2013) This result is consistent with thedrift-barrier hypothesis (DBH) which proposes that natural selectionoperates to improve molecular and cellular traits until the selectiveadvantage of a beneficial mutation refining the trait is so miniscule thatthe probability of it being fixed is essentially the same as that for neutralmutations (Lynch 2011 Sung et al 2012a)

While the negative correlation between ubsGe and Ne is consistentwith expectations from population-genetic theory there is a potentialissue of circularity when correlating these factors as the estimationof Ne relies indirectly on the estimation of ubs (Sung et al 2012a)Although we presented an analysis suggesting that the correlated pa-rameters are not likely to be the primary factor in the observed relation-ship (Sung et al 2012a) and provide another one here (SupplementalMaterial File S1) a more independent analysis is desirable and giventhe amount of data that has accumulated it is time to go beyond a studythat simply considers base-substitutionmutations Here we present therate of insertion-deletion mutation (indel) events (uid) per site pergeneration across eight eukaryotic and seven bacterial species whilealso providing genome-wide estimates of ubs and uid from three newbacterial mutation-accumulation studies These data continue to sup-port a negative correlation between the genome-wide mutation rateand Ne

The DBH postulates that genetic drift determines the limit ofadaptive molecular refinement that can be achieved for any traitincluding those that determine the rate of indels Indels are a class ofmutations separate from base substitutions differing in how theyoriginate Indels generally arise from strand slippage or double-strandbreaks whereas base-substitution mutations originate primarily frombasemisincorporation or biochemical alteration Furthermore there aremajor differences in how the two mutation types are repaired Base-substitution mutations are often reversed by enzymes such as DNAphotolyases and alkyl transferases which do not require DNA in-cision and synthesis (Sancar et al 2004) or are identified by glyco-sylases in base-excision repair (BER) pathways and repaired byincision and DNA-gap filling (Krokan and Bjoras 2013) On theother hand indel mutations are not surveyed by BER but arerepaired primarily by nucleotide-excision repair (NER) which hasbroad substrate specificity and is used to excise bulky lesions arisingfrom the insertion or deletion of nucleotides (Morita et al 2010)Although the mismatch-repair (MMR) pathway can operate on bothbase-substitution mutations and indels MMR-deficient strains ofEscherichia coli and Caenorhabditis elegans exhibit a significantlygreater elevation of the indel mutation rate relative to that for basesubstitutions providing further evidence for the differential treat-ment of mutation types by DNA-repair pathways (Denver et al2005 Lee et al 2012) Furthermore depending on the type of mis-match and local sequence context the error rates of different poly-merases are highly variable between indel and base-substitutionmutations (McCulloch and Kunkel 2008 Kunkel 2009 Sung et al2015) In summary because the enzymes influencing base-substitutionand indel mutation rates differ (and shared enzymes differ in thespectrum of repaired premutations) a focus on the indel mutation

rate provides a means of testing the validity of the DBH that issubstantially independent biologically (and essentially fully inde-pendent in terms of investigator sampling) of that used to extrap-olate measures of the power of random genetic drift

Selection operates to refineDNAreplicationfidelity and repairwhenthe genome-wide deleterious load confers a discernable fitness disad-vantage on an organism (Kimura 1967 1983 Lynch 2010) and thecontributions of indel and base substitution mutations to genome-widedeleterious load differ in two ways First the effects of base substi-tutions in coding regions are highly variable (Eyre-Walker andKeightley 2007) and some base substitutions may not have anyeffect on organismal fitness which may create some uncertaintiesin quantifying the effective genome size (Ge) thereby reducing thecorrelation observed between ubsGe and Ne (Sung et al 2012a) Onthe other hand most indel mutations that arise in protein-codinggenes will generate a frame-shift mutation interfering with genefunction and having a direct effect on organismal fitness Becausesuch indels are generally deleterious selection is then expected tomore efficiently fine tune the rate at which indels arise and if theDBH holds true this should yield a close correlation between uidGe

and Ne Second base-substitutions are generally limited to singlenucleotides while indels may involve many base pairs Althoughthis might suggest that indels have a larger effect than base substi-tutions single-base pair indels and gene-sized indels both result ingene disruption thus generating more similar fitness effects regard-less of the indel length In fact single base-pair indels in codingDNA may generate malformed gene products that require degrada-tion which might be even more harmful than entire gene deletionsBecause the number of indel events and not the size of indelsdetermines the genome-wide deleterious burden we define the pa-rameter uid to be the number of indel mutation events per site pergeneration and use this parameter to test the DBH

MATERIALS AND METHODSTo examine the effect of genetic drift onmutation-rate evolution it isnecessary to derive accurate estimates of the mutation rate and geneticdiversity across phylogenetically diverse organisms Whole-genomesequencing (WGS) has greatly improved our ability to estimate suchparameters Highly accurate measurements of ubs and uid can beobtained through WGS of mutation-accumulation (MA) lines inwhich repeated single-organism bottlenecking minimizes the effi-ciency of selection allowing for the accumulation of all but the mostdeleterious mutations (Lynch et al 2008 Denver et al 2009Ossowski et al 2010 Sung et al 2012a 2012b 2015 Schrider et al2013) Along with data from prior MA studies this study containsMA data from four new MA experiments For new bacterial MAspecies 100 independent MA lines were initiated from a singlefounder colony The new strains used were as followsAgrobacteriumtumefaciens str C58 Staphylococcus epidermidis ATCC 12228 andVibrio cholerae 2740-80

Dependingon the speedofgrowth a singlecolony fromeachMAlinewas isolated and transferred to a fresh plate every 1ndash3 d over the courseof the experiment The bottlenecking process ensures that mutationsaccumulate in an effectively neutral fashion (Kibota and Lynch 1996)After each transfer the original plate was retained as a backup plate at4 If the destination plate was contaminated or if a single colony couldnot be picked a single colony was transferred from the last 4 backupplate

To estimate the generation times that occurred between each trans-fer every 2 wk an entire colony from five randomly selected MA lineswas transferred to 1 middot PBS saline buffer These were vortexed serially

2584 | W Sung et al

diluted and replated Cell density was calculated from viable cell countsin both the growth conditions used throughout the bottleneck processas well as growth conditions at 4 The total number of generations foreachMA line was calculated by the average number of cell divisions pertransfer multiplied by the total number of transfers If backup plateswere used the average number of cell divisions at 4 was used in placeof the average number of cell divisions per bottleneck at standardgrowth temperatures

The average number of cell divisions across the MA are as follows(Dataset S1) A tumefaciens 5819 Bacillus subtilis 5078 (Sung et al2015) E coli 4246 (Lee et al 2012) Mesoplasma florum 2351 (Sunget al 2012a) S epidermidis 7170 and V cholerae 6453 The averagenumber of generations used for reanalysis of the C elegans MA studywas 250 (Denver et al 2009) (Dataset S2)

DNA extraction of MA lines was done using the wizard DNAextraction kit (Promega) or lysis media (CTAB or SDS) followed byphenolchloroform extractions to Illumina library standards Then101-bp paired-end Illumina (Illumina Hi-Seq platform) sequencingwas applied to randomly selected MA lines of A tumefaciensS epidermidis and V cholerae Each MA line was sequenced to acoverage depth of 100 middot with an average library fragment size(distance between paired-end reads) of 175 bp The paired-endreads for each MA line were individually mapped against the refer-ence genome (assembly and annotation available from the NationalCenter for Biotechnology Information httpswwwncbinlmnihgov) using two separate alignment algorithms BWA v074 (Liand Durbin 2009) and NOVOALIGN v20802 (available at wwwnovocraftcom) The resulting pileup files were converted to SAMformat using SAMTOOLS v0118 (Li et al 2009) Using in-houseperl scripts the alignment information was further parsed to gen-erate forward and reverse mapping information at each site result-ing in a configuration of eight numbers for each line (A a C c G gT and t) corresponding to the number of reads mapped at eachgenomic position in the reference sequence A separate file wasalso generated to display sites that had indel calls from the twoalignment algorithms Mutation calling was performed using a con-sensus method (Lynch et al 2008 Denver et al 2009 Ossowski et al2010 Lee et al 2012 Sung et al 2012a 2012b 2015)

A random subset of base-substitutions mutations called using thesemethods have been previously validated in E coli and B subtilis MAlines using fluorescent sequencing technology at the Indiana MolecularBiology Institute at IndianaUniversity (Lee et al 2012 Sung et al 2015)(Dataset S3)

Toverify indelmutationswedesigned38primer sets toPCRamplify300ndash500 bp regions surrounding the putative indel mutation in the Bsubtilis MA lines (Dataset S4) All 2929 short indels ( 10 bp) weredirectly confirmed using standard fluorescent sequencing technologyTwo out of nine large indels ( 10 bp) were confirmed through sizingof the PCR product on gel electrophoresis The remaining seven largeindels did not amplify For all cases the indel was also confirmed to beabsent in one other line without the mutation

To calculate the base-substitution mutation rate per cell division foreach line we used the following equation

ubs frac14 mnT

where ubs is the base-substitution mutation rate (per nucleotide siteper generation) m is the number of observed base substitutions n isthe number of nucleotide sites analyzed and T is the number ofgenerations that occurred in the mutation-accumulation study TheSE for an individual line is calculated using (Denver et al 2004 2009)

SEx frac14ffiffiffiffiffiffiubsnT

r

The total SEof base-substitutionmutation rate is given by the SDof themutation rates across all lines (s) divided by the square root of thenumber of lines analyzed (N)

SEpooled frac14 sffiffiffiffiN

p

The same calculationwasused to calculate indelmutation ratewithubsreplaced with uid

Data availabilityIllumina DNA sequences for the MA lines used in this study aredeposited under the following Bioprojects A tumefaciens PRJNA256312B subtilis PRJNA256312 M florum PRJNA256337 S epidermidisPRJNA256338 and V cholerae PRJNA256339

File S1 contains detailed descriptions of eukaryotic uid estimates aswell as calculations for Ge Gnc us ps and phylogenetic independentcontrasts for both eukaryotic and prokaryotic organisms Figure S1contains average depth of sequencing coverage for each MA line inA tumefaciens S epidermidis and V cholerae Figure S2 displays thesimilarity in us when increasing the number of unique alleles analyzedFigure S3 shows the frequency distribution of mutant calls across MAlines Table S1 contains the calculation for the estimated limit of selec-tion to fix antimutators Figure S4 Figure S5 Figure S6 and Table S2contain statistical support for the DBH Dataset S1 Dataset S2 DatasetS3 and Dataset S4 contain single nucleotide polymorphisms and indelsfor prokaryotic and eukaryotic organisms generated in this study

RESULTSTo examine the effect of genetic drift on mutation-rate evolution it isnecessary to derive accurate estimates of the mutation rate and geneticdiversity across phylogenetically diverse organisms WGS has greatlyimproved our ability to estimate such parameters Highly accuratemeasurements of ubs and uid can be obtained through WGS of MAlines in which repeated single-organism bottlenecking minimizes theefficiency of selection allowing for the accumulation of all but the mostdeleterious mutations (Lynch et al 2008 Denver et al 2009 Ossowskiet al 2010 Sung et al 2012a 2012b 2015 Schrider et al 2013)

The power of genetic drift is related to the inverse of the effectivepopulation size [1Ne for haploids 1(2Ne) for diploids] Under theassumption of neutrality the effective population size (Ne) can be es-timated from the average nucleotide heterozygosity at silent sites innatural populations (ps) or as a function of the number of segregatingsites in the population (us) both of which lead to expected values equalto 4Neubs in diploids and 2Neubs in haploids (Kimura 1983) For mostorganisms analyzed in this study enough WGS data were available toallow calculation of species-specific us values (see File S1 and Table 1)For the remaining species we pooled large available multilocus-sequence studies to estimate ps In all cases we set the estimates of usorps equal to 4Neubs in diploids (2Neubs in haploids) and solved forNe

by factoring out ubs Because this calculation only involves ubs theestimate of Ne is uninfluenced by sampling error in uid thus providingan independent trait measurement by which to test the DBH (see FileS1 for further evaluation of the nonindependence issue)

To provide additional data for testing whether the power of geneticdrift constrains the lower limit of indel mutation-rate evolution weperformed MA experiments in A tumefaciens str C58 S epidermidisATCC 12228 and V cholerae 2740-80 Each bacterial MA experimentwas initiated from multiple lines derived from a single progenitor

Volume 6 August 2016 | Drift Constrains Indel Rate | 2585

colony each of which was repeatedly bottlenecked to accumulate mu-tations for an average of 5819 7170 and 6453 generations respectively(seeMaterials and Methods harmonic mean population sizes betweentransfers were 134 (01) 126 (03) and 149 (02) respectively) Then101-bp paired-end WGS was applied to randomly selected MA lines(47 A tumefaciens 22 S epidermidis and 46 V cholerae MA linesDataset S1) The average sequencing coverage depth is greater than20 middot per site across all MA lines surveyed in these organisms (FigureS1) and greater than 50 middot per site for 9375 (150160) of the MAlines providing high accuracy for measurement of ubs and uid Muta-tions were called and categorized for each of the three species (DatasetS3 and Dataset S4) with ubs and uid shown in Table 1

To test the DBH we combined ubs and uid from the three bacterialspecies analyzed in this study with ubs and uid from four bacterialand eight eukaryotic MAWGS studies (Table 1 Dataset S1 DatasetS2 Dataset S3 and Dataset S4) and also included the same esti-mates for human derived from WGS of parent-offspring trios uidincludes all indel events in each of the 15 study species (see File S1)Due to the highly repetitive DNA sequence in eukaryotic genomesthe number of large indels events ( 9 bp) in eukaryotes may bedownwardly biased when using WGS methods Therefore our es-

timate of the number of large indel events also includes eventsidentified by comparative genome hybridization arrays for organ-isms where data were available (Lynch et al 2008 Lipinski et al2011) Large indel events only account for 150 of total indelsevents across the study bacteria (76506 Dataset S4) suggestingthat any underestimation of the number of large indel events shouldonly have a small effect on uid

To determine the genome-wide deleterious burden in eachorganismassociated with indel mutations we multiplied uid with Ge approxi-mating the latter by the proteome size of that organism A plot of thelogs of the two parameters of uidGe and Ne against one another yieldsa strong negative correlation across all of cellular life (Figure 1Ar2 = 089) Because the power of genetic drift is inversely proportionaltoNe this observation is consistent with the idea that selection operatesto reduce mutation rates to a barrier imposed by random genetic driftPhylogenetic nonindependence may complicate observed relationshipsbetween genomic attributes andNe (Whitney andGarland 2010) How-ever the relationship between Ne and uidGe remains robust even afterphylogenetic correction (Figure 2 A and B r2 = 083) indicating thatthe correlation betweenNe and uidGe reflects a true biological phenom-enon across the Tree of Life

n Table 1 Effective genome size (Ge) indel events per site per generation (uid) base-substitution mutation rate per generation (ubs) us (orps denoted by ) measurements for population mutation rate (Watterson 1975 Tajima 1989 Fu 1995) and estimated effective populationsize (Ne) for seven prokaryotic and eight eukaryotic organisms (see File S1 for details)

Species LabelGe

(middot 107 Sites)Gc + Gnc

(middot 107 Sites)uid (middot 10210 per

Site per Generation)ubs (middot 10210 Events perSite per Generation) us or ps

Ne

(middot 106)

ProkaryotesAgrobacterium tumefaciens Agt 050 057 030 292 0200 34247Bacillus subtilis Bs 036 043 120d 335d 0041 6119Escherichia coli Ec 039 046 037e 200e 0071 17960Mesoplasma florum Mf 007 008 2310f 9780f 0021 107Pseudomonas aeruginosa Pa 059 067 014g 079g 0033 21070Staphlyococcus epidermidis Se 021 026 113 740 0052 3514Vibrio cholerae Vc 034 039 018 115 0110 47826

EukaryotesArabidopsis thaliana At 421 555a 1120h 6950hp 0008 029Caenorhabditis elegans Ce 250 637b 669i 1450q 0003 054Chlamydomonas reinhardtii Cr 392 551 044 j 380j 0032 4331Drosophila melanogaster Dm 232 886c 461k 5165k 0018 086Homo sapiens Hs 365 2175b 1820l 13513l 0001 002Mus musculus Mm 355 2717b 310m 5400m 0004 177Paramecium tetraurelia Pt 568 728 004n 019n 0008 10180Saccharomyces cerevisiae Sc 087 102b 092o 263o 0004 778

Gc + Gnc is the effective genome size when including the total amount of coding (Gc) and noncoding DNA (Gnc) that is estimated to be under purifying selectionFootnotes in uid and ubs indicate data sources (rates pooled when multiple data sources are available) and when absent indicate data generated in this study (seeMaterials and Methods)aHaudry et al (2013)

bSiepel et al (2005)

cHalligan et al (2004)

dSung et al (2015)

eLee et al (2012)

fSung et al (2012a)gSung et al (2012b)

hOssowski et al (2010)

iLipinski et al (2011)jSung et al (2012a) Ness et al (2015)kSchrider et al (2013)

lConrad et al (2011) OrsquoRoak et al (2011 2012) Kong et al (2012) Campbell and Eichler (2013) Wang and Zhu (2014) The 1000 Genomes Project Consortium(2015)

mUchimura et al (2015)

nSung et al (2012b)

oLynch et al 2008) (Zhu et al 2014)

pOssowski et al (2010) (Yang et al 2015)

qLipinski et al (2011)

2586 | W Sung et al

DISCUSSIONBecause the DBH makes general predictions about the pattern ofmolecular and cellular evolution across the Tree of Life because ourfocus is on one of the central determining factors in the evolutionaryprocess (themutation rate) andbecause the patterns appear so strong itis essential to consider the range of factors that might give rise to theobserved statistical relationships and also to alternative evolutionaryhypotheses for them We first consider three issues with respect toestimating the key parametersNe ubs uid andGe and then elaborate onthe significance and implications of the relationship between uidGe andNe for our understanding of molecular evolution

First we address the estimation ofNe one of themost difficult issuesin empirical population genetics Because populations fluctuate in den-sity over time any estimate of Ne must reflect a long-term averagepresumably approximating a harmonic mean not the immediate pop-ulation state Because evolution is a long-term process however themean is most relevant to the issues being examined herein Recentselective sweeps or population bottlenecks can transiently modify levelsof genetic variation at individual loci (Charlesworth 2009 Karasov et al2010) introducing noise into any estimates of Ne derived from limitednumbers of genetic loci but this would reduce the strength of any trueunderlying correlation between the rate of mutation (uidGe) and long-term Ne ie would operate against our ability to detect the expectedsignal of the DBH

Such effects are especially likely in asexual species where thepossibility of reduced recombination might subject many neutral nu-cleotide sites to the effects of selection on nearby linked sites Thusto minimize sampling error wherever possible we have relied upongenome-wide sampling of the number of segregating sites to obtain alow-variance estimator of Neu from observations on silent sites(Watterson 1975) The utilization of an average us across a large num-ber of nucleotide sites and individual isolates reduces the effects ofevolutionary sampling variance associated with chromosomally local-ized and population-specific sweeps arising within individual species(Fu and Li 1993) Using available genomic data we calculated us acrossa large number of within-species genotypic isolates excluding nearlyidentical lab strains that originated from the same individual (seeMaterials and Methods) Although no estimates of silent-site diversity(the source of Ne estimates) are without error estimates derived fromsegregating polymorphic sites across large-scale genomic data sets ap-pear quite robust (Figure S2) Moreover should the levels of variation

sampled in our various study species reflect recent events to whichmutation-rate evolution has not had adequate time to respond(Brandvain and Wright 2016) this would only introduce noise intothe relationship between effective population size and mutation rates

Second as we have noted earlier there is some concern thatcorrelations between estimates of mutation rates and Ne could in partbe spurious artifacts resulting from the use of estimates of Ne obtainedby dividing measures of standing variation at silent-sites by ubs (Sunget al 2012a) If the sampling variance of ubs is substantial enough thiscould lead to a negative correlation between the observed ubs andextrapolated Ne estimates and if there were a sampling covariancebetween ubs and uid this could carry over into the current study Inthe Supplemental Material (File S1 Figure S4 Figure S5 Figure S6 andFigure S7) we provide complementary analyses to that in Sung et al(2012a) indicating that the sampling variance of ubs from WGS-MAstudies is not large enough to explain the negative correlation previ-ously seen between ubs and Ne estimates Because ubs and uid are mea-sured by different methods the sampling covariance between these twomeasures is expected to be negligible We emphasize that it is thesampling variance not the evolutionary variance that is of concernhere The variance of the log-scaled values of ubs would have to exceedthe log-scaled values of Ne by two orders of magnitude in order tocreate the negative correlations that we observe (File S1) As an extremeway of looking at the situation if silent-site variation were constantacross all taxa and the parametric values of mutation rates andNewereobtained without error the only explanation for the data would be atrue underlying negative evolutionary covariance between the two fea-tures In fact there is a marginal negative correlation between estimatesof ps and ubs (Figure S3 Figure S4 Figure S5 Figure S6 Figure S7 andTable S2) further bolstering the idea that ubs and uid decline evolution-arily as Ne increases

Third the DBH proposes that the strength of selection operating toreduce the indel mutation rate is based upon the total indel deleteriousmutational load ie the product of themutational rate of appearance ofindels at individual nucleotide sites (uid) and the number of sites underselective constraint in the genome (Ge approximated by the proteomesize of the organism) However some noncodingDNA (eg noncodingfunctional RNAs and cis-regulatory units in untranslated regions orintrons) is certainly under selective constraint with mutations at thesesites increasing the deleterious mutational load Thus it can be arguedthat the estimated number of nucleotides affecting fitness (Ge) scales

Figure 1 Relationship betweenthe rate of indel events pergeneration per effective ge-nome (uidGe) and effective pop-ulation size (Ne) (A) Regressionlog10(uidGe) = 223(048) ndash 073(007)log10Ne (r2 = 089 P = 681middot 1028 df = 13) with SE ofparameter estimates shown inparentheses Blue circles repre-sent bacteria red circles multi-cellular eukaryotes and blackcircles unicellular eukaryoteswith all data summarized inTable 1 The full list of indelevents for analyzed organismsis presented in Dataset S4

Chromosomal distributions of indel events at each site across all mutation-accumulation experiments are shown in Figure S1 A and B (B)Relationship when adding the number of estimated noncoding sites under purifying selection into the effective genome size (Gc + Gnc) foreukaryotic organisms Regression log10[uid(Gc + Gnc)] = 349(066) ndash 087(009)log10Ne (r2 = 087 P = 313 middot 1027 df = 13)

Volume 6 August 2016 | Drift Constrains Indel Rate | 2587

differently than the protein-coding region of the genome particularlyin larger eukaryotic genomes with a considerable number of noncodingsites (Halligan et al 2004 Siepel et al 2005 Halligan and Keightley2006) Difficulties can arise when estimating the proportion of non-coding DNA that is under selective constraint (Gnc) as the estimatednumber of such sites can vary greatly depending on the model used todefine noncoding DNA and the identification of conserved noncodingDNA is highly sensitive to the available phylogeny (Siepel et al 2005)Nevertheless if we sum the estimated total amount of noncoding DNAunder selective constraint (Gnc see File S1) with that of coding DNA(Gc) we find that uid(Gc +Gnc) andNe remain highly correlated (Figure1B r2 = 087) simply because the fraction of functional noncodingDNA increases with the total amount of coding DNA

We currently adhere to the DBH as an explanation for the phylo-genetic pattern of mutation-rate variation primarily because it has beendifficult to reconcile the patterns with alternative hypotheses In theintroduction we provided arguments as to why selection for replicationspeed appears to be unlikely to explain a negative correlation betweenmutation rates and population size in unicellular species and inmulticellular species the simultaneous deployment of hundreds tothousands of origins of replication makes such an explanation evenmore unlikely Nor does a general constraint on replication fidelityexplain the data

A second potential explanation for variation in the per-generationmutation rate is that it is driven largely by variation in numbers ofgermline cell divisions (Ness et al 2012) but this cannot be reconciledwith the fact that the base-substitution mutation rate scales negativelywith Ne in analyses entirely restricted to unicellular species (Sung et al2012a) In all such species there is one cell division per generation andyet the base-substitution mutation rate per site per cell division rangesfrom 10211 in Paramecium tetraurelia (Sung et al 2012b) to1028 inM florum (Sung et al 2012a) Similarly the number of indelmutational events per site per cell division differs by over two orders ofmagnitude across unicellular organisms (Table 1 and Figure 3) and thenegative regression with Ne remains significant when confined to uni-cellular species (Figure 1 r2 = 066 P = 0003)

A third hypothesis for mutation-rate evolution is that selection iseffectiveenoughtoreduce theerror rate tothepointatwhichthephysicallaws of thermodynamics take over (Kimura 1967) However it is dif-ficult to reconcile this argument with the data now showing that mu-tation rates vary by three orders of magnitude as there are no knownmechanisms by which basic biophysical features (such as diffusioncoefficients and stochastic molecularmotion) would vary by this degreeamong the cytoplasms of different taxa There is of course the issue ofevolved differences in the biochemical features and efficiency of oper-ation of the proteins involved in replication and repair However thistype of variation is in the explanatory domain of the DBH The DBHpostulates that replication fidelity is typically not at the maximumpossible level of refinement but just the lowest level possible underthe prevailing level of random genetic drift which varies substantiallyamong lineages

That a decline in replication fidelity should decline with decreasingeffective population size appears to be a unique prediction of the DBHAlthough other theoretical work has been done on mutation-rateevolution in no case is this type of scaling obviously predicted(acknowledging that this has not been a central focus of such work)For example allowing for a role of beneficial mutations Kimura (1967)and Leigh (1970) suggested that the long-term rate of adaptation ismaximized when the genome-wide mutation rate equals the rate ofpopulation fixation of beneficial mutations The precise predictionsof this hypothesis are not entirely clear but because mutations ariseat a higher rate in large populations and if beneficial fix with higherprobabilities a positive association between the mutation rate and Ne

seems to be implied A rather different model argues that populationsshould evolve genome-wide mutation rates equal to the average effectof a deleterious mutation (Orr 2000 Johnson and Barton 2002) whichseems to imply an optimal mutation rate independent of populationsize (unless one wishes to postulate an association between averagemutational effect and Ne for which we are unaware of any evidence)

The DBH proposes that new alleles that reduce the genome-wideindel mutation rate (ie anti-mutators) can be promoted by selectiononly if they provide a significant enough advantage to offset the power

Figure 2 Relationship between indel events per site per generation (uidGe) and effective population size (Ne) after phylogenetic correction (A)Standardized phylogenetically independent contrasts performed using Compare (Martins 2004) and the PDAP module in Mesquite (Garland et al1993) with branch lengths of 10 The regression equation of the contrasts through the origin is uidGe = ndash060(007)Ne (r2 = 083 P = 128 middot 1026df = 13) with SE in parentheses (B) Phylogenetic tree showing the relationship between organisms

2588 | W Sung et al

of genetic drift The average selective effect of an antimutator or muta-tor allele (which operate opposite to each other) can be approximatedby stΔUid with ΔUid representing the change in the genome-wide indelmutation rate with respect to the population mean rate s being theaverage reduction in fitness per mutation (Lynch 2010) and t being thenumber of generations a mutation remains associated with its mutatorgenetic background (Lynch 2011) ΔUid can be approximated by thechange in the indel mutation rate over the effective genome or ΔuidGe

(Lynch 2011) By setting stΔuidGe equal to the power of random geneticdrift [1Ne for haploids 1(2Ne) for diploids] we can acquire somesense of the average reduction in the indelmutation rate that is requiredfor the power of selection to exceed power of genetic drift Usingestimates of an average value of the selective coefficient (s = 001)(Lynch et al 1999 Eyre-Walker and Keightley 2007) and assumingthat free recombination unlinks mutation-rate modifier alleles fromtheir background every 2 generations in sexually outcrossing species(t = 2) (Lynch 2010) solving stΔuidGe = 1Ne [= 1(2Ne) for dip-loids] for Δuid suggests that the average antimutator must reduce theindel mutation rate by greater than01ndash1 in most organisms (TableS1) in order to be promoted by selection One major limitation of thiskind of analysis is that values of s and t are not well known and arelikely vary across organisms A second and equally important caveat isthat the prior analysis assumes that mutator and antimutator allelesarise with equal frequency Owing to the high level of refinement of thereplication and repair machinery it seems much more likely that mu-tations involving the components of such machinery will increaserather than decrease the mutation rate This will push the equilibriummutation rate to higher levels than expected (Lynch 2008) althoughwithout quantitative information on such bias it is difficult to deter-mine the exact position at which the mutation rate will stall

Finallywenote thatbecause recombinationunlinksalleles fromtheirgenetic background the capacity of selection to enhance replicationfidelity is ultimately a function of the recombination rate (Kimura 1967Lynch 2008) Thus it may be viewed as surprising that bacteria whichdo not undergo meiotic recombination exhibit a relationship between

uid and Ne similar to that in eukaryotic species engaging in periodic toregular meiosis (Figure 1 A and B) It should be noted however thatbacterial recombination occurs through multiple mechanisms (trans-formation conjugation andor transduction) Many bacterial speciesare known to naturally undergo high rates of recombination with ratiosof recombination to mutation rates frequently being comparable tothose in multicellular eukaryotes (Feil and Spratt 2001 Lynch 2007Doroghazi et al 2014 Lassalle et al 2015) so in this sense comparablebehavior of bacterial and eukaryotic species is not unexpected

In summary as in our previous work on the base-substitutionmutation rate (Sung et al 2012a) the strong correlation between thegenome-wide indel rate and Ne appears not to be a statistical artifactMoreover among various hypotheses that have been suggested formutation-rate evolution the DBH appears to provide the most com-patible explanation for the 1000-fold range of variation of this traitacross the Tree of Life As noted above the molecular mechanisms thatgenerate and resolve base-substitution and indel mutations differ in anumber of ways and the rate of occurrence of these two types ofmutations differ by one to two orders of magnitude (with uid rangingfrom 18 to 119 of ubs presumably because of the elevated deleteriouseffects of indel mutations) Yet despite these differences both ubs anduid scale similarly with changes inNe (Figure 3 r2 = 089) Because theforces of mutation selection and drift apply to all biological traits themaximum achievable level of refinement for other fundamental cellulartraits may also be influenced by the drift barrier

ACKNOWLEDGMENTSSupport was provided by the Multidisciplinary University ResearchInitiative Award W911NF-09-1-0444 and from the US ArmyResearch Office to M L P Foster H Tang and S Finkel andW911NF-14-1-0411 to M L P Foster J McKinlay and J T Lennonby CAREER award DEB-0845851 from the National Science Foun-dation to V C and by National Institutes of Health AwardsF32-GM103164 to WS and R01-GM036827 to M L and W KThomas This material is based upon work supported by theNational Science Foundation under grant no CNS-0521433 CNS-0723054 and ABI-1062432 to Indiana UniversityAuthor contributions WS CF VC and ML designed theresearch WS MA MD and TP performed the research WSand MA analyzed the data and WS MA and ML wrote thepaper

LITERATURE CITEDBrandvain Y and S I Wright 2016 The limits of natural selection in a

nonequilibrium world Trends Genet 32 201ndash210Campbell C D and E E Eichler 2013 Properties and rates of germline

mutations in humans Trends Genet 29 575ndash584Charlesworth B 2009 Fundamental concepts in genetics effective popu-

lation size and patterns of molecular evolution and variation Nat RevGenet 10 195ndash205

Conrad D F J E Keebler M A DePristo S J Lindsay Y Zhang et al2011 Variation in genome-wide mutation rates within and betweenhuman families Nat Genet 43 712ndash714

Denver D R K Morris M Lynch and W K Thomas 2004 Highmutation rate and predominance of insertions in the Caenorhabditiselegans nuclear genome Nature 430 679ndash682

Denver D R S Feinberg S Estes W K Thomas and M Lynch 2005 Muta-tion rates spectra and hotspots in mismatch repair-deficient Caenorhabditiselegans Genetics 170 107ndash113

Denver D R P C Dolan L J Wilhelm W Sung J I Lucas-Lledo et al2009 A genome-wide view of Caenorhabditis elegans base-substitutionmutation processes Proc Natl Acad Sci USA 106 16310ndash16314

Figure 3 Relationship between the rate of indel events per site pergeneration (uid) and the base-substitution mutation rate per site pergeneration (ubs) Regression log10(uid) = ndash156(074) + 091(008)log10ubs (r2 = 090 P = 413 middot 1028 df = 13) SE measurementsare shown in parentheses Blue circles represent eubacteria red circlesmulticellular eukaryotes and black circles unicellular eukaryotes withall data summarized in Table 1

Volume 6 August 2016 | Drift Constrains Indel Rate | 2589

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 3: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

diluted and replated Cell density was calculated from viable cell countsin both the growth conditions used throughout the bottleneck processas well as growth conditions at 4 The total number of generations foreachMA line was calculated by the average number of cell divisions pertransfer multiplied by the total number of transfers If backup plateswere used the average number of cell divisions at 4 was used in placeof the average number of cell divisions per bottleneck at standardgrowth temperatures

The average number of cell divisions across the MA are as follows(Dataset S1) A tumefaciens 5819 Bacillus subtilis 5078 (Sung et al2015) E coli 4246 (Lee et al 2012) Mesoplasma florum 2351 (Sunget al 2012a) S epidermidis 7170 and V cholerae 6453 The averagenumber of generations used for reanalysis of the C elegans MA studywas 250 (Denver et al 2009) (Dataset S2)

DNA extraction of MA lines was done using the wizard DNAextraction kit (Promega) or lysis media (CTAB or SDS) followed byphenolchloroform extractions to Illumina library standards Then101-bp paired-end Illumina (Illumina Hi-Seq platform) sequencingwas applied to randomly selected MA lines of A tumefaciensS epidermidis and V cholerae Each MA line was sequenced to acoverage depth of 100 middot with an average library fragment size(distance between paired-end reads) of 175 bp The paired-endreads for each MA line were individually mapped against the refer-ence genome (assembly and annotation available from the NationalCenter for Biotechnology Information httpswwwncbinlmnihgov) using two separate alignment algorithms BWA v074 (Liand Durbin 2009) and NOVOALIGN v20802 (available at wwwnovocraftcom) The resulting pileup files were converted to SAMformat using SAMTOOLS v0118 (Li et al 2009) Using in-houseperl scripts the alignment information was further parsed to gen-erate forward and reverse mapping information at each site result-ing in a configuration of eight numbers for each line (A a C c G gT and t) corresponding to the number of reads mapped at eachgenomic position in the reference sequence A separate file wasalso generated to display sites that had indel calls from the twoalignment algorithms Mutation calling was performed using a con-sensus method (Lynch et al 2008 Denver et al 2009 Ossowski et al2010 Lee et al 2012 Sung et al 2012a 2012b 2015)

A random subset of base-substitutions mutations called using thesemethods have been previously validated in E coli and B subtilis MAlines using fluorescent sequencing technology at the Indiana MolecularBiology Institute at IndianaUniversity (Lee et al 2012 Sung et al 2015)(Dataset S3)

Toverify indelmutationswedesigned38primer sets toPCRamplify300ndash500 bp regions surrounding the putative indel mutation in the Bsubtilis MA lines (Dataset S4) All 2929 short indels ( 10 bp) weredirectly confirmed using standard fluorescent sequencing technologyTwo out of nine large indels ( 10 bp) were confirmed through sizingof the PCR product on gel electrophoresis The remaining seven largeindels did not amplify For all cases the indel was also confirmed to beabsent in one other line without the mutation

To calculate the base-substitution mutation rate per cell division foreach line we used the following equation

ubs frac14 mnT

where ubs is the base-substitution mutation rate (per nucleotide siteper generation) m is the number of observed base substitutions n isthe number of nucleotide sites analyzed and T is the number ofgenerations that occurred in the mutation-accumulation study TheSE for an individual line is calculated using (Denver et al 2004 2009)

SEx frac14ffiffiffiffiffiffiubsnT

r

The total SEof base-substitutionmutation rate is given by the SDof themutation rates across all lines (s) divided by the square root of thenumber of lines analyzed (N)

SEpooled frac14 sffiffiffiffiN

p

The same calculationwasused to calculate indelmutation ratewithubsreplaced with uid

Data availabilityIllumina DNA sequences for the MA lines used in this study aredeposited under the following Bioprojects A tumefaciens PRJNA256312B subtilis PRJNA256312 M florum PRJNA256337 S epidermidisPRJNA256338 and V cholerae PRJNA256339

File S1 contains detailed descriptions of eukaryotic uid estimates aswell as calculations for Ge Gnc us ps and phylogenetic independentcontrasts for both eukaryotic and prokaryotic organisms Figure S1contains average depth of sequencing coverage for each MA line inA tumefaciens S epidermidis and V cholerae Figure S2 displays thesimilarity in us when increasing the number of unique alleles analyzedFigure S3 shows the frequency distribution of mutant calls across MAlines Table S1 contains the calculation for the estimated limit of selec-tion to fix antimutators Figure S4 Figure S5 Figure S6 and Table S2contain statistical support for the DBH Dataset S1 Dataset S2 DatasetS3 and Dataset S4 contain single nucleotide polymorphisms and indelsfor prokaryotic and eukaryotic organisms generated in this study

RESULTSTo examine the effect of genetic drift on mutation-rate evolution it isnecessary to derive accurate estimates of the mutation rate and geneticdiversity across phylogenetically diverse organisms WGS has greatlyimproved our ability to estimate such parameters Highly accuratemeasurements of ubs and uid can be obtained through WGS of MAlines in which repeated single-organism bottlenecking minimizes theefficiency of selection allowing for the accumulation of all but the mostdeleterious mutations (Lynch et al 2008 Denver et al 2009 Ossowskiet al 2010 Sung et al 2012a 2012b 2015 Schrider et al 2013)

The power of genetic drift is related to the inverse of the effectivepopulation size [1Ne for haploids 1(2Ne) for diploids] Under theassumption of neutrality the effective population size (Ne) can be es-timated from the average nucleotide heterozygosity at silent sites innatural populations (ps) or as a function of the number of segregatingsites in the population (us) both of which lead to expected values equalto 4Neubs in diploids and 2Neubs in haploids (Kimura 1983) For mostorganisms analyzed in this study enough WGS data were available toallow calculation of species-specific us values (see File S1 and Table 1)For the remaining species we pooled large available multilocus-sequence studies to estimate ps In all cases we set the estimates of usorps equal to 4Neubs in diploids (2Neubs in haploids) and solved forNe

by factoring out ubs Because this calculation only involves ubs theestimate of Ne is uninfluenced by sampling error in uid thus providingan independent trait measurement by which to test the DBH (see FileS1 for further evaluation of the nonindependence issue)

To provide additional data for testing whether the power of geneticdrift constrains the lower limit of indel mutation-rate evolution weperformed MA experiments in A tumefaciens str C58 S epidermidisATCC 12228 and V cholerae 2740-80 Each bacterial MA experimentwas initiated from multiple lines derived from a single progenitor

Volume 6 August 2016 | Drift Constrains Indel Rate | 2585

colony each of which was repeatedly bottlenecked to accumulate mu-tations for an average of 5819 7170 and 6453 generations respectively(seeMaterials and Methods harmonic mean population sizes betweentransfers were 134 (01) 126 (03) and 149 (02) respectively) Then101-bp paired-end WGS was applied to randomly selected MA lines(47 A tumefaciens 22 S epidermidis and 46 V cholerae MA linesDataset S1) The average sequencing coverage depth is greater than20 middot per site across all MA lines surveyed in these organisms (FigureS1) and greater than 50 middot per site for 9375 (150160) of the MAlines providing high accuracy for measurement of ubs and uid Muta-tions were called and categorized for each of the three species (DatasetS3 and Dataset S4) with ubs and uid shown in Table 1

To test the DBH we combined ubs and uid from the three bacterialspecies analyzed in this study with ubs and uid from four bacterialand eight eukaryotic MAWGS studies (Table 1 Dataset S1 DatasetS2 Dataset S3 and Dataset S4) and also included the same esti-mates for human derived from WGS of parent-offspring trios uidincludes all indel events in each of the 15 study species (see File S1)Due to the highly repetitive DNA sequence in eukaryotic genomesthe number of large indels events ( 9 bp) in eukaryotes may bedownwardly biased when using WGS methods Therefore our es-

timate of the number of large indel events also includes eventsidentified by comparative genome hybridization arrays for organ-isms where data were available (Lynch et al 2008 Lipinski et al2011) Large indel events only account for 150 of total indelsevents across the study bacteria (76506 Dataset S4) suggestingthat any underestimation of the number of large indel events shouldonly have a small effect on uid

To determine the genome-wide deleterious burden in eachorganismassociated with indel mutations we multiplied uid with Ge approxi-mating the latter by the proteome size of that organism A plot of thelogs of the two parameters of uidGe and Ne against one another yieldsa strong negative correlation across all of cellular life (Figure 1Ar2 = 089) Because the power of genetic drift is inversely proportionaltoNe this observation is consistent with the idea that selection operatesto reduce mutation rates to a barrier imposed by random genetic driftPhylogenetic nonindependence may complicate observed relationshipsbetween genomic attributes andNe (Whitney andGarland 2010) How-ever the relationship between Ne and uidGe remains robust even afterphylogenetic correction (Figure 2 A and B r2 = 083) indicating thatthe correlation betweenNe and uidGe reflects a true biological phenom-enon across the Tree of Life

n Table 1 Effective genome size (Ge) indel events per site per generation (uid) base-substitution mutation rate per generation (ubs) us (orps denoted by ) measurements for population mutation rate (Watterson 1975 Tajima 1989 Fu 1995) and estimated effective populationsize (Ne) for seven prokaryotic and eight eukaryotic organisms (see File S1 for details)

Species LabelGe

(middot 107 Sites)Gc + Gnc

(middot 107 Sites)uid (middot 10210 per

Site per Generation)ubs (middot 10210 Events perSite per Generation) us or ps

Ne

(middot 106)

ProkaryotesAgrobacterium tumefaciens Agt 050 057 030 292 0200 34247Bacillus subtilis Bs 036 043 120d 335d 0041 6119Escherichia coli Ec 039 046 037e 200e 0071 17960Mesoplasma florum Mf 007 008 2310f 9780f 0021 107Pseudomonas aeruginosa Pa 059 067 014g 079g 0033 21070Staphlyococcus epidermidis Se 021 026 113 740 0052 3514Vibrio cholerae Vc 034 039 018 115 0110 47826

EukaryotesArabidopsis thaliana At 421 555a 1120h 6950hp 0008 029Caenorhabditis elegans Ce 250 637b 669i 1450q 0003 054Chlamydomonas reinhardtii Cr 392 551 044 j 380j 0032 4331Drosophila melanogaster Dm 232 886c 461k 5165k 0018 086Homo sapiens Hs 365 2175b 1820l 13513l 0001 002Mus musculus Mm 355 2717b 310m 5400m 0004 177Paramecium tetraurelia Pt 568 728 004n 019n 0008 10180Saccharomyces cerevisiae Sc 087 102b 092o 263o 0004 778

Gc + Gnc is the effective genome size when including the total amount of coding (Gc) and noncoding DNA (Gnc) that is estimated to be under purifying selectionFootnotes in uid and ubs indicate data sources (rates pooled when multiple data sources are available) and when absent indicate data generated in this study (seeMaterials and Methods)aHaudry et al (2013)

bSiepel et al (2005)

cHalligan et al (2004)

dSung et al (2015)

eLee et al (2012)

fSung et al (2012a)gSung et al (2012b)

hOssowski et al (2010)

iLipinski et al (2011)jSung et al (2012a) Ness et al (2015)kSchrider et al (2013)

lConrad et al (2011) OrsquoRoak et al (2011 2012) Kong et al (2012) Campbell and Eichler (2013) Wang and Zhu (2014) The 1000 Genomes Project Consortium(2015)

mUchimura et al (2015)

nSung et al (2012b)

oLynch et al 2008) (Zhu et al 2014)

pOssowski et al (2010) (Yang et al 2015)

qLipinski et al (2011)

2586 | W Sung et al

DISCUSSIONBecause the DBH makes general predictions about the pattern ofmolecular and cellular evolution across the Tree of Life because ourfocus is on one of the central determining factors in the evolutionaryprocess (themutation rate) andbecause the patterns appear so strong itis essential to consider the range of factors that might give rise to theobserved statistical relationships and also to alternative evolutionaryhypotheses for them We first consider three issues with respect toestimating the key parametersNe ubs uid andGe and then elaborate onthe significance and implications of the relationship between uidGe andNe for our understanding of molecular evolution

First we address the estimation ofNe one of themost difficult issuesin empirical population genetics Because populations fluctuate in den-sity over time any estimate of Ne must reflect a long-term averagepresumably approximating a harmonic mean not the immediate pop-ulation state Because evolution is a long-term process however themean is most relevant to the issues being examined herein Recentselective sweeps or population bottlenecks can transiently modify levelsof genetic variation at individual loci (Charlesworth 2009 Karasov et al2010) introducing noise into any estimates of Ne derived from limitednumbers of genetic loci but this would reduce the strength of any trueunderlying correlation between the rate of mutation (uidGe) and long-term Ne ie would operate against our ability to detect the expectedsignal of the DBH

Such effects are especially likely in asexual species where thepossibility of reduced recombination might subject many neutral nu-cleotide sites to the effects of selection on nearby linked sites Thusto minimize sampling error wherever possible we have relied upongenome-wide sampling of the number of segregating sites to obtain alow-variance estimator of Neu from observations on silent sites(Watterson 1975) The utilization of an average us across a large num-ber of nucleotide sites and individual isolates reduces the effects ofevolutionary sampling variance associated with chromosomally local-ized and population-specific sweeps arising within individual species(Fu and Li 1993) Using available genomic data we calculated us acrossa large number of within-species genotypic isolates excluding nearlyidentical lab strains that originated from the same individual (seeMaterials and Methods) Although no estimates of silent-site diversity(the source of Ne estimates) are without error estimates derived fromsegregating polymorphic sites across large-scale genomic data sets ap-pear quite robust (Figure S2) Moreover should the levels of variation

sampled in our various study species reflect recent events to whichmutation-rate evolution has not had adequate time to respond(Brandvain and Wright 2016) this would only introduce noise intothe relationship between effective population size and mutation rates

Second as we have noted earlier there is some concern thatcorrelations between estimates of mutation rates and Ne could in partbe spurious artifacts resulting from the use of estimates of Ne obtainedby dividing measures of standing variation at silent-sites by ubs (Sunget al 2012a) If the sampling variance of ubs is substantial enough thiscould lead to a negative correlation between the observed ubs andextrapolated Ne estimates and if there were a sampling covariancebetween ubs and uid this could carry over into the current study Inthe Supplemental Material (File S1 Figure S4 Figure S5 Figure S6 andFigure S7) we provide complementary analyses to that in Sung et al(2012a) indicating that the sampling variance of ubs from WGS-MAstudies is not large enough to explain the negative correlation previ-ously seen between ubs and Ne estimates Because ubs and uid are mea-sured by different methods the sampling covariance between these twomeasures is expected to be negligible We emphasize that it is thesampling variance not the evolutionary variance that is of concernhere The variance of the log-scaled values of ubs would have to exceedthe log-scaled values of Ne by two orders of magnitude in order tocreate the negative correlations that we observe (File S1) As an extremeway of looking at the situation if silent-site variation were constantacross all taxa and the parametric values of mutation rates andNewereobtained without error the only explanation for the data would be atrue underlying negative evolutionary covariance between the two fea-tures In fact there is a marginal negative correlation between estimatesof ps and ubs (Figure S3 Figure S4 Figure S5 Figure S6 Figure S7 andTable S2) further bolstering the idea that ubs and uid decline evolution-arily as Ne increases

Third the DBH proposes that the strength of selection operating toreduce the indel mutation rate is based upon the total indel deleteriousmutational load ie the product of themutational rate of appearance ofindels at individual nucleotide sites (uid) and the number of sites underselective constraint in the genome (Ge approximated by the proteomesize of the organism) However some noncodingDNA (eg noncodingfunctional RNAs and cis-regulatory units in untranslated regions orintrons) is certainly under selective constraint with mutations at thesesites increasing the deleterious mutational load Thus it can be arguedthat the estimated number of nucleotides affecting fitness (Ge) scales

Figure 1 Relationship betweenthe rate of indel events pergeneration per effective ge-nome (uidGe) and effective pop-ulation size (Ne) (A) Regressionlog10(uidGe) = 223(048) ndash 073(007)log10Ne (r2 = 089 P = 681middot 1028 df = 13) with SE ofparameter estimates shown inparentheses Blue circles repre-sent bacteria red circles multi-cellular eukaryotes and blackcircles unicellular eukaryoteswith all data summarized inTable 1 The full list of indelevents for analyzed organismsis presented in Dataset S4

Chromosomal distributions of indel events at each site across all mutation-accumulation experiments are shown in Figure S1 A and B (B)Relationship when adding the number of estimated noncoding sites under purifying selection into the effective genome size (Gc + Gnc) foreukaryotic organisms Regression log10[uid(Gc + Gnc)] = 349(066) ndash 087(009)log10Ne (r2 = 087 P = 313 middot 1027 df = 13)

Volume 6 August 2016 | Drift Constrains Indel Rate | 2587

differently than the protein-coding region of the genome particularlyin larger eukaryotic genomes with a considerable number of noncodingsites (Halligan et al 2004 Siepel et al 2005 Halligan and Keightley2006) Difficulties can arise when estimating the proportion of non-coding DNA that is under selective constraint (Gnc) as the estimatednumber of such sites can vary greatly depending on the model used todefine noncoding DNA and the identification of conserved noncodingDNA is highly sensitive to the available phylogeny (Siepel et al 2005)Nevertheless if we sum the estimated total amount of noncoding DNAunder selective constraint (Gnc see File S1) with that of coding DNA(Gc) we find that uid(Gc +Gnc) andNe remain highly correlated (Figure1B r2 = 087) simply because the fraction of functional noncodingDNA increases with the total amount of coding DNA

We currently adhere to the DBH as an explanation for the phylo-genetic pattern of mutation-rate variation primarily because it has beendifficult to reconcile the patterns with alternative hypotheses In theintroduction we provided arguments as to why selection for replicationspeed appears to be unlikely to explain a negative correlation betweenmutation rates and population size in unicellular species and inmulticellular species the simultaneous deployment of hundreds tothousands of origins of replication makes such an explanation evenmore unlikely Nor does a general constraint on replication fidelityexplain the data

A second potential explanation for variation in the per-generationmutation rate is that it is driven largely by variation in numbers ofgermline cell divisions (Ness et al 2012) but this cannot be reconciledwith the fact that the base-substitution mutation rate scales negativelywith Ne in analyses entirely restricted to unicellular species (Sung et al2012a) In all such species there is one cell division per generation andyet the base-substitution mutation rate per site per cell division rangesfrom 10211 in Paramecium tetraurelia (Sung et al 2012b) to1028 inM florum (Sung et al 2012a) Similarly the number of indelmutational events per site per cell division differs by over two orders ofmagnitude across unicellular organisms (Table 1 and Figure 3) and thenegative regression with Ne remains significant when confined to uni-cellular species (Figure 1 r2 = 066 P = 0003)

A third hypothesis for mutation-rate evolution is that selection iseffectiveenoughtoreduce theerror rate tothepointatwhichthephysicallaws of thermodynamics take over (Kimura 1967) However it is dif-ficult to reconcile this argument with the data now showing that mu-tation rates vary by three orders of magnitude as there are no knownmechanisms by which basic biophysical features (such as diffusioncoefficients and stochastic molecularmotion) would vary by this degreeamong the cytoplasms of different taxa There is of course the issue ofevolved differences in the biochemical features and efficiency of oper-ation of the proteins involved in replication and repair However thistype of variation is in the explanatory domain of the DBH The DBHpostulates that replication fidelity is typically not at the maximumpossible level of refinement but just the lowest level possible underthe prevailing level of random genetic drift which varies substantiallyamong lineages

That a decline in replication fidelity should decline with decreasingeffective population size appears to be a unique prediction of the DBHAlthough other theoretical work has been done on mutation-rateevolution in no case is this type of scaling obviously predicted(acknowledging that this has not been a central focus of such work)For example allowing for a role of beneficial mutations Kimura (1967)and Leigh (1970) suggested that the long-term rate of adaptation ismaximized when the genome-wide mutation rate equals the rate ofpopulation fixation of beneficial mutations The precise predictionsof this hypothesis are not entirely clear but because mutations ariseat a higher rate in large populations and if beneficial fix with higherprobabilities a positive association between the mutation rate and Ne

seems to be implied A rather different model argues that populationsshould evolve genome-wide mutation rates equal to the average effectof a deleterious mutation (Orr 2000 Johnson and Barton 2002) whichseems to imply an optimal mutation rate independent of populationsize (unless one wishes to postulate an association between averagemutational effect and Ne for which we are unaware of any evidence)

The DBH proposes that new alleles that reduce the genome-wideindel mutation rate (ie anti-mutators) can be promoted by selectiononly if they provide a significant enough advantage to offset the power

Figure 2 Relationship between indel events per site per generation (uidGe) and effective population size (Ne) after phylogenetic correction (A)Standardized phylogenetically independent contrasts performed using Compare (Martins 2004) and the PDAP module in Mesquite (Garland et al1993) with branch lengths of 10 The regression equation of the contrasts through the origin is uidGe = ndash060(007)Ne (r2 = 083 P = 128 middot 1026df = 13) with SE in parentheses (B) Phylogenetic tree showing the relationship between organisms

2588 | W Sung et al

of genetic drift The average selective effect of an antimutator or muta-tor allele (which operate opposite to each other) can be approximatedby stΔUid with ΔUid representing the change in the genome-wide indelmutation rate with respect to the population mean rate s being theaverage reduction in fitness per mutation (Lynch 2010) and t being thenumber of generations a mutation remains associated with its mutatorgenetic background (Lynch 2011) ΔUid can be approximated by thechange in the indel mutation rate over the effective genome or ΔuidGe

(Lynch 2011) By setting stΔuidGe equal to the power of random geneticdrift [1Ne for haploids 1(2Ne) for diploids] we can acquire somesense of the average reduction in the indelmutation rate that is requiredfor the power of selection to exceed power of genetic drift Usingestimates of an average value of the selective coefficient (s = 001)(Lynch et al 1999 Eyre-Walker and Keightley 2007) and assumingthat free recombination unlinks mutation-rate modifier alleles fromtheir background every 2 generations in sexually outcrossing species(t = 2) (Lynch 2010) solving stΔuidGe = 1Ne [= 1(2Ne) for dip-loids] for Δuid suggests that the average antimutator must reduce theindel mutation rate by greater than01ndash1 in most organisms (TableS1) in order to be promoted by selection One major limitation of thiskind of analysis is that values of s and t are not well known and arelikely vary across organisms A second and equally important caveat isthat the prior analysis assumes that mutator and antimutator allelesarise with equal frequency Owing to the high level of refinement of thereplication and repair machinery it seems much more likely that mu-tations involving the components of such machinery will increaserather than decrease the mutation rate This will push the equilibriummutation rate to higher levels than expected (Lynch 2008) althoughwithout quantitative information on such bias it is difficult to deter-mine the exact position at which the mutation rate will stall

Finallywenote thatbecause recombinationunlinksalleles fromtheirgenetic background the capacity of selection to enhance replicationfidelity is ultimately a function of the recombination rate (Kimura 1967Lynch 2008) Thus it may be viewed as surprising that bacteria whichdo not undergo meiotic recombination exhibit a relationship between

uid and Ne similar to that in eukaryotic species engaging in periodic toregular meiosis (Figure 1 A and B) It should be noted however thatbacterial recombination occurs through multiple mechanisms (trans-formation conjugation andor transduction) Many bacterial speciesare known to naturally undergo high rates of recombination with ratiosof recombination to mutation rates frequently being comparable tothose in multicellular eukaryotes (Feil and Spratt 2001 Lynch 2007Doroghazi et al 2014 Lassalle et al 2015) so in this sense comparablebehavior of bacterial and eukaryotic species is not unexpected

In summary as in our previous work on the base-substitutionmutation rate (Sung et al 2012a) the strong correlation between thegenome-wide indel rate and Ne appears not to be a statistical artifactMoreover among various hypotheses that have been suggested formutation-rate evolution the DBH appears to provide the most com-patible explanation for the 1000-fold range of variation of this traitacross the Tree of Life As noted above the molecular mechanisms thatgenerate and resolve base-substitution and indel mutations differ in anumber of ways and the rate of occurrence of these two types ofmutations differ by one to two orders of magnitude (with uid rangingfrom 18 to 119 of ubs presumably because of the elevated deleteriouseffects of indel mutations) Yet despite these differences both ubs anduid scale similarly with changes inNe (Figure 3 r2 = 089) Because theforces of mutation selection and drift apply to all biological traits themaximum achievable level of refinement for other fundamental cellulartraits may also be influenced by the drift barrier

ACKNOWLEDGMENTSSupport was provided by the Multidisciplinary University ResearchInitiative Award W911NF-09-1-0444 and from the US ArmyResearch Office to M L P Foster H Tang and S Finkel andW911NF-14-1-0411 to M L P Foster J McKinlay and J T Lennonby CAREER award DEB-0845851 from the National Science Foun-dation to V C and by National Institutes of Health AwardsF32-GM103164 to WS and R01-GM036827 to M L and W KThomas This material is based upon work supported by theNational Science Foundation under grant no CNS-0521433 CNS-0723054 and ABI-1062432 to Indiana UniversityAuthor contributions WS CF VC and ML designed theresearch WS MA MD and TP performed the research WSand MA analyzed the data and WS MA and ML wrote thepaper

LITERATURE CITEDBrandvain Y and S I Wright 2016 The limits of natural selection in a

nonequilibrium world Trends Genet 32 201ndash210Campbell C D and E E Eichler 2013 Properties and rates of germline

mutations in humans Trends Genet 29 575ndash584Charlesworth B 2009 Fundamental concepts in genetics effective popu-

lation size and patterns of molecular evolution and variation Nat RevGenet 10 195ndash205

Conrad D F J E Keebler M A DePristo S J Lindsay Y Zhang et al2011 Variation in genome-wide mutation rates within and betweenhuman families Nat Genet 43 712ndash714

Denver D R K Morris M Lynch and W K Thomas 2004 Highmutation rate and predominance of insertions in the Caenorhabditiselegans nuclear genome Nature 430 679ndash682

Denver D R S Feinberg S Estes W K Thomas and M Lynch 2005 Muta-tion rates spectra and hotspots in mismatch repair-deficient Caenorhabditiselegans Genetics 170 107ndash113

Denver D R P C Dolan L J Wilhelm W Sung J I Lucas-Lledo et al2009 A genome-wide view of Caenorhabditis elegans base-substitutionmutation processes Proc Natl Acad Sci USA 106 16310ndash16314

Figure 3 Relationship between the rate of indel events per site pergeneration (uid) and the base-substitution mutation rate per site pergeneration (ubs) Regression log10(uid) = ndash156(074) + 091(008)log10ubs (r2 = 090 P = 413 middot 1028 df = 13) SE measurementsare shown in parentheses Blue circles represent eubacteria red circlesmulticellular eukaryotes and black circles unicellular eukaryotes withall data summarized in Table 1

Volume 6 August 2016 | Drift Constrains Indel Rate | 2589

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 4: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

colony each of which was repeatedly bottlenecked to accumulate mu-tations for an average of 5819 7170 and 6453 generations respectively(seeMaterials and Methods harmonic mean population sizes betweentransfers were 134 (01) 126 (03) and 149 (02) respectively) Then101-bp paired-end WGS was applied to randomly selected MA lines(47 A tumefaciens 22 S epidermidis and 46 V cholerae MA linesDataset S1) The average sequencing coverage depth is greater than20 middot per site across all MA lines surveyed in these organisms (FigureS1) and greater than 50 middot per site for 9375 (150160) of the MAlines providing high accuracy for measurement of ubs and uid Muta-tions were called and categorized for each of the three species (DatasetS3 and Dataset S4) with ubs and uid shown in Table 1

To test the DBH we combined ubs and uid from the three bacterialspecies analyzed in this study with ubs and uid from four bacterialand eight eukaryotic MAWGS studies (Table 1 Dataset S1 DatasetS2 Dataset S3 and Dataset S4) and also included the same esti-mates for human derived from WGS of parent-offspring trios uidincludes all indel events in each of the 15 study species (see File S1)Due to the highly repetitive DNA sequence in eukaryotic genomesthe number of large indels events ( 9 bp) in eukaryotes may bedownwardly biased when using WGS methods Therefore our es-

timate of the number of large indel events also includes eventsidentified by comparative genome hybridization arrays for organ-isms where data were available (Lynch et al 2008 Lipinski et al2011) Large indel events only account for 150 of total indelsevents across the study bacteria (76506 Dataset S4) suggestingthat any underestimation of the number of large indel events shouldonly have a small effect on uid

To determine the genome-wide deleterious burden in eachorganismassociated with indel mutations we multiplied uid with Ge approxi-mating the latter by the proteome size of that organism A plot of thelogs of the two parameters of uidGe and Ne against one another yieldsa strong negative correlation across all of cellular life (Figure 1Ar2 = 089) Because the power of genetic drift is inversely proportionaltoNe this observation is consistent with the idea that selection operatesto reduce mutation rates to a barrier imposed by random genetic driftPhylogenetic nonindependence may complicate observed relationshipsbetween genomic attributes andNe (Whitney andGarland 2010) How-ever the relationship between Ne and uidGe remains robust even afterphylogenetic correction (Figure 2 A and B r2 = 083) indicating thatthe correlation betweenNe and uidGe reflects a true biological phenom-enon across the Tree of Life

n Table 1 Effective genome size (Ge) indel events per site per generation (uid) base-substitution mutation rate per generation (ubs) us (orps denoted by ) measurements for population mutation rate (Watterson 1975 Tajima 1989 Fu 1995) and estimated effective populationsize (Ne) for seven prokaryotic and eight eukaryotic organisms (see File S1 for details)

Species LabelGe

(middot 107 Sites)Gc + Gnc

(middot 107 Sites)uid (middot 10210 per

Site per Generation)ubs (middot 10210 Events perSite per Generation) us or ps

Ne

(middot 106)

ProkaryotesAgrobacterium tumefaciens Agt 050 057 030 292 0200 34247Bacillus subtilis Bs 036 043 120d 335d 0041 6119Escherichia coli Ec 039 046 037e 200e 0071 17960Mesoplasma florum Mf 007 008 2310f 9780f 0021 107Pseudomonas aeruginosa Pa 059 067 014g 079g 0033 21070Staphlyococcus epidermidis Se 021 026 113 740 0052 3514Vibrio cholerae Vc 034 039 018 115 0110 47826

EukaryotesArabidopsis thaliana At 421 555a 1120h 6950hp 0008 029Caenorhabditis elegans Ce 250 637b 669i 1450q 0003 054Chlamydomonas reinhardtii Cr 392 551 044 j 380j 0032 4331Drosophila melanogaster Dm 232 886c 461k 5165k 0018 086Homo sapiens Hs 365 2175b 1820l 13513l 0001 002Mus musculus Mm 355 2717b 310m 5400m 0004 177Paramecium tetraurelia Pt 568 728 004n 019n 0008 10180Saccharomyces cerevisiae Sc 087 102b 092o 263o 0004 778

Gc + Gnc is the effective genome size when including the total amount of coding (Gc) and noncoding DNA (Gnc) that is estimated to be under purifying selectionFootnotes in uid and ubs indicate data sources (rates pooled when multiple data sources are available) and when absent indicate data generated in this study (seeMaterials and Methods)aHaudry et al (2013)

bSiepel et al (2005)

cHalligan et al (2004)

dSung et al (2015)

eLee et al (2012)

fSung et al (2012a)gSung et al (2012b)

hOssowski et al (2010)

iLipinski et al (2011)jSung et al (2012a) Ness et al (2015)kSchrider et al (2013)

lConrad et al (2011) OrsquoRoak et al (2011 2012) Kong et al (2012) Campbell and Eichler (2013) Wang and Zhu (2014) The 1000 Genomes Project Consortium(2015)

mUchimura et al (2015)

nSung et al (2012b)

oLynch et al 2008) (Zhu et al 2014)

pOssowski et al (2010) (Yang et al 2015)

qLipinski et al (2011)

2586 | W Sung et al

DISCUSSIONBecause the DBH makes general predictions about the pattern ofmolecular and cellular evolution across the Tree of Life because ourfocus is on one of the central determining factors in the evolutionaryprocess (themutation rate) andbecause the patterns appear so strong itis essential to consider the range of factors that might give rise to theobserved statistical relationships and also to alternative evolutionaryhypotheses for them We first consider three issues with respect toestimating the key parametersNe ubs uid andGe and then elaborate onthe significance and implications of the relationship between uidGe andNe for our understanding of molecular evolution

First we address the estimation ofNe one of themost difficult issuesin empirical population genetics Because populations fluctuate in den-sity over time any estimate of Ne must reflect a long-term averagepresumably approximating a harmonic mean not the immediate pop-ulation state Because evolution is a long-term process however themean is most relevant to the issues being examined herein Recentselective sweeps or population bottlenecks can transiently modify levelsof genetic variation at individual loci (Charlesworth 2009 Karasov et al2010) introducing noise into any estimates of Ne derived from limitednumbers of genetic loci but this would reduce the strength of any trueunderlying correlation between the rate of mutation (uidGe) and long-term Ne ie would operate against our ability to detect the expectedsignal of the DBH

Such effects are especially likely in asexual species where thepossibility of reduced recombination might subject many neutral nu-cleotide sites to the effects of selection on nearby linked sites Thusto minimize sampling error wherever possible we have relied upongenome-wide sampling of the number of segregating sites to obtain alow-variance estimator of Neu from observations on silent sites(Watterson 1975) The utilization of an average us across a large num-ber of nucleotide sites and individual isolates reduces the effects ofevolutionary sampling variance associated with chromosomally local-ized and population-specific sweeps arising within individual species(Fu and Li 1993) Using available genomic data we calculated us acrossa large number of within-species genotypic isolates excluding nearlyidentical lab strains that originated from the same individual (seeMaterials and Methods) Although no estimates of silent-site diversity(the source of Ne estimates) are without error estimates derived fromsegregating polymorphic sites across large-scale genomic data sets ap-pear quite robust (Figure S2) Moreover should the levels of variation

sampled in our various study species reflect recent events to whichmutation-rate evolution has not had adequate time to respond(Brandvain and Wright 2016) this would only introduce noise intothe relationship between effective population size and mutation rates

Second as we have noted earlier there is some concern thatcorrelations between estimates of mutation rates and Ne could in partbe spurious artifacts resulting from the use of estimates of Ne obtainedby dividing measures of standing variation at silent-sites by ubs (Sunget al 2012a) If the sampling variance of ubs is substantial enough thiscould lead to a negative correlation between the observed ubs andextrapolated Ne estimates and if there were a sampling covariancebetween ubs and uid this could carry over into the current study Inthe Supplemental Material (File S1 Figure S4 Figure S5 Figure S6 andFigure S7) we provide complementary analyses to that in Sung et al(2012a) indicating that the sampling variance of ubs from WGS-MAstudies is not large enough to explain the negative correlation previ-ously seen between ubs and Ne estimates Because ubs and uid are mea-sured by different methods the sampling covariance between these twomeasures is expected to be negligible We emphasize that it is thesampling variance not the evolutionary variance that is of concernhere The variance of the log-scaled values of ubs would have to exceedthe log-scaled values of Ne by two orders of magnitude in order tocreate the negative correlations that we observe (File S1) As an extremeway of looking at the situation if silent-site variation were constantacross all taxa and the parametric values of mutation rates andNewereobtained without error the only explanation for the data would be atrue underlying negative evolutionary covariance between the two fea-tures In fact there is a marginal negative correlation between estimatesof ps and ubs (Figure S3 Figure S4 Figure S5 Figure S6 Figure S7 andTable S2) further bolstering the idea that ubs and uid decline evolution-arily as Ne increases

Third the DBH proposes that the strength of selection operating toreduce the indel mutation rate is based upon the total indel deleteriousmutational load ie the product of themutational rate of appearance ofindels at individual nucleotide sites (uid) and the number of sites underselective constraint in the genome (Ge approximated by the proteomesize of the organism) However some noncodingDNA (eg noncodingfunctional RNAs and cis-regulatory units in untranslated regions orintrons) is certainly under selective constraint with mutations at thesesites increasing the deleterious mutational load Thus it can be arguedthat the estimated number of nucleotides affecting fitness (Ge) scales

Figure 1 Relationship betweenthe rate of indel events pergeneration per effective ge-nome (uidGe) and effective pop-ulation size (Ne) (A) Regressionlog10(uidGe) = 223(048) ndash 073(007)log10Ne (r2 = 089 P = 681middot 1028 df = 13) with SE ofparameter estimates shown inparentheses Blue circles repre-sent bacteria red circles multi-cellular eukaryotes and blackcircles unicellular eukaryoteswith all data summarized inTable 1 The full list of indelevents for analyzed organismsis presented in Dataset S4

Chromosomal distributions of indel events at each site across all mutation-accumulation experiments are shown in Figure S1 A and B (B)Relationship when adding the number of estimated noncoding sites under purifying selection into the effective genome size (Gc + Gnc) foreukaryotic organisms Regression log10[uid(Gc + Gnc)] = 349(066) ndash 087(009)log10Ne (r2 = 087 P = 313 middot 1027 df = 13)

Volume 6 August 2016 | Drift Constrains Indel Rate | 2587

differently than the protein-coding region of the genome particularlyin larger eukaryotic genomes with a considerable number of noncodingsites (Halligan et al 2004 Siepel et al 2005 Halligan and Keightley2006) Difficulties can arise when estimating the proportion of non-coding DNA that is under selective constraint (Gnc) as the estimatednumber of such sites can vary greatly depending on the model used todefine noncoding DNA and the identification of conserved noncodingDNA is highly sensitive to the available phylogeny (Siepel et al 2005)Nevertheless if we sum the estimated total amount of noncoding DNAunder selective constraint (Gnc see File S1) with that of coding DNA(Gc) we find that uid(Gc +Gnc) andNe remain highly correlated (Figure1B r2 = 087) simply because the fraction of functional noncodingDNA increases with the total amount of coding DNA

We currently adhere to the DBH as an explanation for the phylo-genetic pattern of mutation-rate variation primarily because it has beendifficult to reconcile the patterns with alternative hypotheses In theintroduction we provided arguments as to why selection for replicationspeed appears to be unlikely to explain a negative correlation betweenmutation rates and population size in unicellular species and inmulticellular species the simultaneous deployment of hundreds tothousands of origins of replication makes such an explanation evenmore unlikely Nor does a general constraint on replication fidelityexplain the data

A second potential explanation for variation in the per-generationmutation rate is that it is driven largely by variation in numbers ofgermline cell divisions (Ness et al 2012) but this cannot be reconciledwith the fact that the base-substitution mutation rate scales negativelywith Ne in analyses entirely restricted to unicellular species (Sung et al2012a) In all such species there is one cell division per generation andyet the base-substitution mutation rate per site per cell division rangesfrom 10211 in Paramecium tetraurelia (Sung et al 2012b) to1028 inM florum (Sung et al 2012a) Similarly the number of indelmutational events per site per cell division differs by over two orders ofmagnitude across unicellular organisms (Table 1 and Figure 3) and thenegative regression with Ne remains significant when confined to uni-cellular species (Figure 1 r2 = 066 P = 0003)

A third hypothesis for mutation-rate evolution is that selection iseffectiveenoughtoreduce theerror rate tothepointatwhichthephysicallaws of thermodynamics take over (Kimura 1967) However it is dif-ficult to reconcile this argument with the data now showing that mu-tation rates vary by three orders of magnitude as there are no knownmechanisms by which basic biophysical features (such as diffusioncoefficients and stochastic molecularmotion) would vary by this degreeamong the cytoplasms of different taxa There is of course the issue ofevolved differences in the biochemical features and efficiency of oper-ation of the proteins involved in replication and repair However thistype of variation is in the explanatory domain of the DBH The DBHpostulates that replication fidelity is typically not at the maximumpossible level of refinement but just the lowest level possible underthe prevailing level of random genetic drift which varies substantiallyamong lineages

That a decline in replication fidelity should decline with decreasingeffective population size appears to be a unique prediction of the DBHAlthough other theoretical work has been done on mutation-rateevolution in no case is this type of scaling obviously predicted(acknowledging that this has not been a central focus of such work)For example allowing for a role of beneficial mutations Kimura (1967)and Leigh (1970) suggested that the long-term rate of adaptation ismaximized when the genome-wide mutation rate equals the rate ofpopulation fixation of beneficial mutations The precise predictionsof this hypothesis are not entirely clear but because mutations ariseat a higher rate in large populations and if beneficial fix with higherprobabilities a positive association between the mutation rate and Ne

seems to be implied A rather different model argues that populationsshould evolve genome-wide mutation rates equal to the average effectof a deleterious mutation (Orr 2000 Johnson and Barton 2002) whichseems to imply an optimal mutation rate independent of populationsize (unless one wishes to postulate an association between averagemutational effect and Ne for which we are unaware of any evidence)

The DBH proposes that new alleles that reduce the genome-wideindel mutation rate (ie anti-mutators) can be promoted by selectiononly if they provide a significant enough advantage to offset the power

Figure 2 Relationship between indel events per site per generation (uidGe) and effective population size (Ne) after phylogenetic correction (A)Standardized phylogenetically independent contrasts performed using Compare (Martins 2004) and the PDAP module in Mesquite (Garland et al1993) with branch lengths of 10 The regression equation of the contrasts through the origin is uidGe = ndash060(007)Ne (r2 = 083 P = 128 middot 1026df = 13) with SE in parentheses (B) Phylogenetic tree showing the relationship between organisms

2588 | W Sung et al

of genetic drift The average selective effect of an antimutator or muta-tor allele (which operate opposite to each other) can be approximatedby stΔUid with ΔUid representing the change in the genome-wide indelmutation rate with respect to the population mean rate s being theaverage reduction in fitness per mutation (Lynch 2010) and t being thenumber of generations a mutation remains associated with its mutatorgenetic background (Lynch 2011) ΔUid can be approximated by thechange in the indel mutation rate over the effective genome or ΔuidGe

(Lynch 2011) By setting stΔuidGe equal to the power of random geneticdrift [1Ne for haploids 1(2Ne) for diploids] we can acquire somesense of the average reduction in the indelmutation rate that is requiredfor the power of selection to exceed power of genetic drift Usingestimates of an average value of the selective coefficient (s = 001)(Lynch et al 1999 Eyre-Walker and Keightley 2007) and assumingthat free recombination unlinks mutation-rate modifier alleles fromtheir background every 2 generations in sexually outcrossing species(t = 2) (Lynch 2010) solving stΔuidGe = 1Ne [= 1(2Ne) for dip-loids] for Δuid suggests that the average antimutator must reduce theindel mutation rate by greater than01ndash1 in most organisms (TableS1) in order to be promoted by selection One major limitation of thiskind of analysis is that values of s and t are not well known and arelikely vary across organisms A second and equally important caveat isthat the prior analysis assumes that mutator and antimutator allelesarise with equal frequency Owing to the high level of refinement of thereplication and repair machinery it seems much more likely that mu-tations involving the components of such machinery will increaserather than decrease the mutation rate This will push the equilibriummutation rate to higher levels than expected (Lynch 2008) althoughwithout quantitative information on such bias it is difficult to deter-mine the exact position at which the mutation rate will stall

Finallywenote thatbecause recombinationunlinksalleles fromtheirgenetic background the capacity of selection to enhance replicationfidelity is ultimately a function of the recombination rate (Kimura 1967Lynch 2008) Thus it may be viewed as surprising that bacteria whichdo not undergo meiotic recombination exhibit a relationship between

uid and Ne similar to that in eukaryotic species engaging in periodic toregular meiosis (Figure 1 A and B) It should be noted however thatbacterial recombination occurs through multiple mechanisms (trans-formation conjugation andor transduction) Many bacterial speciesare known to naturally undergo high rates of recombination with ratiosof recombination to mutation rates frequently being comparable tothose in multicellular eukaryotes (Feil and Spratt 2001 Lynch 2007Doroghazi et al 2014 Lassalle et al 2015) so in this sense comparablebehavior of bacterial and eukaryotic species is not unexpected

In summary as in our previous work on the base-substitutionmutation rate (Sung et al 2012a) the strong correlation between thegenome-wide indel rate and Ne appears not to be a statistical artifactMoreover among various hypotheses that have been suggested formutation-rate evolution the DBH appears to provide the most com-patible explanation for the 1000-fold range of variation of this traitacross the Tree of Life As noted above the molecular mechanisms thatgenerate and resolve base-substitution and indel mutations differ in anumber of ways and the rate of occurrence of these two types ofmutations differ by one to two orders of magnitude (with uid rangingfrom 18 to 119 of ubs presumably because of the elevated deleteriouseffects of indel mutations) Yet despite these differences both ubs anduid scale similarly with changes inNe (Figure 3 r2 = 089) Because theforces of mutation selection and drift apply to all biological traits themaximum achievable level of refinement for other fundamental cellulartraits may also be influenced by the drift barrier

ACKNOWLEDGMENTSSupport was provided by the Multidisciplinary University ResearchInitiative Award W911NF-09-1-0444 and from the US ArmyResearch Office to M L P Foster H Tang and S Finkel andW911NF-14-1-0411 to M L P Foster J McKinlay and J T Lennonby CAREER award DEB-0845851 from the National Science Foun-dation to V C and by National Institutes of Health AwardsF32-GM103164 to WS and R01-GM036827 to M L and W KThomas This material is based upon work supported by theNational Science Foundation under grant no CNS-0521433 CNS-0723054 and ABI-1062432 to Indiana UniversityAuthor contributions WS CF VC and ML designed theresearch WS MA MD and TP performed the research WSand MA analyzed the data and WS MA and ML wrote thepaper

LITERATURE CITEDBrandvain Y and S I Wright 2016 The limits of natural selection in a

nonequilibrium world Trends Genet 32 201ndash210Campbell C D and E E Eichler 2013 Properties and rates of germline

mutations in humans Trends Genet 29 575ndash584Charlesworth B 2009 Fundamental concepts in genetics effective popu-

lation size and patterns of molecular evolution and variation Nat RevGenet 10 195ndash205

Conrad D F J E Keebler M A DePristo S J Lindsay Y Zhang et al2011 Variation in genome-wide mutation rates within and betweenhuman families Nat Genet 43 712ndash714

Denver D R K Morris M Lynch and W K Thomas 2004 Highmutation rate and predominance of insertions in the Caenorhabditiselegans nuclear genome Nature 430 679ndash682

Denver D R S Feinberg S Estes W K Thomas and M Lynch 2005 Muta-tion rates spectra and hotspots in mismatch repair-deficient Caenorhabditiselegans Genetics 170 107ndash113

Denver D R P C Dolan L J Wilhelm W Sung J I Lucas-Lledo et al2009 A genome-wide view of Caenorhabditis elegans base-substitutionmutation processes Proc Natl Acad Sci USA 106 16310ndash16314

Figure 3 Relationship between the rate of indel events per site pergeneration (uid) and the base-substitution mutation rate per site pergeneration (ubs) Regression log10(uid) = ndash156(074) + 091(008)log10ubs (r2 = 090 P = 413 middot 1028 df = 13) SE measurementsare shown in parentheses Blue circles represent eubacteria red circlesmulticellular eukaryotes and black circles unicellular eukaryotes withall data summarized in Table 1

Volume 6 August 2016 | Drift Constrains Indel Rate | 2589

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 5: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

DISCUSSIONBecause the DBH makes general predictions about the pattern ofmolecular and cellular evolution across the Tree of Life because ourfocus is on one of the central determining factors in the evolutionaryprocess (themutation rate) andbecause the patterns appear so strong itis essential to consider the range of factors that might give rise to theobserved statistical relationships and also to alternative evolutionaryhypotheses for them We first consider three issues with respect toestimating the key parametersNe ubs uid andGe and then elaborate onthe significance and implications of the relationship between uidGe andNe for our understanding of molecular evolution

First we address the estimation ofNe one of themost difficult issuesin empirical population genetics Because populations fluctuate in den-sity over time any estimate of Ne must reflect a long-term averagepresumably approximating a harmonic mean not the immediate pop-ulation state Because evolution is a long-term process however themean is most relevant to the issues being examined herein Recentselective sweeps or population bottlenecks can transiently modify levelsof genetic variation at individual loci (Charlesworth 2009 Karasov et al2010) introducing noise into any estimates of Ne derived from limitednumbers of genetic loci but this would reduce the strength of any trueunderlying correlation between the rate of mutation (uidGe) and long-term Ne ie would operate against our ability to detect the expectedsignal of the DBH

Such effects are especially likely in asexual species where thepossibility of reduced recombination might subject many neutral nu-cleotide sites to the effects of selection on nearby linked sites Thusto minimize sampling error wherever possible we have relied upongenome-wide sampling of the number of segregating sites to obtain alow-variance estimator of Neu from observations on silent sites(Watterson 1975) The utilization of an average us across a large num-ber of nucleotide sites and individual isolates reduces the effects ofevolutionary sampling variance associated with chromosomally local-ized and population-specific sweeps arising within individual species(Fu and Li 1993) Using available genomic data we calculated us acrossa large number of within-species genotypic isolates excluding nearlyidentical lab strains that originated from the same individual (seeMaterials and Methods) Although no estimates of silent-site diversity(the source of Ne estimates) are without error estimates derived fromsegregating polymorphic sites across large-scale genomic data sets ap-pear quite robust (Figure S2) Moreover should the levels of variation

sampled in our various study species reflect recent events to whichmutation-rate evolution has not had adequate time to respond(Brandvain and Wright 2016) this would only introduce noise intothe relationship between effective population size and mutation rates

Second as we have noted earlier there is some concern thatcorrelations between estimates of mutation rates and Ne could in partbe spurious artifacts resulting from the use of estimates of Ne obtainedby dividing measures of standing variation at silent-sites by ubs (Sunget al 2012a) If the sampling variance of ubs is substantial enough thiscould lead to a negative correlation between the observed ubs andextrapolated Ne estimates and if there were a sampling covariancebetween ubs and uid this could carry over into the current study Inthe Supplemental Material (File S1 Figure S4 Figure S5 Figure S6 andFigure S7) we provide complementary analyses to that in Sung et al(2012a) indicating that the sampling variance of ubs from WGS-MAstudies is not large enough to explain the negative correlation previ-ously seen between ubs and Ne estimates Because ubs and uid are mea-sured by different methods the sampling covariance between these twomeasures is expected to be negligible We emphasize that it is thesampling variance not the evolutionary variance that is of concernhere The variance of the log-scaled values of ubs would have to exceedthe log-scaled values of Ne by two orders of magnitude in order tocreate the negative correlations that we observe (File S1) As an extremeway of looking at the situation if silent-site variation were constantacross all taxa and the parametric values of mutation rates andNewereobtained without error the only explanation for the data would be atrue underlying negative evolutionary covariance between the two fea-tures In fact there is a marginal negative correlation between estimatesof ps and ubs (Figure S3 Figure S4 Figure S5 Figure S6 Figure S7 andTable S2) further bolstering the idea that ubs and uid decline evolution-arily as Ne increases

Third the DBH proposes that the strength of selection operating toreduce the indel mutation rate is based upon the total indel deleteriousmutational load ie the product of themutational rate of appearance ofindels at individual nucleotide sites (uid) and the number of sites underselective constraint in the genome (Ge approximated by the proteomesize of the organism) However some noncodingDNA (eg noncodingfunctional RNAs and cis-regulatory units in untranslated regions orintrons) is certainly under selective constraint with mutations at thesesites increasing the deleterious mutational load Thus it can be arguedthat the estimated number of nucleotides affecting fitness (Ge) scales

Figure 1 Relationship betweenthe rate of indel events pergeneration per effective ge-nome (uidGe) and effective pop-ulation size (Ne) (A) Regressionlog10(uidGe) = 223(048) ndash 073(007)log10Ne (r2 = 089 P = 681middot 1028 df = 13) with SE ofparameter estimates shown inparentheses Blue circles repre-sent bacteria red circles multi-cellular eukaryotes and blackcircles unicellular eukaryoteswith all data summarized inTable 1 The full list of indelevents for analyzed organismsis presented in Dataset S4

Chromosomal distributions of indel events at each site across all mutation-accumulation experiments are shown in Figure S1 A and B (B)Relationship when adding the number of estimated noncoding sites under purifying selection into the effective genome size (Gc + Gnc) foreukaryotic organisms Regression log10[uid(Gc + Gnc)] = 349(066) ndash 087(009)log10Ne (r2 = 087 P = 313 middot 1027 df = 13)

Volume 6 August 2016 | Drift Constrains Indel Rate | 2587

differently than the protein-coding region of the genome particularlyin larger eukaryotic genomes with a considerable number of noncodingsites (Halligan et al 2004 Siepel et al 2005 Halligan and Keightley2006) Difficulties can arise when estimating the proportion of non-coding DNA that is under selective constraint (Gnc) as the estimatednumber of such sites can vary greatly depending on the model used todefine noncoding DNA and the identification of conserved noncodingDNA is highly sensitive to the available phylogeny (Siepel et al 2005)Nevertheless if we sum the estimated total amount of noncoding DNAunder selective constraint (Gnc see File S1) with that of coding DNA(Gc) we find that uid(Gc +Gnc) andNe remain highly correlated (Figure1B r2 = 087) simply because the fraction of functional noncodingDNA increases with the total amount of coding DNA

We currently adhere to the DBH as an explanation for the phylo-genetic pattern of mutation-rate variation primarily because it has beendifficult to reconcile the patterns with alternative hypotheses In theintroduction we provided arguments as to why selection for replicationspeed appears to be unlikely to explain a negative correlation betweenmutation rates and population size in unicellular species and inmulticellular species the simultaneous deployment of hundreds tothousands of origins of replication makes such an explanation evenmore unlikely Nor does a general constraint on replication fidelityexplain the data

A second potential explanation for variation in the per-generationmutation rate is that it is driven largely by variation in numbers ofgermline cell divisions (Ness et al 2012) but this cannot be reconciledwith the fact that the base-substitution mutation rate scales negativelywith Ne in analyses entirely restricted to unicellular species (Sung et al2012a) In all such species there is one cell division per generation andyet the base-substitution mutation rate per site per cell division rangesfrom 10211 in Paramecium tetraurelia (Sung et al 2012b) to1028 inM florum (Sung et al 2012a) Similarly the number of indelmutational events per site per cell division differs by over two orders ofmagnitude across unicellular organisms (Table 1 and Figure 3) and thenegative regression with Ne remains significant when confined to uni-cellular species (Figure 1 r2 = 066 P = 0003)

A third hypothesis for mutation-rate evolution is that selection iseffectiveenoughtoreduce theerror rate tothepointatwhichthephysicallaws of thermodynamics take over (Kimura 1967) However it is dif-ficult to reconcile this argument with the data now showing that mu-tation rates vary by three orders of magnitude as there are no knownmechanisms by which basic biophysical features (such as diffusioncoefficients and stochastic molecularmotion) would vary by this degreeamong the cytoplasms of different taxa There is of course the issue ofevolved differences in the biochemical features and efficiency of oper-ation of the proteins involved in replication and repair However thistype of variation is in the explanatory domain of the DBH The DBHpostulates that replication fidelity is typically not at the maximumpossible level of refinement but just the lowest level possible underthe prevailing level of random genetic drift which varies substantiallyamong lineages

That a decline in replication fidelity should decline with decreasingeffective population size appears to be a unique prediction of the DBHAlthough other theoretical work has been done on mutation-rateevolution in no case is this type of scaling obviously predicted(acknowledging that this has not been a central focus of such work)For example allowing for a role of beneficial mutations Kimura (1967)and Leigh (1970) suggested that the long-term rate of adaptation ismaximized when the genome-wide mutation rate equals the rate ofpopulation fixation of beneficial mutations The precise predictionsof this hypothesis are not entirely clear but because mutations ariseat a higher rate in large populations and if beneficial fix with higherprobabilities a positive association between the mutation rate and Ne

seems to be implied A rather different model argues that populationsshould evolve genome-wide mutation rates equal to the average effectof a deleterious mutation (Orr 2000 Johnson and Barton 2002) whichseems to imply an optimal mutation rate independent of populationsize (unless one wishes to postulate an association between averagemutational effect and Ne for which we are unaware of any evidence)

The DBH proposes that new alleles that reduce the genome-wideindel mutation rate (ie anti-mutators) can be promoted by selectiononly if they provide a significant enough advantage to offset the power

Figure 2 Relationship between indel events per site per generation (uidGe) and effective population size (Ne) after phylogenetic correction (A)Standardized phylogenetically independent contrasts performed using Compare (Martins 2004) and the PDAP module in Mesquite (Garland et al1993) with branch lengths of 10 The regression equation of the contrasts through the origin is uidGe = ndash060(007)Ne (r2 = 083 P = 128 middot 1026df = 13) with SE in parentheses (B) Phylogenetic tree showing the relationship between organisms

2588 | W Sung et al

of genetic drift The average selective effect of an antimutator or muta-tor allele (which operate opposite to each other) can be approximatedby stΔUid with ΔUid representing the change in the genome-wide indelmutation rate with respect to the population mean rate s being theaverage reduction in fitness per mutation (Lynch 2010) and t being thenumber of generations a mutation remains associated with its mutatorgenetic background (Lynch 2011) ΔUid can be approximated by thechange in the indel mutation rate over the effective genome or ΔuidGe

(Lynch 2011) By setting stΔuidGe equal to the power of random geneticdrift [1Ne for haploids 1(2Ne) for diploids] we can acquire somesense of the average reduction in the indelmutation rate that is requiredfor the power of selection to exceed power of genetic drift Usingestimates of an average value of the selective coefficient (s = 001)(Lynch et al 1999 Eyre-Walker and Keightley 2007) and assumingthat free recombination unlinks mutation-rate modifier alleles fromtheir background every 2 generations in sexually outcrossing species(t = 2) (Lynch 2010) solving stΔuidGe = 1Ne [= 1(2Ne) for dip-loids] for Δuid suggests that the average antimutator must reduce theindel mutation rate by greater than01ndash1 in most organisms (TableS1) in order to be promoted by selection One major limitation of thiskind of analysis is that values of s and t are not well known and arelikely vary across organisms A second and equally important caveat isthat the prior analysis assumes that mutator and antimutator allelesarise with equal frequency Owing to the high level of refinement of thereplication and repair machinery it seems much more likely that mu-tations involving the components of such machinery will increaserather than decrease the mutation rate This will push the equilibriummutation rate to higher levels than expected (Lynch 2008) althoughwithout quantitative information on such bias it is difficult to deter-mine the exact position at which the mutation rate will stall

Finallywenote thatbecause recombinationunlinksalleles fromtheirgenetic background the capacity of selection to enhance replicationfidelity is ultimately a function of the recombination rate (Kimura 1967Lynch 2008) Thus it may be viewed as surprising that bacteria whichdo not undergo meiotic recombination exhibit a relationship between

uid and Ne similar to that in eukaryotic species engaging in periodic toregular meiosis (Figure 1 A and B) It should be noted however thatbacterial recombination occurs through multiple mechanisms (trans-formation conjugation andor transduction) Many bacterial speciesare known to naturally undergo high rates of recombination with ratiosof recombination to mutation rates frequently being comparable tothose in multicellular eukaryotes (Feil and Spratt 2001 Lynch 2007Doroghazi et al 2014 Lassalle et al 2015) so in this sense comparablebehavior of bacterial and eukaryotic species is not unexpected

In summary as in our previous work on the base-substitutionmutation rate (Sung et al 2012a) the strong correlation between thegenome-wide indel rate and Ne appears not to be a statistical artifactMoreover among various hypotheses that have been suggested formutation-rate evolution the DBH appears to provide the most com-patible explanation for the 1000-fold range of variation of this traitacross the Tree of Life As noted above the molecular mechanisms thatgenerate and resolve base-substitution and indel mutations differ in anumber of ways and the rate of occurrence of these two types ofmutations differ by one to two orders of magnitude (with uid rangingfrom 18 to 119 of ubs presumably because of the elevated deleteriouseffects of indel mutations) Yet despite these differences both ubs anduid scale similarly with changes inNe (Figure 3 r2 = 089) Because theforces of mutation selection and drift apply to all biological traits themaximum achievable level of refinement for other fundamental cellulartraits may also be influenced by the drift barrier

ACKNOWLEDGMENTSSupport was provided by the Multidisciplinary University ResearchInitiative Award W911NF-09-1-0444 and from the US ArmyResearch Office to M L P Foster H Tang and S Finkel andW911NF-14-1-0411 to M L P Foster J McKinlay and J T Lennonby CAREER award DEB-0845851 from the National Science Foun-dation to V C and by National Institutes of Health AwardsF32-GM103164 to WS and R01-GM036827 to M L and W KThomas This material is based upon work supported by theNational Science Foundation under grant no CNS-0521433 CNS-0723054 and ABI-1062432 to Indiana UniversityAuthor contributions WS CF VC and ML designed theresearch WS MA MD and TP performed the research WSand MA analyzed the data and WS MA and ML wrote thepaper

LITERATURE CITEDBrandvain Y and S I Wright 2016 The limits of natural selection in a

nonequilibrium world Trends Genet 32 201ndash210Campbell C D and E E Eichler 2013 Properties and rates of germline

mutations in humans Trends Genet 29 575ndash584Charlesworth B 2009 Fundamental concepts in genetics effective popu-

lation size and patterns of molecular evolution and variation Nat RevGenet 10 195ndash205

Conrad D F J E Keebler M A DePristo S J Lindsay Y Zhang et al2011 Variation in genome-wide mutation rates within and betweenhuman families Nat Genet 43 712ndash714

Denver D R K Morris M Lynch and W K Thomas 2004 Highmutation rate and predominance of insertions in the Caenorhabditiselegans nuclear genome Nature 430 679ndash682

Denver D R S Feinberg S Estes W K Thomas and M Lynch 2005 Muta-tion rates spectra and hotspots in mismatch repair-deficient Caenorhabditiselegans Genetics 170 107ndash113

Denver D R P C Dolan L J Wilhelm W Sung J I Lucas-Lledo et al2009 A genome-wide view of Caenorhabditis elegans base-substitutionmutation processes Proc Natl Acad Sci USA 106 16310ndash16314

Figure 3 Relationship between the rate of indel events per site pergeneration (uid) and the base-substitution mutation rate per site pergeneration (ubs) Regression log10(uid) = ndash156(074) + 091(008)log10ubs (r2 = 090 P = 413 middot 1028 df = 13) SE measurementsare shown in parentheses Blue circles represent eubacteria red circlesmulticellular eukaryotes and black circles unicellular eukaryotes withall data summarized in Table 1

Volume 6 August 2016 | Drift Constrains Indel Rate | 2589

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 6: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

differently than the protein-coding region of the genome particularlyin larger eukaryotic genomes with a considerable number of noncodingsites (Halligan et al 2004 Siepel et al 2005 Halligan and Keightley2006) Difficulties can arise when estimating the proportion of non-coding DNA that is under selective constraint (Gnc) as the estimatednumber of such sites can vary greatly depending on the model used todefine noncoding DNA and the identification of conserved noncodingDNA is highly sensitive to the available phylogeny (Siepel et al 2005)Nevertheless if we sum the estimated total amount of noncoding DNAunder selective constraint (Gnc see File S1) with that of coding DNA(Gc) we find that uid(Gc +Gnc) andNe remain highly correlated (Figure1B r2 = 087) simply because the fraction of functional noncodingDNA increases with the total amount of coding DNA

We currently adhere to the DBH as an explanation for the phylo-genetic pattern of mutation-rate variation primarily because it has beendifficult to reconcile the patterns with alternative hypotheses In theintroduction we provided arguments as to why selection for replicationspeed appears to be unlikely to explain a negative correlation betweenmutation rates and population size in unicellular species and inmulticellular species the simultaneous deployment of hundreds tothousands of origins of replication makes such an explanation evenmore unlikely Nor does a general constraint on replication fidelityexplain the data

A second potential explanation for variation in the per-generationmutation rate is that it is driven largely by variation in numbers ofgermline cell divisions (Ness et al 2012) but this cannot be reconciledwith the fact that the base-substitution mutation rate scales negativelywith Ne in analyses entirely restricted to unicellular species (Sung et al2012a) In all such species there is one cell division per generation andyet the base-substitution mutation rate per site per cell division rangesfrom 10211 in Paramecium tetraurelia (Sung et al 2012b) to1028 inM florum (Sung et al 2012a) Similarly the number of indelmutational events per site per cell division differs by over two orders ofmagnitude across unicellular organisms (Table 1 and Figure 3) and thenegative regression with Ne remains significant when confined to uni-cellular species (Figure 1 r2 = 066 P = 0003)

A third hypothesis for mutation-rate evolution is that selection iseffectiveenoughtoreduce theerror rate tothepointatwhichthephysicallaws of thermodynamics take over (Kimura 1967) However it is dif-ficult to reconcile this argument with the data now showing that mu-tation rates vary by three orders of magnitude as there are no knownmechanisms by which basic biophysical features (such as diffusioncoefficients and stochastic molecularmotion) would vary by this degreeamong the cytoplasms of different taxa There is of course the issue ofevolved differences in the biochemical features and efficiency of oper-ation of the proteins involved in replication and repair However thistype of variation is in the explanatory domain of the DBH The DBHpostulates that replication fidelity is typically not at the maximumpossible level of refinement but just the lowest level possible underthe prevailing level of random genetic drift which varies substantiallyamong lineages

That a decline in replication fidelity should decline with decreasingeffective population size appears to be a unique prediction of the DBHAlthough other theoretical work has been done on mutation-rateevolution in no case is this type of scaling obviously predicted(acknowledging that this has not been a central focus of such work)For example allowing for a role of beneficial mutations Kimura (1967)and Leigh (1970) suggested that the long-term rate of adaptation ismaximized when the genome-wide mutation rate equals the rate ofpopulation fixation of beneficial mutations The precise predictionsof this hypothesis are not entirely clear but because mutations ariseat a higher rate in large populations and if beneficial fix with higherprobabilities a positive association between the mutation rate and Ne

seems to be implied A rather different model argues that populationsshould evolve genome-wide mutation rates equal to the average effectof a deleterious mutation (Orr 2000 Johnson and Barton 2002) whichseems to imply an optimal mutation rate independent of populationsize (unless one wishes to postulate an association between averagemutational effect and Ne for which we are unaware of any evidence)

The DBH proposes that new alleles that reduce the genome-wideindel mutation rate (ie anti-mutators) can be promoted by selectiononly if they provide a significant enough advantage to offset the power

Figure 2 Relationship between indel events per site per generation (uidGe) and effective population size (Ne) after phylogenetic correction (A)Standardized phylogenetically independent contrasts performed using Compare (Martins 2004) and the PDAP module in Mesquite (Garland et al1993) with branch lengths of 10 The regression equation of the contrasts through the origin is uidGe = ndash060(007)Ne (r2 = 083 P = 128 middot 1026df = 13) with SE in parentheses (B) Phylogenetic tree showing the relationship between organisms

2588 | W Sung et al

of genetic drift The average selective effect of an antimutator or muta-tor allele (which operate opposite to each other) can be approximatedby stΔUid with ΔUid representing the change in the genome-wide indelmutation rate with respect to the population mean rate s being theaverage reduction in fitness per mutation (Lynch 2010) and t being thenumber of generations a mutation remains associated with its mutatorgenetic background (Lynch 2011) ΔUid can be approximated by thechange in the indel mutation rate over the effective genome or ΔuidGe

(Lynch 2011) By setting stΔuidGe equal to the power of random geneticdrift [1Ne for haploids 1(2Ne) for diploids] we can acquire somesense of the average reduction in the indelmutation rate that is requiredfor the power of selection to exceed power of genetic drift Usingestimates of an average value of the selective coefficient (s = 001)(Lynch et al 1999 Eyre-Walker and Keightley 2007) and assumingthat free recombination unlinks mutation-rate modifier alleles fromtheir background every 2 generations in sexually outcrossing species(t = 2) (Lynch 2010) solving stΔuidGe = 1Ne [= 1(2Ne) for dip-loids] for Δuid suggests that the average antimutator must reduce theindel mutation rate by greater than01ndash1 in most organisms (TableS1) in order to be promoted by selection One major limitation of thiskind of analysis is that values of s and t are not well known and arelikely vary across organisms A second and equally important caveat isthat the prior analysis assumes that mutator and antimutator allelesarise with equal frequency Owing to the high level of refinement of thereplication and repair machinery it seems much more likely that mu-tations involving the components of such machinery will increaserather than decrease the mutation rate This will push the equilibriummutation rate to higher levels than expected (Lynch 2008) althoughwithout quantitative information on such bias it is difficult to deter-mine the exact position at which the mutation rate will stall

Finallywenote thatbecause recombinationunlinksalleles fromtheirgenetic background the capacity of selection to enhance replicationfidelity is ultimately a function of the recombination rate (Kimura 1967Lynch 2008) Thus it may be viewed as surprising that bacteria whichdo not undergo meiotic recombination exhibit a relationship between

uid and Ne similar to that in eukaryotic species engaging in periodic toregular meiosis (Figure 1 A and B) It should be noted however thatbacterial recombination occurs through multiple mechanisms (trans-formation conjugation andor transduction) Many bacterial speciesare known to naturally undergo high rates of recombination with ratiosof recombination to mutation rates frequently being comparable tothose in multicellular eukaryotes (Feil and Spratt 2001 Lynch 2007Doroghazi et al 2014 Lassalle et al 2015) so in this sense comparablebehavior of bacterial and eukaryotic species is not unexpected

In summary as in our previous work on the base-substitutionmutation rate (Sung et al 2012a) the strong correlation between thegenome-wide indel rate and Ne appears not to be a statistical artifactMoreover among various hypotheses that have been suggested formutation-rate evolution the DBH appears to provide the most com-patible explanation for the 1000-fold range of variation of this traitacross the Tree of Life As noted above the molecular mechanisms thatgenerate and resolve base-substitution and indel mutations differ in anumber of ways and the rate of occurrence of these two types ofmutations differ by one to two orders of magnitude (with uid rangingfrom 18 to 119 of ubs presumably because of the elevated deleteriouseffects of indel mutations) Yet despite these differences both ubs anduid scale similarly with changes inNe (Figure 3 r2 = 089) Because theforces of mutation selection and drift apply to all biological traits themaximum achievable level of refinement for other fundamental cellulartraits may also be influenced by the drift barrier

ACKNOWLEDGMENTSSupport was provided by the Multidisciplinary University ResearchInitiative Award W911NF-09-1-0444 and from the US ArmyResearch Office to M L P Foster H Tang and S Finkel andW911NF-14-1-0411 to M L P Foster J McKinlay and J T Lennonby CAREER award DEB-0845851 from the National Science Foun-dation to V C and by National Institutes of Health AwardsF32-GM103164 to WS and R01-GM036827 to M L and W KThomas This material is based upon work supported by theNational Science Foundation under grant no CNS-0521433 CNS-0723054 and ABI-1062432 to Indiana UniversityAuthor contributions WS CF VC and ML designed theresearch WS MA MD and TP performed the research WSand MA analyzed the data and WS MA and ML wrote thepaper

LITERATURE CITEDBrandvain Y and S I Wright 2016 The limits of natural selection in a

nonequilibrium world Trends Genet 32 201ndash210Campbell C D and E E Eichler 2013 Properties and rates of germline

mutations in humans Trends Genet 29 575ndash584Charlesworth B 2009 Fundamental concepts in genetics effective popu-

lation size and patterns of molecular evolution and variation Nat RevGenet 10 195ndash205

Conrad D F J E Keebler M A DePristo S J Lindsay Y Zhang et al2011 Variation in genome-wide mutation rates within and betweenhuman families Nat Genet 43 712ndash714

Denver D R K Morris M Lynch and W K Thomas 2004 Highmutation rate and predominance of insertions in the Caenorhabditiselegans nuclear genome Nature 430 679ndash682

Denver D R S Feinberg S Estes W K Thomas and M Lynch 2005 Muta-tion rates spectra and hotspots in mismatch repair-deficient Caenorhabditiselegans Genetics 170 107ndash113

Denver D R P C Dolan L J Wilhelm W Sung J I Lucas-Lledo et al2009 A genome-wide view of Caenorhabditis elegans base-substitutionmutation processes Proc Natl Acad Sci USA 106 16310ndash16314

Figure 3 Relationship between the rate of indel events per site pergeneration (uid) and the base-substitution mutation rate per site pergeneration (ubs) Regression log10(uid) = ndash156(074) + 091(008)log10ubs (r2 = 090 P = 413 middot 1028 df = 13) SE measurementsare shown in parentheses Blue circles represent eubacteria red circlesmulticellular eukaryotes and black circles unicellular eukaryotes withall data summarized in Table 1

Volume 6 August 2016 | Drift Constrains Indel Rate | 2589

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 7: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

of genetic drift The average selective effect of an antimutator or muta-tor allele (which operate opposite to each other) can be approximatedby stΔUid with ΔUid representing the change in the genome-wide indelmutation rate with respect to the population mean rate s being theaverage reduction in fitness per mutation (Lynch 2010) and t being thenumber of generations a mutation remains associated with its mutatorgenetic background (Lynch 2011) ΔUid can be approximated by thechange in the indel mutation rate over the effective genome or ΔuidGe

(Lynch 2011) By setting stΔuidGe equal to the power of random geneticdrift [1Ne for haploids 1(2Ne) for diploids] we can acquire somesense of the average reduction in the indelmutation rate that is requiredfor the power of selection to exceed power of genetic drift Usingestimates of an average value of the selective coefficient (s = 001)(Lynch et al 1999 Eyre-Walker and Keightley 2007) and assumingthat free recombination unlinks mutation-rate modifier alleles fromtheir background every 2 generations in sexually outcrossing species(t = 2) (Lynch 2010) solving stΔuidGe = 1Ne [= 1(2Ne) for dip-loids] for Δuid suggests that the average antimutator must reduce theindel mutation rate by greater than01ndash1 in most organisms (TableS1) in order to be promoted by selection One major limitation of thiskind of analysis is that values of s and t are not well known and arelikely vary across organisms A second and equally important caveat isthat the prior analysis assumes that mutator and antimutator allelesarise with equal frequency Owing to the high level of refinement of thereplication and repair machinery it seems much more likely that mu-tations involving the components of such machinery will increaserather than decrease the mutation rate This will push the equilibriummutation rate to higher levels than expected (Lynch 2008) althoughwithout quantitative information on such bias it is difficult to deter-mine the exact position at which the mutation rate will stall

Finallywenote thatbecause recombinationunlinksalleles fromtheirgenetic background the capacity of selection to enhance replicationfidelity is ultimately a function of the recombination rate (Kimura 1967Lynch 2008) Thus it may be viewed as surprising that bacteria whichdo not undergo meiotic recombination exhibit a relationship between

uid and Ne similar to that in eukaryotic species engaging in periodic toregular meiosis (Figure 1 A and B) It should be noted however thatbacterial recombination occurs through multiple mechanisms (trans-formation conjugation andor transduction) Many bacterial speciesare known to naturally undergo high rates of recombination with ratiosof recombination to mutation rates frequently being comparable tothose in multicellular eukaryotes (Feil and Spratt 2001 Lynch 2007Doroghazi et al 2014 Lassalle et al 2015) so in this sense comparablebehavior of bacterial and eukaryotic species is not unexpected

In summary as in our previous work on the base-substitutionmutation rate (Sung et al 2012a) the strong correlation between thegenome-wide indel rate and Ne appears not to be a statistical artifactMoreover among various hypotheses that have been suggested formutation-rate evolution the DBH appears to provide the most com-patible explanation for the 1000-fold range of variation of this traitacross the Tree of Life As noted above the molecular mechanisms thatgenerate and resolve base-substitution and indel mutations differ in anumber of ways and the rate of occurrence of these two types ofmutations differ by one to two orders of magnitude (with uid rangingfrom 18 to 119 of ubs presumably because of the elevated deleteriouseffects of indel mutations) Yet despite these differences both ubs anduid scale similarly with changes inNe (Figure 3 r2 = 089) Because theforces of mutation selection and drift apply to all biological traits themaximum achievable level of refinement for other fundamental cellulartraits may also be influenced by the drift barrier

ACKNOWLEDGMENTSSupport was provided by the Multidisciplinary University ResearchInitiative Award W911NF-09-1-0444 and from the US ArmyResearch Office to M L P Foster H Tang and S Finkel andW911NF-14-1-0411 to M L P Foster J McKinlay and J T Lennonby CAREER award DEB-0845851 from the National Science Foun-dation to V C and by National Institutes of Health AwardsF32-GM103164 to WS and R01-GM036827 to M L and W KThomas This material is based upon work supported by theNational Science Foundation under grant no CNS-0521433 CNS-0723054 and ABI-1062432 to Indiana UniversityAuthor contributions WS CF VC and ML designed theresearch WS MA MD and TP performed the research WSand MA analyzed the data and WS MA and ML wrote thepaper

LITERATURE CITEDBrandvain Y and S I Wright 2016 The limits of natural selection in a

nonequilibrium world Trends Genet 32 201ndash210Campbell C D and E E Eichler 2013 Properties and rates of germline

mutations in humans Trends Genet 29 575ndash584Charlesworth B 2009 Fundamental concepts in genetics effective popu-

lation size and patterns of molecular evolution and variation Nat RevGenet 10 195ndash205

Conrad D F J E Keebler M A DePristo S J Lindsay Y Zhang et al2011 Variation in genome-wide mutation rates within and betweenhuman families Nat Genet 43 712ndash714

Denver D R K Morris M Lynch and W K Thomas 2004 Highmutation rate and predominance of insertions in the Caenorhabditiselegans nuclear genome Nature 430 679ndash682

Denver D R S Feinberg S Estes W K Thomas and M Lynch 2005 Muta-tion rates spectra and hotspots in mismatch repair-deficient Caenorhabditiselegans Genetics 170 107ndash113

Denver D R P C Dolan L J Wilhelm W Sung J I Lucas-Lledo et al2009 A genome-wide view of Caenorhabditis elegans base-substitutionmutation processes Proc Natl Acad Sci USA 106 16310ndash16314

Figure 3 Relationship between the rate of indel events per site pergeneration (uid) and the base-substitution mutation rate per site pergeneration (ubs) Regression log10(uid) = ndash156(074) + 091(008)log10ubs (r2 = 090 P = 413 middot 1028 df = 13) SE measurementsare shown in parentheses Blue circles represent eubacteria red circlesmulticellular eukaryotes and black circles unicellular eukaryotes withall data summarized in Table 1

Volume 6 August 2016 | Drift Constrains Indel Rate | 2589

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 8: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

Doroghazi J R and D H Buckley 2014 Intraspecies comparison ofStreptomyces pratensis genomes reveals high levels of recombination andgene conservation between strains of disparate geographic origin BMCGenomics 15 970

Drake J W 1991 A constant rate of spontaneous mutation in DNA-basedmicrobes Proc Natl Acad Sci USA 88 7160ndash7164

Drake J W B Charlesworth D Charlesworth and J F Crow 1998 Ratesof spontaneous mutation Genetics 148 1667ndash1686

Eyre-Walker A and P D Keightley 2007 The distribution of fitnesseffects of new mutations Nat Rev Genet 8 610ndash618

Feil E J and B G Spratt 2011 Recombination and the populationstructures of bacterial pathogens Annu Rev Microbiol 55 561ndash590

Fu Y X 1995 Statistical properties of segregating sites Theor Popul Biol48 172ndash197

Fu Y X and W H Li 1993 Statistical tests of neutrality of mutationsGenetics 133 693ndash709

Garland T A W Dickerman C M Janis and J A Jones 1993 Phyloge-netic analysis of covariance by computer-simulation Syst Biol 42 265ndash292

Halligan D L and P D Keightley 2006 Ubiquitous selective constraintsin the Drosophila genome revealed by a genome-wide interspeciescomparison Genome Res 16 875ndash884

Halligan D L A Eyre-Walker P Andolfatto and P D Keightley2004 Patterns of evolutionary constraints in intronic and intergenicDNA of Drosophila Genome Res 14 273ndash279

Haudry A A E Platts E Vello D R Hoen M Leclercq et al 2013 Anatlas of over 90000 conserved noncoding sequences provides insight intocrucifer regulatory regions Nat Genet 45 891ndash898

Johnson T and N H Barton 2002 The effect of deleterious alleles onadaptation in asexual populations Genetics 162 395ndash411

Karasov T P W Messer and D A Petrov 2010 Evidence that adaptationin Drosophila is not limited by mutation at single sites PLoS Genet 6e1000924

Kibota T T and M Lynch 1996 Estimate of the genomic mutation ratedeleterious to overall fitness in E coli Nature 381 694ndash696

Kimura M 1967 On the evolutionary adjustment of spontaneous muta-tion rates Genet Res 9 23ndash24

Kimura M 1983 The Neutral Theory of Molecular Evolution CambridgeUniversity Press Cambridge UK

Kong A M L Frigge G Masson S Besenbacher P Sulem et al2012 Rate of de novo mutations and the importance of fatherrsquos age todisease risk Nature 488 471ndash475

Krokan H E and M Bjoras 2013 Base excision repair Cold Spring HarbPerspect Biol 5 a012583

Kunkel T A 2009 Evolving views of DNA replication (in)fidelity ColdSpring Harb Symp Quant Biol 74 91ndash101

Lassalle F S Perian T Bataillon X Nesme L Duret et al 2015 GC-Content evolution in bacterial genomes the biased gene conversionhypothesis expands PLoS Genet 11 e1004941

Lee H E Popodi H Tang and P L Foster 2012 Rate and molecularspectrum of spontaneous mutations in the bacterium Escherichia coli asdetermined by whole-genome sequencing Proc Natl Acad Sci USA109 e2774ndashe2783

Leigh E G Jr 1970 Natural selection and mutability Am Nat 104301ndash305

Li H and R Durbin 2009 Fast and accurate short read alignment withBurrows-Wheeler transform Bioinformatics 25 1754ndash1760

Li H B Handsaker A Wysoker T Fennell J Ruan et al 2009 Thesequence alignmentmap format and SAMtools Bioinformatics 252078ndash2079

Lipinski K J J C Farslow K A Fitzpatrick M Lynch V Katju et al2011 High spontaneous rate of gene duplication in Caenorhabditiselegans Curr Biol 21 306ndash310

Loh E J J Salk and L A Loeb 2010 Optimization of DNA polymerasemutation rates during bacterial evolution Proc Natl Acad Sci USA 1071154ndash1159

Lynch M 2007 The Origins of Genome Architecture Sinauer AssociatesSunderland Massachusetts

Lynch M 2008 The cellular developmental and population-genetic de-terminants of mutation-rate evolution Genetics 180 933ndash943

Lynch M 2010 Evolution of the mutation rate Trends Genet 26345ndash352

Lynch M 2011 The lower bound to the evolution of mutation ratesGenome Biol Evol 3 1107ndash1118

Lynch M and G K Marinov 2015 The bioenergetic costs of a geneProc Natl Acad Sci USA 112 15690ndash15695

Lynch M J Blanchard D Houle T Kibota S Schultz et al 1999 Sponta-neous deleterious mutation Evolution 53 645ndash663

Lynch M W Sung K Morris N Coffey C R Landry et al 2008 Agenome-wide view of the spectrum of spontaneous mutations in yeastProc Natl Acad Sci USA 105 9272ndash9277

Martins E P 2004 Compare Version 46b Computer Programs for theStatistical Analysis of Comparative Data Department of Biology IndianaUniversity Bloomington IN Available at httpcomparebioindianaedu

McCulloch S D and T A Kunkel 2008 The fidelity of DNA synthesis byeukaryotic replicative and translesion synthesis polymerases Cell Res 18148ndash161

Mira A H Ochman and N A Moran 2001 Deletional bias and theevolution of bacterial genomes Trends Genet 17 589ndash596

Morita R S Nakane A Shimada M Inoue H Iino et al 2010 Molecularmechanisms of the whole DNA repair system a comparison of bacterialand eukaryotic systems J Nucleic Acids 2010 179594

Ness R W A D Morgan N Colegrave and P D Keightley 2012 Estimateof the spontaneous mutation rate in Chlamydomonas reinhardtii Genetics192 1447ndash1454

Ness R W S A Kraemer N Colegrave and P D Keightley2015 Direct estimate of the spontaneous mutation rate uncovers theeffects of drift and recombination in the Chlamydomonas reinhardtiiplastid genome Mol Biol Evol 33 800ndash808

OrsquoRoak B J P Deriziotis C Lee L Vives J J Schwartz et al 2011 Exomesequencing in sporadic autism spectrum disorders identifies severe denovo mutations Nat Genet 43 585ndash589

OrsquoRoak B J L Vives S Girirajan E Karakoc N Krumm et al2012 Sporadic autism exomes reveal a highly interconnected proteinnetwork of de novo mutations Nature 485 246ndash250

Orr H A 2000 The rate of adaptation in asexuals Genetics 155 961ndash968Ossowski S K Schneeberger J I Lucas-Lledo N Warthmann R M Clark

et al 2010 The rate and molecular spectrum of spontaneous mutationsin Arabidopsis thaliana Science 327 92ndash94

Sancar A L A Lindsey-Boltz K Unsal-Kacmaz and S Linn 2004 Molecularmechanisms of mammalian DNA repair and the DNA damage checkpointsAnnu Rev Biochem 73 39ndash85

Schrider D R D Houle M Lynch and M W Hahn 2013 Rates andgenomic consequences of spontaneous mutational events in Drosophilamelanogaster Genetics 194 937ndash954

Siepel A G Bejerano J S Pedersen A S Hinrichs M Hou et al2005 Evolutionarily conserved elements in vertebrate insect worm andyeast genomes Genome Res 15 1034ndash1050

Sniegowski P and Y Raynes 2013 Mutation rates how low can you goCurr Biol 23 R147ndashR149

Sniegowski P D P J Gerrish T Johnson and A Shaver 2000 Theevolution of mutation rates separating causes from consequencesBioEssays 22 1057ndash1066

Sung W M S Ackerman S F Miller T G Doak and M Lynch2012a Drift-barrier hypothesis and mutation-rate evolution ProcNatl Acad Sci USA 109 18488ndash18492

Sung W A E Tucker T G Doak E Choi W K Thomas et al2012b Extraordinary genome stability in the ciliate Parameciumtetraurelia Proc Natl Acad Sci USA 109 19339ndash19344

Sung W M S Ackerman J F Gout S F Miller E Williams et al2015 Asymmetric context-dependent mutation patterns revealedthrough mutation-accumulation experiments Mol Biol Evol 321672ndash1683

2590 | W Sung et al

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591

Page 9: Evolution of the Insertion-Deletion Mutation Rate Across …lynchlab/PDF/Lynch244.pdf · Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life Way Sung,*,

Tajima F 1989 Statistical method for testing the neutral mutation hy-pothesis by DNA polymorphism Genetics 123 585ndash595

The 1000 Genomes Project Consortium 2015 A global reference for humangenetic variation Nature 526 68ndash74

Uchimura A M Higuchi Y Minakuchi M Ohno A Toyoda et al2015 Germline mutation rates and the long-term phenotypic effects ofmutation accumulation in wild-type laboratory mice and mutator miceGenome Res 25 1125ndash1134

Vieira-Silva S M Touchon and E P Rocha 2010 No evidence for ele-mental-based streamlining of prokaryotic genomes Trends Ecol Evol 25319ndash320 author reply 320ndash311

Wang H and X Zhu 2014 De novo mutations discovered in 8 MexicanAmerican families through whole genome sequencing BMC Proc 8 S24

Watterson G A 1975 On the number of segregating sites ingenetical models without recombination Theor Popul Biol 7256ndash276

Whitney K D and T Garland Jr 2010 Did genetic drift drive increasesin genome complexity PLoS Genet 6 e1001080

Yang S L Wang J Huang X Zhang Y Yuan et al 2015 Parent-progenysequencing indicates higher mutation rates in heterozygotes Nature 523463ndash467

Zhu Y O M L Siegal D W Hall and D A Petrov 2014 Precise esti-mates of mutation rate and spectrum in yeast Proc Natl Acad Sci USA111 e2310ndashe2318

Communicating editor S I Wright

Volume 6 August 2016 | Drift Constrains Indel Rate | 2591