germline selection shapes human mitochondrial dna diversity€¦ · r n r 2 r 1 germline selection...

14
RESEARCH ARTICLE SUMMARY MITOCHONDRIAL DNA Germline selection shapes human mitochondrial DNA diversity Wei Wei, Salih Tuna, Michael J. Keogh, Katherine R. Smith*, Timothy J. Aitman, Phil L. Beales, David L. Bennett, Daniel P. Gale, Maria A. K. Bitner-Glindzicz, Graeme C. Black, Paul Brennan, Perry Elliott, Frances A. Flinter, R. Andres Floto, Henry Houlden, Melita Irving, Ania Koziell, Eamonn R. Maher, Hugh S. Markus, Nicholas W. Morrell, William G. Newman, Irene Roberts, John A. Sayer, Kenneth G. C. Smith, Jenny C. Taylor, Hugh Watkins, Andrew R. Webster, Andrew O. M. Wilkie, Catherine Williamson, NIHR BioResourceRare Diseases, 100,000 Genomes ProjectRare Diseases Pilot, Sofie Ashford, Christopher J. Penkett, Kathleen E. Stirrups, Augusto Rendon*, Willem H. Ouwehand§, John R. Bradley§, F. Lucy Raymond§, Mark Caulfield*, Ernest Turro, Patrick F. ChinneryINTRODUCTION: Only 2.4% of the 16.5-kb mitochondrial DNA (mtDNA) genome shows homoplasmic variation at >1% frequency in humans. Migration patterns have contributed to geographic differences in the frequency of common genetic variants, but population ge- netic evidence indicates that selection shapes the evolving mtDNA phylogeny. The mecha- nism and timing of this process are not clear. Unlike the nuclear genome, mtDNA is mater- nally transmitted and there are many copies in each cell. Initially, a new genetic variant affects only a proportion of the mtDNA (heteroplasmy). During female germ cell development, a reduc- tion in the amount of mtDNA per cell causes a genetic bottleneck,which leads to rapid seg- regation of mtDNA molecules and different lev- els of heteroplasmy between siblings. Although heteroplasmy is primarily governed by random genetic drift, there is evidence of selection oc- curring during this process in animals. Yet it has been difficult to demonstrate this convinc- ingly in humans. RATIONALE: To determine whether there is selection for or against heteroplasmic mtDNA variants during transmission, we studied 12,975 whole-genome sequences, including 1526 mother offspring pairs of which 45.1% had heteroplasmy affecting >1% of mtDNA molecules. Harnessing both the mtDNA and nuclear ge- nome sequences, we then determined whether the nuclear genetic background influenced mtDNA heter- oplasmy, validating our findings in another 40,325 individuals. RESULTS: Previously unknown mtDNA var- iants were less likely to be inherited than known variants, in which the level of heteroplasmy tended to increase on transmission. Variants in the ribosomal RNA genes were less likely to be transmitted, whereas variants in the noncoding displacement (D)loop were more likely to be transmitted. MtDNA variants predicted to affect the protein sequence tended to have lower het- eroplasmy levels than synonymous variants. In 12,975 individuals, we identified a correlation be- tween the location of heteroplasmic sites and known D-loop polymorphisms, including the absence of variants in critical sites required for mtDNA transcription and replication. We defined 206 unrelated individuals for which the nuclear and mitochondri- al genomes were from different human populations. In these individuals, new population-specific heteroplasmies were more likely to match the nuclear genetic ancestry than the mitochondrial genome on which the mutations occurred. These findings were independently replicated in 654 additional unrelated individuals. CONCLUSION: The characteristics of mtDNA in the human population are shaped by selective forces acting on heter- oplasmy within the female germ line and are influenced by the nuclear genetic back- ground. The signature of selection can be seen over one generation, ensuring con- sistency between these two independent genetic systems. RESEARCH Wei et al., Science 364, 749 (2019) 24 May 2019 1 of 1 The list of author affiliations is available in the full article online. *These authors contributed equally to this work. Deceased. NIHR BioResourceRare Diseases and the 100,000 Genomes Project Rare Diseases Pilot collaborators and affiliations are listed in the supplementary materials. §These authors contributed equally to this work. ¶Corresponding author. Email: [email protected] (P.F.C.); [email protected] (E.T.) Cite this article as W. Wei et al., Science 364, eaau6520 (2019). DOI: 10.1126/science.aau6520 Many generations Mitochondrial transmission of a single variant Nuclear-mtDNA genome matched individuals Nuclear-mtDNA genome mismatched individuals Wild-type Low level heteroplasmy High level heteroplasmy Intermediate level heteroplasmy Many generations Heteroplasmic variants in different nuclear-mtDNA genomes Wild-type mtDNA Heteroplasmy not matching nuclear genome Heteroplasmy matching nuclear genome Mutant mtDNA D-loop mtDNA Gray bars: Frequency in population De novo, positive shift Lost, negative shift Transmitted, positive shift Transmitted, negative shift Circos plot: tRNA genes Protein coding genes Comparison of heteroplasmic variants in mothers and their offspring P o s i t i v e h et e r o p la s m i c s h ift s N e g a t i v e h e t e r o p l a s m i c s h ift s R N R 2 R N R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially, new mtDNA variants are heteroplasmic, with the proportion changing during maternal transmission. (Bottom left) Selection during the transmission of mtDNA heteroplasmy. Transmitted variants are more likely to have been seen before as homoplasmic polymorphisms. (Right) New haplogroup-specific variants are more likely to match the nuclear genetic ancestry than the mtDNA ancestry. ON OUR WEBSITE Read the full article at http://dx.doi. org/10.1126/ science.aau6520 .................................................. on July 8, 2020 http://science.sciencemag.org/ Downloaded from

Upload: others

Post on 25-Jun-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

RESEARCH ARTICLE SUMMARY◥

MITOCHONDRIAL DNA

Germline selection shapes humanmitochondrial DNA diversityWei Wei, Salih Tuna, Michael J. Keogh, Katherine R. Smith*, Timothy J. Aitman,Phil L. Beales, David L. Bennett, Daniel P. Gale, Maria A. K. Bitner-Glindzicz†,Graeme C. Black, Paul Brennan, Perry Elliott, Frances A. Flinter, R. Andres Floto,Henry Houlden, Melita Irving, Ania Koziell, Eamonn R. Maher, Hugh S. Markus,Nicholas W. Morrell, William G. Newman, Irene Roberts, John A. Sayer,Kenneth G. C. Smith, Jenny C. Taylor, Hugh Watkins, Andrew R. Webster,Andrew O. M. Wilkie, Catherine Williamson, NIHR BioResource–Rare Diseases‡,100,000 Genomes Project–Rare Diseases Pilot‡, Sofie Ashford,Christopher J. Penkett, Kathleen E. Stirrups, Augusto Rendon*,Willem H. Ouwehand§, John R. Bradley§, F. Lucy Raymond§, Mark Caulfield*,Ernest Turro¶, Patrick F. Chinnery¶

INTRODUCTION: Only 2.4% of the 16.5-kbmitochondrial DNA (mtDNA) genome showshomoplasmic variation at >1% frequency inhumans.Migration patterns have contributedto geographic differences in the frequency ofcommon genetic variants, but population ge-netic evidence indicates that selection shapesthe evolving mtDNA phylogeny. The mecha-nism and timing of this process are not clear.

Unlike the nuclear genome,mtDNA ismater-nally transmitted and there are many copies ineach cell. Initially, a new genetic variant affectsonly aproportion of themtDNA (heteroplasmy).During female germ cell development, a reduc-tion in the amount of mtDNA per cell causes a“genetic bottleneck,”which leads to rapid seg-regation ofmtDNAmolecules and different lev-els of heteroplasmy between siblings. Although

heteroplasmy is primarily governed by randomgenetic drift, there is evidence of selection oc-curring during this process in animals. Yet ithas been difficult to demonstrate this convinc-ingly in humans.

RATIONALE: To determine whether there isselection for or against heteroplasmicmtDNAvariants during transmission,we studied 12,975whole-genomesequences, including 1526mother–offspring pairs ofwhich 45.1%hadheteroplasmy

affecting >1% of mtDNAmolecules.HarnessingboththemtDNAandnuclearge-nome sequences, we thendetermined whether thenuclear genetic backgroundinfluenced mtDNA heter-

oplasmy, validating our findings in another40,325 individuals.

RESULTS: Previously unknown mtDNA var-iants were less likely to be inherited than knownvariants, in which the level of heteroplasmytended to increase on transmission. Variants inthe ribosomal RNA genes were less likely to betransmitted, whereas variants in the noncodingdisplacement (D)–loop were more likely to betransmitted. MtDNA variants predicted to affectthe protein sequence tended to have lower het-eroplasmy levels than synonymous variants. In12,975 individuals, we identified a correlation be-tween the location of heteroplasmic sites and

knownD-looppolymorphisms, including theabsence of variants in critical sites requiredfor mtDNA transcription and replication.We defined 206 unrelated individuals

for which the nuclear andmitochondri-al genomes were from different humanpopulations. In these individuals, newpopulation-specific heteroplasmies weremore likely to match the nuclear geneticancestry than the mitochondrial genomeonwhich the mutations occurred. Thesefindings were independently replicatedin 654 additional unrelated individuals.

CONCLUSION: The characteristics ofmtDNA in the human population areshaped by selective forces acting on heter-oplasmy within the female germ line andare influenced by the nuclear genetic back-ground. The signature of selection can beseen over one generation, ensuring con-sistency between these two independentgenetic systems.▪

RESEARCH

Wei et al., Science 364, 749 (2019) 24 May 2019 1 of 1

The list of author affiliations is available in the full article online.*These authors contributed equally to this work.†Deceased.‡NIHR BioResource–Rare Diseases and the 100,000Genomes Project–Rare Diseases Pilot collaborators andaffiliations are listed in the supplementary materials.§These authors contributed equally to this work.¶Corresponding author. Email: [email protected](P.F.C.); [email protected] (E.T.)Cite this article as W. Wei et al., Science 364,eaau6520 (2019). DOI: 10.1126/science.aau6520

Many generations

Mitochondrial transmission of a single variant

Nuclear-mtDNA genome matched individuals

Nuclear-mtDNA genome mismatched individuals

Wild-type

Low level heteroplasmy

High level heteroplasmy

Intermediate level heteroplasmy

Many generations

Heteroplasmic variants in different nuclear-mtDNA genomes

Wild-type mtDNA Heteroplasmy not matching nuclear genomeHeteroplasmy matching nuclear genomeMutant mtDNA

D-loop

mtDNA

Gray bars: Frequency in populationDe novo, positive shift Lost, negative shiftTransmitted, positive shift Transmitted, negative shiftCircos plot:

tRNA genes Protein coding genes

Comparison of heteroplasmic variants in mothers and their offspring

Pos itive heteroplasmic shifts

Nega tive heteroplasmic shifts

RNR2 RNR1

Germline selection of humanmitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,new mtDNA variants are heteroplasmic, with the proportion changing during maternal transmission.(Bottom left) Selection during the transmission of mtDNA heteroplasmy. Transmitted variants are morelikely to have been seen before as homoplasmic polymorphisms. (Right) New haplogroup-specificvariants are more likely to match the nuclear genetic ancestry than the mtDNA ancestry.

ON OUR WEBSITE◥

Read the full articleat http://dx.doi.org/10.1126/science.aau6520..................................................

on July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 2: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

RESEARCH ARTICLE◥

MITOCHONDRIAL DNA

Germline selection shapes humanmitochondrial DNA diversityWei Wei1,2, Salih Tuna3,4, Michael J. Keogh1, Katherine R. Smith5*,Timothy J. Aitman6,7, Phil L. Beales8,9, David L. Bennett10, Daniel P. Gale11,Maria A. K. Bitner-Glindzicz8,9,12†, Graeme C. Black13,14, Paul Brennan15,16,17,Perry Elliott18,19, Frances A. Flinter20,21, R. Andres Floto22,23,24, Henry Houlden25,Melita Irving21, Ania Koziell26,27, Eamonn R. Maher28,29, Hugh S. Markus30,Nicholas W. Morrell4,22, William G. Newman13,14, Irene Roberts31,32,33,John A. Sayer16,34, Kenneth G. C. Smith22, Jenny C. Taylor33,35, Hugh Watkins35,36,37,Andrew R. Webster38,39, Andrew O. M. Wilkie33,37,40, Catherine Williamson41,42,NIHR BioResource–Rare Diseases‡, 100,000 Genomes Project–Rare Diseases Pilot‡,Sofie Ashford4,43, Christopher J. Penkett3,4, Kathleen E. Stirrups3,4, Augusto Rendon3,5*,Willem H. Ouwehand3,4,44,45,46§, John R. Bradley4,22,24,29,47§, F. Lucy Raymond4,28§,Mark Caulfield5,48*, Ernest Turro3,4,49¶, Patrick F. Chinnery1,2,4¶

Approximately 2.4% of the human mitochondrial DNA (mtDNA) genome exhibits commonhomoplasmic genetic variation. We analyzed 12,975 whole-genome sequences to showthat 45.1% of individuals from 1526 mother–offspring pairs harbor a mixed population ofmtDNA (heteroplasmy), but the propensity for maternal transmission differs across themitochondrial genome. Over one generation, we observed selection both for and againstvariants in specific genomic regions; known variants were more likely to be transmittedthan previously unknown variants. However, new heteroplasmies were more likely to matchthe nuclear genetic ancestry as opposed to the ancestry of the mitochondrial genome on whichthe mutations occurred, validating our findings in 40,325 individuals. Thus, human mtDNAat the population level is shaped by selective forces within the female germ line under nucleargenetic control, which ensures consistency between the two independent genetic lineages.

Primarily inherited from the maternalline, the 16.5-kb human mitochondrialDNA (mtDNA) genome acquired muta-tions sequentially after modern humansemerged out of Africa (1–3). Pedigree and

phylogenetic analyses have estimated a de novomtDNA nucleotide substitution rate of ~10−8

substitutions per base pair per year (4). However,from 30,506 mitochondrial genome sequencesfrom across the globe (5), only 2.4% of nucleo-

tides show genetic variationwith >1% frequencywithin a population (Fig. 1). Although the ideahas been the subject of debate (6, 7), selectionmight contribute to the nonrandom distributionof common variants across the mitochondrialgenome in the human population.

Heteroplasmic mtDNA variants arecommon and maternally inherited

We analyzed high-depthmtDNA sequences from1526mother–offspring pairs (mean depth in themothers = 1880×, range = 249× to 7454×; meandepth in the offspring = 1901×, range = 259× to7475×; mothers versus offspring, P = 0.49, two-sample t test) (fig. S1). We called homoplasmicand heteroplasmic mtDNA variants from whole-blood DNA sequence data (8, 9) and filtered outheteroplasmic calls that are likely attributable toerrors (9–11). We identified a mixed populationofmtDNA (heteroplasmic variants)with a hetero-plasmic variant allele frequency (VAF) >1% withhigh confidence in 47.8%ofmothers (1043hetero-plasmic variants at 812 sites) and 42.5% of off-spring (893 heteroplasmic variants at 693 sites)(Fig. 1, table S1, and data S1). In 22 individuals forwhich the whole genome was independently se-quenced twice, the heteroplasmic mtDNA callswere 96.4% concordant (fig. S2) (9). As expected(12, 13), there was a small but significant positivecorrelation between the number of heteroplasmicvariants and the mother’s age [P = 6.42 × 10−11,R2 = 0.17, 95% confidence interval (CI) = 0.12 to0.23, Pearson’s correlation] (fig. S3), with mothershaving more heteroplasmic variants than off-spring (mean number in the mothers = 0.68,range = 0 to 6; mean number in the offspring =0.58, range = 0 to 4; P = 0.002, effect size = 0.68,Wilcoxon rank sum test) (Fig. 2A).We defined three categories of heteroplasmic

variants: (i) transmitted or inherited, if the vari-ant was present in the mother and the offspringand was heteroplasmic in at least one of the two;(ii) lost, if the heteroplasmic variant was present

RESEARCH

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 1 of 12

1Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK. 2Medical Research Council Mitochondrial Biology Unit,Cambridge Biomedical Campus, Cambridge, UK. 3Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK. 4NIHR BioResource, Cambridge UniversityHospitals NHS Foundation, Cambridge Biomedical Campus, Cambridge, UK. 5Genomics England, Charterhouse Square, London, UK. 6MRC Clinical Sciences Centre, Faculty of Medicine, ImperialCollege London, London, UK. 7Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK. 8Genetics and Genomic Medicine Programme, UCL Great Ormond Street Instituteof Child Health, London, UK. 9Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK. 10The Nuffield Department of Clinical Neurosciences, University of Oxford, John RadcliffeHospital, Oxford, UK. 11UCL Centre for Nephrology, University College London, London, UK. 12UCL Great Ormond Street Institute of Child Health, University College London, London, UK. 13Evolution andGenomic Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK. 14Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester UniversitiesFoundation NHS Trust, Manchester, UK. 15Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK. 16Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne,UK. 17Northern Genetics Service, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK. 18UCL Institute of Cardiovascular Science, University College London, London, UK.19Barts Heart Centre, St Bartholomew’s Hospital, Barts Health NHS Trust, London, UK. 20Guy’s and St Thomas’ Hospital, Guy’s and St Thomas’ NHS Foundation Trust, London, UK. 21Clinical GeneticsDepartment, Guy’s and St Thomas’ NHS Foundation Trust, London, UK. 22Department of Medicine, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge,UK. 23Royal Papworth Hospital NHS Foundation Trust, Cambridge, UK. 24Addenbrookes Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK. 25Department of MolecularNeuroscience, UCL Institute of Neurology, London, UK. 26Department of Experimental Immunobiology, King’s College London, London, UK. 27Department of Paediatric Nephrology, Evelina LondonChildren’s Hospital, Guy’s and St Thomas’ NHS Foundation Trust, London, UK. 28Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, CambridgeBiomedical Campus, Cambridge, UK. 29NIHR Cambridge Biomedical Research Centre, Cambridge Biomedical Campus, Cambridge, UK. 30Stroke Research Group, Department of Clinical Neurosciences,University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK. 31MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.32Department of Paediatrics, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK. 33NIHR Oxford Biomedical Research Centre, Oxford University Hospitals Trust, Oxford, UK.34Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK. 35Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK. 36Department of Cardiovascular Medicine,Radcliffe Department of Medicine, University of Oxford, Oxford, UK. 37Oxford University Hospitals NHS Foundation Trust, Oxford, UK. 38Moorfields Eye Hospital NHS Foundation Trust, London, UK.39UCL Institute of Ophthalmology, University College London, London, UK. 40MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK. 41Division ofWomen's Health, King’s College London, London, UK. 42Institute of Reproductive and Developmental Biology, Surgery and Cancer, Hammersmith Hospital, Imperial College Healthcare NHS Trust,London, UK. 43Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. 44Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. 45NHSBlood and Transplant, Cambridge Biomedical Campus, Cambridge, UK. 46British Heart Foundation Cambridge Centre of Excellence, University of Cambridge, Cambridge, UK. 47Department of RenalMedicine, Addenbrookes Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK. 48William Harvey Research Institute, NIHR Biomedical Research Centre at Barts, Queen MaryUniversity of London, London, UK. 49MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, Cambridge, UK.*These authors contributed equally to this work. †Deceased. ‡NIHR BioResource–Rare Diseases and 100,000 Genomes Project–Rare Diseases Pilot collaborators and affiliations are listed in the supplementarymaterials. §These authors contributed equally to this work.¶Corresponding author. Email: [email protected] (P.F.C.); [email protected] (E.T.)

on July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 3: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

in themother but not detectable in the offspring;and (iii) de novo, if the heteroplasmic variantwas present in the offspring but not detectable inthemother (table S1) (9). Because our sequencingand bioinformatics pipeline may have missedvery low level heteroplasmies (<1% VAF), lostand de novo variants could potentially be presentat very low levels in, respectively, the offspring’sandmother’s germ line. The heteroplasmic frac-tion (HF) of transmitted heteroplasmic variants(mean HF = 19.5%, SD = 13.9%) was significantlyhigher than the HF of lost variants (mean HF =5.6%, SD = 6.3%) in the mothers (P < 2.2 × 10−16,effect size = 4.24, Wilcoxon rank sum test), andtheHFof inherited heteroplasmic variants (meanHF = 19.8%, SD = 14.1%) was significantly higherthan that of de novo variants (mean HF = 6.2%,SD = 7.4%) in the offspring (P < 2.2 × 10−16, effectsize = 4.06, Wilcoxon rank sum test) (Fig. 2B andtable S1). The HF of transmitted variants in theoffspring strongly correlatedwith the correspond-ingmaternal level (P = 1.52 × 10−93,R2 = 0.79, 95%CI = 0.75 to 0.82, Pearson’s correlation) (Fig. 2C).In total, 477 de novo heteroplasmic variants not

seen in the mother were observed at >1% HF inthe offspring, consistent with previous estimates(13). To ensure that these data were not due totechnical errors, we determined whether anyheteroplasmic variants in the offspring werealso present in their fathers. Among 313 father–offspring pairs, the offspring harbored 196 hetero-plasmic variants with HF >1%, and only 1 ofthese was also observed in the correspondingfather. This was a common population variant[population minor allele frequency (MAF) =25.8% (5)] in the displacement (D)–loop region(m.152T>C),whichwas homoplasmic in the fatherand had anHFof 12.4% in his child. The alternateallele was not detected in the mother, whichsuggests that this region is a recurrent site ofmutation or that the variant is conceivably dueto the paternal transmission of mtDNA.The difference between HF in mothers and

their offspring can be measured in percentagepoints (Fig. 2D) (14). This metric is limited bythe difference between the HF of the offspringand that of the mother, as well as by the bound-aries 0 and 100%. Themagnitude of the percent-

age change differs from the magnitude of thefold change in VAF. For example, a change from50 to 55% would be given the same value as achange from 1 to 6%, even though the latterimplies a sixfold increase in the proportion ofmtDNAcarrying the alternate allele.We thereforestudied the log2 ratio of HF between offspringand mothers after imputation of HF values be-low 1% to our detection threshold of 1% [sub-sequently termed the heteroplasmy shift (HS)](Fig. 2, E and F) (9), which shrunk HSs towardzero only when the true HF in either themotheror the offspring was below 1%.Overall, there was no significant difference

between the number of heteroplasmic variantswith a positive (n = 731) and a negative (n = 798)HS (P = 0.091, binomial test). The HS distribu-tion around zero was moderately symmetric andyielded a marginal P value for asymmetry (P =0.05, one sample t test) (Fig. 2, D and E), con-sistentwith random segregation ofmitochondriaduring meiosis (14, 15). All of the HSs were <6 inmagnitude, corresponding to a <64-fold increaseor decrease in HF across one generation, with

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 2 of 12

Fig. 1. Circos plot ofmitochondrial heteroplas-mic variants identifiedin 1526 mother–offspringpairs. Circles from theoutside to the inside indicatethe following: (i) positionof a variant on the mtDNA(removed regions are repre-sented by red crosses);(ii) minor allele frequency forcommon variants (MAF>1%)derived from 30,506 NCBImtDNA sequences (5) (theradial axis corresponds to theMAF); (iii) phastCons100wayscores from UCSC (47) (theradial axis corresponds to thedegree of conservation); (iv)heteroplasmic variants identi-fied in the mothers (the radialaxis corresponds to the HF);(v) regions correspondingto the different mtDNAgenes (yellow, D-loop; purple,coding region; green,rRNAs; orange, tRNAs);and (vi) heteroplasmicvariants identified in theoffspring (the radial axiscorresponds to the HF).

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 4: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 3 of 12

Fig. 2. Transmission of heteroplasmic mtDNA variants in 1562mother–offspring pairs. (A) Frequency distribution of heteroplasmicvariants in mothers and offspring. (B) Distribution of HF in mothers andoffspring. (C) Scatter plot of logit(HF) in transmitted heteroplasmicvariants between mothers and offspring (R2 = 0.79, P = 1.52 × 10−93,Pearson’s correlation). (D) (Left) Difference in the percentage shift ofHF between offspring and the corresponding mothers (HFoffspring −HFmother) ordered by the degree of shift. (Right) Distribution of thedifference of the percentage shift of HF between offspring and the

corresponding mothers (HFoffspring − HFmother). (E) (Left) Log2 ratioof HF difference between offspring and the corresponding mothersordered by the degree of the ratio. Three variants with HS > 6 are shownin the inset. (Right) Distribution of log2 ratio of HF difference betweenoffspring and the corresponding mothers. (F) Log2 ratio of HF differencebetween offspring and the corresponding mothers aligned to the wholemitochondrial DNA sequence. mtDNA regions are shown at the bottom barin different colors (yellow, D-loop; purple, coding region; green, rRNAs;orange, tRNAs).

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 5: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

three exceptions. De novo variants at m.57T>C(HF = 99.3%), m.8993T>G (HF = 82.1%), andm.14459G>A (HF = 93.6%) were detected inthree unrelated offspring and were not presentin the corresponding mothers (figs. S4 to S6).m.14459G>A is a nonsynonymous (NS) variantin ND6 that, on the basis of evidence from pre-viously published pedigrees (16, 17), causes Leberhereditary optic neuropathy andLeigh syndrome(dystonia). m.8993T>G is a NS variant in ATP6(L156R) that has been observed on many inde-pendent occasions in Leigh syndrome or neu-rogenic ataxia with retinitis pigmentosa (17–19).Although these extreme HSs could reflect dif-ferences in the mechanism of transmission forpathogenic mtDNA mutations (20), ascertain-ment is amore likely explanation because partic-ipants with childhood-onset neurodegenerativediseases were recruited as part of this study (21).Ascertainment bias is not likely to explain the denovo occurrence of m.57T>C, but these findingsindicate that extreme HSs at moderate HFs arenot typical among humans.As expected, the noncoding D-loop had the

highest substitution frequency (7.64 × 10−5 perbase per genome per transmission) of all theregions in the mitochondrial genome (Fig. 3Aand table S2) (13). In total, we observed 16 of57 previously defined (5) pathogenic mutationsin the 1526 mother–offspring pairs (Fig. 3B). Af-ter excludingm.14459G>A andm.8993T>G,wherethe extreme HS likely reflects ascertainmentbias, themeanHS for the remaining 14 pathogenicmutations was not significantly different fromzero (P = 0.22, one sample t test) nor from themean HS for the remaining 1076 nonpathogenicvariants (P = 0.11, two sample t test). Thus, over-all we did not see a strong signature of selectionfor or against pathogenic alleles, although ourstatistical analysis does not preclude that a sub-set of the observed pathogenic alleles may beunder selection. Notably, only three motherscarried the most common heteroplasmic path-ogenic mutation, m.3243A>G (22), each with alow HF (5.2, 3.6, and 1.7%) that decreased tolevels below our detection threshold in two ofthe three corresponding offspring (3.9, <1, and<1%). Six of the 16 pathogenic mutations werenot detectable in the mothers, yielding a de novomutation rate for known pathogenic mutationsof 393 per 100,000 live births (95% CI = 144 to854),which is ~3.7-fold higher than the previouslyreported rate (23).To gain insight into possiblemutationalmech-

anisms, we determined the trinucleotide muta-tional signature. As shown previously, C>T andT>C substitutions were the most common typeof substitution in homoplasmic variants (5) andcancer somatic mtDNAmutations (24). For het-eroplasmic variants, C>T and T>C substitutionswere also predominant, which suggests thatgermline transmission shapes the mutationalsignatures seen in homoplasmic variants at thepopulation level. However, we also observed asmall but significant excess of C>A, C>G, T>A,andT>G substitutions (P< 2.2 × 10−16, odd ratio =0.36, 95%CI = 0.29 to 0.44, Fisher’s exact test) (fig.

S7). Additionally, de novo mutations were morelikely to involve a CpG-containing trinucleotide(P = 3.01 × 10−6, odd ratio = 0.50, 95% CI =0.38 to 0.66, Fisher’s exact test) (Fig. 3C). Al-though controversial (25), this could be becausemethylationofNpCpGsites on themtDNAgenomepredisposes to de novo mtDNA mutations, asseen in the nuclear genome.

Known mtDNA variants are more likelyto be transmitted than previouslyunknown variants

We then compared knownheteroplasmic variants(i.e., those seen before in the general population)and those not previously observed. Variants wereconsidered previously unknown if they wereabsent from the 1000 Genomes datasets andthe Single Nucleotide Polymorphism Database(dbSNP) and were seen in atmost one individualamong 30,506 NCBI mtDNA sequences (5). Pre-viously unidentified heteroplasmic variants were4.7-fold less commonly transmitted frommotherto offspring than known variants (P= 3.55 × 10−13,odd ratio = 2.60, 95% CI = 1.97 to 3.45, Fisher’sexact test), and the HS for transmitted knownvariants was more likely to be positive (P =0.0002, probability = 0.40, 95% CI = 0.35 to 0.45,binomial test) (Fig. 3, D and E). Also, the trans-mitted heteroplasmic variants were more likelyto affect known haplogroup-defining sites (26)than the lost and de novo heteroplasmic variants(P = 7.86 × 10−11, odds ratio = 0.40, 95% CI = 0.30to 0.53, andP=0.0016, odds ratio =0.62, 95%CI=0.46 to 0.84, respectively, Fisher’s exact test) (Fig.3F). This suggests that factors may modulatethe transmission of mtDNA heteroplasmy withinthe female germ line over a single generation andinfluence the likelihood that they become estab-lishedwithin humanmtDNApopulations. Becauseheteroplasmic variants are acquired throughoutlife, they must be removed at transmission to off-spring at a higher rate than they appear de novo;otherwise, each generation would be accom-panied by an expected increase in potentiallydeleterious heteroplasmic variants (27). Corre-spondingly, the number of previously unknownvariants present in themother but not transmitted(lost variants) exceeded the number of de novounknown variants detected in the offspring (P =7.93 × 10−7, probability = 0.62, 95% CI = 0.57 to0.67, binomial test) (Fig. 3D), in part reflectingthe accumulation of heteroplasmic variants withincreasing maternal age (fig. S3).

Selection for and against heteroplasmyin different genomic regions

We analyzed different functional regions of thegenome and found evidence indicating region-specific selection for or against heteroplasmicvariants. The distributions of HF in the 1526mother–offspring pairs were significantly differ-ent between the D-loop, ribosomal RNA (rRNA),tRNA, and coding regions (Fig. 4A and table S3).Within the coding region, theNSand synonymous(SS) variants also had different distributions (P =2.74 × 10−5, Kolmogorov-Smirnoff test). TheNS/SSratio was greater for the heteroplasmic variants

than for the homoplasmic variants (P = 3.98 ×10−24, odds ratio = 1.91, 95% CI = 1.68 to 2.18,Fisher’s exact test), and the de novo and lostheteroplasmic variants had a higher NS/SS thanthe transmitted variants (transmitted versusde novo: P = 0.0056, odds ratio = 1.69, 95% CI =1.15 to 2.48; transmitted versus lost: P = 0.01, oddsratio = 1.57, 95% CI = 1.10 to 2.24, Fisher’s exacttest) (Fig. 4B). The heteroplasmic variants weremore often in conserved sites than the homo-plasmic variants (P = 3.71 × 10−77, odds ratio =3.21, 95% CI = 2.86 to 3.60, Fisher’s exact test),and the transmitted heteroplasmic variants wereless conserved than the de novo (P = 0.0018,odds ratio = 1.62, 95% CI = 1.19 to 2.22, Fisher’sexact test) and lost (P = 9.60 × 10−9, odds ratio =2.25, 95% CI = 1.69 to 3.03, Fisher’s exact test)heteroplasmic variants (Fig. 4C). Also, hetero-plasmic variants with a positive HS were less con-served than those with a negative HS (P = 0.03,odds ratio = 1.28, 95% CI = 1.01 to 1.61, Fisher’sexact test). Variants in the rRNA genes weremore likely to show a decrease in the hetero-plasmy level on transmission than an increase(P = 1.00 × 10−4, probability = 0.65, 95% CI =0.57 to 0.72, binomial test) (Fig. 4D), and themean HS was significantly less than zero (P =8.21 × 10−5, d = 0.30, one sample t test) (Fig. 4E).To understand the determinants of transmis-

sion of heteroplasmic variants with a reducedrisk of confounding, we used multivariable lo-gistic regression to model the probability oftransmission across all 1526 mother–offspringpairs (9). We modeled the transmission prob-ability of a variant as a function of its HF in themother, the identity of themitochondrial genomeregion containing it, and its status as known orpreviously unknown (Figs. 3D and 4, F to H) (9).The probability that a heteroplasmic variant inthe mother was transmitted to her offspring wasassociated with its HF in the mother (P < 2.2 ×10−16, coefficient estimate = 1.17, SD = 0.08, lo-gistic regression) (Fig. 4F). Variants in the D-loopwere more likely to be transmitted (P = 0.04,coefficient estimate = 0.39, SD = 0.19, logistic reg-ression) than average, and those in the rRNAwereless likely to be transmitted (P=0.0026, coefficientestimate = −0.94, SD = 0.31, logistic regression)than average (Fig. 4G). The previously unrecog-nized variants were less likely to be transmittedthan the known variants (P = 0.028, coefficientestimate = 0.43, SD = 0.19, logistic regression)(Fig. 3D), even after accounting for all other co-variates, including HF in the mothers.

Heteroplasmic variants in thenoncoding D-loop

To cast light on the possible effects of selectionon the noncoding D-loop, we derived a high-resolution map of heteroplasmic variants in12,975 individuals, which included the 1526mother–offspring pairs (mean mtDNA genomedepth = 1832×, SD = 945×; mean depth of D-loop = 1569×, SD = 819×) (Fig. 5, A to C, andfig. S8) (9). We found an association betweenthe homoplasmic allele frequency among 30,506NCBI mtDNA sequences and the proportion of

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 4 of 12

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 6: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

individuals heteroplasmic for the same allele(P < 2.2 × 10−16, logistic regression) (Fig. 5, A toC), similar to that previously observed (5). Of the17 regions in the D-loop (Fig. 5C, bottom, purpleand orange bars), 2 had a significantly greater

number of heteroplasmic variants than expectedby chance. These regions correspond to the pro-posed replication fork barrier associatedwith theD-loop termination sequence (MT-TAS2) (28) andMT-CSB1 (MT-TAS2: P = 4.5 × 10−11, odds ratio =

0.40, 95% CI = 0.30 to 0.54; MT-CSB1: P = 7.0 ×10−6, odds ratio = 0.39, 95%CI= 0.24 to 0.61, Fisher’sexact test versus remainder of the D-loop).To help us understand the evolution of the

D-loop, we identified all heteroplasmic variants

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 5 of 12

A

ExpectedHomoplasmy

Transmitted

Lost

de novoCpG non CpG

Frequency(%) 0 20 40 60 80 100

D

B

0

80

20

40

60

100

C

0HF (%) 20 40 60 80 100

Pat

hoge

nic

mut

atio

n

HF in offspringHF in motherpositive change in HF (transmitted)negative change in HF (transmitted)positive change in HF (de novo)negative change in HF (lost)

m.1555A>Gm.3243A>G

m.3460G>A

m.3243A>Gm.3243A>G

m.3946G>Am.3946G>Am.4136A>Gm.5631G>Am.8344A>G

m.10197G>A

m.8993T>Gm.9185T>C

m.14459G>Am.15498G>Am.15498G>A

Know

n

0.60.5

00.10.20.30.4

0.10.2

0.60.50.40.3

Pro

babi

lity

increasedecrease

0 2 64-4 -2-6

D-L

OO

P

CY

B

ND

6

ND5

ND4

ND4L

ND3CO3

ATP

6AT

P8 C

O2

CO

1

ND2

ND1

RNR2

RNR1

1/log2(rate)

0.0590

0.0732

0.0625

0.0661

0.0696

F

tran

smitt

ed

de n

ovo

lost

haplogroup markersnon-haplogroup markers

E

Num

ber

of h

eter

opla

smic

var

iant

s

Known variants Previously unknown variants

Previously

unknown

increasedecrease

de novolost

0

50

100

150

200

250

300

350F

requ

ency

(%

)

****

****

***

****

**

log2(HFoffs / HFmum)

Fig. 3. Characteristics of the heteroplasmic mtDNA variants in1562 mother–offspring pairs. (A) Mutation rate of mtDNA genomicregions was estimated using 477 de novo heteroplasmic variants from1526 mother–offspring pairs detected at HF > 1%. Vertical axes represent1/log2(mutation rate) per base per mother–child transmission. mtDNAgenomic regions are labeled and illustrated in different colors (yellow,D-loop; purple, coding region; green, rRNAs; orange, tRNAs). All tRNAswere combined to estimate the tRNA mutation rate. [Note that this is theraw number of new mutations per base pair per transmission detectedat HF > 1% in the offspring and does not factor in the detection thresholdnor segregation because current models assume neutrality (13, 48),which we later show is not the case.] (B) Pathogenic mutations wereobserved in 1526 mother–offspring pairs. Each dot represents the HF inthe mothers (blue) and the corresponding offspring (pink); the directions

of the arrow show positive (→) or negative (←) HS; the length ofthe arrow between each pair of points represents the change in HF(orange, transmitted heteroplasmic variants; gray, de novo and lostheteroplasmic variants). (C) Frequency of heteroplasmic variantsat CpG and non-CpG islands. (D) Previously unidentified versus knowntransmitted, lost, and de novo heteroplasmic variants. (E) Distributionof HS between offspring and corresponding mothers in transmittedknown and previously unknown heteroplasmic variants. (F) Frequencyof haplogroup-defining variants in transmitted, lost, and de novoheteroplasmic variants. The transmitted heteroplasmic variants were morelikely to affect known haplogroup-defining variants on the world mtDNAphylogeny than the lost and de novo heteroplasmic variants (P = 7.86 ×10−11 and 0.0016, respectively, Fisher’s exact test). *P < 0.05, **P < 0.01,***P < 0.001, and ****P < 0.0001.

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 7: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 6 of 12

D-loop

SS

NS

rRNA

tRNA

positive shift

negative shift

Number of heteroplasmic shifts

100 15050 2500 200

A

1.0 0.5 0.25 00.752.0 1.5 1.251.75

all heteroplasmic

transmitted

lost

de novo

****

**

**

NS/SS

all homoplasmic

****

B

conservednon-conserved

Fre

quen

cy (

%)

tran

smitt

ed

hom

opla

smie

s

de n

ovo

lost

expe

cted

tota

l pos

itive

shi

ft

tota

l neg

ativ

e sh

ift

****

***** *40

60

20

20

40

0

80

C D

E

****

Den

sity

Normalized heteroplasmic shifts

D-loop SS NS rRNA tRNA

-5 50

0.30

0.10

0.20

0.25

0.15

0

0.05

F G H

1~2 <5 <10<20

HF in mother (%)

100

80

60

40

20

0

lost

heteroplasmies

transmitted

heteroplasmies

Fre

quen

cy (

%)

<30<40<50

False positive rate

True

pos

itive

rat

e

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

AUC=0.857

Heteroplasmic fraction (%)

D-loopSSNSrRNAtRNA

1.0

0.2

0.6

0.8

0.4

0

40 6020 1000 80

100806040200

Frequency (%)

D-loop

Codingregion

rRNA

tRNA

lost

hete

ropl

asm

ies

tran

smitt

edhe

tero

plas

mie

s

*

**

-5 50-5 50-5 50-5 50

Cum

ulat

ive

frac

tion

of

hete

ropl

asm

ic v

aria

nts

Fig. 4. Evidence of selection during the transmission of mtDNA hetero-plasmy in 1526 mother–offspring pairs. (A) Cumulative distributions of HF inmothers and offspring within each mtDNA region. Vertical lines betweentwo curves indicate the greatest distance between specific regions (D-loop,SS, NS, rRNA, and tRNA) (P values in table S3). (B) Ratio of NS and SSvariants for observed homoplasmic polymorphisms; total heteroplasmicvariants; and transmitted, lost, and de novo heteroplasmic variants.(C) Frequency of heteroplasmic variants affecting conserved and nonconservedsites. (D) Number of heteroplasmies showing an increased or decreased HF

in each mtDNA region. Left-pointing arrows indicate that the number increasingwas less than the number decreasing. Right-pointing arrows indicate thatthe number increasing was greater than the number decreasing. (E) Histogramsof HS in each mtDNA region with fitted kernel density curves. (F) Bar plotof the frequency of transmitted heteroplasmic variants by bins of HF in themothers. (G) Frequency of transmitted heteroplasmic variants in eachmtDNA region, along with 95% CIs. (H) Receiver operating characteristiccurve for the logistic regression model of transmission [area under the curve(AUC) = 0.857]. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 8: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

not identified on mtDNA phylogenies across asubset of 10,210 unrelated individuals from theoriginal dataset (9). Five of these heteroplasmicvariants were shared by more than one individ-ual andwere present exclusively in peoplewithin

a particular haplogroup (Fig. 5, D and E). Onevariant (m.16237A>T) was present in multipleindividuals from two different branches of thephylogeny (L0a1&2 andM35b2) (Fig. 5, D and E).Compared with homoplasmic sequences from

across the world (5), only m.299C>A was ob-served previously as a homoplasmic variant (in 3of 30,506 individuals), each time on the R30b1haplogroup background. This suggests that, inour study, individuals heteroplasmic form.299C>A

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 7 of 12

Fig. 5. Distribution ofheteroplasmic var-iants in mtDNA D-loopregion. (A) MAF ofhomoplasmic single-nucleotide polymor-phisms observed in30,506 NCBI mtDNAsequences, with anexpanded axis to showMAF < 10% at thebottom. (B) Trend ofPhastCons scoresshown across themtDNA D-loop region.(C) HFs observed in12,975 mtDNA sequencesin the D-loop region.MT-TAS2 and MT-CSB1are shadowed in lightpurple. MT-LSP isshadowed in light orange.Corresponding knownsubregions of themtDNA D-loop are shownat the bottom. MT-3H,mt3 H-strand control ele-ment; MT-3L, L-strandcontrol element; MT-4H,mt4 H-strand control ele-ment; MT-7SDNA, 7SDNA; MT-CSB1, con-served sequence block 1;MT-CSB2, conservedsequence block 2;MT-CSB3, conservedsequence block 3;MT-HPR, replicationprimer; MT-HSP1, majorH-strand promoter;MT-HV1, hypervariablesegment 1; MT-HV2,hypervariable segment2; MT-HV3, hyper-variable segment 3;MT-LSP, L-strandpromoter; MT-OHR,H-strand origins; MT-OHR57, H-strand origin;MT-TAS, termination-associated sequence;MT-TAS2, extendedtermination-associatedsequence; MT-TFH,MT-TFL, MT-TFX, MT-TFY,mtTF1 binding site (49).(D) Trinucleotide mutational signature of heteroplasmic variants in the D-loopregion in 12,975 mtDNA sequences. Different colored bars represent the six typesof substitutions. Labeled heteroplasmic variants are included for the bars circledin red. (E) Simplified mtDNA phylogenetic tree showing six heteroplasmicvariants (see main text). Variants are shown in red; haplogroups are in blue. Pie

chart sizes are proportional to the number of samples (shown at the bottom)belonging to the corresponding haplogroup in 10,210 unrelated mtDNA se-quences. The proportion of samples carrying each heteroplasmic variantwithin the same haplogroup is shown in yellow. (F) HF of six heteroplasmicvariants shared by more than one individual belonging to the same haplogroup.

Freq

uenc

y in

pop

ulat

ion

(%)

2

4

6

810

0

A

B

0.5

1

0

0.5

1

0

20

40

60

80

100

0

C

A16237T

L3L1,2,4,5,6

M

R

W4a1I2c

R30b1

N1a1a1a2

N1a2

M35b2

L0a1a/dL0a2a

A16237T

A193C

C299A

A432C

A576C

C16306G

N

N1

500300200 004001

MT-CSB1

500300200 004001

500300200 004001

16100 16200 16400 16500

MT-TAS2

16300

16100 16200 16400 1650016300

16100 16200 16400 1650016300

20

40

60

80

100

0

20

40

60

80

100

0

MT-7SDNAMT-HV1

MT-TAS2 MT-TAS MT-3L

MT-7SDNA3VH-TM2VH-TM

MT-OHRMT-OHR57 MT-TFX MT-TFL MT-TFH

MT-5MT-CSB1 MT-TFY MT-HPR MT-4H

MT-LSP MT-HSP1MT-CSB2 MT-CSB3 MT-3H

E

299C->A16306C->G

16237A->T193A->C

432A->C, 576A->C

F

10

0

30

20

40

HF

(%

)

D

193A->C(W4a1)299C->A(R30b1)

432A->C(I2c)

16237A->T(L0a1a/d&L0a2a)16237A->T(M35b2)

576A->C(N1a1a1a2)

16306C->G(N1a2)2

0

6

4

8

10

Fre

quen

cy in

D

-loop

reg

ion

(%) C->A C->G C->T T->A T->C T->G

2

4

6

810

0

20

40

60

80

100

0

68

5

24 25

413

222

HF

(%

)ph

astC

ons

scor

e

MT-LSP

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 9: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

(Fig. 5F) descended from the same maternal an-cestor as the three homoplasmic individuals seenpreviously (5) and belonged to a closely relatedmaternal lineage that had not yet reached fixa-tion. These recurrent heteroplasmies contrib-uted to the distinct trinucleotide mutationalsignature of theD-loop (P= 2.3 × 10−137, Stouffer’smethod for combining Fisher P values), whichinvolves prominent noncanonical substitutionsand is consistent with the conclusion that thehomoplasmic trinucleotidemutational signatureofmtDNA is shaped by germline transmission ofheteroplasmic variants (Fig. 5D and fig. S9).We observed an absence of low-level hetero-

plasmic variants in critical sites required for theinitiation of mtDNA transcription and repli-cation. These zones include several conservedsequence boxes and the light-strand promoter(MT-LSP: P = 7.7 × 10−18, odds ratio = 10.12, 95%CI =5.43 to 20.31, Fisher’s exact test), which arerequired for mtDNA transcription and mtDNAreplication (29). Certain regions with no knownfunction (30) (e.g., 16,400 to 16,500 bp; Fig. 5C)also had a complete lack of low-level hetero-plasmic variants, which suggests that an intactsequence at these regions is essential for mito-chondrial function and perhaps genome propa-gation. The coordinates of the conserved andnonconserved regions provide a guide for func-tional studies of the mtDNA D-loop, which hasbeen incompletely characterized to date.

The nuclear genetic backgroundinfluences the heteroplasmy landscape

Most of the ~1500 knownmitochondrial proteinsare synthesized from the nuclear genome, in-cluding the majority of polypeptide subunits ofthe oxidative phosphorylation system and themachinery required to replicate and transcribethemitochondrial genome in situ (1). Selection foror against specificmtDNAvariantsmust thereforeoccur in the context of a specific nuclear geneticbackground. To explore this, we identified 12,933individuals for whom a confident mtDNA haplo-group could be predicted (fig. S10).We comparedthe haplogroup of each individual with the cor-responding nuclear genetic ancestry and identi-fied three distinct groups of individuals: (i) ahaplogroup-matched group (n = 11,867, 91.7%), inwhich the mtDNA haplogroup was concordantwith the nuclear ancestry; (ii) a haplogroup-mismatched group (n = 295, 2.3%), in which thenuclear ancestry and mtDNA were from differenthuman populations; and (iii) a group for whichthe nuclear ancestry could not be reliably deter-mined (n=771, 6.0%) (Fig. 6, A and B, and fig. S10).Subsequent analyses focused on the haplogroup-matched and -mismatched groups (9).In the matched group, 8159 heteroplasmic

variants at 3854 of the 16,569 distinct sites onthe mitochondrial genome were present, and inthe mismatched group, 195 heteroplasmic var-iants at 163 distinct sites were present. Themeannumber of heteroplasmic variants and meanHF were not statistically different between thematched andmismatched groups (fig. S11). Next,we studied distinct heteroplasmic sites in 10,179

of the 12,933 individuals whowere not related onthe basis of their nuclear genome (9414 in thematched group, 217 in the mismatched group,and 548 in the third group). Distinct hetero-plasmic sites were more likely to affect knownhaplogroup specific sites (26) than the rest of themitochondrial genome (P < 2.2 × 10−16, Fisher’sexact test), particularly within the mismatchedgroup (P = 0.001, odds ratio = 1.70, 95% CI =1.22 to 2.36, Fisher’s exact test) (Fig. 6C).We extracted 2641 haplogroup-specific variants

present in only one superpopulation (European,Asian, or African) on the world mtDNA phylog-eny (26). We built a predictive model of trans-mission of these variants using logistic regressionin 9385 unrelated European and Asian nuclearancestries using 2215 European-specific (n= 940)and Asian-specific (n = 1275) variants on themtDNA phylogeny, omitting the Africans be-cause of the diversity and small number (figs.S10 and S12) (9). We included the superpopu-lation and the logit population allele frequencyas covariates. We also included a dummy vari-able indicating whether the variant matched themitochondrial ancestry of the individual carry-ing the variant. Finally, for the matched andmismatched groups, we included a separatevariable indicating whether the variant super-population matched the nuclear ancestry ofthe individual who carried the variant.We fitted the model to 768 heteroplasmic

variants in 9179 unrelated matched individualsand 30 heteroplasmic variants in 206 unrelatedmismatched individuals (9). The heteroplasmicvariants in the mismatched group were signif-icantly more likely to match the ancestry of thenuclear genetic background than the mtDNAbackground onwhich the heteroplasmy occurred(P = 2.9 × 10−4, coefficient estimate = 0.85, SD =0.24, logistic regression) (Fig. 6D and table S4).These findings suggest that the previously un-identified mtDNA variants underwent selectionto match the nuclear genome. Given the highmutation rate of the mitochondrial genome andthe patterns we observed over one generation,the selective process is likely to occur within thefemale germ line.To independently validate this finding, we

repeated our analysis with an additional 40,325whole-genome sequences recruited through theGenomics England 100,000 Genomes Rare Dis-ease Main Programme (9). There were 36,038individuals in a haplogroup-matched group, 1098in a haplogroup-mismatched group, and 3124 ina group for which the nuclear ancestry could notbe reliably determined (figs. S12 and S13). Asbefore, we focused on the European- and Asian-specific variants observed in 23,931 unrelatedEuropean and Asian individuals. We fitted thesame logistic regression model to 1942 hetero-plasmic variants in 23,277 unrelated matchedindividuals and 67 heteroplasmic variants in654 unrelated individuals for whom the nuclearand mtDNA had a different ancestral origin.Again, the heteroplasmic variants in the mis-matched group were more likely to match theancestry of the nuclear genetic background

than the ancestral background of the mtDNAon which the heteroplasmy occurred (P = 1.33 ×10−3, coefficient estimate = 0.47, SE = 0.15, lo-gistic regression) (Fig. 6D and table S4). Aninverse-weighted meta-analysis of the discoveryand validation cohorts yielded a significantassociation across the two datasets (P = 3.3 × 10−6,coefficient estimate = 0.59, SE = 0.13). To gain abetter understanding of the underlying mech-anisms, we studied the gene location and HF of97 heteroplasmic variants identified in the mis-matched groups across both the discovery andvalidation studies. Potentially functional variantswere found in the noncoding region and RNAgenes and also included 14 NS protein-codingvariants in the MT-ATP, MT-COX, MT-CYB, andMT-ND regions (fig. S14). This raises the pos-sibility that differences in oxidative phosphoryla-tion and adenosine triphosphate (ATP) synthesisare responsible for the association we observed.

Discussion

Several explanations have been proposed for thehigh substitution rate of the noncoding mtDNAD-loop, including a high intrinsic mutation rateand/or a permissive sequence relative to thecoding regions (30). The segregation of mtDNAheteroplasmy likely plays a role in shapingD-looppopulation polymorphisms by a mechanism op-erating within the female germ line. Similar find-ings have been seen inDrosophila, where D-loopvariants “selfishly” drive segregation favoring aspecific mtDNA genotype (31). These observa-tions have implications for the development ofmitochondrial transfer techniques for prevent-ing the inheritance of severe pathogenic mtDNAmutations in humans (32, 33). After mitochon-drial transfer, ~15% of human embryonic stemcell lines show reversion to the original mtDNAgenotype (33–35). The reasons for this are notfully understood, but the selective propagationof D-loop heteroplasmy is a plausible explana-tion. Our findings implicate the nuclear genomein this process. This places greater emphasis onmatching both nuclear andmtDNAbackgroundswhen selecting potential mitochondrial donorsin order to minimize the possibility of nuclear-mitochondrial incompatibility after mitochon-drial transfer.In cases of heteroplasmic mtDNA, one allele

can be preferentially copied or can segregate tohigh levels in a population of daughter cells. Thiscan lead to changes in mtDNA allele frequencyduring the lifetime of an individual cell, tissue, ororganism through genetic drift (36, 37). A highlevel of mtDNA content buffers fluctuations inallele frequency. However, if the number ofcopies falls below a certain threshold, a “geneticbottleneck” occurs, increasing the possibility oflarge changes in allele frequency.There is a ~1000-fold reduction in cellular

mtDNA content during human germ cell devel-opment (38); this phase is followed by a period ofintense proliferation and migration when germcells migrate to form the developing gonad (39).This process depends on oxidative phosphoryl-ation and is accompanied by a massive increase

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 8 of 12

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 10: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

in mtDNA levels (38). Under these conditions,selection against variants that compromisemito-chondrial ATP synthesis will occur. On the otherhand, variants that promote mtDNA replicationwill have an advantage, potentially explaining

the preferential transmission of specific D-loopvariants. Subtle selective pressures will havemaximal impact at this time, so the nuclear ge-netic influencewe observedwillmost likely comeinto play during this critical period of devel-

opment. Rather than being a direct analysis ofthe germ line, our study was based on whole-blood DNA, so it is possible that tissue-specificdifferences in heteroplasmy come into play.However, our analysis indicates that, at the

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 9 of 12

Fig. 6. Characteristics ofheteroplasmic variantsin the nuclear ancestryand mtDNA ancestrymatched and mismatchedgroups. (A) Schematicshowing how individualswith matched (MG; redborder) and mismatched(MMG; green border)nuclear and mtDNAgenomes arise over gener-ations. Red and graymtDNAs represent twodifferent hypothetical pop-ulations. (B) (I) Projectionof the nuclear genotypesat common SNPs onto thetwo leading principalcomponents (PC1 andPC2) computed with the1000 Genomes dataset,with individuals colored bytheir assigned nuclearancestry: Asian (blue),African (green), European(red), and other (orange).(II to IV) Individualsrepresented by blue, green,and red symbols arealso shown in panels II, III,and IV, respectively,where they are coloredby their mitochondrialancestries. Stars indicatethat the mitochondrialancestry does not matchthe nuclear ancestry.(C) Proportion ofhaplogroup-defining var-iants in the matched group(MG) and mismatchedgroup (MMG) in 9631mtDNA sequences fromunrelated individuals. Theexpected proportion isshown at left. Distinct het-eroplasmic sites weremore likely to affect knownhaplogroup-definingvariants (25) than therest of the mitochondrialgenome compared withthat expected by chance(P < 2.2 × 10−16, Fisher’s exact test). This bias was stronger in the mismatched group than the matched group (P = 0.001, Fisher’s exact test).(D) Heatmaps showing the density of observed heteroplasmic mtDNA haplogroup-specific variants in the observation (left) and validation (right)datasets. The matched (top) and mismatched (bottom) groups are broken down by the nuclear ancestry of the carrier and the major haplogroup ofthe variants. The width of each column is proportional to the number of variants defining each of the two major haplogroups (Asian and European).Within each heatmap, the height of each row is proportional to the number of individuals having each nuclear ancestry. The density of heteroplasmicvariants in each cell determines its color.

A

B

IV. European nuclear genome

II. Asian nuclear genome

PC

2

PC1

PC

2

III. African nuclear genome

PC1

...MG

MMG

D

haplogroup markers non-haplogroup markers

C

MMGMG

I. All individuals

Eur

opea

nA

sian

Asian marker European markerE

urop

eanA

sian

Asian marker European marker

Discovery dataset Validation dataset

Nuc

lear

gen

ome

Anc

estr

y

Matched group

Mismatched group

Mean proportion of markers observed across individuals

Expected

European nuclear genomeAfrican nuclear genomeAsian nuclear genomeOthers European mtDNA

African mtDNAAsian mtDNA

European mtDNA

African mtDNA

Asian mtDNA

European mtDNAAfrican mtDNAAsian mtDNA

0.03

-0.03

-0.02

-0.01

0.00

0.01

0.02

0.03

-0.03

-0.02

-0.01

0.00

0.01

0.02

0.0050

-0.0100

-0.0075

-0.0050

-0.0025

0.0000

0.0025

-0.0125

-0.024

-0.032

-0.030

-0.028

-0.026

-0.04 0.020.010.00-0.01-0.02-0.03 0.010 0.0180.0160.0140.012

-0.035 -0.010-0.015-0.020-0.025-0.030 0.009 0.0140.0130.0120.0110.010

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 11: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

population level, humanmtDNA is influencedbyselective forces acting within the female germline and modulated by the nuclear genetic back-ground. These forces are apparent within onegeneration and ensure consistency betweenthe two independent genetic systems, shapingthe current world mtDNA phylogeny.

Materials and methodsParticipants, approvals, andsequence acquisition

Theprimary datawere obtainedbywhole-genomesequencing (WGS) from whole-blood DNA from13,037 individuals in the NIHR BioResource–Rare Diseases and 100,000 Genomes ProjectPilot studies (table S5) (21) After quality control(QC) [see below and (9)], 12,975 samples (includ-ing 1526 mother–offspring pairs) were includedin this study. For demographics, see (9). Ethicalapproval was provided by the East of EnglandCambridge South national research ethics com-mittee under reference number 13/EE/0325.WGSwas performed using the Illumina TruSeq DNAPCR-Free sample preparation kit (Illumina, Inc.)and an Illumina HiSeq 2500 sequencer, generat-ing a mean depth of 45× (range from 34× to 72×)and greater than 15× for at least 95% of thereference human genome (fig. S8A).

Extracting mitochondrial sequences,quality control, and variant detection

WGS readswere aligned to theGenomeReferenceConsortium human genome build 37 (GRCh37)using IsaacGenomeAlignment Software (version01.14; Illumina, Inc.). Reads aligning to themitochondrial genome were extracted from eachBAM file and analyzed using MToolBox (v1.0)(8, 9). Variant Call Files (VCFs) and the mergedVCFwere normalizedwith bcftools and vt (40–42),and duplicated variantswere droppedwith vt. Thefinal VCF was annotated using the Variant EffectPredictor (VEP) (43). Further QC was carried outas described (9). PotentialDNAcross-contaminationwas investigated using verifyBamID (44) in thenuclear genome and mtDNA variant calls (9).

Determining matched andmismatched groups

The pairwise relatedness and nuclear ancestrywere estimated using nuclear genetic markers asdescribed (9). MtDNA haplogroup assignmentwas performed using HaploGrep2 (26, 45). Wethen compared the mtDNA phylogenetic haplo-group with the nuclear genetic ancestry in thesame individual and identified three distinctgroups of individuals as described in the text.

Defining previously unknown variants

Variants were considered to be previously un-known if they were absent from 1000 Genomesdatasets and dbSNP and were seen in no morethan one individual among 30,506 NCBImtDNAsequences (5).

MtDNA mutational spectra and signature

Mutational spectra were derived from the refer-ence and alternative alleles as described (24, 46).

Probability of maternalmtDNA transmissionWe modeled the probability of transmission ofheteroplasmic variants observed in the mothersusing the following logistic regression model

logitPðyijl ¼ 1Þ ¼ aþ b11j¼1 þ b21j¼2 þ b31j¼3þ

b41j¼4 þ gwijl þ hzijl

where yijl ¼ 1 if the lth variant within mitochon-drial genomic region j inmother iwas transmittedand is zero otherwise; j ¼ 0; 1; 2; 3; or 4 denotethe coding, D-loop, rRNA, tRNA, and remaindersequences, respectively;wijl is the logit of the HFof the lth variant within mitochondrial genomicregion j in mother i; and zijl ¼ 1 if the lth variantwithin mitochondrial genomic region j in motheri was observed in no individuals from the 1000Genomes datasets, dbSNP, and at most one in-dividual among 30,506 NCBI mtDNAs; other-wise, it was equal to zero.

Homoplasmic allele frequency in thepopulation and heteroplasmic variants

We fitted a logistic regression model to explorethe relationship between the homoplasmic allelefrequency in the general population and the rateat which individuals who are not homoplasmicfor the alternate allele are heteroplasmic (9).

Selecting haplogroup-specific variantson the mtDNA phylogenetic tree

We extracted 4476 single-nucleotide variants(SNVs) present on the mtDNA phylogenetic tree(26), then focused on SNVs either present in onlyone superpopulation (European, Asian, or African)or present in two superpopulations but commonlyseen in one population (>1%) and not seen (or seenextremely rarely) in the other population in 17,520mtDNAs (5). This selected 2641 haplogroup-specificvariants, including 426 African variants, 1275Asian variants, and 940 European variants.

Nuclear genome ancestry and mtDNAheteroplasmic variants

We modeled the presence or absence of a hetero-plasmic variant in a particular individual usinglogistic regression. We considered only the 2215mtDNAvariants frommore than4000haplogroup-specific variants that are present exclusively inEuropean or Asian branches of the world mtDNAphylogeny (26), as this allows unambiguous as-signation ofmitochondrial ancestry to each variant.To avoid the potential for bias induced by recentshared ancestry between individuals, we consideredonly the 9385 unrelated Asian or European in-dividuals inmatched andmismatched groups. Wefitted the following logistic regression model

logitPðyij ¼ 1Þ ¼ aþ bxj þ gvj þh1zi¼xj þ w1xj¼wi∩zi¼wiþy1xj¼wi∩zi≠wi

where yij ¼ 1 if variant j is heteroplasmic inindividual i and is zero otherwise; xj ¼ 1 if the

ancestry of variant j is European and is zero other-wise; vj is the logit of the homoplasmic allelefrequency of variant j in 30,506 NCBI samples;zi ¼ 0; 1; or 2 depending on whether the mito-chondrial ancestry of individual i is Asian,European, or African, respectively; and wi ¼ 1 ifthe nuclear ancestry of individual i is Europeanand is zero otherwise. The indicator variable1 evaluates to 1 if the conditions in its subscriptare met and to zero otherwise.

Validation dataset

We repeated the nuclear-mtDNA ancestry anal-ysis in 42,799WGS fromwhole-bloodDNA in theGenomics England 100,000 Genomes Rare Dis-easeMain Programme aligned to GRCh37 or/andhg38 using the same bioinformatics pipeline. See(9) for details.

REFERENCES AND NOTES

1. S. B. Vafai, V. K. Mootha, Mitochondrial disorders as windowsinto an ancient organelle. Nature 491, 374–383 (2012).doi: 10.1038/nature11707; pmid: 23151580

2. D. C. Wallace, Mitochondrial DNA variation in human radiation anddisease. Cell 163, 33–38 (2015). doi: 10.1016/j.cell.2015.08.067;pmid: 26406369

3. J. B. Stewart, P. F. Chinnery, The dynamics of mitochondrialDNA heteroplasmy: Implications for human health and disease.Nat. Rev. Genet. 16, 530–542 (2015). doi: 10.1038/nrg3966;pmid: 26281784

4. P. Soares et al., Correcting for purifying selection: An improvedhuman mitochondrial molecular clock. Am. J. Hum. Genet.84, 740–759 (2009). doi: 10.1016/j.ajhg.2009.05.001;pmid: 19500773

5. W. Wei, A. Gomez-Duran, G. Hudson, P. F. Chinnery,Correction: Background sequence characteristics influencethe occurrence and severity of disease-causing mtDNAmutations. PLOS Genet. 14, e1007364 (2018). doi: 10.1371/journal.pgen.1007364; pmid: 29727451

6. B. Cavadas et al., Fine Time Scaling of Purifying Selection onHuman Nonsynonymous mtDNA Mutations Based onthe Worldwide Population Tree and Mother-Child Pairs.Hum. Mutat. 36, 1100–1111 (2015). doi: 10.1002/humu.22849;pmid: 26252938

7. E. Ruiz-Pesini, D. Mishmar, M. Brandon, V. Procaccio,D. C. Wallace, Effects of purifying and adaptive selection onregional variation in human mtDNA. Science 303, 223–226(2004). doi: 10.1126/science.1088434; pmid: 14716012

8. C. Calabrese et al., MToolBox: A highly automated pipeline forheteroplasmy annotation and prioritization analysis of humanmitochondrial variants in high-throughput sequencing.Bioinformatics 30, 3115–3117 (2014). doi: 10.1093/bioinformatics/btu483; pmid: 25028726

9. See supplementary materials and methods.10. M. Gerstung, E. Papaemmanuil, P. J. Campbell, Subclonal

variant calling with multiple samples and prior knowledge.Bioinformatics 30, 1198–1204 (2014). doi: 10.1093/bioinformatics/btt750; pmid: 24443148

11. M. Gerstung et al., Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun.3, 811 (2012). doi: 10.1038/ncomms1814; pmid: 22549840

12. M. Wachsmuth, A. Hübner, M. Li, B. Madea, M. Stoneking,Age-Related and Heteroplasmy-Related Variation in HumanmtDNA Copy Number. PLOS Genet. 12, e1005939 (2016).doi: 10.1371/journal.pgen.1005939; pmid: 26978189

13. B. Rebolledo-Jaramillo et al., Maternal age effect and severegerm-line bottleneck in the inheritance of human mitochondrialDNA. Proc. Natl. Acad. Sci. U.S.A. 111, 15474–15479 (2014).doi: 10.1073/pnas.1409328111; pmid: 25313049

14. M. Li et al., Transmission of human mtDNA heteroplasmy inthe Genome of the Netherlands families: Support for avariable-size bottleneck. Genome Res. 26, 417–426 (2016).doi: 10.1101/gr.203216.115; pmid: 26916109

15. P. F. Chinnery et al., The inheritance of mitochondrial DNAheteroplasmy: Random drift, selection or both? Trends Genet.16, 500–505 (2000). doi: 10.1016/S0168-9525(00)02120-X;pmid: 11074292

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 10 of 12

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 12: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

16. D. M. Kirby, S. G. Kahler, M. L. Freckmann, D. Reddihough,D. R. Thorburn, Leigh disease caused by the mitochondrialDNA G14459A mutation in unrelated families. Ann. Neurol. 48,102–104 (2000). doi: 10.1002/1531-8249(200007)48:1<102:AID-ANA15>3.0.CO;2-M; pmid: 10894222

17. J. M. Shoffner et al., Leber’s hereditary optic neuropathy plusdystonia is caused by a mitochondrial DNA point mutation.Ann. Neurol. 38, 163–169 (1995). doi: 10.1002/ana.410380207;pmid: 7654063

18. I. J. Holt, A. E. Harding, R. K. Petty, J. A. Morgan-Hughes,A new mitochondrial disease associated with mitochondrialDNA heteroplasmy. Am. J. Hum. Genet. 46, 428–433 (1990).pmid: 2137962

19. Y. Tatuch et al., Heteroplasmic mtDNA mutation (T----G) at8993 can cause Leigh disease when the percentage ofabnormal mtDNA is high. Am. J. Hum. Genet. 50, 852–858(1992). pmid: 1550128

20. I. J. Wilson et al., Mitochondrial DNA sequence characteristicsmodulate the size of the genetic bottleneck. Hum. Mol.Genet. 25, 1031–1041 (2016). doi: 10.1093/hmg/ddv626;pmid: 26740552

21. The NIHR BioResource, on behalf of the 100,000 GenomesProject, Whole-genome sequencing of rare disease patients ina national healthcare system. bioRxiv 507244 [Preprint].1 January 2019. doi: 10.1101/507244

22. G. S. Gorman et al., Prevalence of nuclear and mitochondrialDNA mutations related to adult mitochondrial disease.Ann. Neurol. 77, 753–759 (2015). doi: 10.1002/ana.24362;pmid: 25652200

23. H. R. Elliott, D. C. Samuels, J. A. Eden, C. L. Relton,P. F. Chinnery, Pathogenic mitochondrial DNA mutations arecommon in the general population. Am. J. Hum. Genet. 83, 254–260(2008). doi: 10.1016/j.ajhg.2008.07.004; pmid: 18674747

24. Y. S. Ju et al., Origins and functional consequencesof somatic mitochondrial DNA mutations in human cancer.eLife 3, e02935 (2014). doi: 10.7554/eLife.02935;pmid: 25271376

25. Y. Matsuda, I. Hanasaki, R. Iwao, H. Yamaguchi, T. Niimi,Estimation of diffusive states from single-particle trajectory inheterogeneous medium using machine-learning methods.Phys. Chem. Chem. Phys. 20, 24099–24108 (2018).doi: 10.1039/C8CP02566E; pmid: 30204178

26. M. van Oven, M. Kayser, Updated comprehensive phylogenetictree of global human mitochondrial DNA variation. Hum. Mutat.30, E386–E394 (2009). doi: 10.1002/humu.20921;pmid: 18853457

27. H. J. Muller, The relation of recombination to mutationaladvance. Mutat. Res. 106, 2–9 (1964). doi: 10.1016/0027-5107(64)90047-8; pmid: 14195748

28. Y. Shi et al., Mitochondrial transcription termination factor1 directs polar replication fork pausing. Nucleic Acids Res. 44,5732–5742 (2016). doi: 10.1093/nar/gkw302; pmid: 27112570

29. M. Falkenberg et al., Mitochondrial transcription factors B1and B2 activate transcription of human mtDNA. Nat. Genet. 31,289–294 (2002). doi: 10.1038/ng909; pmid: 12068295

30. T. J. Nicholls, M. Minczuk, In D-loop: 40 years of mitochondrial7S DNA. Exp. Gerontol. 56, 175–181 (2014). doi: 10.1016/j.exger.2014.03.027; pmid: 24709344

31. H. Ma, P. H. O’Farrell, Selfish drive can trump function whenanimal mitochondrial genomes compete. Nat. Genet. 48,798–802 (2016). doi: 10.1038/ng.3587; pmid: 27270106

32. H. Ma et al., Metabolic rescue in pluripotent cells from patientswith mtDNA disease. Nature 524, 234–238 (2015).doi: 10.1038/nature14546; pmid: 26176921

33. L. A. Hyslop et al., Towards clinical application of pronucleartransfer to prevent mitochondrial DNA disease. Nature534, 383–386 (2016). doi: 10.1038/nature18303;pmid: 27281217

34. M. Yamada et al., Genetic Drift Can Compromise MitochondrialReplacement by Nuclear Transfer in Human Oocytes. Cell StemCell 18, 749–754 (2016). doi: 10.1016/j.stem.2016.04.001;pmid: 27212703

35. E. Kang et al., Mitochondrial replacement in humanoocytes carrying pathogenic mitochondrial DNA mutations.Nature 540, 270–275 (2016). doi: 10.1038/nature20592;pmid: 27919073

36. P. F. Chinnery, D. C. Samuels, Relaxed replication of mtDNA:A model with implications for the expression of disease. Am. J.Hum. Genet. 64, 1158–1165 (1999). doi: 10.1086/302311;pmid: 10090901

37. P. F. Chinnery, D. C. Samuels, J. Elson, D. M. Turnbull,Accumulation of mitochondrial DNA mutations in ageing,

cancer, and mitochondrial disease: Is there a commonmechanism? Lancet 360, 1323–1325 (2002). doi: 10.1016/S0140-6736(02)11310-9; pmid: 12414225

38. V. I. Floros et al., Segregation of mitochondrial DNAheteroplasmy through a developmental genetic bottleneck inhuman embryos. Nat. Cell Biol. 20, 144–151 (2018).doi: 10.1038/s41556-017-0017-8; pmid: 29335530

39. M. Ginsburg, M. H. L. Snow, A. McLaren, Primordial germ cellsin the mouse embryo during gastrulation. Development 110,521–528 (1990). pmid: 2133553

40. H. Li, A statistical framework for SNP calling, mutationdiscovery, association mapping and population geneticalparameter estimation from sequencing data. Bioinformatics 27,2987–2993 (2011). doi: 10.1093/bioinformatics/btr509;pmid: 21903627

41. H. Li et al., The Sequence Alignment/Map format andSAMtools. Bioinformatics 25, 2078–2079 (2009).doi: 10.1093/bioinformatics/btp352; pmid: 19505943

42. A. Tan, G. R. Abecasis, H. M. Kang, Unified representationof genetic variants. Bioinformatics 31, 2202–2204(2015). doi: 10.1093/bioinformatics/btv112;pmid: 25701572

43. W. McLaren et al., The Ensembl Variant Effect Predictor.Genome Biol. 17, 122 (2016). doi: 10.1186/s13059-016-0974-4;pmid: 27268795

44. G. Jun et al., Detecting and estimating contamination of humanDNA samples in sequencing and array-based genotype data.Am. J. Hum. Genet. 91, 839–848 (2012). doi: 10.1016/j.ajhg.2012.09.004; pmid: 23103226

45. H. Weissensteiner et al., HaploGrep 2: Mitochondrial haplogroupclassification in the era of high-throughput sequencing. NucleicAcids Res. 44, W58–W63 (2016). doi: 10.1093/nar/gkw233;pmid: 27084951

46. L. B. Alexandrov et al., Signatures of mutational processes inhuman cancer. Nature 500, 415–421 (2013). doi: 10.1038/nature12477; pmid: 23945592

47. W. J. Kent et al., The human genome browser at UCSC.Genome Res. 12, 996–1006 (2002). doi: 10.1101/gr.229102;pmid: 12045153

48. M. D. Hendy, M. D. Woodhams, A. Dodd, Modellingmitochondrial site polymorphisms to infer the number ofsegregating units and mutation rate. Biol. Lett. 5, 397–400(2009). doi: 10.1098/rsbl.2009.0104; pmid: 19324622

49. MITOMAP: A Human Mitochondrial Genome Database (2008);www.mitomap.org.

ACKNOWLEDGMENTS

We gratefully acknowledge the patients, families, and health careprofessionals involved in the NIHR BioResource–Rare Diseasesand 100,000 Genomes projects. We thank N. S. Jones for criticalcomments on an early draft of the manuscript. Funding: Thisstudy makes use of data generated by the NIHR BioResource andthe Genomics England Rare Diseases pilot projects. Genotype andphenotype data of both projects are part of the 100,000 GenomesProject. The main source of funding for the BioResource andGenomics England is provided by the National Institute for HealthResearch of England (NIHR) (www.nihr.ac.uk). This work was alsomade possible by funding from the UK Medical Research Council(MRC) to create the UK Clinical Genomics Data Centre. P.F.C. is aWellcome Trust Principal Research Fellow (101876/Z/13/Z and212219/Z/18/Z) and a NIHR Senior Investigator who receives supportfrom the Medical Research Council Mitochondrial Biology Unit(MC_UU_00015/9), the Evelyn Trust, and the NIHR BiomedicalResearch Centre based at Cambridge University Hospitals NHSFoundation Trust and the University of Cambridge. W.H.O. is a NIHRSenior Investigator, and his laboratory receives support from theBritish Heart Foundation, Bristol-Myers Squibb, EuropeanCommission, MRC, NHS Blood and Transplant, Rosetrees Trust,and the NIHR Biomedical Research Centre based at CambridgeUniversity Hospitals NHS Foundation Trust and the University ofCambridge. M.C. is an NIHR Senior Investigator and is funded by theNIHR Biomedical Research Centre at St Bartholomew’s Hospital.Je.T., Jo.T., S.P., and A.O.M.W. are funded by the NIHR BiomedicalResearch Centre, Oxford. This work was supported in part byWellcome Trust grant 090532/Z/09/Z. R.H. is funded by WellcomeTrust grants 201064/Z/16/Z, 109915/Z/15/Z, and 203105/Z/16/Z;MRC UK grant MR/N025431/1; ERC grant 309548; and NewtonFundMR/N027302/1. J.S. is funded by MRC UK grant MR/M012212/1.A.M., G.A., and A.W. are funded by the Moorfields Eye Charity. G.A.and A.W. are funded by the RP Fighting Blindness. All Moorfields EyeHospital and Institute of Ophthalmology authors are funded by theUCL Institute of Ophthalmology and Moorfields NIHR Biomedical

Resource Centre. The Bristol NIHR Biomedical Research Centreprovided infrastructure for BioResource activities in Bristol. AdditionalNIHR Biomedical Research Centres that contributed include ImperialCollege Healthcare NHS Trust BRC, Guy’s and St Thomas’ NHSFoundation Trust and King’s College London BRC. The authors listedalso represent NephroS, the UK study of Nephrotic Syndrome. A.L. isa British Heart Foundation Senior Basic Science Research Fellow(FS/13/48/30453). D.L.B., A.C.T, N.V.Z., and M.I.M. are members ofthe DOLORisk consortium funded by the European CommissionHorizon 2020 (ID633491). A.C.T. is a member of the InternationalDiabetic Neuropathy Consortium and the Novo Nordisk Foundation(NNF14SA0006). D.L.B. is a Wellcome clinical scientist (202747/Z/16/Z).A.R.W. is supported by the NIHR-BRC of UCL Institute ofOphthalmology and Moorfields Eye Hospital. I.R. and E.L. are supportedby the NIHR Translational Research Collaboration–Rare Diseases.H.J.B. works for the Netherlands CardioVascular Research Initiative(CVON). T.K.B. is sponsored by the NHSBT and British Society ofHaematology. K.G.C.S. holds a Wellcome Investigator Award,MRC Programme Grant (MR/L019027/1). M.I.M. is a Wellcome SeniorInvestigator (supported by Wellcome grants 090532 and 0938381).P.H.D. receives funding from ICP Support. H.S.M. receives support fromBHF Programme grant RG/16/4/32218. N.C. is partially funded byImperial College NIHR BRC. M.R.W. holds a NIHR award to the NIHRImperial Clinical Research Facility at Imperial College HealthcareNHS Trust. P.Y.W.M. is supported by grants from MRC UK(G1002570), Fight for Sight (1570/1571 and 24TP171), and NIHR(IS-BRC-1215-20002). R.H. is a Wellcome Trust Investigator(109915/Z/15/Z) who receives support from the Wellcome Centrefor Mitochondrial Research (203105/Z/16/Z), MedicalResearch Council (UK) (MR/N025431/1), the European ResearchCouncil (309548), the Wellcome Trust Pathfinder Scheme(201064/Z/16/Z), the Newton Fund (UK/Turkey, MR/N027302/1),and the European Union H2020–Research and InnovationActions (SC1-p.m.-03-2017, Solve-RD). K.F. and C.V.G. weresupported by the Research Council of the University of Leuven(BOF KU Leuven‚ Belgium; OT/14/098). J.S.W. is funded by theWellcome Trust (107469/Z/15/Z) and the NIHR CardiovascularBiomedical Research Unit at Royal Brompton and Harefield NHSFoundation Trust and Imperial College London. G.A. is funded byNIHR-Biomedical Research Centre at Moorfields Eye Hospital andUCL Institute of Ophthalmology, Fight for Sight (UK) Early CareerInvestigator Award, Moorfields Eye Hospital Special Trustees,Moorfields Eye Charity, Foundation Fighting Blindness (USA), andRetinitis Pigmentosa Fighting Blindness. M.C.S. holds a MRC ClinicalResearch Training Fellowship (grant MR/R002363/1). M.A.Ku. holds aNIHR Research Professorship NIHR-RP-2016-07-019 and WellcomeIntermediate Fellowship 098524/Z/12/A. J.Whi. is a recipient ofa Cancer Research UK Cambridge Cancer Centre Clinical ResearchTraining Fellowship. A.J.M. has received funding from a MedicalResearch Council Senior Clinical Fellowship (MR/L006340/1). D.P.G.is funded by the MRC; Kidney Research UK; and St Peters Trust forKidney, Bladder and Prostate Research. S.A.J. is funded by KidsKidney Research. C.L. received funding from a MRC ClinicalResearch Training Fellowship (MR/J011711/1). K.D. is a HSST traineesupported by Health Education England. C.Had. was funded througha Ph.D. Fellowship by the NIHR Translational Research CollaborationRare Diseases. M.J.D. receives funding from the Wellcome Trust(WT098519MA). K.J.M. is supported by the Northern CountiesKidney Research Fund. Some of the work performed by E.L.M. wascarried out at University College London Hospitals/UniversityCollege London, which received a proportion of funding from theDepartment of Health’s NIHR Biomedical Research Centres fundingscheme. K.C.G. is a holder of NIHR–BRC funding. P.L.B. is aNIHR Senior Investigator. This research was partly funded bythe NIHR Great Ormond Street Hospital Biomedical ResearchCentre. The views expressed are those of the author(s) and notnecessarily those of the NHS, the NIHR, or the Department of Health.Author contributions: Study design: P.F.C., F.L.R., M.C., W.H.O.,E.T., and W.W. Data analysis: E.T., W.W., S.T., and M.J.K. Writing:P.F.C., E.T., and W.W. Experimental and analytical supervision:P.F.C. and E.T. Project supervision: P.F.C. and E.T. The remainingauthors contributed to the recruitment of participants, samplelogistics, and initial data preparation. Competing interests:M.I.M. serves on advisory panels for Pfizer, NovoNordisk, and ZoeGlobal; has received honoraria from Pfizer, NovoNordisk, andEli Lilly; has stock options in Zoe Global; and has received researchfunding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly,Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier,and Takeda. T.J.A. has received consultancy payments fromAstraZeneca within the past 5 years and has received speakerhonoraria from Illumina. K.J.M. previously received funding forresearch from and is currently on the scientific advisory boardof Gemini Therapeutics. M.C.S. received travel and accommodation

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 11 of 12

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 13: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

funds from NovoNordisk. D.M.L. serves on advisory boards forAgios, Novartis, and Cerus. A.M.K. had no competing interests atthe time of the study but after study completion received aneducational grant from CSL Behring to attend the ISTH meeting inBerlin in 2017. C.V.G. holds the Bayer and Norbert Heimburger(CSL Behring) chairs. Data and materials availability: Heteroplasmydata for the mother–child pairs is provided in data S1. Whole-genome sequence data from the NIHR BioResource–Rare Diseasesproject can be found in the European Genome-phenome Archive(EGA) at the EMBL European Bioinformatics Institute (BPD:EGAD00001004519, CSVD: EGAD00001004513, HCM:EGAD00001004514, ICP: EGAD00001004515, IRD:

EGAD00001004520, MPMT: EGAD00001004521, NDD:EGAD00001004522, NPD: EGAD00001004516, PAH:EGAD00001004525, PID: EGAD00001004523, PMG:EGAD00001004517, SMD: EGAD00001004524, SRNS:EGAD00001004518; see table S5 for the disease abbreviations).Whole-genome sequence data from the UK Biobank samplesare available through a data release process overseen byUK Biobank (www.ukbiobank.ac.uk/). Whole-genome sequencedata from the participants enrolled in 100,000 Genomes Projectcan be accessed via Genomics England Limited following theprocedure outlined at www.genomicsengland.co.uk/about-gecip/joining-research-community/.

SUPPLEMENTARY MATERIALS

science.sciencemag.org/content/364/6442/eaau6520/suppl/DC1Materials and MethodsCollaborator Names and AffiliationsFigs. S1 to S15Tables S1 to S5References (50–58)Data S1

3 July 2018; resubmitted 22 January 2019Accepted 3 April 201910.1126/science.aau6520

Wei et al., Science 364, eaau6520 (2019) 24 May 2019 12 of 12

RESEARCH | RESEARCH ARTICLEon July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 14: Germline selection shapes human mitochondrial DNA diversity€¦ · R N R 2 R 1 Germline selection of human mitochondrial DNA is shaped by the nuclear genome. (Top left) Initially,

Germline selection shapes human mitochondrial DNA diversity

Mark Caulfield, Ernest Turro and Patrick F. ChinneryChristopher J. Penkett, Kathleen E. Stirrups, Augusto Rendon, Willem H. Ouwehand, John R. Bradley, F. Lucy Raymond, Catherine Williamson, NIHR BioResource-Rare Diseases, 100,000 Genomes Project-Rare Diseases Pilot, Sofie Ashford,Roberts, John A. Sayer, Kenneth G. C. Smith, Jenny C. Taylor, Hugh Watkins, Andrew R. Webster, Andrew O. M. Wilkie, Houlden, Melita Irving, Ania Koziell, Eamonn R. Maher, Hugh S. Markus, Nicholas W. Morrell, William G. Newman, IreneGale, Maria A. K. Bitner-Glindzicz, Graeme C. Black, Paul Brennan, Perry Elliott, Frances A. Flinter, R. Andres Floto, Henry Wei Wei, Salih Tuna, Michael J. Keogh, Katherine R. Smith, Timothy J. Aitman, Phil L. Beales, David L. Bennett, Daniel P.

DOI: 10.1126/science.aau6520 (6442), eaau6520.364Science 

, this issue p. eaau6520ScienceThus, mtDNA is under selection at specific loci in the human germ line.mtDNA variants were passed on to children. This effect was influenced by the mother's nuclear genetic background.mtDNA. Furthermore, studies of more than 1500 mother-offspring pairs indicated that the female line selected which revealed that about 45% of individuals carry heteroplasmic mtDNA sequences at levels greater than 1% of their totalmtDNA sequences to determine mtDNA genome structure, selection, and transmission. Whole-genome sequencing

explored humanet al.the transmission of multiple genetic variants into the next generation. Wei −−prevent heteroplasmy In humans, mitochondrial DNA (mtDNA) is predominantly maternally inherited. mtDNA is under selection to

Heteroplasmy incidence in mitochondrial DNA

ARTICLE TOOLS http://science.sciencemag.org/content/364/6442/eaau6520

MATERIALSSUPPLEMENTARY http://science.sciencemag.org/content/suppl/2019/05/22/364.6442.eaau6520.DC1

CONTENTRELATED

http://stm.sciencemag.org/content/scitransmed/4/118/118ra10.fullhttp://stm.sciencemag.org/content/scitransmed/8/326/326rv3.full

REFERENCES

http://science.sciencemag.org/content/364/6442/eaau6520#BIBLThis article cites 58 articles, 8 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

is a registered trademark of AAAS.ScienceScience, 1200 New York Avenue NW, Washington, DC 20005. The title (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement ofScience

Science. No claim to original U.S. Government WorksCopyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of

on July 8, 2020

http://science.sciencemag.org/

Dow

nloaded from