sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

52
DOI: 10.1126/science.1237619 , 562 (2013); 341 Science et al. G. David Poznik Common Ancestor of Males Versus Females Sequencing Y Chromosomes Resolves Discrepancy in Time to This copy is for your personal, non-commercial use only. clicking here. colleagues, clients, or customers by , you can order high-quality copies for your If you wish to distribute this article to others here. following the guidelines can be obtained by Permission to republish or repurpose articles or portions of articles ): August 7, 2013 www.sciencemag.org (this information is current as of The following resources related to this article are available online at http://www.sciencemag.org/content/341/6145/562.full.html version of this article at: including high-resolution figures, can be found in the online Updated information and services, http://www.sciencemag.org/content/suppl/2013/08/01/341.6145.562.DC1.html can be found at: Supporting Online Material http://www.sciencemag.org/content/341/6145/562.full.html#related found at: can be related to this article A list of selected additional articles on the Science Web sites http://www.sciencemag.org/content/341/6145/562.full.html#ref-list-1 , 22 of which can be accessed free: cites 46 articles This article http://www.sciencemag.org/content/341/6145/562.full.html#related-urls 1 articles hosted by HighWire Press; see: cited by This article has been registered trademark of AAAS. is a Science 2013 by the American Association for the Advancement of Science; all rights reserved. The title Copyright American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the Science on August 7, 2013 www.sciencemag.org Downloaded from

Upload: carlos-bella

Post on 11-May-2015

4.824 views

Category:

Education


6 download

TRANSCRIPT

Page 1: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

DOI: 10.1126/science.1237619, 562 (2013);341 Science

et al.G. David PoznikCommon Ancestor of Males Versus FemalesSequencing Y Chromosomes Resolves Discrepancy in Time to

This copy is for your personal, non-commercial use only.

clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others

  here.following the guidelines

can be obtained byPermission to republish or repurpose articles or portions of articles

  ): August 7, 2013 www.sciencemag.org (this information is current as of

The following resources related to this article are available online at

http://www.sciencemag.org/content/341/6145/562.full.htmlversion of this article at:

including high-resolution figures, can be found in the onlineUpdated information and services,

http://www.sciencemag.org/content/suppl/2013/08/01/341.6145.562.DC1.html can be found at: Supporting Online Material

http://www.sciencemag.org/content/341/6145/562.full.html#relatedfound at:

can berelated to this article A list of selected additional articles on the Science Web sites

http://www.sciencemag.org/content/341/6145/562.full.html#ref-list-1, 22 of which can be accessed free:cites 46 articlesThis article

http://www.sciencemag.org/content/341/6145/562.full.html#related-urls1 articles hosted by HighWire Press; see:cited by This article has been

registered trademark of AAAS. is aScience2013 by the American Association for the Advancement of Science; all rights reserved. The title

CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 2: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

Sequencing Y Chromosomes ResolvesDiscrepancy in Time to CommonAncestor of Males Versus FemalesG. David Poznik,1,2 Brenna M. Henn,3,4 Muh-Ching Yee,3 Elzbieta Sliwerska,5Ghia M. Euskirchen,3 Alice A. Lin,6 Michael Snyder,3 Lluis Quintana-Murci,7,8 Jeffrey M. Kidd,3,5Peter A. Underhill,3 Carlos D. Bustamante3*

The Y chromosome and the mitochondrial genome have been used to estimate when the commonpatrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 malesfrom nine populations, including two in which we find basal branches of the Y-chromosome tree.We identify ancient phylogenetic structure within African haplogroups and resolve a long-standingambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome andthe mitochondrial genome, we estimate the time to the most recent common ancestor (TMRCA) ofthe Y chromosome to be 120 to 156 thousand years and the mitochondrial genome TMRCA tobe 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineagesdo not coalesce significantly more recently than female lineages.

The Y chromosome contains the longeststretch of nonrecombining DNA in thehuman genome and is therefore a pow-

erful tool with which to study human history.Estimates of the time to the most recent commonancestor (TMRCA) of the Y chromosome have dif-fered by a factor of about 2 from TMRCA estimatesfor the mitochondrial genome. Y-chromosomecoalescence time has been estimated in the rangeof 50 to 115 thousand years (ky) (1–3), althoughlarger values have been reported (4, 5), whereasestimates for mitochondrial DNA (mtDNA) rangefrom 150 to 240 ky (3, 6, 7). However, the qualityand quantity of data available for these two uni-parental loci have differed substantially. Whereas

the complete mitochondrial genome has beenresequenced thousands of times (6, 8), fullysequenced diverse Y chromosomes have onlyrecently become available. Previous estimates ofthe Y-chromosome TMRCA relied on short re-sequenced segments, rapidly mutating micro-satellites, or single-nucleotide polymorphisms(SNPs) ascertained in a small panel of individ-uals and then genotyped in a global panel. Theseapproaches likely underestimate genetic diver-sity and, consequently, TMRCA (9).

We sequenced the complete Y chromosomesof 69 males from seven globally diverse pop-ulations of the Human Genome Diversity Panel(HGDP) and two additional African populations:

San (Bushmen) from Namibia, Mbuti Pygmiesfrom the Democratic Republic of Congo, BakaPygmies andNzebi fromGabon,Mozabite Berbersfrom Algeria, Pashtuns (Pathan) from Pakistan,Cambodians, Yakut from Siberia, and Mayansfrom Mexico (fig. S1). Individuals were selectedwithout regard to their Y-chromosome haplogroups.

The Y-chromosome reference sequence is59.36 Mb, but this includes a 30-Mb stretch ofconstitutive heterochromatin on the q arm, a3-Mb centromere, 2.65-Mb and 330-kb telomericpseudoautosomal regions (PAR) that recombinewith the X chromosome, and eight smaller gaps.We mapped reads to the remaining 22.98 Mbof assembled reference sequence, which consistsof three sequence classes defined by their com-plexity and degree of homology to the X chro-mosome (10): X-degenerate, X-transposed, andampliconic. Both the high degree of self-identitywithin the ampliconic tracts and theX-chromosomehomology of the X-transposed region render por-tions of the Y chromosome ill suited for short-readsequencing. To address this, we constructed filtersthat reduced the data to 9.99 million sites (11)

1Program in Biomedical Informatics, Stanford University Schoolof Medicine, Stanford, CA, USA. 2Department of Statistics,StanfordUniversity, Stanford, CA, USA. 3Department of Genetics,Stanford University School of Medicine, Stanford, CA, USA.4Department of Ecology and Evolution, Stony Brook University,Stony Brook, NY, USA. 5Department of Human Genetics andDepartment of Computational Medicine and Bioinformatics,University of Michigan, Ann Arbor, MI, USA. 6Department ofPsychiatry, Stanford University, Stanford, CA, USA. 7InstitutPasteur, Unit of Human Evolutionary Genetics, 75015 Paris,France. 8Centre National de la Recherche Scientifique, URA3012,75015 Paris, France.

*Corresponding author. E-mail: [email protected]

050

100

150

200

250

300

350

400

450

500

Filt

ered

Dep

th E

WM

A

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Position (Mb)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(MQ

0 / U

nfilt

ered

Dep

th)

EW

MA

Depth FilterMQ0 Ratio FilterExclusion MaskInclusion Mask

Compatible SiteIncompatible Site

...

0 Mb 59.36 MbX degenerate X transposed Ampliconic Heterochromatic Pseudoautosomal Other

Fig. 1. Callability mask for the Y chromosome. Exponentially weightedmoving averages of read depth (blue line) and the proportion of readsmapping ambiguously (MQ0 ratio; violet line) versus physical position.Regions with values outside the envelopes defined by the dashed lines(depth) or dotted lines (MQ0) were flagged (blue and violet boxes) andmerged for exclusion (gray boxes). The complement (black boxes) defines

the regions within which reliable genotype calls can be made. Below, ascatter plot indicates the positions of all observed SNVs. Those incom-patible with the inferred phylogenetic tree (red) are uniformly distributed.The X-degenerate regions yield quality sequence data, ampliconic sequencestend to fail both filters, and mapping quality is poor in the X-transposedregion.

2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org562

REPORTS

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 3: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

(Fig. 1 and fig. S2). We then implemented a hap-loid model expectation-maximization algorithmto call genotypes (11).

We identified 11,640 single-nucleotide vari-ants (SNVs) (fig. S3). A total of 2293 (19.7%)are present in dbSNP (v135), and we assignedhaplogroups on the basis of the 390 (3.4%) presentin the International Society of Genetic Genealogy(ISOGG) database (12) (fig. S4). At SNVs, me-dian haploid coverage was 3.1x (interquartile range2.6 to 3.8x) (table S1 and fig. S5), and sequencevalidation suggests a genotype calling error rateon the order of 0.1% (11).

Because mutations accumulate over timealong a single lengthy haplotype (13), the male-specific region of the Y chromosome providespower for phylogenetic inference. We constructeda maximum likelihood tree from 11,640 SNVsusing the Tamura-Nei nucleotide substitution

model (Fig. 2) and, in agreement with (14), ob-serve strong bootstrap support (500 replicates)for the major haplogroup branching points. Thetree both recapitulates and adds resolution tothe previously inferred Y-chromosome phyloge-ny (fig. S6), and it characterizes branch lengthsfree of ascertainment bias. We identify extra-ordinary depth within Africa, including lineagessampled from the San hunter-gatherers thatcoalesce just short of the root of the entire tree.This stands in contrast to a tree from autosomalSNP genotypes (15), wherein African brancheswere considerably shorter than others; genotyp-ing arrays primarily rely on SNPs ascertained inEuropean populations and therefore undersamplediversity within Africa. Two regions of reducedbranch length in our tree correspond to rapidexpansions: the out-of-Africa event (downstreamof F-M89) and the agriculture-catalyzed Bantu

expansions (downstream of E-M2). Among thethree hunter-gatherer populations, we find a rel-atively high number of B2 lineages. Within thishaplogroup, six Baka B-M192 individuals form adistinct clade that does not correspond to extantdefinitions (11) (fig. S7). We estimate this pre-viously uncharacterized structure to have arisen~35 thousand years ago (kya).

We resolve the polytomy of the Y macro-haplogroup F (16) by determining the branchingorder of haplogroups G, H, and IJK (Fig. 2 andfig. S6).We identified a single variant (rs73614810,a C→T transition dubbed “M578”) for whichhaplogroupG retains the ancestral allele, whereasits brother clades (H and IJK) share the derivedallele. Genotyping M578 in a diverse panel con-firmed the finding (table S2). We thereby infermore recent common ancestry between hgH andhgIJK than between either and hgG. M578 de-

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 1100.0 1200.0

H-M138Cambodian

N-M231Cambodian

E-P59 Nzebi

Q-M3 Maya

E-P116 NzebiE-M191Nzebi

E-P252 Nzebi

B-P70 San

E-U290 Nzebi

B-M192Baka

N-L708 Yakut

E-M183Mozabite

N-L708 Yakut

E-U290 Baka

E-P116 Nzebi

N-L708 Yakut

L-M357 Pashtun

R-L657 Pashtun

E-M154Nzebi

A-P28 San

Q-L54 Maya

B-M192Baka

A-M14 Baka

B-M30 Baka

E-P277 Nzebi

E-M183Mozabite

B-M192Baka

O-Page23 Cambodian

E-P278.1Nzebi

E-P252 Baka

E-P277 Nzebi

E-U290 Nzebi

E-P278.1NzebiE-P277 Nzebi

B-M211Baka

A-M51San

E-P252 Baka

E-M191Nzebi

E-P252 Mbuti

G-M406Pashtun

E-L515 Baka

N-L708 Yakut

E-P252 Baka

E-M183Mozabite

B-M112Baka

B-P6San

B-M211Baka

E-P277 Nzebi

B-M192Baka

A-P262San

G-M377Pashtun

E-P277 Nzebi

B-M109Nzebi

E-P277 Mbuti

E-M183Mozabite

B-M112Baka

B-Page18 Mbuti

B-M192Baka

E-P277 Nzebi

B-P6San

E-P252 Mbuti

B-M192Mbuti

E-P252 Nzebi

B-M30 Baka

B-M192Baka

E-P277 Nzebi

E-P252 Baka

O-M95 Cambodian

B-M112Baka

CT-M168

N-Page56

B-M150

P-M45

O-P186

E-U290

A-M6

B-P6

G-P287

B-M182

E-M2/M180

Q-L54

B-M211

E-M191E-L514

BT-M42E-P179

KxLT-M526

B-M192

E-U175/P277

N-L708

A-M14

B-M30

F-M89

E-M183

E-P252

A-L419

K-M9NO-M214

BE

FT

(No

n-A

fric

an)

AHap

log

rou

ps

HIJK-M578

Fig. 2. Y-chromosome phylogeny inferred from genomic sequencing. Thistree recapitulates the previously known topology of the Y-chromosome phylogeny;however, branch lengths are now free of ascertainment bias. Branches are drawnproportional to the number of derived SNVs. Internal branches are labeled withdefining ISOGG variants inferred to have arisen on the branch. Leaves are colored

by major haplogroup cluster and labeled with the most derived mutation observedand the population from which the individual was drawn. Previously uncharacterizedstructure within African hgB2 is indicated in orange. (Inset) Resolution of apolytomy was possible through the identification of a variant for which hgGretains the ancestral allele, whereas hgH and hgIJK share the derived allele.

www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 563

REPORTS

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 4: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

fines an early diversification episode of the Yphylogeny in Eurasia (11).

To account for missing genotypes, we as-signed each SNV to the root of the smallest sub-tree containing all carriers of one allele or theother and inferred that the allele specific to thesubtree was derived (fig. S8). We used the chim-panzee Y-chromosome sequence to polarize 398variants assigned to the deepest split—a taskcomplicated by substantial structural divergence(11, 17).

We estimated the coalescence time of all Ychromosomes using both amolecular clock–basedfrequentist estimator and an empirical Bayes ap-proach that uses a prior distribution of TMRCA

from coalescent theory and conductsMarkov chainsimulation to estimate the likelihood of param-eters given a set of DNA sequences (GENETREE)(11, 18) (Table 1). To directly compare the TMRCA

of the Y chromosome to that of the mtDNA, weestimated their respective mutation rates by cali-brating phylogeographic patterns from the initialpeopling of the Americas, a recent human eventwith high-confidence archaeological dating.

Archaeological evidence indicates that humansfirst colonized the Americas ~15 kya via a rapidcoastal migration that reached Monte Verde II insouthern Chile by 14.6 kya (19). The two NativeAmericanMayans represent Y-chromosome hgQlineages, Q-M3 and Q-L54*(xM3), that likelydiverged at about the same time as the initialpeopling of the continents. Q is defined by theM242 mutation that arose in Asia. A descendenthaplogroup, Q-L54, emerged in Siberia and isancestral to Q-M3. Because the M3 mutationappears to be specific to the Americas (20), itlikely occurred after the initial entry, and theprevalence of M3 in South America suggeststhat it emerged before the southward migratorywave. Consequently, the divergence betweenthese two lineages provides an appropriate cal-ibration point for the Y mutation rate. The largenumber of variants that have accumulated sincedivergence, 120 and 126, contrasts with thepedigree-based estimate of the Y-chromosomemutation rate, which is based on just 4 mutations(21). Using entry to the Americas as a calibrationpoint, we estimate a mutation rate of 0.82 × 10−9

per base pair (bp) per year [95% confidenceinterval (CI): 0.72 × 10−9 to 0.92 × 10−9/bp/year](table S3). False negatives have minimal effecton this estimate due to the low probability, at5.7x and 8.5x coverage, of observing fewerthan two reads at a site (observed proportions:3.1% and 0.6%) and due to the fact that thenumber of unobserved singletons possessed byone individual is offset by a similar number ofQ doubletons unobserved in the same individualand thereby misclassified as singletons possessedby the other (11) (figs. S9 and S10). This calibra-tion approach assumes approximate coincidencebetween the expansion throughout the Americasand the divergence of Q-M3 and Q-L54*(xM3),but we consider deviation from this assumptionand identify a strict lower bound on the point of

divergence using sequences from the 1000 Ge-nomes Project (11). As a comparison point, weconsider the out-of-Africa expansion of modernhumans, which dates to approximately 50 kya(22) and yields a similar mutation rate of0.79 × 10−9/bp/year.

We constructed an analogous pipeline forhigh coverage (>250x) mtDNA sequences fromthe 69male samples and an additional 24 femalesfrom the seven HGDP populations (11) (fig. S11).As in the Y-chromosome analysis, we calibratedthe mtDNAmutation rate using divergence with-in the Americas. We selected the pan-AmericanhgA2, one of several initial founding haplogroupsamong Native Americans. The star-shaped phy-logeny of hgA2 subclades suggests that its di-vergence was coincident with the rapid dispersalupon the initial colonization of the continents(23). Calibration on 108 previously analyzed hgA2sequences (11) (fig. S12) yields a point estimateequivalent to that fromour sevenMayanmtDNAs,but within a narrower confidence interval. Fromthis within-human calibration, we estimate a mu-tation rate of 2.3 × 10−8/bp/year (95% CI: 2.0 ×10−8 to 2.5 × 10−8/bp/year), higher than that fromhuman-chimpanzee divergence but similar toother estimates using within-human calibrationpoints (24, 25).

The global TMRCA estimate for any locus con-stitutes an upper bound for the time of human

population divergence under models without geneflow. We estimate the Y-chromosome TMRCA

to be 138 ky (120 to 156 ky) and the mtDNATMRCA to be 124 ky (99 to 148 ky) (Table 1) (11).Our mtDNA estimate is more recent than manyprevious studies, the majority of which used mu-tation rates extrapolated from between-speciesdivergence. However, mtDNAmutation rates aresubject to a time-dependent decline, with pedigree-based estimates on the faster end of the spectrumand species-based estimates on the slower. Be-cause of this time dependency and the need tocalibrate the Yand mtDNA in a comparable man-ner, it is more appropriate here to use within-human clade estimates of the mutation rate.

Rather than assume the mutation rate to be aknown constant, we explicitly account for theuncertainty in its estimation by modeling eachTMRCA as the ratio of two random variables.We estimate the ratio of the mtDNA TMRCA tothat of the Y chromosome to be 0.90 (95% CI:0.68 to 1.11) (fig. S13). If, as argued above, thedivergence of the Y-chromosome Q lineagesoccurred at approximately the same time as thatof the mtDNA A2 lineages, then the TMRCA

ratio is invariant to the specific calibration timeused. Regardless, the conclusion of parity isrobust to possible discrepancy between the di-vergence times within the Americas (11). Usingcomparable calibration approaches, the Y and

Table 1. TMRCA and Ne estimates for the Y chromosome and mtDNA. Pop., population.

MethodY chromosome mtDNA

Pop. n TMRCA* Ne Pop. n TMRCA* Ne

Molecular clock All 69 139 (120–156) 4500† All 93 124 (99–148) 9500†

GENETREE‡ San 6 128 (112–146) 3800 Nzebi 18 105 (91–119) 11,500Baka 11 122 (106–137) 1800 Mbuti 6 121 (100–143) 3700

*Employs mutation rate estimated from within-human calibration point. Times measured in ky. †Uses Watterson’sestimator, %qw . ‡Each coalescent analysis restricted to a single population spanning the ancestral root (11).

Fig. 3. Similarity ofTMRCA does not implyequivalent Ne of malesand females. The TMRCAfor a given locus is drawnfrom a predata (i.e., prior)distribution that is a func-tion of Ne, generation time,sample size, and demo-graphic history. Considerthe distribution of possibleTMRCAs for a set of 100uniparental chromosomes.Although the Mbuti mtDNANe is twice as large as thatof the Baka Y chromosome,the corresponding predataTMRCA distributions overlapconsiderably.

0.00

00.

002

0.00

40.

006

0.00

80.

010

Time (ky)

Pro

babi

lity

Den

sity

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org564

REPORTS

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 5: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

mtDNA coalescence times are not significantlydifferent. This conclusion would hold whetheror not an alternative approach would yield moredefinitive TMRCA estimates.

Our observation that the TMRCA of the Ychromosome is similar to that of the mtDNAdoes not imply that the effective population sizes(Ne) of males and females are similar. In fact,we observe a larger Ne in females than in males(Table 1). Although, due to its larger Ne, the dis-tribution from which the mitochondrial TMRCA

has been drawn is right-shifted with respect tothat of the Y-chromosome TMRCA, the two dis-tributions have large variances and overlap (Fig. 3).

Dogma has held that the common ancestor ofhuman patrilineal lineages, popularly referred toas the Y-chromosome “Adam,” lived considera-bly more recently than the common ancestor offemale lineages, the so-called mitochondrial“Eve.”However, we conclude that the mitochon-drial coalescence time is not substantially greaterthan that of the Y chromosome. Indeed, due toour moderate-coverage sequencing and the ex-istence of additional rare divergent haplogroups,our analysis may yet underestimate the trueY-chromosome TMRCA.

References and Notes1. J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun,

M. W. Feldman, Mol. Biol. Evol. 16, 1791–1798(1999).

2. R. Thomson, J. K. Pritchard, P. Shen, P. J. Oefner,M. W. Feldman, Proc. Natl. Acad. Sci. U.S.A. 97,7360–7365 (2000).

3. H. Tang, D. O. Siegmund, P. Shen, P. J. Oefner,M. W. Feldman, Genetics 161, 447–459 (2002).

4. M. F. Hammer, Nature 378, 376–378 (1995).5. F. Cruciani et al., Am. J. Hum. Genet. 88, 814–818

(2011).6. M. Ingman, H. Kaessmann, S. Pääbo, U. Gyllensten,

Nature 408, 708–713 (2000).7. R. L. Cann, M. Stoneking, A. C. Wilson, Nature 325,

31–36 (1987).8. P. A. Underhill, T. Kivisild, Annu. Rev. Genet. 41,

539–564 (2007).9. M. A. Jobling, C. Tyler-Smith, Nat. Rev. Genet. 4,

598–612 (2003).10. H. Skaletsky et al., Nature 423, 825–837 (2003).11. Materials and methods are available as supplementary

materials on Science Online.12. ISOGG, International Society of Genetic Genealogy

(2013); available at www.isogg.org/.13. P. A. Underhill et al., Ann. Hum. Genet. 65, 43–62 (2001).14. W. Wei et al., Genome Res. 23, 388–395 (2013).15. J. Z. Li et al., Science 319, 1100–1104 (2008).16. T. M. Karafet et al., Genome Res. 18, 830–838 (2008).17. J. F. Hughes et al., Nature 463, 536–539 (2010).18. R. C. Griffiths, S. Tavaré, Philos. Trans. R. Soc. London B

Biol. Sci. 344, 403–410 (1994).19. T. Goebel, M. R. Waters, D. H. O’Rourke, Science 319,

1497–1502 (2008).20. M. C. Dulik et al., Am. J. Hum. Genet. 90, 229–246

(2012).21. Y. Xue et al.; Asan, Curr. Biol. 19, 1453–1457 (2009).22. R. G. Klein, Evol. Anthropol. 17, 267–281 (2008).23. S. Kumar et al., BMC Evol. Biol. 11, 293 (2011).24. S. Y. W. Ho, M. J. Phillips, A. Cooper, A. J. Drummond,

Mol. Biol. Evol. 22, 1561–1568 (2005).

25. B. M. Henn, C. R. Gignoux, M. W. Feldman,J. L. Mountain, Mol. Biol. Evol. 26, 217–230 (2009).

Acknowledgments: We thank O. Cornejo, S. Gravel,D. Siegmund, and E. Tsang for helpful discussions; M. Sikoraand H. Costa for mapping reads from Gabonese samples; andH. Cann for assistance with HGDP samples. This work wassupported by National Library of Medicine training grantLM-07033 and NSF graduate research fellowship DGE-1147470(G.D.P.); NIH grant 3R01HG003229 (B.M.H. and C.D.B.);NIH grant DP5OD009154 ( J.M.K. and E.S.); and InstitutPasteur, a CNRS Maladies Infectieuses Émergentes Grant,and a Foundation Simone et Cino del Duca Research Grant(L.Q.M.). P.A.U. consulted for, P.A.U. and B.M.H. have stockin, and C.D.B. is on the advisory board of a project at 23andMe.C.D.B. is on the scientific advisory boards of Personalis, Inc.;InVitae (formerly Locus Development, Inc.); and Ancestry.com.M.S. is a scientific advisory member and founder of Personalis,a scientific advisory member for Genapsys Former, and aconsultant for Illumina and Beckman Coulter Society forAmerican Medical Pathology. B.M.H. formerly had a paidconsulting relationship with Ancestry.com. Variants have beendeposited to dbSNP (ss825679106–825690384). Individuallevel genetic data are available, through a data accessagreement to respect the privacy of the participants fortransfer of genetic data, by contacting C.D.B.

Supplementary Materialswww.sciencemag.org/cgi/content/full/341/6145/562/DC1Materials and MethodsSupplementary TextFigs. S1 to S13Tables S1 to S3Data File S1References (26–51)11 March 2013; accepted 25 June 201310.1126/science.1237619

Low-Pass DNA Sequencing of 1200Sardinians Reconstructs EuropeanY-Chromosome PhylogenyPaolo Francalacci,1* Laura Morelli,1† Andrea Angius,2,3 Riccardo Berutti,3,4 Frederic Reinier,3Rossano Atzeni,3 Rosella Pilu,2 Fabio Busonero,2,5 Andrea Maschio,2,5 Ilenia Zara,3Daria Sanna,1 Antonella Useli,1 Maria Francesca Urru,3 Marco Marcelli,3 Roberto Cusano,3Manuela Oppo,3 Magdalena Zoledziewska,2,4 Maristella Pitzalis,2,4 Francesca Deidda,2,4Eleonora Porcu,2,4,5 Fausto Poddie,4 Hyun Min Kang,5 Robert Lyons,6 Brendan Tarrier,6Jennifer Bragg Gresham,6 Bingshan Li,7 Sergio Tofanelli,8 Santos Alonso,9 Mariano Dei,2Sandra Lai,2 Antonella Mulas,2 Michael B. Whalen,2 Sergio Uzzau,4,10 Chris Jones,3David Schlessinger,11 Gonçalo R. Abecasis,5 Serena Sanna,2 Carlo Sidore,2,4,5 Francesco Cucca2,4*

Genetic variation within the male-specific portion of the Y chromosome (MSY) can clarify theorigins of contemporary populations, but previous studies were hampered by partial geneticinformation. Population sequencing of 1204 Sardinian males identified 11,763 MSY single-nucleotidepolymorphisms, 6751 of which have not previously been observed. We constructed a MSYphylogenetic tree containing all main haplogroups found in Europe, along with manySardinian-specific lineage clusters within each haplogroup. The tree was calibrated witharchaeological data from the initial expansion of the Sardinian population ~7700 years ago.The ages of nodes highlight different genetic strata in Sardinia and reveal the presumptivetiming of coalescence with other human populations. We calculate a putative age for coalescenceof ~180,000 to 200,000 years ago, which is consistent with previous mitochondrial DNA–basedestimates.

New sequencing technologies have pro-vided genomic data sets that can recon-struct past events in human evolution

more accurately (1). Sequencing data from themale-specific portion of theY chromosome (MSY)(2), because of its lack of recombination and low

mutation, reversion, and recurrence rates, canbe particularly informative for these evolution-ary analyses (3, 4). Recently, high-coverage Ychromosome sequencing data from 36males fromdifferent worldwide populations (5) assessed6662 phylogenetically informative variants andestimated the timing of past events, including aputative coalescence time for modern humans of~101,000 to 115,000 years ago.

MSY sequencing data reported to date stillrepresent a relatively small number of individualsfrom a few populations. Furthermore, dating esti-mates are also affected by the calibration of the

1Dipartimento di Scienze della Natura e del Territorio, Uni-versità di Sassari, 07100 Sassari, Italy. 2Istituto di RicercaGeneticae Biomedica (IRGB), CNR, Monserrato, Italy. 3Center for Ad-vanced Studies, Research and Development in Sardinia (CRS4),Pula, Italy. 4Dipartimento di Scienze Biomediche, Università diSassari, 07100 Sassari, Italy. 5Center for Statistical Genetics,Department of Biostatistics, University of Michigan, Ann Arbor,MI 48109, USA. 6DNA Sequencing Core, University of Michigan,Ann Arbor, MI 48109, USA. 7Center for Human Genetics Re-search, Department of Molecular Physiology and Biophysics,Vanderbilt University, Nashville, TN 37235, USA. 8Dipartimentodi Biologia, Universitàdi Pisa, 56126 Pisa, Italy. 9Departamentode Genética, Antropología Física y Fisiología Animal, Universi-dad del País Vasco/Euskal Herriko Unibertsitatea, 48080 Bilbao,Spain. 10Porto Conte Ricerche, Località Tramariglio, Alghero,07041 Sassari, Italy. 11Laboratory of Genetics, National Instituteon Aging, Baltimore, MD 21224, USA.

*Corresponding author. E-mail: [email protected] (P.F.);[email protected] (F.C.)†Laura Morelli prematurely passed away on 20 February 2013.This work is dedicated to her memory.

www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 565

REPORTS

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 6: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

www.sciencemag.org/cgi/content/341/6145/562/DC1

Supplementary Materials for

Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females

G. David Poznik, Brenna M. Henn, Muh-Ching Yee, Elzbieta Sliwerska, Ghia M. Euskirchen, Alice A. Lin, Michael Snyder, Lluis Quintana-Murci,

Jeffrey M. Kidd, Peter A. Underhill, Carlos D. Bustamante*

*Corresponding author. E-mail: [email protected]

Published 2 August 2013, Science 341, 562 (2013)

DOI: 10.1126/science.1237619

This PDF file includes:

Materials and Methods Supplementary Text Figs. S1 to S13 Tables S1 to S3 References

Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/341/6245/562/DC1)

Data File S1. Sample, phylogeny, and variant data (zipped archive). Data File S2. Y chromosome genotype calls. To protect participant privacy, this zipped archive is available through a data access agreement (DAA) for transfer of genetic data by contacting C.D.B. Data File S3. Y chromosome mapped sequencing reads. This BAM file is also available via the DAA described above. Mapping, quality score recalibration, and indel realignment are described in Materials and Methods.

Page 7: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

2

Table of Contents Materials and Methods .............................................................................................. 4

Sequencing.......................................................................................................................... 4

Genotypes ........................................................................................................................... 4 Validation............................................................................................................................ 5

Phylogenetic Inference........................................................................................................ 5 mtDNA Analysis................................................................................................................. 6

Frequentist Estimation of TMRCA ......................................................................................... 6 Empirical Bayesian Estimation of TMRCA and Ne: GENETREE ......................................... 10

Predata Distribution of TMRCA ........................................................................................... 11

Supplementary Text.................................................................................................. 12

Novel Y Chromosome Phylogenetic Structure................................................................. 12 Imputation ......................................................................................................................... 12

Calibration and Mutation Rate Estimation ....................................................................... 13 Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation..................... 14

Calibration Time ............................................................................................................... 17 Existence of Rare Yet More Basal Lineages .................................................................... 18

Effective Population Size.................................................................................................. 18 Additional Acknowledgements......................................................................................... 18

Page 8: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

3

Supplementary Figures Fig. S1. Map of populations. ............................................................................................ 19

Fig. S2. Sequencing read mapping on Xq21. ................................................................... 20 Fig. S3. Quality control and genotype calling on the Y chromosome.............................. 21

Fig. S4. Cross-tabulation of populations and Y haplogroups........................................... 22 Fig. S5. Call rate and mean sequencing coverage on the Y chromosome........................ 23

Fig. S6. Y chromosome phylogenetic backbone. ............................................................. 24 Fig. S7. Novel structure in Y hgB2. ................................................................................. 25

Fig. S8. Phylogeny-aware imputation. ............................................................................. 26 Fig. S9. Y chromosome hgQ clade with Phase 1 1000 Genomes samples included........ 27

Fig. S10. Sequencing coverage for Mayan HGDP00856 at singleton sites. .................... 28 Fig. S11. mtDNA phylogeny. ........................................................................................... 29

Fig. S12. mtDNA calibration tree..................................................................................... 30 Fig. S13. Comparing the Y chromosome TMRCA to that of mtDNA.................................. 31

Supplementary Tables Table S1. Y chromosome summary of samples............................................................... 32

Table S2. M578 genotyping results. ................................................................................ 34 Table S3. Mutation rate point estimates........................................................................... 36

Supplementary Data Data File S1. Sample, phylogeny, and variant data. ........................................................ 37

Data File S2. Y chromosome genotype calls. .................................................................. 37 Data File S3. Y chromosome mapped sequencing reads. ................................................ 37

FTP Addresses and Accession Numbers for External Data ....................... 38

Y Chromosome hgQ Sequences from the 1000 Genomes Project ................................... 38

Complete mtDNA hgA2 Sequences: GenBank Accession Numbers ............................... 38

References and Notes................................................................................................ 39

Page 9: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

4

Materials and Methods

Sequencing We prepared genomic libraries (26) from cell lines (HGDP) and blood (Gabonese), then sequenced the libraries on Illumina HiSeq 2000 machines at the Stanford Center for Genomics and Personalized Medicine. We used BWA (27) to map paired 101 bp reads to the GRCh37 human reference, removed PCR duplicates with Picard (28), and then utilized the Genome Analysis Tool Kit (GATK) (29, 30) to recalibrate quality scores, perform local realignment around candidate indels, and compute genotype likelihoods.

Genotypes Callability Mask To learn directly from the read data the boundaries of the regions within which short-read sequencing could yield reliable variant calls, we calculated average filtered read depth across all samples in contiguous 1 kb windows and computed an exponentially-weighted moving average (EWMA) of these values (Fig. 1). Regions for which the EWMA deviated from a narrow envelope were identified as problematic. Those of depressed depth corresponded to ampliconic sequences, within which reads do not map uniquely and were thus filtered out. Regions of inflated depth corresponded to heterochromatin, where naïve application of standard genotype calling methods would give the impression of abundant heterozygosity due to the pileup of highly similar reads around the borders of unassembled regions. After constructing the depth-based filter, we repeated this procedure for the MQ0 ratio, the proportion of unfiltered reads with fully ambiguous mapping. Although the X-transposed region showed no deviation in the depth-based mask, it failed the MQ0 ratio based mask. In females we found depressed read depth in the homologous region of the X chromosome (Fig. S2); we hypothesize that in males, each of whom possesses one X and one Y, there is an equal exchange of mismapped reads between the two chromosomes. The depth and MQ0 masks were merged and smoothed, leaving 10.45 Mb of sequence for down-stream quality control. Site-Level Quality Control With the regional mask in hand, we defined a series of site-level quality control filters (Fig. S3A). Of the 22,974,737 mapped coordinates, 12,532,580 fell within the bounds of the regional exclusion mask. A further 129,411 were excluded due to an MQ0 ratio greater than or equal to 0.10, and 170,144 were excluded because more than 20 samples had missing genotypes, either due to an absence of sequencing reads or to a heterozygous maximum likelihood genotype (Fig. S3B). The remaining polymorphic sites had a median depth (across all samples) of 265, and we filtered out all sites whose depth was outside three median absolute deviations of this value, thus excluding 12,425 with depth above 371 and 141,512 below 159 (Fig. S3C). Finally, we culled 547 sites with a heterozygous maximum likelihood genotype in more than seven samples (Fig. S3D). This left 9,988,118 callable sites. Of 432 ISOGG SNPs with observed variation in our data,

Page 10: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

5

393 pass the regional and mapping quality filters, and of these, just one failed the missingness filter and a further two the depth filter. Genotype Calling To call genotypes, we implemented a haploid model EM algorithm that treated allele frequency as the latent variable and used the homozygous state genotype likelihoods calculated by GATK. Genotypes with a heterozygous maximum likelihood state were classified as missing because calls in such cases were found to be disproportionately incompatible with the inferred phylogeny.

Validation The false positive rate is kept low primarily by the fact that GATK generally requires at least 2 reads of support to identify a site as variable. In addition, we exclude sites incompatible with the phylogeny. Though this filter discards some genuine homoplasic variants, the class is enriched for false positives, and we have chosen to err on the side of conservatism. We consider three means of validation. Sanger Sequencing We validated Y chromosome genotypes for the 29 male HGDP samples at 46 sites using a combination of targeted PCR and Sanger sequencing (3 sites), and exome capture followed by Illumina sequencing (43 sites). Validation failed to yield data for two genotypes, and we compared the remaining 1,245 genotypes to the main data set to find a concordance rate of 99.92%. Just one genotype was discordant (M150, hg19 position 21869519, in HGDP00462). The genotype had zero sequencing reads of support, and the individual had been imputed to carry the reference allele whereas the validation data indicated that this sample actually carries the non-reference allele. Only one other sample, the nearest neighbor to HGDP00462, also carried the non-reference allele, and this illustrates the fact that it is impossible to properly impute missing genotypes for sites otherwise identified as singletons (Supplementary Text, “Imputation” section). Minimally Diverged Samples We also consider private variation among minimally diverged individuals to argue that sequencing errors are minimized in our study. Specifically, we observe a cluster of five Baka hgB2 samples with just a handful of singletons per lineage. This group approximates a replication set and thus gives tight upper bounds on the false positive variant rate. Haplogroup Assignments All HGDP haplogroup assignments were consistent with prior ISOGG designations.

Phylogenetic Inference We used MEGA5 (31) to construct maximum likelihood phylogenetic trees.

Page 11: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

6

mtDNA Analysis mtDNA Pipeline To call mitochondrial haplogroups, we converted sequences from the GRCh37 to the rCRS coordinate system and imported to HaploGrep (32), which draws on the Phylotree database (33). We explicitly utilized data presented in Table 1 of Behar et al. (34) to polarize alleles for variants assigned to the most ancient split—that between hgL0 and the rest of the tree (Fig. S11). Whereas the mutation rate on the Y chromosome is sufficiently low that we could regard base substitutions as unique events and simply discard sites that were incompatible with the phylogeny, excluding sites would have been inappropriate for the mitochondrial genome, in which a much higher mutation rate has led to considerable homoplasy. To account for this, we split sites with multiple substitutions into pseudo-sites, each of which constitute a unique event. We discarded a few mutational hotspot sites with evidence for more than four unique substitution events.

Calibration Based on mtDNA hgA2 Since there are far fewer segregating sites in the mitochondrial genome, and we only had seven hgA2 lineages, we used 108 publicly available hgA2 Native American sequences to calibrate. Kumar et al. (23) list 568 accession numbers for mitochondrial genomes, 134 of which belong to hgA2 and are of American descent. We downloaded the subset of 108 entries that included the full mtDNA sequence and, along with the GRCh37 reference sequence, conducted a multiple alignment using MUSCLE (35). We then called haplogroups, built a tree (Fig. S12), assigned variants to branches, and resolved homoplasies as described above.

Frequentist Estimation of TMRCA The Molecular Clock Under the infinite sites model, mutations accumulate in a Poisson process of rate µl, the locus-wide mutation rate. To estimate TMRCA, molecular clock approaches first estimate the mean number of derived mutations per lineage and then divide by an estimate of the mutation rate. For both the Y chromosome and the mtDNA, we estimate TMRCA with:

where D is the sample average of { Di }, the inferred number of mutations accumulated by each lineage since the global MRCA:

T =D

µly,

D =1

n

nX

i=1

Di.

Page 12: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

7

We estimated the { Di } using a maximum likelihood phylogeny (Fig. 2), and we estimate the yearly mutation rate, µly, as:

where t is the known TMRCA of the calibration subclade and C is the sample average of { Ci }, the number of derived mutations acquired by each lineage since the common ancestor of the subtree:

Here nc is the number of individuals within the calibration subclade. is therefore a scaled ratio of two random variables:

TMRCA Confidence Intervals From the frequentist perspective, we consider T a fixed but unknown constant, and we are interested in the sampling variance of our estimator conditional on its true value. Since the calibration subtree is a small fraction of the total tree, D and C are approximately uncorrelated. This fact simplifies the expression for the standard deviation of a ratio of random variables, which is obtained using the δ method (36):

Since both D and C are sums of Poisson random variables with a large number of total events, each is well approximated by the normal distribution. Consequently, their ratio is also approximately normally distributed (37). Therefore, if we are able to compute σD|T and σC, we can construct a confidence interval for T. We first consider σD|T. The { Di } are identically Poisson distributed, but they are not independent due to the shared internal branches (3). Thus,

Since each Di is a Poisson random variable, its variance is equal to its mean. Now consider samples i and j. The numbers of mutations that have accumulated in each since

µly =C

t,

C =1

nc

ncX

i=1

Ci.

ˆ T

T = tD

C.

�T |T ⇡t

C

s✓D

C�C

◆2

+ �2D|T .

�2D|T = Var[D|T ] =

1

n2

"X

i

Var[Di|T ] + 2 ·X

i

X

j>i

Cov [Di, Dj|T ]

#.

Page 13: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

8

their MRCA are independent. However, they share all mutations possessed by their MRCA. Thus,

where Dij is the number of derived variants possessed by the common ancestor of i and j. Let I denote the set of internal branches, and let bs and bl be the number of descendants and the length of a branch, b, respectively. Each internal branch will be shared by bs choose 2 pairs of individuals. Thus,

which gives:

An identical argument applies to σC within the calibration subtree. We, therefore, construct a 95% confidence interval for TMRCA as:

The bias of the point estimator is minimal (36). Precision of TMRCA Estimation The standard error for the mean estimate of a Poisson random variable with mean µlT is

µlT n , so the coefficient of variation (the ratio of the standard error to the mean) declines in proportion to

nµlT . On the Y chromosome, T is large and, because the non-recombining locus is so long, µl is quite large as well. Consequently, the standard error for estimating the mean branch length is relatively small, and the greater source of uncertainty lies in estimating the mutation rate, where the time intervals over which mutations have accumulated are shorter, and the number of lineages is smaller. However, µl is sufficiently large that we could derive a narrow confidence interval based solely on the two hgQ lineages we had sequenced. In contrast, for the mtDNA, the uncertainty due to σD|T exceeds that due to σC. An Alternative Frequentist Estimator

Cov [Di, Dj|T ] = Dij,

2 ·X

i

X

j>i

Cov [Di, Dj|T ] = 2 ·X

b2I

✓bs

2

◆bl =

X

b2I

bs(bs � 1)bl,

�D|T =1

n

sX

i

Di +X

b2I

bs(bs � 1)bl.

T = T ± z0.025 · �T |T

T = t

2

4D

C± z0.025 ·

1

C

vuut✓

D

C�C

◆2

+1

n2

X

i

Di +X

b2I

bs(bs � 1)bl

!3

5 .

Page 14: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

9

An alternative frequentist estimator defines D as half the average mutational distance dij between pairs of individuals that span the ancestral root (3):

Here, L and R represent sets of individuals on the left and right side of the root. This estimator is less well-suited to our data set. We have four Y hgA individuals on the left side of the tree and 65 individuals on the right side. This partition-based approach effectively upweights information from the hgA samples, since all distances are measured with respect to a member of this clade. However, we have lower effective coverage on the internal branches of hgA than elsewhere in the tree. This is due to both the lower number of samples and the fact that hgA lineages are highly diverged. Consequently, these are exactly the samples for which false negatives are of greatest potential impact. For the sake of comparison, the TMRCA point estimates from this approach are 134 ky and 118 ky for the Y chromosome and mtDNA, respectively. Estimating the Ratio of mtDNA TMRCA to Y TMRCA To compare the TMRCA of the Y chromosome to that of the mtDNA, we estimate the ratio:

where we define M and Y as the fixed but unknown unscaled TMRCA of the mtDNA and Y respectively, and R as the ratio M / Y. The quantity τ = tm / ty is the ratio of coalescence times of the Native American lineages, mtDNA hgA2 and Y chromosome hgQ. Our estimator of γ is:

where

The standard error is:

Since R is the ratio of two random variables, its standard error is:

D =1

2|L||R|X

i2L

X

j2R

dij.

� =Tm

Ty=

tmM

tyY= ⌧R,

� = ⌧ R = ⌧M

Y,

M = Dm/Cm,

Y = Dy/Cy,

R = M/Y .

��|� = ⌧�R|M,Y .

Page 15: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

10

where

ρ = Corr[ ˆ M | M , ˆ Y |Y ]. We cannot disregard the correlation term in this case. If the TMRCA of male and female lineages are correlated, their estimates will be as well, though the correlation of the estimates would necessarily be less than that of the true values due to the uncertainty in both variables. Confidence bands for γ are defined by:

To assume zero correlation would be conservative, as positive correlation reduces the variance. We consider representative values of ρ for the sake of comparison (Fig. S13). Again, the bias of the point estimator is minimal (36).

Empirical Bayesian Estimation of TMRCA and Ne: GENETREE As distributed, GENETREE can handle only 99 sites per run, but we modified the source code to enable runs of several thousand SNPs. First, we perform a grid search to obtain a maximum likelihood estimate for the scaled mutation rate, θ = 2Neµlg, where µlg is the locus-wide per generation mutation rate. We then simulate the posterior distribution of TMRCA, conditional on this estimate. We restricted each analysis to a single population so that the assumption of exchangeability of lineages (38) would hold. As the TMRCA is determined by the deepest coalescence in a sample, we exclusively analyzed populations that sample from both sides of the tree (Fig. 2): the San and Baka for the Y chromosome and the Mbuti and Nzebi for the mitochondrial genome. Results from the Baka and Mbuti Pygmy populations are the most directly comparable (Table 1).! We excluded several lineages from the GENETREE analyses. In the Baka, we excluded three samples possessing high levels of autosomal identity by descent with another individual, as inferred with Illumina Omni SNP arrays. We also excluded six Baka hgE samples, as these likely represent West African agriculturalist lineages that have introgressed into the Baka a few thousand years ago (39) in violation of the exchangeability assumption of coalescent theory. In the mitochondrial analysis we removed two Nzebi and one Mbuti because GENETREE does not allow for identical lineages. Point estimates for the Baka Y chromosomes reflect averages of multiple coalescent runs. Each run subsampled 1500 (of 2927) segregating sites to overcome computation limitations for the full dataset. Estimates for the Mbuti mtDNAs reflect averages of multiple coalescent runs, each with a different random seed, as these runs were more variable due to a smaller Poisson mean (nµl).

�R|M,Y ⇡1

E[Y |Y ]

vuut

E[M |M ]

E[Y |Y ]�Y |Y

!2

+ �2M |M � 2⇢�M |M�Y |Y

E[M |M ]

E[Y |Y ],

� = ⌧

"M

Y± z0.025 · �R|M,Y

#.

Page 16: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

11

Coalescent theory measures time in units of Ne generations. To convert to years, we use the maximum likelihood estimate of θ, the gender-specific generation time (g; Table S3), and the Native American calibration estimate for µly, the locus-wide per year mutation rate:

GENETREE is suboptimal for our data set. Due to the exchangeability assumption and computational limitations, each analysis draws information from just a subset of the data. Because the full sequence data is highly informative about the underlying gene genealogy, very few random trees are compatible with it. This makes GENETREE a highly inefficient approach to estimating population genetic parameters. Thus, we emphasize the point estimates and confidence intervals derived from the frequentist approach.

Predata Distribution of TMRCA For a constant population size, the TMRCA of a locus, measured in Ne generations, is given by:

where Ti is the time during which i ancestral lineages of the sample existed. Coalescent theory (38) models Ti as an exponential random variable with parameter:

To obtain the distributions presented in Fig. 3, we simulated five million draws of TMRCA for n = 100 lineages and scaled each value by a factor of Ne·g to convert to years.

Ne =✓

2µlg=

2gµly

TMRCA = TcNeg =Tc✓

2µly

TMRCA =nX

i=2

Ti,

�i =

✓i

2

◆.

Page 17: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

12

Supplementary Text

Novel Y Chromosome Phylogenetic Structure Haplogroup B2 Within hgB2, we identify one clade and three additional lineages that represent previously uncharacterized structure (Figs. 2, S7). Each lineage represents an ancient divergence within the Y chromosome phylogeny and carries no known differentiating mutations downstream of M192 and Page72, which define hgB2b1. First, in the main text we describe a subclade of B2b1a that encompasses six Baka individuals. Previously, B2b1a2 was associated with the P70 variant, but because these six Baka individuals carry the ancestral allele for P70, we propose reassociating P70 with a new label, “B2b1a2a,” and labeling the new clade “B2b1a2b.” Second, B2b1b was previously associated with P6, but we have identified a Mbuti individual carrying the ancestral allele for this variant. Thus, we propose associating P6 with a new label, “B2b1b1,” and designating the new lineage “B2b1b2.” Finally, we identify two new lineages within B2b1a1. The individuals representing both of these lineages carry the ancestral T allele for the M169 variant that defines B2b1a1a, the only extant sublineage of B2b1a1 not represented. Haplogroup F Table S2 presents genotyping results for the M578 variant in separate panel of individuals. The results confirm the (G, H, IJK) → (G, (H, IJK)) polytomy resolution. The demographic fates of hgG and hgHIJK were geographically asymmetric, with the spread zone of hgG (40) considerably more restricted than that of hgHIJK (Fig. S6). The latter now spans all continents, including Africa due to the back migration of some haplogroups (41).

Imputation We used our phylogeny-aware algorithm (Fig. S8) to impute approximately 5.3 missing genotypes per Y chromosome variant site and a median of 826 per individual. Imputation Limitations It is not possible to impute singletons: when the carrier of a unique allele has zero reads of support, there is no evidence for variation at the site. Doubletons pose a similar problem. Let A and B be nearest neighbors in the phylogeny. Consider the case where, at a given site, A possesses an allele not observed in any other sample, and B has zero reads. It is impossible to distinguish whether the site is an A singleton or an A/B doubleton. However, conditional on one sample missing data at a particular site, our imputation strategy correctly imputes two thirds of tripletons; it fails only in the case where the lineage of the missing sample is the last to coalesce. For four lineages, there are 18 possible trees. Of these, twelve consist of stepwise coalescence, and the lineage with

Page 18: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

13

missing data is the most diverged in just three. Thus, we correctly impute five-sixths of quadrupletons. Polarizing Variants on the Branch Spanning the Ancestral Root Our method to infer the ancestral state at a given site was inapplicable to the 398 variants assigned to the most ancient (basal) split, as no outgroup for these branches was present within the data set. For these, we first conducted a LiftOver (42) to map GRCh37 coordinates to those of the chimpanzee reference (PanTro3). Due to the abundance of large-scale inversions between the two chromosomes (17), it was necessary to BLAT (43) 101 bp chunks of DNA surrounding each human variant to infer relative orientation. Ancestral states were thereby inferred for 322 variants, and those of the remaining 76, for which the corresponding chimpanzee allele could not be inferred, were randomly assigned in the corresponding proportion. Homoplasy and the Infinite Sites Model We deemed a SNV consistent with the tree when we observed no ancestral alleles in the subtree rooted at the branch to which the SNV was assigned. Most variants (11,279) were consistent with the tree, and we imputed missing genotypes for those that were. Sites incompatible with the phylogeny were uniformly distributed across the callable regions (Fig. 1) and were excluded from downstream analyses. Just 199 (of 361) incompatibilities were supported by more than one sequencing read. This lack of homoplasy on the Y chromosome justifies usage of the infinite sites model.

Calibration and Mutation Rate Estimation Mutation rate estimates are typically based on family pedigrees (14) or species phylogenies, such as the human-chimpanzee divergence (2, 3). However, just one pedigree-based rate is available for the Y chromosome, and, though the mutation process is highly stochastic, this rate is based on a single pedigree. Furthermore, precise alignment between the human Y chromosome and that of the chimpanzee is difficult due to extreme structural divergence. Finally, if the Y is subject to a time-dependent mutation rate, as is mtDNA (24, 25), then neither estimation approach is ideal for dating human population events. Instead, we estimate mutation rates using a within-human calibration point, the initial migration into and expansion throughout the Americas. Well-dated archaeological sites include Paisley Cave in Oregon, which dates to 14.3 kya (19); Buttermilk Creek in Central Texas, at 13.2–15.5 kya (44); and Monte Verde II in Southern Chile, 14.6 kya (45). To date the expansion of genetic lineages unique to the Americas, we follow Goebel et al. who state that the most parsimonious estimate is that “humans colonized the Americas around 15 kya” (19). We show that a lack of parity between the expansion event and the divergence of lineages used for calibration would have minimal effect on the difference between the TMRCA of the Y and mtDNA if the divergences are within a few thousand years of one another (Fig. S13, Materials and Methods).

Page 19: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

14

For reference and comparison, Table S3 summarizes mutation rate point estimates on four scales. The Y chromosome mutation rates are similar to previous autosomal phylogenetic-based mutation rates and extended pedigree-based rates, but they are almost two-fold higher than autosomal mutation rates based on trios (46).

Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation We developed a method to estimate the variance in estimated TMRCA that is due to the stochastic nature of the mutation process (Materials and Methods, “Frequentist Estimation of TMRCA” section). Here we discuss the potential impact of bias due to sequencing error and modest sequencing coverage. We have estimated TMRCA by calculating the ratio of two quantities, divergence and the mutation rate, each of which depends on experimental measurements. The numerator is the average tip-to-root height of the tree, and we estimate the denominator as the ratio of average branch length within the calibration subtree to the calibration time. Data for each of the three measurements is imperfect. In this section, we consider potential biases in the first two, and we consider calibration time in the next section. Tip-to-Root Height We measure tip-to-root height as the total number of SNVs assigned to all branches separating an individual from the common ancestor of all individuals. This sum includes the singletons of the terminal branch and the shared variants on the internal branches. Two factors act in opposition to stretch and shrink an observed branch length with respect to its true value: sequencing error and the total sequencing coverage of the branch, which itself is influenced both by sequencing coverage of individuals and by sampling density of the clade rooted at the branch. The primary effect of sequencing error is to stretch terminal branches, as it is unlikely that random sequencing errors will cluster phylogenetically. We have demonstrated that genotype error is minimal (Materials and Methods, “Validation” section). Consequently, branch lengths are not significantly inflated by sequencing error. Though modest sequencing coverage translates to unobserved variants near the tips of the tree, thereby shortening observed heights, the internal branches of the tree, which constitute the overwhelming majority of any tip-to-root path, have quite high coverage due to the superposition of sequencing from all descending lineages. Thus, most observed internal branch lengths cannot differ significantly from their true lengths. Fortunately, the most divergent sample with the longest terminal branch, the San individual in the hgA-M51 clade, had higher than average sequencing coverage (6.15×) and, consequently, call rate (0.985). We observed 1012 private variants in this individual, and we estimate approximately 22 false negatives—unobserved variants with either a no-call genotype or just one sequencing read, an event insufficient to identify a site as variable. This worst-case scenario is less than 2% of the average tip-to-root height. We likely have very few false negatives in other individuals, even among those of lower coverage, since the lower coverage samples are clustered in the densely sampled portions of the tree, such as in hgE and portions of hgB, and the imputation strategy we’ve implemented enables these lineages to receive credit for variation detected in neighbors and which they can be

Page 20: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

15

inferred to possess. Finally, the maximum observed tip-to-root height (1188), could be considered a conservative upper bound on the true mean, and it differs from the observed mean by just 5%. Branch Lengths in the Calibration Subtree We now consider how sequencing coverage affects branch lengths in the Y chromosome hgQ subtree used to estimate the mutation rate. We sequenced Mayan HGDP00856, a representative of hgQ-M3, to 5.7× coverage and Mayan HGDP00877, whose haplogroup is labeled hgQ-L54*(xM3) because it carries the L54 mutation but is ancestral at the M3 SNP, to an average depth of 8.5×. Had we sequenced the two Mayan lineages to lower coverage, we would have artificially boosted TMRCA estimates by underestimating the mutation rate. However, haploid coverage for the Mayan samples are high enough that false negatives have little impact on our calibration. The rate of false negatives is dominated by sites in the terminal branches of the tree with either zero or one sequencing read for a sample. When an individual has zero or one read at a shared SNP, we can usually impute its genotype, but it is not possible to impute singletons or to distinguish a singleton from a doubleton in the presence of missing data (Supplementary Text, “Imputation” section). Although missing singletons and misclassified doubletons have little impact on total branch length from the tips to the root of the entire tree, they are quite important for calibration because singletons constitute a significant portion of branch length within the calibration subtree. In our study, the shared hgQ branch is of approximately the same length as the Q-M3 and Q-L54*(xM3) terminal branches. Consequently, no-call genotypes at singletons sites, which lead to missing singletons, are counterbalanced by no-call genotypes in the shared hgQ branch, which lead to doubletons misclassified as singletons. This relies on the fact that at 5.7× and 8.5× coverage, the no-call rates on the doubleton and singleton branches are comparable. In general, a no-call due to the presence of just a single sequencing read is less likely to occur on the doubleton branch than on the singleton branch, but of the 9,988,118 callable sites only 194,966 (2.0%) and 23,989 (0.2%) are covered by just one read in HGDP00856 and HGDP00877, respectively. To empirically estimate the false negative rate within the hgQ subtree used for calibration, we incorporated data from the 1000 Genomes Project (47). We downloaded genotype calls (VCF files) for 525 males from Phase 1, called haplogroups, and identified eleven individuals belonging to hgQ1. We then downloaded aligned sequence data (BAM files) for these samples, converted from the GRCh37 to hg19 reference, and applied our pipeline to the combined set of 80 individuals (Fig. S9). In the combined analysis, the branch shared by all hgQ lineages grew from 136 to 146 SNPs2. One SNP had not been called in either HGDP sample (hg19 position 15825218), and nine SNPs were no-calls in HGDP00856: three due to the absence of reads, and six due to one erroneous read (of 4– 1 A twelfth, NA19753, was sequenced using SOLiD. We did not include this sequence in our analysis since it is likely to have different error and mapping properties than those generated by Illumina technology. 2 The exact length is 149, but the difference includes two SNPs that were on the borderline of the depth-based filter in the main study and a net of one SNP discarded due to homoplasy: two in the main study and one in the combined analysis.

Page 21: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

16

10). With perfect data, these nine SNPs would have been classified as doubletons, but they were instead misclassified as HGDP00877 singletons. Thus, for HGDP00856, we can estimate the no-call rate within the hgQ subtree, β0 ≈ 6.8% (10 / 146). Partly because the coverage is higher, we observed no doubletons misclassified as singletons due to missingness in HGDP008773. Thus, for HGDP00877, β0 ≈ 0.7% (1 / 146). Whereas on the shared doubleton branch the no-call rate should sufficiently inform the type 2 error rate (βd ≈ β0), the no-call rate does not provide complete information for the terminal branches since GATK, prudently, will most often not designate a site as variable if there is just one sequencing read with the alternative allele in the entire sample. Thus, to fully model the singleton type 2 error rate, βs, we must also consider the probability of observing just one read, β1, since when this occurs at a singleton site, a false negative will most often result. To do so, we computed the sequencing read depth distribution over all ten million callable sites for each sample. Scaling this empirical probability mass function by the number of singletons observed in the individual and censoring to discard the zero-read and one-read bins, we observe that when coverage exceeds 4×, the expected read-depth distribution among singletons closely mirrors the observed distribution (Fig. S10). This suggests that there are few false negatives at sites for which at least two sequencing reads are observed. Thus, βs ≈ β0 + β1. When a branch with false negative rate β has true length L and observed length Y, the number of unobserved variants, X, is given by:

. On the HGDP00856 singleton branch, we have Y = 126 and, from the empirical read-depth distribution, β1 = 2.0%. Thus, βs ≈ β0 + β1 = 6.8% + 2.0% = 8.8%, which gives X ≈ 12.2 missing singletons. This is likely an overestimate because the no-call rate across all variable sites, 2.2% (Table S1), is lower than the empirical rate within the subtree, 6.8%. The branch shared by all hgQ-M3 lineages (branch 18 in Fig. S9) affords an opportunity to empirically check the singleton false negative rate for HGDP00856, since this individual should possess each of these variants. We had correctly called 16 of 17 in our main analysis. This suggests a singleton false negative rate for this sample of 1/17 = 5.9%4, but the variance for this particular estimate is quite high since it is based on just 17 sites, so to be conservative, we use the value of 8.8% estimated above. For HGDP00877, we have Y = 120 and β1 = 0.2%, which give βs ≈ 0.7% + 0.2% = 0.9%, and X ≈ 1.1 missing singletons. This prediction cannot be tested empirically with these data because the lineage is an outgroup to the two hgQ-L54*(xM3) sequences from the 1000 Genomes Project. As discussed above, there were nine doubletons previously 3 It is possible that one such SNP exists and is missing in all three hgQ-L54*(xM3) sequences, but this is a low probability event. 4 The lone false negative occurred at hg19 position 22613361. Prior to imputation, we do make the correct call in the combined analysis, because one read was present, and it carried the derived A allele.

X = �L =�

1� �Y

Page 22: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

17

classified as HGDP00877 singletons, so accounting for type 2 errors reduces this branch length by 7.9 (9 – 1.1). Putting these two together, we compute the average branch length since MRCA of the two samples as 125 SNPs, which differs by the observed value of 123 by 1.6%. Thus, one might wish to scale our Y chromosome TMRCA estimates by a factor of 123 / 125 = 0.984. However, the effect of false negatives would be offset by false positives, should one or two exist, so we choose not to. False negatives are not an issue for mitochondria, where all sequences are complete.

Calibration Time In light of the above, the largest potential source of bias is the calibration time: the dating of the arrival of humans into the Americas and the approximation of synchronicity of this arrival with phylogenetic divergences. Timing of Expansion into the Americas Archaeological dates for the time of first arrival in the Americas range from 14.3–16.5 ky. Goebel, et al. (19) conclude that the most parsimonious estimate is that “humans colonized the Americas around 15 kya,” so we elect 15 ky as reasonable figure for both the maternal and paternal loci. If the true divergence time of American lineages were 14.3 ky, one must scale down the TMRCA ranges we report by about 5%. Likewise, for 16.5 ky, an increase of 10% would be requisite. However, the specific number used will have no effect on the relative TMRCA estimates for the two loci, provided the divergences of the two loci were contemporaneous. We consider the case of unequal split times in Fig. S13 (Materials and Methods, “Estimating the Ratio of mtDNA TMRCA to Y TMRCA” subsection). Y Chromosome Calibration Point With 108 sampled lineages, the point of rapid expansion within the Americas among mtDNA hgA2 lineages is clear. However, the corresponding point within Y hgQ is less so. Though we have argued that M3 most likely occurred shortly subsequent to initial entry to the Americas, it remains possible that hgQ-M3 and hgQ-L54*(xM3) diverged within Siberia or Beringia. When we include lower coverage 1000 Genomes hgQ lineages, we observe a star-like diversification among the Q-M3 derived lineages (Figure S9, below branch #18). It is possible that some subset of the 17 M3-equivalent mutations accumulated prior to entry—within Beringia, for example, as has been proposed for mtDNA founding lineages (48). However, 12 of the 13 sequenced individuals are from Mexico, and this sampling bias could obscure a more upstream initiation of the expansion. For example, it is possible that hgQ-M3 lineages within Greenland do not share all 17 of these mutations. Because just three sequences represent hgQ-L54*(xM3), the phylogenetic structure of this subhaplogroup remains largely unknown, but the root of the sampled hgQ-M3 lineages can be used to calculate a strict lower bound on the mutation rate, as entry to the Americas certainly happened no later than this point.

Page 23: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

18

The 1000 Genomes lineages are inappropriate to calibrate upon due to lower sequencing coverage (average = 2.9×; Supplementary Text, “Branch Lengths in the Calibration Subtree” subsection), so we are left with a single lineage from our sample, HGDP00856, for this lower bound calculation. Accounting for false negatives had little effect when two samples were used for calibration, as the degree to which the hgQ-M3 branch grew was offset by a corresponding shrinkage of the hgQ-L54*(xM3) due to the hgQ doubletons that were unobserved in HGDP00856 and thereby misclassified as HGDP00877 singletons. However, it is important to correct for type 2 errors when considering this lineage alone. In the main analysis, the observed length of the M3 lineage was 126 mutations. This breaks down to 16 observed M3-equivalent SNPs and 110 post-M3 SNPs. Using a singleton false negative rate of 8.8%, this translates to approximately 10.6 (0.088*110/(1–0.088)) unobserved post-M3 SNPs, which gives a calibration length of 120.6 SNPs. This differs from the calibration used in the main text by 1.9%.

Existence of Rare Yet More Basal Lineages We emphasize that the estimates we derive refer to the coalescence times within our sample. For the mitochondrial genome, we have likely sampled the most divergent branches in the tree (34). However for the Y chromosome, our estimate of the TMRCA reaches as far back as the A1b clade. Inclusion of samples from hgA1a or the newly discovered hgA0 (5) or hgA00 (49) would push the date further back. However, these haplogroups are very rare, and it is difficult to assess whether correspondingly divergent but singular mitochondrial genomes may also await discovery.

Effective Population Size The Ne differences we observe between males and females are most likely due to a greater variance in reproductive success among males, a phenomenon influenced by cultural and demographic factors, such as the practice of polygyny (50). Both purifying and positive selection could also act to reduce the Ne along the linked regions of the Y chromosome. However, both forms of selection may have also acted on the mitochondrial genome. Additional information would be necessary before one could invoke natural selection as the primary cause of reduced male Ne, and the hypothesis is neither necessary nor sufficient.

Additional Acknowledgements This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1147470. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Page 24: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

19

Fig. S1. Map of populations. We sampled Y chromosomes and mtDNAs from nine populations including Baka Pygmies from Gabon, Cambodians, Maya from Mexico’s Yucatán Peninsula, Mbuti Pygmies from the Democratic Republic of Congo, Mozabite Berbers from Algeria, Nzebi from Gabon, Pashtuns (Pathan) from Pakistan’s North-West Frontier Province, San from Namibia, and Yakut from Siberia.

BakaCambodianMayaMbutiMozabiteNzebiPashtunSanYakut

Page 25: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

20

Fig. S2. Sequencing read mapping on Xq21. Total read depth and the depth of MQ0 reads are plotted for 24 HGDP females. Mean values in contiguous 5 kb windows are shown along chrXq21. Dashed gray lines indicate the region that corresponds to the “X-transposed” segment of the Y chromosome.

chrX Position (Mb)

Dep

th in

HG

DP

Fem

ales

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●●●

●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●

●●

●●●●

●●●●●●

●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●●●●●

●●

●●

●●●●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●●●●

●●

●●●●●●●●●●●●●

●●●●●

●●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●

●●●●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●●●●●●●●●

●●●●●●

●●

●●

●●

●●●●

●●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●

●●

●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●

●●

●●●●●

●●●

●●

●●●●●

●●

●●

●●●●●

●●

●●●

●●●●●

●●

●●●

●●●

●●

●●●●●●●●●●

●●●

●●

●●●

●●

●●

●●●●●●

●●●●●

●●●●●●

●●●●●●●●

●●●●●

●●●●●

●●

●●●●●●●●●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●●●●●

●●

●●

●●

●●●●●●●

●●●●

●●●●

●●●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●●

●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

85 86 87 88 89 90 91 92 93 94 95 96

050

100

150

200

250

300 Homologue of X−transposed Region●

Filtered DepthMQ0 Depth

Page 26: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

21

Fig. S3. Quality control and genotype calling on the Y chromosome. (A) Pipeline from Illumina sequencing to filtered variable sites. (B) Missingness distribution subsequent to imposition of regional mask (Fig. 1) and mapping quality filter. (C) Filtered depth versus physical position. Sites above or below 3 MADs of the median were filtered out. (D) Depth distributions with tranches defined by the number of samples with a heterozygous maximum likelihood genotype. Evidence for multiple alleles in 0–7 samples is likely due to random sequencing error, but sites with more than seven “het” samples exhibit inflated depth. We infer this to result from mismapping and filter these sites out. Filters in (B–D) were tuned on variable sites only. Numbers in (A) are chromosome-wide, without regard to variability.

Missingness Distribution

Samples with Missing Data

Site

s

0 5 10 15 20 25 30 35

050

010

0015

00

100 200 300 400 500 600

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Depth Stratified by Maximum Likelihood Het Counts

Depth

Den

sity

of S

ites

● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ●● ●

● ●

● ● ● ● ●

●● ●

● ●

● ● ● ● ● ● ● ● ● ●● ● ● ●

● ●

● ● ● ● ● ● ● ● ● ●● ● ● ●

● ●

● ● ● ● ● ●

0−2 (n=11231)3−5 (n=243)6−7 (n=138)8−10 (n=89)11−12 (n=36)13−34 (n=32)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

010

020

030

040

050

060

0

Depth vs. Physical Position

Position (Mb)

Dep

th

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

B D

C

Haploid-model Expectation-Maximization algorithm

11,640 Variant Sites

Start: Regional Mask: Mapping Quality ( MQ0/DP ≥ 0.10 ): No-calls ( > 20 ): Depth ( > 371 or < 159 ): MaxLik Het Count ( > 7 ): Callable:

Mapping to GRCh37: BWA PCR Duplicate Removal: Picard Quality Score Recalibration, Local Realignment, GATk Genotype Likelihood Computation:

Map

ping

}

Cal

ling

Sequencing: Illumina HiSeq 2000

Filtr

atio

n

22,974,737 – 12,532,580 – 129,411 – 170,144 – 153,937 - 547 9,988,118

A

Page 27: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

22

Fig. S4. Cross-tabulation of populations and Y haplogroups. We sampled predominantly from Africa and observe 23 autochthonous lineages along with 32 representatives of the Bantu hgE. In addition, we sample 14 Eurasian and Native American lineages.

A B E G H L N O Q R 69 4 19 32 2 1 1 5 2 2 1

San 6 3 3 Baka 20 1 13 6

Mbuti 5 2 3 Nzebi 20 1 19

Mozabite 4 4 Pashtun 4 2 1 1

Cambodian 4 1 1 2 Yakut 4 4 Maya 2 2

African( Non+African(Autochthonous( Bantu(

Page 28: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

23

Fig. S5. Call rate and mean sequencing coverage on the Y chromosome. (A) Distribution across samples of call rate, the percentage of variable sites for which a genotype was called prior to imputation. Samples are stratified by collection: HGDP (29 samples from 7 populations) and Gabon (40 samples from two populations). (B) Distribution of mean sequencing coverage among variable sites.

Call Rate0

1020

3040

5060

7080

90100 HGDP Gabon

Samples

Mean Coverage

02

46

810

12

HGDP Gabon

Samples

B A

Page 29: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

24

Fig. S6. Y chromosome phylogenetic backbone. Defining mutations for and geographic distribution of Y chromosome haplogroups (41) are indicated along with sample size (dark blue). Gray lineages were not sampled in this study. Light blue branches indicate the new structures introduced by our resolution of a polytomy within macro-haplogroup F.

4 19 32 2 1 1 5 2 2 1

A1b B E D C F* G HIJK* H I J T L Q RK* M S N O P*

K-M9 IJ-M429

IJK-M522

CF-P143

CT-M168

DE-YAP

D-M174

E-M40 C-M130 F-M89

K(xLT)-M526

BT-M42

B-M60

S-M230

T-M70 M-Page93 L-M20

NO-M214 P-M45

H-M69

G-M201

Africa Himalayas, East Asia

LT-P329

HIJK-M578

N-M231

O-M175 R-M207

Q-M242

Mediterranean, Middle East, Somalia

Caucasus, Europe

East Asia

Siberia, Americas

South Asia, Roma

Eurasia Turkey, Iran, South Asia

New Guinea, Oceania Boreal Asia

Middle East, West/South Asia

A1b-V221

Europe

Page 30: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

25

Fig. S7. Novel structure in Y hgB2. Detail of the maximum likelihood phylogeny shown in Fig. 2. The African-specific haplogroup B2b is present at high frequency among hunter-gatherer populations in central and southern Africa, such as the Baka and San. Orange branches indicate novel phylogenetic structure not described in ISOGG. Italics indicates an extant branch label that we have proposed moving upstream, and orange text indicates a proposed new haplogroup label. Several of the newly described lineages have substantial branch lengths; B2b1a2b dates to approximately 35 kya.

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 1100.0 1200.0

H-M138 Cambodian

N-M231 Cambodian

E-P59 Nzebi

Q-M3 Maya

E-P116 NzebiE-M191 Nzebi

E-P252 Nzebi

B-P70 San

E-U290 Nzebi

B-M192 Baka

N-L708 Yakut

E-M183 Mozabite

N-L708 Yakut

E-U290 Baka

E-P116 Nzebi

N-L708 Yakut

L-M357 Pashtun

R-L657 Pashtun

E-M154 Nzebi

A-P28 San

Q-L54 Maya

B-M192 Baka

A-M14 Baka

B-M30 Baka

E-P277 Nzebi

E-M183 Mozabite

B-M192 Baka

O-Page23 Cambodian

E-P278.1 Nzebi

E-P252 Baka

E-P277 Nzebi

E-U290 Nzebi

E-P278.1 NzebiE-P277 Nzebi

B-M211 Baka

A-M51 San

E-P252 Baka

E-M191 Nzebi

E-P252 Mbuti

G-M406 Pashtun

E-L515 Baka

N-L708 Yakut

E-P252 Baka

E-M183 Mozabite

B-M112 Baka

B-P6 San

B-M211 Baka

E-P277 Nzebi

B-M192 Baka

A-P262 San

G-M377 Pashtun

E-P277 Nzebi

B-M109 Nzebi

E-P277 Mbuti

E-M183 Mozabite

B-M112 Baka

B-Page18 Mbuti

B-M192 Baka

E-P277 Nzebi

B-P6 San

E-P252 Mbuti

B-M192 Mbuti

E-P252 Nzebi

B-M30 Baka

B-M192 Baka

E-P277 Nzebi

E-P252 Baka

O-M95 Cambodian

B-M112 Baka

CT-M168

N-Page56

P-M45

O-P186

E-U290

A-M6

G-P287

E-M2/M180

Q-L54

E-M191E-L514

BT-M42E-P179

KxLT-M526

E-U175/P277

N-L708

HIJK-M578

A-M14

F-M89

E-M183

E-P252

A-L419

K-M9

NO-M214

B2 (M182)

B2a (M150)

B2b1 (M192)

B2a1a (M109)

B2b1b

B2b1a B2b1a1

B2b1a2 B2b1a2b

B2b1a2a B2b1a1c

B2b1a1b

B2b1b2 B2b1b1

Page 31: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

26

Fig. S8. Phylogeny-aware imputation. (A) Schematic of a phylogenetic tree. Asterisks indicate samples for which no genotype was called: { 1, 6, 14 }. By parsimony, the algorithm infers the T→G variant to have arisen on the branch incident upon node 11 and imputes missing data accordingly (white text on colored background). (B) Jalview (51) visualization of Y chromosome sequence in phylogenetic context. Assigning SNVs to branches enables hierarchical clustering of both variants (rows) and samples (columns). Phylogenetic branching patterns are clearly defined by specific sets of mutations.

Obs:Imp: G T T T

G T * TT T T G G G

9 13 14 15T * T G * G0 1 3 5 6 8

1817

12

411

2 7 10 16

A B

Page 32: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

27

Fig. S9. Y chromosome hgQ clade with Phase 1 1000 Genomes samples included. Y hgQ phylogeny derived by merging our data with 11 lower coverage sequences from Phase 1 of the 1000 Genomes Project. Haplogroups Q-L54*(xM3) and Q-M3 are indicated by different shades of blue. Each branch is labeled by an index and the number of SNPs assigned to the branch in brackets. Individuals are labeled by population, ID, and haplogroup. The two samples used for calibration in the main analyses are circled. Branch 18 indicates SNPs inferred to be shared by all of hgQ-M3, and branch 24 is shared by all of hgQ.

0.050.0

100.0150.0

200.0250.0

300.0

11. [59] MXL N

A19729 Q-M

3

25. [263] Pashtun HG

DP00243 R

-L657

7. [89] MXL N

A19735 Q-M

3

15. [54] MXL N

A19774 Q-L54

9. [93] MXL N

A19783 Q-M

3

19. [91] Maya H

GD

P00877 Q-L54

3. [35] MXL N

A19732 Q-M

3

5. [112] Maya H

GD

P00856 Q-M

3

0. [45] MXL N

A19682 Q-M

3

21. [61] MXL N

A19795 Q-L54

14. [89] MXL N

A19664 Q-M

3

1. [31] MXL N

A19786 Q-M

3

20. [36] MXL N

A19771 Q-L54

13. [97] CLM

HG

01124 Q-M

3

2.[9]

24.Q-L54.[149]

23.Q-L54*(xM

3).[21]

10.[3]

18.Q-M

3.[17]

6.[1]

22.[62]

4.[26]

Page 33: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

28

Fig. S10. Sequencing coverage for Mayan HGDP00856 at singleton sites. Sequencing depth distribution across all callable sites, scaled by the number of observed singletons (black). The observed depth distribution among singletons (red) indicates that sites with zero or one read are not identified as variable, however the observed distribution is in line with the censored expectation—the expectation conditional on the presence of two or more reads (blue).

Sequencing Depth

Num

ber o

f Sin

glet

ons

0 2 4 6 8 10 12 14 16 18 20

02

46

810

1214

1618

2022

ExpectedObservedCensored Expectation

Page 34: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

29

Fig. S11. mtDNA phylogeny. We constructed the mtDNA tree from our samples in order to directly compare the TMRCA of this locus to that of the Y chromosome. Branch lengths are the number of derived SNVs. Internal branches are labeled by the haplogroups they define, and individuals are labeled by haplogroup and the population from which the individual was drawn. This mtDNA tree is concordant with previous constructions of the phylogeny from whole mitochondrial genomes (8).

0.0

10

.02

0.0

30

.04

0.0

50

.06

0.0

M51a1 C

ambodian

L1c1a1a1b1 Baka

U7 P

ashtun

U6a1a1 M

ozabite

L0d1b2 San

HV

1 Mozabite

L3e3b1 Nzebi

L3e3b2 Nzebi

L2a2a1 Mbuti

Z1a Yakut

L1c1a2b Baka

B4b1a3a Y

akut

G1b Y

akut

L1c1a1a1a Baka

L1c1a1a1a Baka

L2a4 Mbuti

M72 C

ambodian

HV

1 Mozabite

A2w

Maya

W3a1b P

ashtun

L1c1a2a1 Baka

A2 M

aya

T1a1 Pashtun

A2 M

ayaC5a1 Y

akut

L1c1a1a1b Baka

W3a1 P

ashtun

L0d1b2 San

M24 C

ambodian

L1c1a2b Baka

L1c1a2a1 Baka

L0a2b Mbuti

L3e1a3a Nzebi

L1c1a1a1a Baka

J2a2b Yakut

L3e2b1 Nzebi

D5a2a2 Y

akut

L1c1a2b Baka

B5a1a C

ambodian

L5a1c Mbuti

L0d1c San

L0d1b1 San

V M

ozabite

L1c2b2 Nzebi

L3e2b1 Mozabite

L1b1a15 Nzebi

A2w

Maya

L2b1b Nzebi

A2w

MayaL1c1a2b B

aka

L0a2b Mbuti

L1c1a2a1 Baka

L1c3a1b Nzebi

D4i2 Y

akut

L3e3b2 Nzebi

L1c1b Nzebi

L0k1a San U

7 Pashtun

D5b1d Y

akut

L2a2b1 MbutiL1c1a1a1a B

aka

L1c1b Nzebi

L1c1a2b Nzebi

J2b1a PashtunL1c1a1a1b1 B

aka

L1c1a2a1 Baka

F1a1a1 Cam

bodian

V M

ozabite

M3a1 P

ashtun

L3e2b1 Nzebi

L2a1c Nzebi

L1c1a1a1b1 Baka

HV

1 Mozabite

L1c1a2b Nzebi

L0a2b1 Mbuti

U3a M

ozabite

R9b2 C

ambodian

L3f1b4a Nzebi

L1c1a1a1a Baka

L0d1c1a San

L1c1a2b Baka

R23 C

ambodian

L2a1c5 Nzebi

M17c C

ambodian

A2 M

aya

K2a5 P

ashtun

L3d1a1a Nzebi

L0a1b2 Nzebi L3d3a N

zebi

L1c1a2a1 Baka

B2 M

aya

A2 M

aya

L1c1a1a1a Baka

L1

V

L0

HV

CZ

L2 L3

W

N

J

R

D

M

U

A

JT

Page 35: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

30

Fig. S12. mtDNA calibration tree. Phylogeny constructed from 108 publicly available mtDNA hgA2 sequences (23). We used this clade to calibrate the mtDNA mutation rate based on divergence within the Americas.

0.01.0

2.03.0

4.05.0

6.07.0

8.09.0

10.011.0

12.0

A2+64 Dogrib

A2+64 Mexican

A2af1b2 Mexican

A2+64 Mexican

A2v Mexican

A2d Mexican

A2ad Unknown

A2aa Waiwai

A2aa PoturujaraA2ab G

uarani

A2+64 AcheA2+64 M

exican

A2+64 Waiapi

A2j Mexican A2u M

exican

A2v Mexican

A2o Mexican

A2 KatuenaA2h Yanomam

a

A2m M

exican

A2x Mexican

A2r Mexican

A2u Mexican

A2w Mexican

A2t Mexican

A2 Mexican

A2t Mexican

A2p1 Mexican

A2d MexicanA2 M

exican

A2u1 Mexican

A2c Mexican

A2+64 Mexican

A2f2 Mexican

A2af1b2 Mexican

A2 Mexican

A2u1 Mexican

A2+64 Mexican

A2+64 Mexican

A2d Mexican

A2g1 Mexican

A2 MexicanA2ae M

exican

A2h Yanomam

a

A2+64 Mexican

A2v Mexican

A2s Mexican

A2h1 Mexican

A2d Mexican

A2 Mexican

A2g Mexican

A2g Mexican

A2k1 WayuuA2f2 M

exicanA2ac Cayapa

A2l Mexican

A2 Surui

A2p Mexican

A2r Mexican

A2+64 Mexican

A2+64 Mexican

A2+64 Mexican

A2+64 Cayapa

A2+64 Mexican

A2u1 Mexican

A2+64 Waiwai

A2i Unknown

A2p Mexican

A2v Mexican

A2m M

exican

A2u1 Mexican

A2f3 Mexican

A2g Mexican

A2l Mexican

A2t Mexican

A2 Mexican

A2+64 Mexican

A2 Mexican

A2p1 Mexican

A2 Mexican

A2+64 Mexican

A2 Mexican

A2u1 Mexican

A2d Mexican A2g M

exican

A2j Mexican

A2g1 Mexican

A2p1 Mexican

A2f3 Mexican

A2+64 Mexican

A2n Unknown

A2 UnknownA2a Apache

A2l Mexican

A2h Kogui

A2p1 Mexican

A2+64 Mexican

A2+64 Mexican

A2l Mexican

A2x Mexican A2t M

exican

A2ab ZoroA2v Unknown

A2p Mexican A2 M

exican

A2+64 Mexican

A2w Arsario

A2o Mexican

Page 36: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

31

Fig. S13. Comparing the Y chromosome TMRCA to that of mtDNA. (A) 95% confidence intervals for each locus (blue boxes) with point estimates (horizontal lines). (B) 95% confidence bands for γ, the ratio of mtDNA TMRCA to that of the Y chromosome, as a function of τ, the ratio of coalescence times for two Y hgQ lineages and 108 hgA2 mtDNAs. Point estimates are plotted as a solid line, and the estimate corresponding to concordant divergence times is indicated with a solid black point. Shading indicates the narrowing of the confidence bands as a function of potential positive correlation between estimates of TMRCA for the two loci.

TMRCA Confidence Intervals

T MR

CA

(ky)

025

5075

100

125

150

175

200

chrY mtDNA

TMRCA Ratio

Calibration RatioT M

RC

A(m

tDN

A)T M

RC

A(Y)

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Point EstimatePoint Estimate (Calibration Ratio = 1)95% Confidence Bands (Correlation = 0)Correlation = 0.25Correlation = 0.50Correlation = 0.75

B A

Page 37: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

32

Table S1. Y chromosome summary of samples. Identifier, population, most derived mutation observed, ISOGG haplogroup, percentage of sites at which a genotype call was made, mean sequencing coverage. ID Population Mutation Haplogroup Call Rate Coverage HGDP01029 San A-P28 A1b1a1a1 98.1 6.44 HGDP00987 San A-P262 A1b1a1a2b 93.5 3.73 0919 Baka A-M14 A1b1a1 91.2 2.86 HGDP01036 San A-M51 A1b1b2a 98.5 6.15 0920 Baka B-M30 B2b1a1b 92.1 2.99 0909 Baka B-M112 B2b 82.8 2.05 0918 Baka B-M30 B2b1a1b 91.0 2.73 0912 Baka B-M112 B2b 76.9 1.82 0927 Baka B-M192 B2b1 86.3 2.36 0937 Baka B-M211 B2b1a1c 94.1 3.12 0908 Baka B-M211 B2b1a1c 88.2 2.46 HGDP00992 San B-P70 B2b1a2 97.7 6.25 0917 Baka B-M192 B2b1 94.4 3.48 0922 Baka B-M192 B2b1 91.9 2.84 0904 Baka B-M192 B2b1 98.4 5.13 0932 Baka B-M192 B2b1 91.7 3.12 0925 Baka B-M192 B2b1 90.8 2.77 0907 Baka B-M112 B2b 92.7 2.90 HGDP00449 Mbuti B-M192 B2b1 93.2 3.35 HGDP01032 San B-P6 B2b1b 98.6 6.55 HGDP00991 San B-P6 B2b1b 92.8 3.70 HGDP00462 Mbuti B-Page18 B2a 94.1 3.88 0702 Nzebi B-M109 B2a1a 90.3 2.62 HGDP01259 Mozabite E-M183 E1b1b1b1a2 94.5 3.88 HGDP01258 Mozabite E-M183 E1b1b1b1a2 93.4 3.81 HGDP01262 Mozabite E-M183 E1b1b1b1a2 96.6 3.78 HGDP01264 Mozabite E-M183 E1b1b1b1a2 90.4 2.58 HGDP00456 Mbuti E-CTS8030 E1b1a1a1f1a1d 93.6 3.51 7030 Nzebi E-CTS8030 E1b1a1a1f1a1d 94.2 3.18 0938 Baka E-CTS8030 E1b1a1a1f1a1d 94.6 3.26 0906 Baka E-CTS8030 E1b1a1a1f1a1d 98.8 6.33 0712 Nzebi E-P252 E1b1a1a1f1a1 91.9 2.86 0914 Baka E-CTS8030 E1b1a1a1f1a1d 93.1 3.10 0913 Baka E-CTS8030 E1b1a1a1f1a1d 99.0 6.03 7005 Nzebi E-CTS8030 E1b1a1a1f1a1d 79.2 1.87 7003 Nzebi E-CTS8030 E1b1a1a1f1a1d 69.7 1.53 0713 Nzebi E-CTS8030 E1b1a1a1f1a1d 90.2 2.58 HGDP01081 Mbuti E-P252 E1b1a1a1f1a1 98.5 5.63 0711 Nzebi E-M191 E1b1a1a1f1a 95.6 3.55 0928 Baka E-L515 E1b1a1a1f1b 89.8 2.73 0716 Nzebi E-P59 E1b1a1a1g1b 90.0 2.58 0708 Nzebi E-P277 E1b1a1a1g1 99.3 12.93 7032 Nzebi E-P277 E1b1a1a1g1 93.2 2.99

Page 38: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

33

HGDP00474 Mbuti E-P277 E1b1a1a1g1 90.7 2.62 0710 Nzebi E-P277 E1b1a1a1g1 81.1 2.02 0705 Nzebi E-P278.1 E1b1a1a1g1 85.4 2.46 0926 Baka E-U290 E1b1a1a1g1a 81.5 2.10 0703 Nzebi E-U290 E1b1a1a1g1a 83.9 2.22 0715 Nzebi E-Z1725 E1b1a1a1g1a2 88.4 2.46 0701 Nzebi E-M154 E1b1a1a1g1c 92.4 3.04 7019 Nzebi E-P277 E1b1a1a1g1 83.9 2.23 0707 Nzebi E-P277 E1b1a1a1g1 87.2 2.49 0709 Nzebi E-P277 E1b1a1a1g1 92.1 2.92 7020 Nzebi E-P277 E1b1a1a1g1 95.2 3.30 0714 Nzebi E-P278.1 E1b1a1a1g1 92.3 2.89 HGDP00222 Pashtun G-M406 G2a1c1 99.3 12.44 HGDP00213 Pashtun G-M377 G2b 91.9 2.87 HGDP00720 Cambodian H-M138 H1a3 98.7 5.67 HGDP00258 Pashtun L-M357 L1c 95.7 3.97 HGDP00964 Yakut N-L708 N1c1a1 92.0 2.94 HGDP00960 Yakut N-L708 N1c1a1 92.9 3.08 HGDP00948 Yakut N-L708 N1c1a1 94.7 3.51 HGDP00950 Yakut N-L708 N1c1a1 94.1 3.72 HGDP00715 Cambodian N-M231 N 96.8 3.93 HGDP00716 Cambodian O-Page23 O3a2c1a 92.6 3.10 HGDP00711 Cambodian O-M95 O2a1 93.2 3.50 HGDP00877 Maya Q-L54 Q1a3a 99.0 8.45 HGDP00856 Maya Q-M3 Q1a3a1 97.8 5.73 HGDP00243 Pashtun R-L657 R1a1a1b2a1a 94.5 3.57

Page 39: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

34

Table S2. M578 genotyping results. All samples from haplogroups A, B, C, D, E, and G possess the ancestral C allele, and all samples from haplogroups H, I, J, K, L, M, N, O, Q, R, S, and T possess the derived T allele. This validates the polytomy resolution (G, H, IJK) → (G, (H, IJK)). One individual from paragroup F-M89* (HGDP00528) also possesses the derived allele and should therefore be re-classified as hgHIJK* in light of the newly defined topology. ID Haplogroup Population Genotype HGDP01406 A-M13 Bantu – turk209 A-M13 Turk C turk6256 A-M13 Turk C HGDP00988 A-M6 San C HGDP00931 B1-M236 Yoruba C HGDP00992 B2-M112 San C Bsk111 cell C Burusho C HGDP00029 C-M356 Brahui C HGDP00545 C-M38 Papuan C HGDP01310 C* Dai C HGDP00758 C1-M8 Japanese C HGDP00104 C3-M217 Hazara C HGDP01214 D-M15 Daur C HGDP01183 D-M15 Yizu C HGDP00752 D-M55 Japanese C HGDP00757 D-M55 Japanese C HGDP01226 D-P47 Mongol C HGDP00944 E-M191* Yoruba C HGDP00757 F2-M427 Lahu C HGDP01318 F2-M427 Lahu C HGDP00528 F3-M282 French T HGDP01152 G-L497 Italian C HGDP01050 G-L497 Pima C HGDP00213 G-M377 Pashtun C HGDP00222 G-M406 Pashtun C HGDP00725 G-M485* Palestinian C HGDP01073 G-M527 Sardinian C HGDP00017 G-P15* Brahui C HGDP00893 G-P16 Russian – HGDP00626 G-P19 Bedouin C HGDP00723 G-P303 Palestinian C HGDP00049 G1-M285 Brahui C HGDP00359 H-M197 Burusho T HGDP00720 H-M39 Cambodian T HGDP00041 H-M52 Brahui T HGDP00062 H-M52 Balochi T HGDP00254 H-M69*(xM52) Pashtun T HGDP00428 H-M82 Burusho T HGDP00438 H-M82 Burusho T HGDP00319 H-M82 Kalash T

Page 40: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

35

HGDP00321 H-M82 Kalash T HGDP00326 H-M82 Kalash T HGDP00328 H-M82 Kalash T HGDP00224 H-M82 Pashtun T HGDP01066 I-M26 Sardinian T HGDP00627 J1-Page8 Bedouin T HGDP00555 K*-M9* Papuan T HGDP00084 L-M20 Balochi T HGDP00789 M-Page93 Melanesian T HGDP01295 N-M231 Han T HGDP00821 O-M117 Han T HGDP01060 Q-M346 Pima T HGDP00033 R-M17 Brauhi T HGDP00543 S-M230 Papuan T Greek ne7 T-M184 Greek T

Page 41: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

36

Table S3. Mutation rate point estimates. We estimate Y chromosome and mtDNA mutation rates using entry to the Americas as a calibration event. Point estimates used in the analysis appear in bold. These are based on two divergent Y hgQ samples from within this study and 108 publicly available mtDNA hgA2 sequences (23). As the timing of the Out of Africa event is not known to great precision, the corresponding mutation rate estimates are included for comparison only.

Event Source n T M P µly ×103 µlg ×102 µby ×109 µbg × 108

Y Chromosome

Entry to Americas 2 15 123.0 122 8.2 26 0.82 2.6 Out of Africa Internal 14 50 393.5 127 7.9 25 0.79 2.5

Mitochondrial Genome (Full)

Internal 7 15 5.71 2620 0.38 1.0 23 61 Entry to Americas External 108 15 5.66 2650 0.38 1.0 23 60 Out of Africa Internal 39 50 16.51 3030 0.33 0.88 20 53

Mitochondrial Genome (Hypervariable Region Omitted)

Internal 7 15 3.57 4200 0.24 0.63 15 40 Entry to Americas External 108 15 3.94 3810 0.26 0.70 17 44 Out of Africa Internal 39 50 10.15 4920 0.20 0.54 13 34

Data n: Number of lineages T: Estimated time since divergence (ky) M: Average number of mutations since divergence Mutation Rate Measures P: Mutation period (years / mutation): T / M µly: Per year mutation rate for the locus: M / T µlg: Per generation mutation rate for the locus: µly × g µby: Per year mutation rate per bp: µly / L µbg: Per generation mutation rate per bp: µly × (g / L)

Parameters g: Average generation time

chrY: 31.5 years / generation chrM: 26.5 years / generation

L: Locus length: chrY: 9.988 × 106 bp chrM Full: 16,571 bp chrM HVR omitted: 15,755 bp

Page 42: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

37

Data File S1. Sample, phylogeny, and variant data. This zipped archive (available at www.sciencemag.org) includes a more detailed version of the phylogeny presented in Fig. 2; a BED file detailing the regions within which genotype calls were made; population and haplogroup data for sampled individuals; data for each branch of the phylogeny, including length (# of SNVs) and the set of individuals within the subtree rooted at the branch; data for each variant, including phylogenetic placement, hg19 coordinate, ancestral and derived alleles, name, and ss#; and mtDNA genotype calls.

Data File S2. Y chromosome genotype calls. To protect participant privacy, this zipped archive is available through a data access agreement (DAA) for transfer of genetic data by contacting C.D.B.

Data File S3. Y chromosome mapped sequencing reads. This BAM file is also available via the DAA described above. Mapping, quality score recalibration, and indel realignment are described in Materials and Methods.

Page 43: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

38

FTP Addresses and Accession Numbers for External Data

Y Chromosome hgQ Sequences from the 1000 Genomes Project Server: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/ Binary Sequence Alignment/Map Files: HG01124/alignment/HG01124.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam NA19664/alignment/NA19664.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19682/alignment/NA19682.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19729/alignment/NA19729.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19732/alignment/NA19732.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19735/alignment/NA19735.mapped.ILLUMINA.bwa.MXL.low_coverage.20130415.bam NA19771/alignment/NA19771.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19774/alignment/NA19774.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19783/alignment/NA19783.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19786/alignment/NA19786.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19795/alignment/NA19795.mapped.ILLUMINA.bwa.MXL.low_coverage.20130415.bam

Complete mtDNA hgA2 Sequences: GenBank Accession Numbers AY195786.2 EF079873.1 EU095526.1 EU095528.1 EU095529.1 EU095530.1 EU095538.1 EU095552.1 EU095194.1 EU095195.1 EU095196.1 EU095197.1 EU095198.1 EU095199.1 EU095200.1 EU095201.1 EU095202.1 EU095204.1

EU095205.1 EU431081.1 EU431082.1 EU095545.2 EU431080.2 HQ012049.1 HQ012050.1 HQ012051.1 HQ012052.1 HQ012053.1 HQ012054.1 HQ012055.1 HQ012056.1 HQ012057.1 HQ012058.1 HQ012059.1 HQ012060.1 HQ012061.1

HQ012062.1 HQ012063.1 HQ012064.1 HQ012065.1 HQ012066.1 HQ012067.1 HQ012068.1 HQ012069.1 HQ012070.1 HQ012071.1 HQ012072.1 HQ012073.1 HQ012074.1 HQ012075.1 HQ012076.1 HQ012077.1 HQ012078.1 HQ012079.1

HQ012080.1 HQ012081.1 HQ012082.1 HQ012083.1 HQ012084.1 HQ012085.1 HQ012086.1 HQ012087.1 HQ012088.1 HQ012089.1 HQ012090.1 HQ012091.1 HQ012092.1 HQ012093.1 HQ012094.1 HQ012095.1 HQ012096.1 HQ012097.1

HQ012098.1 HQ012099.1 HQ012100.1 HQ012101.1 HQ012102.1 HQ012103.1 HQ012104.1 HQ012105.1 HQ012106.1 HQ012107.1 HQ012108.1 HQ012109.1 HQ012110.1 HQ012111.1 HQ012112.1 HQ012113.1 HQ012114.1 HQ012115.1

HQ012116.1 HQ012117.1 HQ012118.1 HQ012119.1 HQ012120.1 HQ012121.1 HQ012122.1 HQ012123.1 HQ012124.1 HQ012125.1 HQ012126.1 HQ012127.1 HQ012128.1 HQ012129.1 HQ012130.1 HQ012131.1 HQ012132.1 HQ012133.1

Page 44: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

39

References 1. J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun, M. W. Feldman, Population growth of

human Y chromosomes: A study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999). doi:10.1093/oxfordjournals.molbev.a026091 Medline

2. R. Thomson, J. K. Pritchard, P. Shen, P. J. Oefner, M. W. Feldman, Recent common ancestry of human Y chromosomes: Evidence from DNA sequence data. Proc. Natl. Acad. Sci. U.S.A. 97, 7360–7365 (2000). doi:10.1073/pnas.97.13.7360 Medline

3. H. Tang, D. O. Siegmund, P. Shen, P. J. Oefner, M. W. Feldman, Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition. Genetics 161, 447–459 (2002). Medline

4. M. F. Hammer, A recent common ancestry for human Y chromosomes. Nature 378, 376–378 (1995). doi:10.1038/378376a0 Medline

5. F. Cruciani, B. Trombetta, A. Massaia, G. Destro-Bisol, D. Sellitto, R. Scozzari, A revised root for the human Y chromosomal phylogenetic tree: The origin of patrilineal diversity in Africa. Am. J. Hum. Genet. 88, 814–818 (2011). doi:10.1016/j.ajhg.2011.05.002 Medline

6. M. Ingman, H. Kaessmann, S. Pääbo, U. Gyllensten, Mitochondrial genome variation and the origin of modern humans. Nature 408, 708–713 (2000). doi:10.1038/35047064 Medline

7. R. L. Cann, M. Stoneking, A. C. Wilson, Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987). doi:10.1038/325031a0 Medline

8. P. A. Underhill, T. Kivisild, Use of y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu. Rev. Genet. 41, 539–564 (2007). doi:10.1146/annurev.genet.41.110306.130407 Medline

9. M. A. Jobling, C. Tyler-Smith, The human Y chromosome: An evolutionary marker comes of age. Nat. Rev. Genet. 4, 598–612 (2003). doi:10.1038/nrg1124 Medline

10. H. Skaletsky, T. Kuroda-Kawaguchi, P. J. Minx, H. S. Cordum, L. Hillier, L. G. Brown, S. Repping, T. Pyntikova, J. Ali, T. Bieri, A. Chinwalla, A. Delehaunty, K. Delehaunty, H. Du, G. Fewell, L. Fulton, R. Fulton, T. Graves, S. F. Hou, P. Latrielle, S. Leonard, E. Mardis, R. Maupin, J. McPherson, T. Miner, W. Nash, C. Nguyen, P. Ozersky, K. Pepin, S. Rock, T. Rohlfing, K. Scott, B. Schultz, C. Strong, A. Tin-Wollam, S. P. Yang, R. H. Waterston, R. K. Wilson, S. Rozen, D. C. Page, The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423, 825–837 (2003). doi:10.1038/nature01722 Medline

11. Materials and methods are available as supplementary material on Science Online.

12. ISOGG, International Society of Genetic Genealogy (2013) (available at http://www.isogg.org/).

13. P. A. Underhill, G. Passarino, A. A. Lin, P. Shen, M. Mirazón Lahr, R. A. Foley, P. J. Oefner, L. L. Cavalli-Sforza, The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann. Hum. Genet. 65, 43–62 (2001). doi:10.1046/j.1469-1809.2001.6510043.x Medline

Page 45: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

40

14. W. Wei, Q. Ayub, Y. Chen, S. McCarthy, Y. Hou, I. Carbone, Y. Xue, C. Tyler-Smith, A calibrated human Y-chromosomal phylogeny based on resequencing. Genome Res. 23, 388–395 (2013). doi:10.1101/gr.143198.112 Medline

15. J. Z. Li, D. M. Absher, H. Tang, A. M. Southwick, A. M. Casto, S. Ramachandran, H. M. Cann, G. S. Barsh, M. Feldman, L. L. Cavalli-Sforza, R. M. Myers, Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008). doi:10.1126/science.1153717 Medline

16. T. M. Karafet, F. L. Mendez, M. B. Meilerman, P. A. Underhill, S. L. Zegura, M. F. Hammer, New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 18, 830–838 (2008). doi:10.1101/gr.7172008 Medline

17. J. F. Hughes, H. Skaletsky, T. Pyntikova, T. A. Graves, S. K. van Daalen, P. J. Minx, R. S. Fulton, S. D. McGrath, D. P. Locke, C. Friedman, B. J. Trask, E. R. Mardis, W. C. Warren, S. Repping, S. Rozen, R. K. Wilson, D. C. Page, Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463, 536–539 (2010). doi:10.1038/nature08700 Medline

18. R. C. Griffiths, S. Tavaré, Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. London B Biol. Sci. 344, 403–410 (1994). doi:10.1098/rstb.1994.0079 Medline

19. T. Goebel, M. R. Waters, D. H. O’Rourke, The late Pleistocene dispersal of modern humans in the Americas. Science 319, 1497–1502 (2008). doi:10.1126/science.1153569 Medline

20. M. C. Dulik, S. I. Zhadanov, L. P. Osipova, A. Askapuli, L. Gau, O. Gokcumen, S. Rubinstein, T. G. Schurr, Mitochondrial DNA and Y chromosome variation provides evidence for a recent common ancestry between Native Americans and Indigenous Altaians. Am. J. Hum. Genet. 90, 229–246 (2012). doi:10.1016/j.ajhg.2011.12.014 Medline

21. Y. Xue, Q. Wang, Q. Long, B. L. Ng, H. Swerdlow, J. Burton, C. Skuce, R. Taylor, Z. Abdellah, Y. Zhao, D. G. MacArthur, M. A. Quail, N. P. Carter, H. Yang, C. Tyler-Smith; Asan, Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree. Curr. Biol. 19, 1453–1457 (2009). doi:10.1016/j.cub.2009.07.032 Medline

22. R. G. Klein, Out of Africa and the evolution of human behavior. Evol. Anthropol. 17, 267–281 (2008). doi:10.1002/evan.20181

23. S. Kumar, C. Bellis, M. Zlojutro, P. E. Melton, J. Blangero, J. E. Curran, Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins. BMC Evol. Biol. 11, 293 (2011). doi:10.1186/1471-2148-11-293 Medline

24. S. Y. W. Ho, M. J. Phillips, A. Cooper, A. J. Drummond, Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. 22, 1561–1568 (2005). doi:10.1093/molbev/msi145 Medline

Page 46: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

41

25. B. M. Henn, C. R. Gignoux, M. W. Feldman, J. L. Mountain, Characterizing the time dependency of human mitochondrial DNA mutation rate estimates. Mol. Biol. Evol. 26, 217–230 (2009). doi:10.1093/molbev/msn244 Medline

26. D. R. Bentley, S. Balasubramanian, H. P. Swerdlow, G. P. Smith, J. Milton, C. G. Brown, K. P. Hall, D. J. Evers, C. L. Barnes, H. R. Bignell, J. M. Boutell, J. Bryant, R. J. Carter, R. Keira Cheetham, A. J. Cox, D. J. Ellis, M. R. Flatbush, N. A. Gormley, S. J. Humphray, L. J. Irving, M. S. Karbelashvili, S. M. Kirk, H. Li, X. Liu, K. S. Maisinger, L. J. Murray, B. Obradovic, T. Ost, M. L. Parkinson, M. R. Pratt, I. M. Rasolonjatovo, M. T. Reed, R. Rigatti, C. Rodighiero, M. T. Ross, A. Sabot, S. V. Sankar, A. Scally, G. P. Schroth, M. E. Smith, V. P. Smith, A. Spiridou, P. E. Torrance, S. S. Tzonev, E. H. Vermaas, K. Walter, X. Wu, L. Zhang, M. D. Alam, C. Anastasi, I. C. Aniebo, D. M. Bailey, I. R. Bancarz, S. Banerjee, S. G. Barbour, P. A. Baybayan, V. A. Benoit, K. F. Benson, C. Bevis, P. J. Black, A. Boodhun, J. S. Brennan, J. A. Bridgham, R. C. Brown, A. A. Brown, D. H. Buermann, A. A. Bundu, J. C. Burrows, N. P. Carter, N. Castillo, M. Chiara E Catenazzi, S. Chang, R. Neil Cooley, N. R. Crake, O. O. Dada, K. D. Diakoumakos, B. Dominguez-Fernandez, D. J. Earnshaw, U. C. Egbujor, D. W. Elmore, S. S. Etchin, M. R. Ewan, M. Fedurco, L. J. Fraser, K. V. Fuentes Fajardo, W. Scott Furey, D. George, K. J. Gietzen, C. P. Goddard, G. S. Golda, P. A. Granieri, D. E. Green, D. L. Gustafson, N. F. Hansen, K. Harnish, C. D. Haudenschild, N. I. Heyer, M. M. Hims, J. T. Ho, A. M. Horgan, K. Hoschler, S. Hurwitz, D. V. Ivanov, M. Q. Johnson, T. James, T. A. Huw Jones, G. D. Kang, T. H. Kerelska, A. D. Kersey, I. Khrebtukova, A. P. Kindwall, Z. Kingsbury, P. I. Kokko-Gonzales, A. Kumar, M. A. Laurent, C. T. Lawley, S. E. Lee, X. Lee, A. K. Liao, J. A. Loch, M. Lok, S. Luo, R. M. Mammen, J. W. Martin, P. G. McCauley, P. McNitt, P. Mehta, K. W. Moon, J. W. Mullens, T. Newington, Z. Ning, B. Ling Ng, S. M. Novo, M. J. O’Neill, M. A. Osborne, A. Osnowski, O. Ostadan, L. L. Paraschos, L. Pickering, A. C. Pike, A. C. Pike, D. Chris Pinkard, D. P. Pliskin, J. Podhasky, V. J. Quijano, C. Raczy, V. H. Rae, S. R. Rawlings, A. Chiva Rodriguez, P. M. Roe, J. Rogers, M. C. Rogert Bacigalupo, N. Romanov, A. Romieu, R. K. Roth, N. J. Rourke, S. T. Ruediger, E. Rusman, R. M. Sanches-Kuiper, M. R. Schenker, J. M. Seoane, R. J. Shaw, M. K. Shiver, S. W. Short, N. L. Sizto, J. P. Sluis, M. A. Smith, J. Ernest Sohna Sohna, E. J. Spence, K. Stevens, N. Sutton, L. Szajkowski, C. L. Tregidgo, G. Turcatti, S. Vandevondele, Y. Verhovsky, S. M. Virk, S. Wakelin, G. C. Walcott, J. Wang, G. J. Worsley, J. Yan, L. Yau, M. Zuerlein, J. Rogers, J. C. Mullikin, M. E. Hurles, N. J. McCooke, J. S. West, F. L. Oaks, P. L. Lundberg, D. Klenerman, R. Durbin, A. J. Smith, Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). doi:10.1038/nature07517 Medline

27. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). doi:10.1093/bioinformatics/btp324 Medline

28. A. Wysoker, K. Tibbetts, T. Fennell, Picard (2009) (available at http://picard.sourceforge.net/).

29. A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). doi:10.1101/gr.107524.110 Medline

Page 47: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

42

30. M. A. DePristo, E. Banks, R. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A. Philippakis, G. del Angel, M. A. Rivas, M. Hanna, A. McKenna, T. J. Fennell, A. M. Kernytsky, A. Y. Sivachenko, K. Cibulskis, S. B. Gabriel, D. Altshuler, M. J. Daly, A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). doi:10.1038/ng.806 Medline

31. K. Tamura, D. Peterson, N. Peterson, G. Stecher, M. Nei, S. Kumar, MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011). doi:10.1093/molbev/msr121 Medline

32. A. Kloss-Brandstätter, D. Pacher, S. Schönherr, H. Weissensteiner, R. Binna, G. Specht, F. Kronenberg, HaploGrep: A fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum. Mutat. 32, 25–32 (2011). doi:10.1002/humu.21382 Medline

33. M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009). doi:10.1002/humu.20921 Medline

34. D. M. Behar, M. van Oven, S. Rosset, M. Metspalu, E. L. Loogväli, N. M. Silva, T. Kivisild, A. Torroni, R. Villems, A “Copernican” reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012). doi:10.1016/j.ajhg.2012.03.002 Medline

35. R. C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). doi:10.1093/nar/gkh340 Medline

36. J. A. Rice, Mathematical Statistics and Data Analysis (Brooks/Cole, Belmont, CA, ed. 3rd, 2006), p. 166.

37. J. Hayya, D. Armstrong, N. Gressis, A note on the ratio of two normally distributed variables. Manage. Sci. 21, 1338–1341 (1975). doi:10.1287/mnsc.21.11.1338

38. J. F. C. Kingman, in Exchangeability in Probability and Statistics, G. Koch, F. Spizzichino, Eds. (North-Holland, Amsterdam, 1982), pp. 97–112.

39. C. de Filippo, C. Barbieri, M. Whitten, S. W. Mpoloka, E. D. Gunnarsdóttir, K. Bostoen, T. Nyambe, K. Beyer, H. Schreiber, P. de Knijff, D. Luiselli, M. Stoneking, B. Pakendorf, Y-chromosomal variation in sub-Saharan Africa: Insights into the history of Niger-Congo groups. Mol. Biol. Evol. 28, 1255–1269 (2011). doi:10.1093/molbev/msq312 Medline

40. S. Rootsi, N. M. Myres, A. A. Lin, M. Järve, R. J. King, I. Kutuev, V. M. Cabrera, E. K. Khusnutdinova, K. Varendi, H. Sahakyan, D. M. Behar, R. Khusainova, O. Balanovsky, E. Balanovska, P. Rudan, L. Yepiskoposyan, A. Bahmanimehr, S. Farjadian, A. Kushniarevich, R. J. Herrera, V. Grugni, V. Battaglia, C. Nici, F. Crobu, S. Karachanak, B. Hooshiar Kashani, M. Houshmand, M. H. Sanati, D. Toncheva, A. Lisa, O. Semino, J. Chiaroni, J. Di Cristofaro, R. Villems, T. Kivisild, P. A. Underhill, Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus. Eur. J. Hum. Genet. 20, 1275–1282 (2012). doi:10.1038/ejhg.2012.86 Medline

Page 48: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

43

41. J. Chiaroni, P. A. Underhill, L. L. Cavalli-Sforza, Y chromosome diversity, human expansion, drift, and cultural evolution. Proc. Natl. Acad. Sci. U.S.A. 106, 20174–20179 (2009). doi:10.1073/pnas.0910803106 Medline

42. A. S. Hinrichs, D. Karolchik, R. Baertsch, G. P. Barber, G. Bejerano, H. Clawson, M. Diekhans, T. S. Furey, R. A. Harte, F. Hsu, J. Hillman-Jackson, R. M. Kuhn, J. S. Pedersen, A. Pohl, B. J. Raney, K. R. Rosenbloom, A. Siepel, K. E. Smith, C. W. Sugnet, A. Sultan-Qurraie, D. J. Thomas, H. Trumbower, R. J. Weber, M. Weirauch, A. S. Zweig, D. Haussler, W. J. Kent, The UCSC Genome Browser Database: Update 2006. Nucleic Acids Res. 34, D590–D598 (2006). doi:10.1093/nar/gkj144 Medline

43. W. J. Kent, BLAT: The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002). Medline

44. M. R. Waters, S. L. Forman, T. A. Jennings, L. C. Nordt, S. G. Driese, J. M. Feinberg, J. L. Keene, J. Halligan, A. Lindquist, J. Pierson, C. T. Hallmark, M. B. Collins, J. E. Wiederhold, The Buttermilk Creek complex and the origins of Clovis at the Debra L. Friedkin site, Texas. Science 331, 1599–1603 (2011). doi:10.1126/science.1201855 Medline

45. T. D. Dillehay, C. Ramírez, M. Pino, M. B. Collins, J. Rossen, J. D. Pino-Navarro, Monte Verde: Seaweed, food, medicine, and the peopling of South America. Science 320, 784–786 (2008). doi:10.1126/science.1156533 Medline

46. A. Scally, R. Durbin, Revising the human mutation rate: Implications for understanding human evolution. Nat. Rev. Genet. 13, 745–753 (2012). doi:10.1038/nrg3295 Medline

47.1000 Genomes Project Consortium, An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). doi:10.1038/nature11632 Medline

48. E. Tamm, T. Kivisild, M. Reidla, M. Metspalu, D. G. Smith, C. J. Mulligan, C. M. Bravi, O. Rickards, C. Martinez-Labarga, E. K. Khusnutdinova, S. A. Fedorova, M. V. Golubenko, V. A. Stepanov, M. A. Gubina, S. I. Zhadanov, L. P. Ossipova, L. Damba, M. I. Voevoda, J. E. Dipierri, R. Villems, R. S. Malhi, Beringian standstill and spread of Native American founders. PLoS ONE 2, e829 (2007). doi:10.1371/journal.pone.0000829 Medline

49. F. L. Mendez, T. Krahn, B. Schrack, A. M. Krahn, K. R. Veeramah, A. E. Woerner, F. L. Fomine, N. Bradman, M. G. Thomas, T. M. Karafet, M. F. Hammer, An African American paternal lineage adds an extremely ancient root to the human Y chromosome phylogenetic tree. Am. J. Hum. Genet. 92, 454–459 (2013). doi:10.1016/j.ajhg.2013.02.002 Medline

50. G. Destro-Bisol, F. Donati, V. Coia, I. Boschi, F. Verginelli, A. Caglià, S. Tofanelli, G. Spedini, C. Capelli, Variation of female and male lineages in sub-Saharan populations: the importance of sociocultural factors. Mol. Biol. Evol. 21, 1673–1682 (2004). doi:10.1093/molbev/msh186 Medline

51. A. M. Waterhouse, J. B. Procter, D. M. Martin, M. Clamp, G. J. Barton, Jalview Version 2: A multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009). doi:10.1093/bioinformatics/btp033 Medline

Page 49: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

DOI: 10.1126/science.1242899, 465 (2013);341 Science

Rebecca L. CannY Weigh In Again on Modern Humans

This copy is for your personal, non-commercial use only.

clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others

  here.following the guidelines

can be obtained byPermission to republish or repurpose articles or portions of articles

  ): August 7, 2013 www.sciencemag.org (this information is current as of

The following resources related to this article are available online at

http://www.sciencemag.org/content/341/6145/465.full.htmlversion of this article at:

including high-resolution figures, can be found in the onlineUpdated information and services,

http://www.sciencemag.org/content/341/6145/465.full.html#relatedfound at:

can berelated to this article A list of selected additional articles on the Science Web sites

http://www.sciencemag.org/content/341/6145/465.full.html#ref-list-1, 5 of which can be accessed free:cites 11 articlesThis article

registered trademark of AAAS. is aScience2013 by the American Association for the Advancement of Science; all rights reserved. The title

CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 50: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 465

PERSPECTIVES

frequencies. However, only one frequency

proportional to the input OAM will sur-

vive in the total power of the scattered light

when two opposite OAM waves are used as

input. This analysis indicates that a Doppler

shift should occur even with ordinary “non-

twisted” light as the input, if only specifi c

nonzero OAM components are detected in

the scattered light. This additional test was

carried out by Lavery et al., confi rming the

predictions.

The work of Lavery et al. builds on sev-

eral previous results. A form of Doppler

shift arising from the transverse

translational motion of a scattering

inhomogeneous surface is a well-

known effect, which provides the

basis for the so-called laser speckle

velocimetry that allows for noncon-

tact surface-speed measurements

( 7). The present study can be seen as

a generalization of speckle velocim-

etry to the rotational case. Various

examples of Doppler-like effects

in the interaction of light with spin-

ning particles or molecules have

been reported ( 8), but these effects arise

from SAM scattering, not OAM. Hence,

they require the particles to be anisotropic

(rather than inhomogeneous) and, because

SAM is bounded by ħ, cannot be enhanced

at will, in contrast to the case with OAM.

Finally, a Doppler effect relying on OAM

was demonstrated in an ad hoc setup involv-

ing transmission through spinning Dove

prisms, which are truncated prisms used to

invert and rotate images ( 9).

The rotational Doppler effect demon-

strated by Lavery et al. could fi nd applica-

tion in noncontact remote measurement

of angular speeds. Particularly fascinat-

ing would be the detection of astronomical

object rotations by fi ltering specifi c OAM

components in the detected radiation ( 10).

However, high-OAM components in the

light of distant sources are expected to be

strongly attenuated, so the potential in this

area will require further study.

References 1. M. P. J. Lavery, F. C. Speirits, S. M. Barnett, M. J. Padgett,

Science 341, 537 (2013). 2. A. M. Yao, M. J. Padgett, Adv. Opt. Photon. 3, 161 (2011). 3. M. J. Padgett, R. Bowman, Nat. Photon. 5, 343 (2011). 4. A. Ambrosio, L. Marrucci, F. Borbone, A. Roviello, P.

Maddalena, Nat. Commun. 3, 989 (2012). 5. N. Bozinovic et al., Science 340, 1545 (2013). 6. V. D’Ambrosio et al., Nat. Commun. 3, 961 (2012). 7. T. Asakura, N. Takai, Appl. Phys. 25, 179 (1981). 8. B. I. Bialynicki, B. Z. Bialynicka, in The Angular Momentum

of Light, D. L. Andrews, M. Babiker, Eds. (Cambridge Univ. Press, Cambridge, 2012).

9. J. Courtial, D. Robertson, K. Dholakia, L. Allen, M. Padgett, Phys. Rev. Lett. 81, 4828 (1998).

10. F. Tamburini, B. Thidé, G. Molina-Terriza, G. Anzolin, Nat.

Phys. 7, 195 (2011).

10.1126/science.1242097

The age of the most recent man or

woman from whom all living humans

today descended has been the subject

of considerable debate. It has been suggested

that the date of our last common maternal

ancestor could have be three times older than

that of our last common paternal ancestor.

Two papers in this issue independently redate

our most recent common paternal ancestor

and fi nd that there is rather little or no dis-

parity with the age our common maternal

ancestor. On page 565, Francalacci et al. ( 1)

report their high resolution sampling of

1204 Sardinian men, yielding 11,763 phylo-

genetically informative and male-specific

single-nucleotide Y-chromosome polymor-

phisms (MSY-SNPs), and generate a puta-

tive estimate of 180,000 to 200,000 years

for the point at which all these and other

human paternal lineages coalesce. In a sep-

arate study on page 562, Poznik et al. ( 2)

detail their methods using sequences from

69 males drawn from nine populations, cov-

ering 9.99 million loci on the Y, and con-

clude that the most recent common pater-

nal ancestor lived 120,000 to 156,000 years

ago. These papers further confi rm an earlier

sequencing study ( 3) of 36 male donors that

pushed the ancestral Y back to 115,000 years

before present (yr B.P.), using almost 6800

variants shared by two or more men. This is

roughly the same as the dates derived on the

basis of mitochondrial genome analysis for

the most recent common maternal ancestor

( 4). So now it seems that a population giv-

ing rise to the strictly maternal and strictly

paternal portions of our genomes could have

produced individuals who found each other

in the same space and time.

While the papers of Francalacci et al.

and Poznik et al. are elegant, careful analy-

ses, the general public is more familiar with

mitochondrial and Y-chromosome analyses

in the context of population-based com-

parisons for assigning parentage or assess-

ing continental origin (so-called ancestry

Y Weigh In Again on Modern Humans

GENETICS

Rebecca L. Cann

Sampling of the human Y chromosome

eliminates the curious disparity in ages of

our last common male and female ancestors.

Department of Cell and Molecular Biology, John A. Burns School of Medicine, University of Hawaii at Manoa, 1960 East-West Road, Honolulu, HI 96822, USA. E-mail: [email protected]

Twisted light and the Doppler effect. Light carrying orbital angular momentum (OAM), repre-sented by its helical-structured wavefront in orange, is refl ected off a spinning disk. The disk’s sur-face roughness generates new OAM components in the scattered light. In this example, a single-helix wave with an OAM of ħ (one rotational quantum) is scattered into a triple-helix wave with an OAM of 3ħ. The scattered light then also acquires a Doppler frequency shift, represented here as a change of color to blue.

Published by AAAS

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 51: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org 466

PERSPECTIVES

testing). Maternally (i.e., mitochondrial)

and paternally (male-specifi c regions of the

Y) inherited genomes are used in ances-

try testing because they present such sim-

ple models of genetic inheritance. But they

can sometimes oversimplify the search for

a genetic homeland and family tree. When

using Y-chromosome markers, a grandfa-

ther to father to son pattern of inheritance

can be easily reconstructed, both solving and

raising issues regarding paternity, as seen in

the case of the Jefferson family and Sally

Hemings’s descendants ( 5). But SNP mark-

ers can undergo a type of genetic recombi-

nation called gene conversion that compli-

cates simple Y-chromosome typing and the

estimation of mutation rates, leading to spu-

rious conclusions when extrapolated over

generations. For example, Niederstätter et

al. ( 6) recently found that efforts to look at

bio-geographical origins were complicated

by the fact that changes in a 37–base pair

region of the Y chromosome occurred more

often among thousands of Austrian men

than could be accounted for by simple step-

wise mutations.

For most biologists, the analysis of

SNPs simply provides evidence of popula-

tion subdivision in the branching patterns

of our long-dead ancestors, and this can

offer an overwhelming sense of our geo-

graphical roots that some will fi nd appeal-

ing. However, for social scientists pondering

the social consequences of such disclosures

surrounding biological diversity in humans,

there can be instant recoil at past misguided

efforts to use genetics to justify racism.

While some have looked at genetic basis

of disease susceptibility in the context of

migrations of human populations ( 7), there

is always the danger of confusing the effects

of selection driven by the environment com-

pared to the genetic history of the popula-

tions in question. Indeed, some researchers

have concluded that human racial classifi ca-

tion is a continuing social construct and not

a biological reality at all ( 8).

On a grander scale of history, discov-

ery bias has been a consistent problem in

using SNPs ( 9), or paleoanthropology, to

reconstruct the past. So it is good news that

these two new papers provide fresh evi-

dence, using between them a diverse set

of data that will allow the consideration of

alternative demographic models of hominin

migration. The idea that culturally modern

humans pulsed out of Africa only 50,000

to 60,000 years ago ( 10) is widely promul-

gated, although earlier proposed calibra-

tions of mutation rates for the Y chromo-

some ( 11), as well as whole-genome analy-

sis of mitochondrial DNA ( 12), were con-

sistent with other models of more ancient

possible migrations involving anatomically

modern humans. Eventual ecological dis-

placement of temperate zone archaic pop-

Neandertals

A

B

Denisovians?

Unoccupied

Indian archaics

Modern humans

African archaics?

NeandertalsDenisovians?

Modern humans

Modern humans

Abandoned

African archaics?

Ancestors on the move. Using Y-chromosome data from modern human males to date the most recent common paternal ancestor will ultimately help to constrain demographic models of past hominin pop-ulation locations and migrations. (A) Possible distri-butions of different populations at ~190,000 B.P. As described in the text, Y-chromosome analyses might help explain the timing of cultural changes seen with regard to microblade tool use and the disappearance of archaic forms of hominins from India, generating the distribution seen at 71,000 B.P. (B). [Adapted from (13)]

Published by AAAS

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 52: Sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females

www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 467

PERSPECTIVES

ulations, both Neandertal and Denisovan,

by modern humans has been put forward to

account for discordancy in tool traditions

from archaeological research at sites in

central India ( 13), where microblade tech-

nology had been in use only since 45,000

years ago, as compared to sites in Africa,

China, and Malaysia. In these latter loca-

tions, the hominin populations appeared

to be culturally modern (in terms of the

tools they were making) and well adapted

to the emerging localities, suggesting that

the time frames implied by a short Y chro-

nology allows insufficient time for inva-

sion and settlement. More broadly, marine

isotope 5–calibrated dates in the range of

85,000 to 130,000 yr B.P. suggest that mod-

ern humans were on the move between trop-

ical and subtropical zones during the peri-

ods when climate oscillated in the temper-

ate regions they would later successfully

reinvade, leaving us as their legacy.

References 1. P. Francalacci et al., Science 341, 565 (2013).

2. G. David Poznik et al., Science 341, 562 (2013).

3. W. Wei et al., Genome Res. 23, 388 (2013).

4. R. L. Cann, M. Stoneking, A. C. Wilson, Nature 325, 31

(1987).

5. E. A. Foster et al., Nature 396, 27 (1998).

6. H. Niederstätter et al., Forensic Sci. Int. Genet.

10.1016/j.fsigen.2013.05.010 (2013).

7. E. Corona et al., PLoS Genet. 9, e1003447 (2013).

8. E. Bonilla-Silva, Racism Without Racists: Color-Blind Rac-

ism and the Persistence of Racial Inequality in the United

States (Rowman and Littlefi eld, Lanham, MD 2006)

9. A. Albrechtsen, F. C. Nielsen, R. Nielsen, Mol. Biol. Evol. 27,

2534 (2010).

10. R. Klein, The Human Career: Human Biological and

Cultural Origins, (Univ. of Chicago Press, Chicago, ed.3.,

2009)

11. F. Cruciani et al., Am. J. Hum. Genet. 88, 814 (2011).

12. A. Olivieri et al., Science 314, 1767 (2006).

13. S. Mishra, N. Chauhan, A. Singhvi, PLoS ONE 8, e69280

(2013).

Our understanding of the forms,

functions, and movement of RNA

continues to expand. Not only

can RNA control gene expression by mul-

tiple mechanisms within a cell, it appears

to travel outside the cell within an organism

as well. This raises the interesting question

of whether the RNA world extends beyond

the boundaries of the organism. Can RNA

traffi c integrate an organism into its envi-

ronment—is there “social RNA”? Exam-

ining the mechanism of RNA interference

(RNAi) may be a good route for seeking the

answer.

In many eukaryotic cells, exposure to

double-stranded RNA (dsRNA) can initi-

ate an RNAi response that generates small

interfering RNA (siRNA). These are potent

silencing molecules that use

base-pairing to recognize genes

with sequence similarity to the

original double-stranded trigger

( 1, 2). Moreover, many organ-

isms, including mustard cress

and roundworms, possess mech-

anisms to move siRNAs between

tissues ( 3, 4). So far, research into the func-

tions of RNAi has focused on its role within

an organism—in antiviral defense or in

silencing repetitive DNA sequences in the

genome, for example. In the model nema-

tode Caenorhabditis elegans, however,

molecular mechanisms facilitate traffi cking

of functional RNA to and from cells. This

extends the RNAi response outside of the

cell and possibly even outside of the organ-

ism. The functional importance of either for

C. elegans in the wild is still unknown. How-

ever, successfully investigating such roles

could be achieved by analyzing nematodes

in their natural habitat, for which ecological

characterization is more advanced than for

the laboratory workhorse C. elegans.

One of the most remarkable features of

RNAi in C. elegans is that feeding these

animals dsRNA can silence endogenous

genes. This response differs from nonspe-

cifi c infl ammatory responses to dsRNA in

mammalian cells because only genes with

matching sequence to the ingested dsRNA

will be silenced ( 5, 6). A specifi c pathway

that takes up long dsRNA (~200

to 500 base pairs) from the gut

lumen involves the channel pro-

tein SID-2 ( 7, 8). Inside the cell,

dsRNA is cleaved by the endori-

bonuclease Dicer (DCR-1) to

generate ~23-nucleotide RNAs,

known as siRNAs. siRNAs are

bound by Argonaute proteins, and the result-

ing complex targets messenger RNA for

degradation. In addition, RNA-dependent

RNA polymerase enzymes amplify the trig-

ger, thereby bolstering its silencing effect

on target genes ( 9). Another dsRNA-selec-

tive channel, SID-1, subsequently allows the

silencing RNA signal to spread throughout

the animal ( 10). The signal can even reach

the germ line, thus instigating a transgenera-

tional response ( 11– 13).

Exploiting the SID pathway enables the

function of almost any gene to be examined

simply by making a bacterial strain express-

ing dsRNA that matches the gene of interest

—a great tool. But the broad implications of

the response that this pathway elicits have

not resonated widely, in part because its

function in the normal C. elegans life cycle

is mysterious. What possible use could there

be for a pathway that takes up RNA from the

environment and uses it to silence endoge-

nous genes?

An attractively simple idea is that C. ele-

gans might respond to dsRNA that is natu-

rally produced by the bacteria it consumes.

This would allow C. elegans to mount an

RNAi response against bacterial RNAs that

enter the gut. siRNAs produced by cleav-

age of a bacterial dsRNA trigger could tar-

get endogenous genes, redirecting gene

expression programs in response to differ-

ent diets. Although a plausible model, there

is no clear evidence yet that this occurs. A

noncoding RNA produced by certain Esch-

erichia coli strains might cause gene expres-

sion changes via RNAi in C. elegans ( 14).

However, mutations in SID pathway genes

do not obviously compromise fi tness under

laboratory conditions, which suggests that

“environmental RNAi” is not important for

growth on E. coli in general. In the wild, C.

elegans probably feeds on bacteria growing

on rotting fruit ( 15) and therefore encoun-

ters multiple species of microbes, so deeper

sampling of ecologically relevant bacteria

might provide insight into the role of SID-

encoding genes. Our understanding of the

Is There Social RNA?

MOLECULAR BIOLOGY

Peter Sarkies and Eric A. Miska

The idea that RNA can be transferred between

organisms and function in communication and

environmental sensing is discussed.

Gurdon Institute and Department of Biochemistry, Uni-versity of Cambridge, Cambridge CB2 1QN, UK. E-mail: [email protected]

Onlinesciencemag.org

Podcast interview with author Eric

Miska (http://scim.ag/ed_6145).

10.1126/science.1242899

Published by AAAS

on

Aug

ust 7

, 201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from