genetic evidence for archaic admixture in africabiology-web.nmsu.edu/~houde/archaic homo admixture...

14
Genetic evidence for archaic admixture in Africa Michael F. Hammer a,b,1 , August E. Woerner a , Fernando L. Mendez b , Joseph C. Watkins c , and Jeffrey D. Wall d a Arizona Research Laboratories Division of Biotechnology, b Department of Ecology and Evolutionary Biology, and c Mathematics Department, University of Arizona, Tucson, AZ 85721; and d Institute for Human Genetics, University of California, San Francisco, CA 94143 Edited by Ofer Bar-Yosef, Harvard University, Cambridge, MA, and approved July 27, 2011 (received for review June 13, 2011) A long-debated question concerns the fate of archaic forms of the genus Homo: did they go extinct without interbreeding with an- atomically modern humans, or are their genes present in contem- porary populations? This question is typically focused on the genetic contribution of archaic forms outside of Africa. Here we use DNA sequence data gathered from 61 noncoding autosomal regions in a sample of three sub-Saharan African populations (Mandenka, Biaka, and San) to test models of African archaic ad- mixture. We use two complementary approximate-likelihood approaches and a model of human evolution that involves recent population structure, with and without gene ow from an archaic population. Extensive simulation results reject the null model of no admixture and allow us to infer that contemporary African populations contain a small proportion of genetic material (z2%) that introgressed z35 kya from an archaic population that split from the ancestors of anatomically modern humans z700 kya. Three candidate regions showing deep haplotype divergence, un- usual patterns of linkage disequilibrium, and small basal clade size are identied and the distributions of introgressive haplotypes surveyed in a sample of populations from across sub-Saharan Africa. One candidate locus with an unusual segment of DNA that extends for >31 kb on chromosome 4 seems to have introgressed into modern Africans from a now-extinct taxon that may have lived in central Africa. Taken together our results suggest that polymorphisms present in extant populations introgressed via relatively recent interbreeding with hominin forms that di- verged from the ancestors of modern humans in the Lower-Middle Pleistocene. H. sapiens | hybridization I t is now well accepted that anatomically modern humans (AMH) originated in Africa and eventually dispersed to all inhabited parts of the world. What is not known is the extent to which the ancestral population that gave rise to AMH was ge- netically isolated, and whether archaic hominins made a genetic contribution to the modern human gene pool. Answering these questions has important implications for understanding the way in which adaptations associated with modern traits were assem- bled in the human genome: do the genes of AMH descend ex- clusively from a single isolated population, or do our genes descend from divergent ancestors that occupied different eco- logical niches over a wider geographical range across and outside of the African Pleistocene landscape? The introgression debate is typically framed in terms of in- terbreeding between AMH and Neandertals in Europe or other archaic forms in Asia. The opportunity for such hybridizations may have existed between 90 and 30 kya, after early modern humans dispersed from Africa and before archaic forms went extinct in Eurasia (15). Recent genome-level analyses of ancient DNA suggest that a small amount of gene ow did occur from Neandertals into the ancestors of non-Africans sometime after AMH left Africa (6) and that an archaic Denisovanpopulation contributed genetic material to the genomes of present-day Melanesians (7). Given recent fossil evidence, however, the greatest opportunity for introgression was in Africa, where AMH and various archaic forms coexisted for much longer than they did outside of Africa (5, 811). Indeed, the fossil record indicates that a variety of transitional forms with a mosaic of archaic and modern features lived over an extensive geographic area from Morocco to South Africa between 200 and 35 kya (1215). Although sequencing the Neandertal and Denisovan genomes has provided evidence that gene ow between archaic and modern humans is plausible, it has not aided efforts to determine the extent of introgression in African populations. Here we use a different strategy to address the question of ancient population structure in Africa. Using multilocus DNA sequence poly- morphism data from extant Africans, we analyze patterns of di- vergence and linkage disequilibrium (LD) to detect the signature of archaic admixture (1618). Application of this approach to publicly available sequence data from the Environmental Ge- nome Project found evidence of ancient population structure in both African and non-African populations (19). However, anal- yses of diversity in and around coding regions may be compli- cated by the effects of recent natural selection, which might contribute to unusual patterns of polymorphism. In this study we use a large resequencing dataset that includes 61 noncoding regions on the autosomes to test whether patterns of neutral polymorphism in three contemporary sub-Saharan African populations are better explained by archaic admixture. Although whole-genome polymorphism data are now available from hun- dreds of samples (20), they do not include individuals from Af- rican hunter-gatherer populations, which serve as important reservoirs of human genetic diversity. Our study includes two such populations (Biaka Pygmies and San), along with an agri- cultural population from West Africa (Mandenka). We use a model of historically isolated subpopulations (17, 21) to predict patterns of nucleotide variation expected as a consequence of no admixture (null hypothesis) vs. low levels of admixture (alter- native hypothesis). We apply two complementary coalescent- based approaches, a two-population and a three-population model to test the null hypothesis, and then estimate three key parameters: the time of admixture (T a ), the ancestral split time (T 0 ), and the admixture proportion (a). Results Two-Population Model. In this approach we follow a two-step strategy (18). First we estimate demographic parameters of the null model using summary statistics that quantify recent African population structure. Using these model parameters, we then test the hypothesis of no admixture using a different summary statistic that is designed to detect low levels of genetic exchange between modern and archaic humans. The null model of recent African population structure without archaic admixture incor- porates divergence, migration, and recent population growth (Fig. 1A). We calculate a composite likelihood of the summa- rized data on a grid of parameter values (18) (SI Materials and Methods). Parameter estimates, along with simulation-based 95% Author contributions: M.F.H., A.E.W., F.L.M., J.C.W., and J.D.W. designed research; A.E.W. and J.D.W. performed research; A.E.W., F.L.M., J.C.W., and J.D.W. analyzed data; and M.F.H., A.E.W., F.L.M., J.C.W., and J.D.W. wrote the paper. The authors declare no conict of interest. This article is a PNAS Direct Submission. 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1109300108/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1109300108 PNAS | September 13, 2011 | vol. 108 | no. 37 | 15123e15128 ANTHROPOLOGY

Upload: others

Post on 18-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Genetic evidence for archaic admixture in AfricaMichael F. Hammera,b,1, August E. Woernera, Fernando L. Mendezb, Joseph C. Watkinsc, and Jeffrey D. Walld

aArizona Research Laboratories Division of Biotechnology, bDepartment of Ecology and Evolutionary Biology, and cMathematics Department, University ofArizona, Tucson, AZ 85721; and dInstitute for Human Genetics, University of California, San Francisco, CA 94143

Edited by Ofer Bar-Yosef, Harvard University, Cambridge, MA, and approved July 27, 2011 (received for review June 13, 2011)

A long-debated question concerns the fate of archaic forms of thegenus Homo: did they go extinct without interbreeding with an-atomically modern humans, or are their genes present in contem-porary populations? This question is typically focused on thegenetic contribution of archaic forms outside of Africa. Here weuse DNA sequence data gathered from 61 noncoding autosomalregions in a sample of three sub-Saharan African populations(Mandenka, Biaka, and San) to test models of African archaic ad-mixture. We use two complementary approximate-likelihoodapproaches and a model of human evolution that involves recentpopulation structure, with and without gene flow from an archaicpopulation. Extensive simulation results reject the null model ofno admixture and allow us to infer that contemporary Africanpopulations contain a small proportion of genetic material (z2%)that introgressed z35 kya from an archaic population that splitfrom the ancestors of anatomically modern humans z700 kya.Three candidate regions showing deep haplotype divergence, un-usual patterns of linkage disequilibrium, and small basal clade sizeare identified and the distributions of introgressive haplotypessurveyed in a sample of populations from across sub-SaharanAfrica. One candidate locus with an unusual segment of DNA thatextends for >31 kb on chromosome 4 seems to have introgressedinto modern Africans from a now-extinct taxon that may havelived in central Africa. Taken together our results suggest thatpolymorphisms present in extant populations introgressed viarelatively recent interbreeding with hominin forms that di-verged from the ancestors of modern humans in the Lower-MiddlePleistocene.

H. sapiens | hybridization

It is now well accepted that anatomically modern humans(AMH) originated in Africa and eventually dispersed to all

inhabited parts of the world. What is not known is the extent towhich the ancestral population that gave rise to AMH was ge-netically isolated, and whether archaic hominins made a geneticcontribution to the modern human gene pool. Answering thesequestions has important implications for understanding the wayin which adaptations associated with modern traits were assem-bled in the human genome: do the genes of AMH descend ex-clusively from a single isolated population, or do our genesdescend from divergent ancestors that occupied different eco-logical niches over a wider geographical range across and outsideof the African Pleistocene landscape?The introgression debate is typically framed in terms of in-

terbreeding between AMH and Neandertals in Europe or otherarchaic forms in Asia. The opportunity for such hybridizationsmay have existed between 90 and 30 kya, after early modernhumans dispersed from Africa and before archaic forms wentextinct in Eurasia (1–5). Recent genome-level analyses of ancientDNA suggest that a small amount of gene flow did occur fromNeandertals into the ancestors of non-Africans sometime afterAMH left Africa (6) and that an archaic “Denisovan” populationcontributed genetic material to the genomes of present-dayMelanesians (7). Given recent fossil evidence, however, thegreatest opportunity for introgression was in Africa, where AMHand various archaic forms coexisted for much longer than theydid outside of Africa (5, 8–11). Indeed, the fossil record indicatesthat a variety of transitional forms with a mosaic of archaic and

modern features lived over an extensive geographic area fromMorocco to South Africa between 200 and 35 kya (12–15).Although sequencing the Neandertal and Denisovan genomes

has provided evidence that gene flow between archaic andmodern humans is plausible, it has not aided efforts to determinethe extent of introgression in African populations. Here we usea different strategy to address the question of ancient populationstructure in Africa. Using multilocus DNA sequence poly-morphism data from extant Africans, we analyze patterns of di-vergence and linkage disequilibrium (LD) to detect the signatureof archaic admixture (16–18). Application of this approach topublicly available sequence data from the Environmental Ge-nome Project found evidence of ancient population structure inboth African and non-African populations (19). However, anal-yses of diversity in and around coding regions may be compli-cated by the effects of recent natural selection, which mightcontribute to unusual patterns of polymorphism. In this study weuse a large resequencing dataset that includes 61 noncodingregions on the autosomes to test whether patterns of neutralpolymorphism in three contemporary sub-Saharan Africanpopulations are better explained by archaic admixture. Althoughwhole-genome polymorphism data are now available from hun-dreds of samples (20), they do not include individuals from Af-rican hunter-gatherer populations, which serve as importantreservoirs of human genetic diversity. Our study includes twosuch populations (Biaka Pygmies and San), along with an agri-cultural population from West Africa (Mandenka). We usea model of historically isolated subpopulations (17, 21) to predictpatterns of nucleotide variation expected as a consequence of noadmixture (null hypothesis) vs. low levels of admixture (alter-native hypothesis). We apply two complementary coalescent-based approaches, a two-population and a three-populationmodel to test the null hypothesis, and then estimate three keyparameters: the time of admixture (Ta), the ancestral split time(T0), and the admixture proportion (a).

ResultsTwo-Population Model. In this approach we follow a two-stepstrategy (18). First we estimate demographic parameters of thenull model using summary statistics that quantify recent Africanpopulation structure. Using these model parameters, we thentest the hypothesis of no admixture using a different summarystatistic that is designed to detect low levels of genetic exchangebetween modern and archaic humans. The null model of recentAfrican population structure without archaic admixture incor-porates divergence, migration, and recent population growth(Fig. 1A). We calculate a composite likelihood of the summa-rized data on a grid of parameter values (18) (SI Materials andMethods). Parameter estimates, along with simulation-based 95%

Author contributions: M.F.H., A.E.W., F.L.M., J.C.W., and J.D.W. designed research; A.E.W.and J.D.W. performed research; A.E.W., F.L.M., J.C.W., and J.D.W. analyzed data; andM.F.H., A.E.W., F.L.M., J.C.W., and J.D.W. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1109300108/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1109300108 PNAS | September 13, 2011 | vol. 108 | no. 37 | 15123e15128

ANTH

ROPO

LOGY

Page 2: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

confidence intervals (CIs), are given in Table S1. Two patternsemerge from these analyses: estimates of the start of growth arevery recent, and estimates of the population split time are rela-tively old. Although the recent growth estimates are consistentwith results of previous studies (23, 24), the estimate of a di-vergence time that predates the origin of modern humans basedon fossil data (450 kya, Biaka–Mandenka comparison) was un-expected. There are several possible explanations for this ob-servation. First, it is possible that the true divergence time is oldand that AMH evolved within the context of a geographicallystructured population. Alternatively, it is possible that the truedivergence time is younger and that the old estimate arose eitherby chance or by bias caused by model misspecification (i.e., thetrue demographic model is different from Fig. 1A). Our simu-lations suggest that the large divergence estimate might happenby chance roughly 2% of the time, if the true demographic model(i.e., without admixture) were as in Fig. 1A (SI Materials andMethods and Table S2).We then tested for archaic admixture using the estimated

model parameters of the null model and a summary of LD (S*)that was specifically designed to be sensitive to archaic admixture(18, 19). The evidence for archaic admixture is extremely strongin the Biaka and the San (P < 10−4) but not in the Mandenka(P > 0.05). Quantile–quantile plots for the distribution of Pvalues across loci are shown in Fig. S1.

Three-Population Model. To complement the first approach, wealso implemented an approximate-likelihood method to estimateadmixture parameters under a three-population isolation andmigration model (Fig. 1B). Because this is a new inferentialstrategy, we explain our approach in some detail. In an isolationand admixture model (17) we expect to find loci with both deephaplotype divergence (reflecting a long period of isolation forthose haplotypes that trace to different subpopulations) and ele-vated levels of LD (reflecting a reduced time for diverged hap-lotypes to recombine). If levels of admixture are low, then oneclass of haplotypes is expected to be at low frequency (i.e., a smallbasal clade). In other words, low levels of recent admixture with anarchaic human population are likely to produce data with a smallsubsample of sequences that are highly diverged over an extendedregion of the chromosome. With this in mind, we developed our

three summary statistics as follows. For each locus, we identify thetwo most diverged sequences and then define two groups, G1 andG2, by genetic similarity to the two designated sequences. Fromthis we set our three statistics for approximate likelihood: (i) D1,the fraction of polymorphisms that are shared between G1 and G2.D1 reflects the amount of recombination and thus is sensitive tothe time of introgression, Ta. (ii) D2, the ratio of the number ofdifferences between the two distinguished sequences describedabove and the number of fixed sequence differences betweenhuman and chimpanzee. D2 reflects the relative time-depth of thegenealogy and thus is sensitive to the archaic split time, T0. (iii)D3,the size of the smaller of the groups, G1 and G2. D3 reflects therelative size of the two most basal clades and thus is sensitive tothe amount of admixture, a.Our approximate-likelihood protocol estimates the distribu-

tion of the summary statistics D1, D2, and D3 on the basis of thesimulation of a large number of ancestral recombination graphs(ARGs). An important part of this protocol is the choice oftolerances or bin sizes δ1, δ2, and δ3 for their respective summarystatistics. In general, we chose tolerances to maximize power fora = 1% (SI Materials and Methods).We find that the data are significantly unlikely under the null

model of no admixture (i.e., the likelihood ratio test yieldsa bootstrapped P value of 0.0493). We note that this result isconservative because it is based on estimates of recombinationrate that are biased downward and a tolerance that is less pow-erful in regions of high recombination (see below). Interestingly,we find evidence for two separate peaks in the maximum-likeli-hood surface: (i) an older peak with an archaic split time,T0 z 700 kya, a time of admixture, Ta z 35 kya, and an admixtureproportion, a z 2%; and (ii) a more recent peak with T0 z 375kya, Ta z 15 kya, and a z 0.5% (Fig. 2). Although our methodhas little power to infer the exact admixture proportion, we canplace 95% CIs on the times of divergence (125 kya < T0 < 1.5Mya) and admixture (Ta < 70 kya) (SI Materials and Methods).Note that T0 for the more recent peak is consistent with the Biaka–Mandenka split time estimates from the two-population model.

Fig. 1. Schematic of the (A) two-population model and the (B) three-pop-ulation model. Both demographic models test the fit of admixture with anarchaic group (dotted lines) who split from the ancestors of modern humansat time T0 and a (%) of alleles introgressed into the modern gene pool attime Ta. The dashed lines represent all possible locations where admixturecould occur. Both models begin with a single population of size Na, followedby a population split at time T1, with population growth beginning at timesg1 and g2, and a constant symmetric migration rate M. For B, an additionalpopulation split at time T2 also occurs. This model also assumes that theancestors of the San split first from those of the Mandenka and Biaka (22).

Fig. 2. Approximate likelihood profile based on 60 loci for time of in-trogression and archaic split time. A log-likelihood difference of 3.92 definesthe 95% confidence region (using the χ2 approximation). Likelihood esti-mates at each locus have at least 10 ARGs for both ψold and ψrecent.

15124 | www.pnas.org/cgi/doi/10.1073/pnas.1109300108 Hammer et al.

Page 3: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Likelihood Ratios of Individual Loci. The two inferential methodscan also assess the evidence for archaic admixture at individualloci (SI Materials and Methods). Both methods identify the samelocus on chromosome 4 (4qMB179) as a strong candidate forarchaic admixture (P < 5 × 10−4 for each method). Table 1describes the three loci exhibiting the lowest P value in the three-population model. Of the six individuals in the minimum clades,four are Biaka (4qMB179, 18qMB60) and two are San (13qMB107).Although both inferential methods identified the 13qMB107 lo-cus as a likely candidate, the result is much more significant forthe three-population (P < 0.001) vs. two-population (P = 0.049)model (Table 1). We note that the power of the two-populationapproach is reduced when evidence of introgression is limited toa short tract of DNA (as in the case of 13qMB107 where it isfound only in the first subset; Discussion). For 18qMB60, the two-population method excludes singletons from the S* analysis. Ifthey were included, the P value for 18qMB60 would be below0.01 (Table 1).To address the question of whether some loci favor one

maximum in the likelihood surface over the other (i.e., ψrecent vs.ψold), we compute the likelihood ratio (SI Materials and Meth-ods) for each locus (Fig. S2). Notably, the four most extremelikelihood ratios include the three loci that individually favorψold (Table 1).

Analysis of the 4qMB179 Region.We now turn to a focused analysisof the 4qMB179 region, a region characterized by no evidence ofrecombination between the major clades and deep haplotypedivergence. In thez20-kb region that was initially surveyed (Fig.3A), we identified 20 SNPs (and one insertion) that separatethree Biaka haplotypes (B1–B3; Table S3) from all of theremaining African sequences. To determine the full length of theunusual pattern of SNPs, we gathered additional DNA sequencedata from all individuals in our panel (Fig. 3A) and identifieda 31.4-kb region with 37 completely linked sites where the Biakahaplotypes are 0.3% diverged from the other sequences in oursample. Using a simple model of isolation followed by recentmixing, we next developed likelihood-based methods for esti-mating a split time and admixture time for the locus (SI Materialsand Methods). We estimated an initial split time of 1.25 Mya(95% CI, 0.7–2.1 Mya) and an admixture time of 37 kya (95%CI, 1–137 kya) (Fig. S3).

Geographic Surveys.A survey of the insertion that is diagnostic forthe divergent haplotype at 4qMB179 (i.e., at position 179,598,847in Table S3) in 502 individuals from West, East, central, andsouthern Africa reveals that it reaches its highest average fre-quency (3.6%) in Pygmy groups from west-central Africa (Fig. 4).The variant is also found at low average frequencies (0.8%) insome non-Pygmy groups from West and East Africa. An A/Gmutation that marks the divergent haplotype at 18qMB60 showsa similar distribution—also reaching its highest average fre-quency in the Pygmy groups—although it is found at slightlylower frequencies than the variant at 4qMB179 (i.e., 1.6% vs.

3.6%, respectively). This variant is also found in some non-Pygmy groups, exhibiting similar average frequencies as the4qMB179 variant in West Africans (0.8%), East Africans (0.8%),and southern Africans (0.5% vs. 0.0%, respectively) (Fig. 4).Interestingly, the distribution of the G/A variant marking thedivergent haplotype at 13qMB107 exhibits a somewhat differentgeographic distribution, reaching its highest average frequency inour sample of southern Africans (6.3%, and especially in the Sanat a frequency of 11.9%) rather than in central African Pygmies(average of 5.2%). However, it is important to note that itspresence in our sample of central Africans is entirely limited tothe Mbuti, where it has a frequency of 14.8%.

DiscussionOur inference methods reject the hypothesis that the ancestralpopulation that gave rise to AMH in Africa was geneticallyisolated and point to several candidate regions that may haveintrogressed from an archaic source(s). For example, we identi-fied a z31.4-kb region within the 4qMB179 locus with highlydiverged haplotypes, one of which is found at low frequency inseveral Pygmy groups in central Africa. We hypothesize that theunusual haplotype descends from an archaic DNA segment thatentered the AMH population via admixture. The observedhaplotype structure is highly unusual (P < 5 × 10−5), even whenwe account for recent population structure or uncertainty inthe underlying recombination rate (Table S4). It is noteworthythat the two ends of the archaic haplotype correspond to re-combinational hotspots in the 4qMB179 region (Fig. 3B), sug-gesting that an initially much longer block of archaic DNA waswhittled down by frequent recombination in the hotspots.Both inferential methods also identified the 13qMB107 locus

as a likely introgression candidate; however, only z7 kb of thesurveyed region contains SNPs that are in high LD, all of whichare found at the 59 end of the sequenced region in two Sanindividuals. To determine whether the length of the unusualpattern of SNPs extends beyond our sequenced region at13qMB107, we examined public full genome sequence data (25).We identified a San individual (!Gubi) who carried one copy ofthe unusual 13qMB107 haplotype and noted a run of heterozy-gous sites that extended an additional z7 kb to the 59 side of oursequenced region. Like the case of 4qMB179, the two ends of theunusual haplotype correspond to recombinational hotspots, andanalysis of 13qMB107 yields an estimated divergence time of z1Mya and a recent introgression time (z20 kya) (Table 1). Thegeographic distribution of the introgressive variant at 18qMB60,a third candidate identified in the three-population model (Table1), is very similar to that of 4qMB179, albeit consistently found atlower frequencies (Fig. 4). On the other hand, the distribution ofthe introgressive variant at 13qMB107 is distinguished from thatof the other two candidate loci by its presence in the San and thesouthern African Xhosa, as well as in Mbuti from the Demo-cratic Republic of Congo. Interestingly, the Mbuti represent theonly population in our survey that carries the introgressive var-iant at all three candidate loci, despite the fact that no Mbuti

Table 1. Three loci that favor an alternative model

Locus Likelihood ratio P value T0 (Mya) Ta (kya) D1 D2 D3 S* P value

13qMB107 44.38 <0.001 1 20 0.1 0.264 2 0.0494qMB179 39.85 <0.001 1.5 20 0 0.366 3 <0.00118qMB60 12.74 0.022 0.75 20 0 0.192 1 >0.05†

The likelihood ratio is defined to be maxψ{L(ψjdata)}/maxψ{L(ψjdata), ψ ˛ H0} and the P value determined witha parametric bootstrap. These values along with parameter values in columns 4–8 refer to results from the three-population model, whereas the S* P values in the last column refer to results from the two-population model.†S* was originally calculated excluding all singletons from the analysis. When we recalculate S* including single-tons, we obtain P < 0.01.

Hammer et al. PNAS | September 13, 2011 | vol. 108 | no. 37 | 15125

ANTH

ROPO

LOGY

Page 4: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

were represented in our initial sequencing survey. Given that theMbuti population is known to be relatively isolated from otherPygmy and neighboring non-Pygmy populations (26), this sug-gests that central Africa may have been the homeland of a now-extinct archaic form that hybridized with modern humans.We have relied on an indirect approach to detect ancient

admixture in African populations because there are no Africanancient DNA sequences to make direct comparisons with ourcandidate loci. As proof of principle that an indirect approachcan be useful, we reexamined the RRM2P4 pseudogene on theX chromosome. Using a similar approximate-likelihood meth-odology, it was previously posited that a divergent allele at thepseudogene introgressed from an archaic taxon in Asia (27, 28).We compared human and Neandertal RRM2P4 sequences andfound that the three derived sites that define the non-African basallineage are shared with Neandertal (Fig. S4). Thus, we verified thatthis unusual human sequence, which is characterized by a deephaplotype divergence and a small basal clade, is indeed shared withan archaic form. Further genome-level (i.e., multilocus) analysiswill also shed light on the process of archaic admixture, which islikely to be more complicated than we have modeled. For instance,the multimodal likelihood surface in Fig. 2 suggests that gene flowamong strongly subdivided populations in Africa may characterizemultiple stages of human evolution in Africa.Our results are consistent with earlier inferences supporting

the role of archaic admixture in sub-Saharan Africa based onanalyses of coding regions (19) and the Xp21.1 noncoding region(16). Although our estimates of isolation and admixture datesare tentative, the results point to relatively recent genetic ex-change with an unknown archaic hominin that diverged from theancestors of modern humans in the Lower-Middle Pleistoceneand remained isolated for several hundred thousand years. De-spite a fragmentary African fossil record, there are plenty ofcandidates for the source(s) of this introgression. Beginningz700 kya, fossil evidence from many parts of Africa indicatethat Homo erectus was giving way to populations with largerbrains, a change that was accompanied by several structuraladjustments to the skull and postcranial skeleton (14). By z200kya, individuals with more modern skeletal morphology begin toappear in the African record (8, 14). Despite these signs of an-atomical and behavioral innovation, hominins with a combina-tion of archaic and modern features persist in the fossil recordacross sub-Saharan Africa and the Middle East until after z35kya (12, 14). Although there is currently a major debate aboutthe meaning of this piecemeal or mosaic-like appearance ofmodern traits for taxonomic classification (12, 29), the evidencepresented here and elsewhere suggests that long-separatedhominin groups exchanged genes with forms that either were in

the process of evolving fully modern features, or were alreadyfully modern in appearance. The emerging geographic pattern ofunusual variants discovered here suggests that one such in-trogression event may have taken place in central Africa (wherethere is a very poor fossil record). Interestingly, recent studiesattest to the existence of Late Stone Age human remains witharchaic features in Nigeria (Iwo Eleru) and the DemocraticRepublic of Congo (Ishango) (30–32). The observation thatpopulations from many parts of the world, including Africa, showevidence of introgression of archaic variants (6, 16, 19) suggeststhat genetic exchange between morphologically divergent formsmay be a common feature of human evolution. If so, hybridiza-tion may have played a key role in the de novo origin of some ouruniquely human traits (33).

Fig. 4. Frequency of introgressive variants within three sequenced regionsin an expanded sample of z500 sub-Saharan Africans (SI Materials andMethods). The filled bar represents the frequency of a variant marking thedivergent haplotype at 4qMB179 (Left), 18qMB60 (Center), and 13qMB179(Right) in each of 14 population samples. Each horizontal line on the barcharts represents a frequency of 5%.

Fig. 3. (A) Schematic of the original (filled bars) and extended sequence data (open bars) for the 4qMB179 locus. The unusual Biaka haplotype extends forz31.4 kb between the vertical dotted lines. (B) Recombinational landscape as inferred from HapMap Phase I data.

15126 | www.pnas.org/cgi/doi/10.1073/pnas.1109300108 Hammer et al.

Page 5: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Materials and MethodsSamples and Regions Sequenced. DNA samples used in this study for rese-quencingwere taken frompublicly available cell lines administeredby theCentred’Étude du Polymorphisme Humain Human Genome Diversity Panel (34), whilesamples used for geographic surveys were described in refs. 22 and 27. In-dividual identifiers for each of these samples are described in Wall et al. (35).Our updated resequence dataset consists of 61 loci in autosomal intergenicregions (36), each z20 kb of primarily single-copy noncoding (i.e., putativelynonfunctional) DNA in regions of medium or high recombination (ρ $ 0.9 cM/Mb) at least 100 kb away from the nearest gene (37). We used a locus triodesign, sequencing three fragments of z2 kb spaced evenly across the region(35). The sample was composed of z16 individuals from each population (withthe exception of the San, which included nine samples), nearly all of which weremales. Although the Hammer et al. (36) dataset includes 30 X-linked loci, wechose not to include them in the current analysis because of the much smallernumber of X chromosomes and the need tomake assumptions about sex-biasedprocesses. Exact locations are available at http://hammerlab.biosci.arizona.edu/ArchadData/PNAS.archad.locusLocationInfo.xls, and genotyping assays and pri-mers are available at http://hammerlab.biosci.arizona.edu/ArchadData/PNAS.primers.doc.

Inferential Approach. Here we implement two types of models, denoted“two-population” and “three-population” model (Fig. 1). Because the two-population model is simpler, it has the advantage of using a broader array ofsummary statistics and allows evaluation over a finer grid of parameterspace. The more complex three-population model is much more computa-tionally expensive, yet has the advantage of considering all three sampledpopulations simultaneously. Our approximate-likelihood approach allows usto investigate the entire grid of parameter space. In contrast, full-likelihoodmethods require computationally intensive techniques (e.g., Markov chainMonte Carlo) that limit analysis of parameter space to regions near localmaxima, whereas Bayesian methods suffer from the need to use priors thatmay be poorly justified in this study.

Coalescent-Based Model Testing. Two-population model. For each pair of sub-Saharan African populations we consider the demographic model describedin Fig. 1A and use a previously published composite-likelihood methodology(18, 19) to estimate parameters ψ = (g1, g2, T1, M) for growth, split time, andmigration rate (see SI Materials and Methods for details of model andmethodology). This method uses information from levels of diversity and thejoint frequency spectrum, but not LD for estimating (composite) likelihoods.For each pair of African populations, we then use the parameters estimatedabove as a null model and test for the presence of additional ancient pop-ulation structure (19). If archaic admixture occurs at a locus, then “archaic”SNPs on introgressed sequences would be in strong pairwise LD. Simulationssuggest that both the number of such SNPs and the total distance spannedby such SNPs are elevated when archaic admixture occurs (21). To exploitthese two observations and to account for the effects of intragenic re-combination (18), we calculate, for each locus, a statistic, S*, shown to besensitive to archaic admixture (18). S* looks for population-specific SNPs thatare in strong LD (e.g., the square correlation r2z1). We determine the sig-nificance of S* values from the actual data by running simulations using thepreviously estimated demographic parameters to obtain a distribution of S*values under the null hypothesis of no (archaic) admixture. Significantly highS* values are interpreted as departures from the null model in the directionof some unknown ancient population structure. The P values across loci arecombined (assuming independence) using the method of Fisher (38).Three-population model. The three-population model has nine parameters,three of which (Ta, T0, and a) are the key parameters for inference (SIMaterials and Methods). Ancestral recombination graphs under a grid ofparameter space are created using the software tool ms (39) to approximatedistributions under several tolerances or bin sizes. All inference is based onapproximate-likelihood computations for the three key summary statistics,D1, D2, D3, as described in SI Materials and Methods. Because the parameterspace of our null model of no admixture is a subspace of the entire pa-rameter space, we can make inference using a likelihood ratio test.

Approximate-likelihood surfaces are generated in two stages. First, wesimulate 5,000 ARGs over a coarse grid of parameter space. This allows us toreduce the parameter space to the null space and to those values within the99% CI of the coarse grid estimates. We then run simulations using 100,000ARGs for each parameter value, storing approximations of the summarystatistic distribution using reduced tolerances. This allows us to performbootstrap and goodness of fit (GOF) tests for larger tolerances by addingempirical probabilities from the simulations.

To test the null model of no admixture, we used tolerances of δ1 = 0.06,δ2 = 0.05, and δ3 = 2 to estimate the approximate likelihood for each locususing local-scale estimates of recombination rate, which yielded an estimateof Ta = 40 kya, T0 = 750 kya, and a = 1% and a log-likelihood ratio of −2.01.To estimate the significance of this value we drew 10,000 points from themaximum-likelihood location under H0 using our 3D histogram and tabu-lated the probability of observing a log-likelihood ratio as small (or smaller)than −2.01 with an archaic split time no more recent than 750 kya.

To better characterize the alternative model we used a two-tiered ap-proach. First we examined a more refined grid of parameter space, and thenwe ran two levels of simulations. In the initial pass we generated 5,000 ARGsper parameter value, and in the second pass we took all of the values withinthe 99% CI and computed 3D histograms of summary statistics for eachparameter value in the manner described above. We then used a parametricbootstrap to address GOF as described in SI Materials and Methods.Estimating recombination rates. The three-population methodology is highlysensitive to recombination rates. For this reason, we chose to estimate local-scale recombination and favored these estimates in our inference over themuch larger-scale estimates of Kong et al. (37). To this end, we used Phase 2.1(40, 41) using two qualitatively different strategies. The first uses HAPMAPYoruba data (42), and the second estimates ρ using the major clade of eachlocus in our own resequencing data (SI Materials andMethods). We chose themajor clade because recently introgressed archaic lineages, if they exist, serveas a barrier to recombination and thus will bias estimates of ρ downward.Locus-specific analyses. To infer the split and admixture times for individualintrogressive candidates, we calculate the probability of observing thenumber of completely correlated sites for the relevant population data (e.g.,the Biaka for 4qMB179), assuming panmixia, as a function of the underlyingscaled recombination parameter ρ (Table S4). To estimate admixture time(Ta), we first estimate the minimum length of the diverged haplotype. Usingthe genetic map of Kong et al. (37), we then estimated the total re-combination rate for the diverged haplotype. Given an admixture event ggenerations ago, the distribution of lengths of inherited chromosomalsegments roughly follows an exponential distribution with mean geneticdistance 1/g. It follows that the maximum-likelihood estimate for the time ofadmixture is Ta = 1/g generations ago, with 95% CI (0.0253 Ta–3.69 Ta)generations. We assume a mean generation time of 25 y. Note that we havenot accounted for the additional uncertainty in estimating ρ.

We use polymorphism data along with an outgroup (orang-utan) se-quence to estimate the split time T0. We assume that exactly one archaicsequence was introduced into the modern gene pool, leaving the observednumber of descendant sequence(s) in the divergent haplotypes. Our generalapproach was to tabulate polymorphic sites and fixed differences, notingwhether the SNPs were polymorphic (or fixed) in the archaic or the modernsequences. We then run coalescent simulations (39) to estimate the likeli-hood of observing the actual numbers of fixed differences and poly-morphisms of different categories, as a function of T0, Ne,, μ (the mutationrate), ρ, and Ta. By simulating over a grid of values with increments 0.25 Myrfor T0, 1,000 for Ne, 1 × 10−9/bp for μ, 2.5 × 10−9 for ρ, and 0.04 Ne, gen-erations for Tα, we estimate a profile likelihood curve for T0. A total of 104

replicates for each parameter combination were sufficient to accuratelyestimate the likelihood.

ACKNOWLEDGMENTS. We thank J. Cahill and collaborators, includingG. Destro-Bisol, T. Jenkins, H. Soodyall, and L. Louie who donated DNAsamples. This research was funded by National Science Foundation HOMINIDGrant BCS-0423670 (to M.F.H. and J.D.W.).

1. Coppa A, Grun R, Stringer C, Eggins S, Vargiu R (2005) Newly recognized Pleistocenehuman teeth from Tabun Cave, Israel. J Hum Evol 49:301e315.

2. Grün R, et al. (2006) ESR and U-series analyses of enamel and dentine fragments ofthe Banyoles mandible. J Hum Evol 50:347e358.

3. Klein RG (2009) Darwin and the recent African origin of modern humans. Proc NatlAcad Sci USA 106:16007e16009.

4. Morwood MJ, et al. (2004) Archaeology and age of a new hominin from Flores ineastern Indonesia. Nature 431:1087e1091.

5. Stringer C (2007) The origin and dispersal of Homo sapiens: Our current state ofknowledge. Rethinking the Human Revolution, eds Mellars P, Boyle K, Bar-Yosef O,Stringer C (McDonald Institure for Archaeological Research, Cambridge, UK).

6. Green RE, et al. (2010) A draft sequence of the Neandertal genome. Science 328:710e722.7. Reich D, et al. (2010) Genetic history of an archaic hominin group from Denisova Cave

in Siberia. Nature 468:1053e1060.8. McDougall I, Brown FH, Fleagle JG (2005) Stratigraphic placement and age of modern

humans from Kibish, Ethiopia. Nature 433:733e736.

Hammer et al. PNAS | September 13, 2011 | vol. 108 | no. 37 | 15127

ANTH

ROPO

LOGY

Page 6: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

9. Stringer C (2011) The chronological and evolutionary position of the Broken Hillcranium. Am J Phys Anthropol 144(Suppl 52):287.

10. Tattersall I (2003) Once we were not alone. Sci Am (Summer):20e27.11. Wood B (2002) Hominid revelations from Chad. Nature 418:133e135.12. Brauer G (2008) The origin of modern anatomy: By speciation or intraspecific evo-

lution? Evol Anthropol 17:22e37.13. Klein RG (2000) The Earlier Stone Age of southern Africa. S Afr Archaeol Bull 55:

107e122.14. Rightmire GP (2009) Out of Africa: Modern human origins special feature: Middle and

later Pleistocene hominins in Africa and Southwest Asia. Proc Natl Acad Sci USA 106:16046e16050.

15. Trinkaus E (2005) Early modern humans. Annu Rev Anthropol 34:207e230.16. Garrigan D, Mobasher Z, Kingan SB, Wilder JA, Hammer MF (2005) Deep haplotype

divergence and long-range linkage disequilibrium at xp21.1 provide evidence thathumans descend from a structured ancestral population. Genetics 170:1849e1856.

17. Nordborg M (2001) On Detecting Ancient Admixture. Genes, Fossils and Behaviour:An Integrated Approach to Human Evolution, NATO Science Series: Life Sciences, edsDonnelly P, Foley RA (IOS Press, Amsterdam), Vol 310.

18. Plagnol V, Wall JD (2006) Possible ancestral structure in human populations. PLoSGenet 2:e105.

19. Wall JD, Lohmueller KE, Plagnol V (2009) Detecting ancient admixture and estimatingdemographic parameters in multiple human populations. Mol Biol Evol 26:1823e1827.

20. Durbin RM, et al.; 1000 Genomes Project Consortium (2010) A map of human genomevariation from population-scale sequencing. Nature 467:1061e1073.

21. Wall JD (2000) Detecting ancient admixture in humans using sequence polymorphismdata. Genetics 154:1271e1279.

22. Veeramah KR, et al. (2011) An early divergence of KhoeSan ancestors from those ofother modern humans is supported by an ABC-based analysis of autosomal re-se-quencing data. Mol Biol Evol, in press.

23. Cox MP, et al. (2009) Autosomal resequence data reveal Late Stone Age signals ofpopulation expansion in sub-Saharan African foraging and farming populations. PLoSONE 4:e6366.

24. Voight BF, et al. (2005) Interrogating multiple aspects of variation in a full re-sequencing data set to infer human population size changes. Proc Natl Acad Sci USA102:18508e18513.

25. Schuster SC, et al. (2010) Complete Khoisan and Bantu genomes from southern Africa.Nature 463:943e947.

26. Patin E, et al. (2009) Inferring the demographic history of African farmers and pygmyhunter-gatherers using a multilocus resequencing data set. PLoS Genet 5:e1000448.

27. Cox MP, et al. (2008) Testing for archaic hominin admixture on the X chromosome:Model likelihoods for the modern human RRM2P4 region from summaries of gene-alogical topology under the structured coalescent. Genetics 178:427e437.

28. Garrigan D, Mobasher Z, Severson T, Wilder JA, Hammer MF (2005) Evidence for ar-chaic Asian ancestry on the human X chromosome. Mol Biol Evol 22:189e192.

29. Tattersall I, Schwartz JH (2008) The morphological distinctiveness of Homo sapiensand its recognition in the fossil record: Clarifying the problem. Evol Anthropol 17:49e54.

30. Allsworth-Jones P, Harvati K, Stringer C (2010) The archaeological context of the IwoEleru cranium from Nigeria, and preliminary results of new morphometric studies.West African Archaeology, New Developments, New Perspectives, ed Allsworth-Jones P (British Archaeological Reports International Series S2164, Oxford), pp 29e42.

31. Crevecoeur I, Semal P, Cornelissen E, Brooks AS (2010) The Late Stone Age humanremains from Ishango (Democratic Republic of Congo): Contribution to the study ofthe African Late Pleistocene modern human diversity. Am J Phys Anthropol 141(Suppl50):87.

32. Stringer C, Harvati K, Allsworth-Jones P, Grün R, Adebayo Folorunso C (2010) Newresearch on the Iwo Eleru cranium from Nigeria. Am J Phys Anthropol 141(Suppl 50):225e226.

33. Arnold ML, Sapir Y, Martin NH (2008) Review. Genetic exchange and the origin ofadaptations: Prokaryotes to primates. Philos Trans R Soc Lond B Biol Sci 363:2813e2820.

34. Cann HM, et al. (2002) A human genome diversity cell line panel. Science 296:261e262.

35. Wall JD, et al. (2008) A novel DNA sequence database for analyzing human de-mographic history. Genome Res 18:1354e1361.

36. Hammer MF, et al. (2010) The ratio of human X chromosome to autosome diversity ispositively correlated with genetic distance from genes. Nat Genet 42:830e831.

37. Kong A, et al. (2002) A high-resolution recombination map of the human genome.Nat Genet 31:241e247.

38. Mosteller F, Fisher RA (1948) Questions and answers #14. The American Statistician 2:30e31.

39. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of ge-netic variation. Bioinformatics 18:337e338.

40. Crawford DC, et al. (2004) Evidence for substantial fine-scale variation in re-combination rates across the human genome. Nat Genet 36:700e706.

41. Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying re-combination hotspots using single-nucleotide polymorphism data. Genetics 165:2213e2233.

42. Frazer KA, et al.; International HapMap Consortium (2007) A second generationhuman haplotype map of over 3.1 million SNPs. Nature 449:851e861.

15128 | www.pnas.org/cgi/doi/10.1073/pnas.1109300108 Hammer et al.

Page 7: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Supporting InformationHammer et al. 10.1073/pnas.1109300108SI Materials and MethodsTwo-Population Model. Estimating demographic parameters. For eachpair of sub-Saharan African populations we consider the fol-lowing demographic model: an ancestral panmictic populationhaving effective population size, Ne = 104 splits at time T1 intotwo descendant panmictic populations each also having Ne =104. Given a per-generation migration rate m, these descendantpopulations exchange migrants at the scaled migration rate M =4 Ne m until the present day. The two populations have 100-foldpopulation growth starting at times g1 and g2, respectively (Fig.1A, main text).We use previously published composite-likelihood methodol-

ogy (1, 2) to estimate parameters ψ = (g1, g2, T1, M). Thismethod uses information from levels of diversity and the jointfrequency spectrum—but not linkage disequilibrium (LD)—forestimating (composite) likelihoods. Likelihoods are calculatedover a grid of parameter values, with increments of 2,000 y for g1and g2, 5,000 y for T1, and 1 for M. Scaled recombination rateswere assumed to be fixed within loci but to vary across loci,within-locus recombination rates were chosen from a Γ distri-bution with mean equal to half of the average mutation rate asestimated by θW (3). We ran 5 ×105 simulations for each pa-rameter combination.First, we simulated 15 replicates under each of the following

three scenarios: ψ1 = (0, 4, 450, 10), ψ2 = (0, 4, 35, 5), and ψ3 =(0, 4, 25, 0). The first scenario corresponds to the Mandenka-Biaka maximum-likelihood estimates from the data, whereas ψ2and ψ3 are comparable parameter values (with the same valuesof g1 and g2) that produce roughly the same average value of FST.A summary of the simulation results is shown in Table S1. Wenote that the approximate 95% confidence intervals (CIs) (basedon asymptotic likelihood assumptions) cover the true parametervalue roughly 93% of the time (167 of 180), which suggests thatCIs based on standard assumptions are reasonably accurate. Foranalyzing the actual data, we empirically determined a newlikelihood-ratio cutoff value for estimating 95% CIs. This cutoff(which takes log-likelihood values within 2.8 of the maximum-likelihood estimate) has the correct coverage level for the sim-ulated data.Detecting archaic admixture. For each pair of African populations,we used the parameters estimated above as a null model andtested for the presence of additional ancient population structure(2). If archaic admixture occurs at a locus, then “archaic” SNPson introgressed sequences would be in strong LD with eachother. Simulations suggest that both the number of such SNPsand the total distance spanned by such SNPs are elevated whenarchaic admixture occurs (4). To exploit these two observations,and to account for the effects of intragenic recombination (1), wecalculated, for each locus, a statistic, S*, shown to be sensitive toarchaic admixture (1). S* looks for population-specific SNPs(excluding singletons) that are in strong LD with each other(e.g., the square correlation r2 z 1). We determine the signifi-cance of S* values from the actual data by running simulationsusing the previously estimated demographic parameters to ob-tain a distribution of S* values under the null hypothesis of no(archaic) admixture. Significantly high S* values are interpretedas departures from the null model in the direction of some un-known ancient population structure. We estimate P values foreach locus by running 104 simulations under the null model. TheP values across loci were combined (assuming independence)using the method of Fisher (5).

Three-Population Model. To more closely model our populationsampling strategy, we introduce a second, more comprehensivethree-population model (Fig. 1B, main text). Our goal is to es-timate simultaneously the time of admixture (Ta), the ancestralsplit time (T0), and the admixture proportion (a). Our approachhas several modeling assumptions, including that the San areancestral to the Mandenka and Biaka (6), that the migration ratebetween all three populations is symmetric and constant, thatrecent population growth leads to a 100-fold increase in effectivepopulation size, and that generation time is 25 y The model isspecified by the parameters ψ = (NA, T1, T2, g1, g2, M, Ta, T0, a),where

� NA is the ancestral effective population size,� T1 is the time when the San split from the Biaka-Mandenka,� T2 is the time when the Biaka and Mandenka split,� g1 is the time since the start of population growth in the San,� g2 is the time since the start of population growth in theBiaka and Mandenka, and

� M is the scaled migration rate.

Summary Statistics. To identify candidate introgressed sequences,we adopt the following approach. For each locus, we cluster allsequences into two (putatively basal) groups, G1 and G2, asfollows:

1. Identify the two most diverged sequences.2. Assign the remaining sequences to one of two groups ac-

cording to genetic similarity to the two individuals identifiedin step 1.

3. For a tie in step 2, calculate the average genetic distancebetween the target individual and all individuals in eachgroup. Assign membership to the closer group. In case ofa tie, assign group membership randomly.

Then, define the statistics:

� Kmax, the number of differences between the sequences cho-sen in step 1,

� Ss, the number of polymorphisms shared between the twogroups,

� S, the total number of polymorphisms, and� d, the number of fixed differences between human andchimpanzee sequence.

We now define our summary statistics for inference D1 = Ss/S,D2 = Kmax/d, and D3 = min {jG1j, jG2j}.Because our null model of no admixture, H0, is a subspace of

our alternative model of admixture, H1, we can make inferenceusing likelihood ratio tests. Further, we can use χ23, the χ2 sta-tistic with three degrees of freedom, as a test statistic for thedifference in log-likelihood values under H0 and H1. This isa conservative approximation, however, because the null spacerepresents a corner of our alternative space. Unless otherwisestated, P values are those that come from this approximation.We approximate the likelihood of the summary statistics D =

(D1, D2, D3) using tolerance levels δ= (δ1, δ2, δ3). Thus, for eachset of model parameters ψ we estimate

Prψfjd1 −D1j< δ1; jd2 −D2j< δ2; jd3 −D3j< δ3g;where d1, d2, and d3 are calculated from data simulated underthe parameter values ψ. The initial tolerances were selected tomaximize power for 1% admixture. Loci are assumed to be in-

Hammer et al. www.pnas.org/cgi/content/short/1109300108 1 of 8

Page 8: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

dependent, so likelihoods for the full data are computed as theproduct of the 61 locus-specific likelihoods.To simulate the data to compute approximate likelihoods, we

first need to determine fine-scale estimates of recombination. Tothis end, we used Phase 2.1 (7, 8) using two qualitatively differentstrategies. We examined genotypes for our loci in 30 HAPMAP(9) Yoruba parent–offspring trios. For each trio we replacedgenotype calls at SNPs showing Mendelian inconsistencies with“missing data.” Using parental genotypes, we constructed Phasefiles for each locus, adding an additional 10 kb of flanking data tomitigate any possible edge effects in our estimate of ρ. Using allindividuals to estimate ρ, we used a 10,000-step burn-in andsampled 100,000 points in the posterior. We then estimated ρ foreach locus according to the median per-SNP ρ estimates fromPhase. To validate this approach we performed 100 coalescentsimulations using ms (10) of a single Wright-Fisher populationusing a recombination rate of 1 cM/Mb over 40 kb of sequence,and then we ran Phase on these simulations. We contrastedcomputing the mean vs. the median recombination rate esti-mates, to the per-SNP and to the per-locus estimates, and foundthat the per-SNP median estimate better recovered, althoughslightly underestimated, the simulated value.In the second strategy we used Phase to estimate ρ using the

major clade of each locus in our own resequencing data using thesame Phase parameters as above. This approach is, in general,less powerful because of the locus trio design. With only z6 kbof data collected over a z20-kb genomic window, the ability toinfer the recombination rates will be hampered by the smallnumber of segregating sites. In addition, we are restrained to theassumption that ρ at the locus trio is constant across a 20-kbregion. We used the same validation approach as in the firststrategy, modified to run Phase on the major clade of eachsimulation using an archetype locus-trio design, and again foundthat the median per-SNP estimate better recovered the simu-lated value. Despite numerous attempts, Phase failed to com-plete on locus 1pMB4, and thus this locus was dropped from allsubsequent analyses.

Rejecting the Null Hypothesis. We simulated ancestral recom-bination graphs (ARGs) over a grid of parameter values to es-timate each locus’s approximate likelihood using the re-combination rate estimates described above and tolerances δ1 =0.06, δ2 = 0.05, and δ3 = 2. Parameter values ranged from 6,000to 16,000 for NA, 60 to 120 kya for T1, 30 to 60 kya for T2, 20 to40 kya for g1 and g2, 0 to 10 for M, 10 to 100 kya for Ta, 0.125 to1.5 Mya for T0, and 0 to 8% for a.To provide a coarse-grained likelihood surface, we generated

5,000 ARGs over a reduced grid of the parameter space. We usea goodness of fit (GOF) test to identify loci for finer-scale esti-mation. This yielded three loci (4qMB105, 16pMB17, and13qMB64) with poor fit GOF (P < 0.05) across our entire pa-rameter space, with P values of 0.022, 0.015, and 0.026, re-spectively. These three loci were then rerun using the majorclade fine-scale estimate of recombination, and all three ex-hibited improved GOF, with P values of 0.400, 0.145, and 0.526,respectively. Further, in the initial run, two additional loci,13qMB107 and 18qMB73, had fine-scale estimates of recom-bination that were exceedingly high (estimates for ρ per locus are147.97 and 118.73, respectively), leading to coalescent runtimesthat were prohibitively long. For all future simulations the majorclade estimate was used for these loci. (Estimates for ρ per locusare 95.46 and 107.28, respectively.)To obtain a more refined point estimate, we reduced the pa-

rameter space to the null space and to those values within the 99%CI of the coarse-grain estimates. We then ran simulations using100,000ARGs for each parameter value. In addition, we store, foreach parameter value, an approximation of the summary statistic

distribution in a 3D histogram (for our three summaries) usinga reduced tolerances δ1 = 0.01, δ2 = 0.01, and δ3 = 0.The result is a maximum-likelihood estimate of Ta= 40 kya,

T0 = 750 kya, and a = 1% with a log-likelihood ratio of −2.01. Toestimate the significance of this value we drew 10,000 points fromthe maximum-likelihood location under H0 using our 3D histo-gram and tabulated the probability of observing a log-likelihoodratio as small (or smaller) than −2.01 with an archaic split time nomore recent than 750 kya. The bootstrapped P value for this is0.0493, allowing us to reject the null hypothesis. Although this Pvalue is only marginally significant, as seen in the sections thatfollow, more refined analyses yield even smaller P values underthe conservative χ2 approximation of the likelihood ratio test.

Describing H1. We chose two different approaches to describingour alternative model. The first, and simplest, uses the minimumtolerances for each of the summary statistics, D1, D2, and D3,keeping the other two at their original tolerances. This gave usthree sets of three likelihood profiles (for each of the three ad-mixture parameters). Minimizing the tolerance for δ1 best re-stricted the parameter space. Under this method, the pointestimates are: T0 = 375 kya, Ta = 20 kya, and a = 2% with a log-likelihood ratio of −4.14 (P = 0.04). Moreover, this method al-lowed us to estimate the following 95% CIs for T0, Ta, a: 125 kya <T0 < 1.5 Mya, 0 < Ta < 70 kya, and 0 < a < 1.There was one exception to this analysis. The log-likelihood

difference between the parameter value for Ta = 100 kya (T0 = 1Mya, a= 0.5%) and the maximum is −1.915, marginally inside ofour CIs based on the χ2 approximation. To assess the accuracy ofthis approximation, we drew 10,000 samples from this point inthe alternative space and estimated the probability of observinga maximum log-likelihood ratio at or more extreme than −1.915at an introgression time at most 20 kya. The bootstrappedprobability of this occurring by chance is 0.021, allowing us toplace this single point in the alternative model outside of our95% CI. As seen in Fig. 2 (main text) and Fig. S5, the alternativespace can best be described as multimodal.Custom tolerances. From our bootstrap analysis, we found thatlocus-specific critical values are largely determined by the basalrecombination rate. More precisely, loci with higher recom-bination rates required much smaller likelihood ratios to rejectH0. To determine optimal tolerance values to discriminate be-tween values in the parameter space, given a fixed number ofARGs, we chose at random 100 parameter values for each locus.For each pair of values, we evaluated tolerance levels from theminimal tolerance up to our original level of acceptable toler-ance. We then asked the question: what is the level of tolerancethat maximizes our discriminatory power given that 1% of thepoints will yield an observed likelihood of 0?Applying these custom tolerances to our loci yielded pro-

nounced evidence of two distinct maxima: T0 = 375 kya, Ta = 10kya, and a = 0.5% and T0 = 750 kya, Ta = 40 kya, and a = 2%,with essentially equal log-likelihood values of −468.48 and−468.67, respectively, and the former yielding a log-likelihoodratio of −5.02 (P < 0.02). Four loci (1pMB101, 12qMB46,5pMB35, and 5qMB123) had fewer than 10 ARGs that matchedtheir empirical values in either of the maxima, and for these locian additional 100,000 ARGs were generated, elevating theminimum number of matching simulations to 10 for all loci. Thisslightly adjusted the likelihood surface, favoring instead theolder archaic split time as the maximum likelihood estimate(likelihoods of −468.78 and −468.51), giving a likelihood ratio of−5.00, P < 0.02 (Fig. 2, main text). The same strategy was used toelevate the minimum number of matching ARGs to 20, this timefavoring the local maxima T0 = 500 kya, Ta = 20 kya, and a =2% and T0 = 750 kya, Ta = 40 kya, and a = 2%, with the first ofthe two points moving perhaps more than expected.

Hammer et al. www.pnas.org/cgi/content/short/1109300108 2 of 8

Page 9: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

At least 20 matches. To test whether this movement was due to thesampling variance associated with estimating exceedingly smalllikelihoods, we designed an iterative variant to the above pro-cedure designed ensure that all loci have at least 20 matchingARGs for each point within our 95% CI using the χ2 approxi-mation. To accommodate this, we used our initial set of customtolerances, and rather than keeping the tolerances static andadding more simulations as needed, we instead relaxed toler-ances for all loci having fewer than 20 matches and looked at theminimum number of matching ARGs over the 95% confidenceregion. The 2D likelihood surface shows two distinct maxima:T0 = 625 kya, Ta = 30 kya, and a = 3% and T0 = 250 kya, Ta =10 kya, and a = 5%, the latter of which has estimated timessimilar to those discovered in our two population approach.

Goodness of Fit.We used a parametric bootstrap to address GOF.In particular, we drew 1,000 samples from our 3D histogram foreach locus for both maxima in H1. We then estimated the like-lihood of each of our 1,000 samples and calculated the proba-bility of our empirical likelihood value in this distribution. Thisgenerated a probability value for each locus, and these proba-bility values were combined using the method of Fisher (5) togive a single GOF P value for the data set. This procedure wasrun on the at least 20 matches maxima, yielding GOF P values of0.059 and 0.071 for the earlier and later archaic split maxima,respectively. These P values are conservative, because any max-ima we find will only be a maximum with respect to our pa-rameter space discretization; finer discretization will likely resultin higher maxima and, thus, in an improved fit of the model.Uncertainties in our recombination rate estimates also influencethe fit. Notably, results that are based on the deCODE estimatesof recombination, which are estimated over much larger physicaldistance, produced a substantially smaller GOF (P < 10−4).

Likelihood Ratios of Individual Loci. Although this approachexamines the likelihood of the set of 61 loci together, it also can beused to evaluate whether a particular locus better fits the alter-native model. To identify individual loci that are likely to harbor

archaic lineages, we allow all nine parameters to vary freelyamong loci. In addition, rather than selecting points from the 3Dhistogram, which are only defined for our initial estimate of the99% CI for all loci together, we instead selected our maxima andcalculated our bootstrapped P values from our original coarsescan of the parameter space. Table 1 (main text) describes thethree loci exhibiting the lowest P value.

Describing Two Maxima. Throughout our attempts to describe thealternative space we have seen pronounced evidence for twopeaks in our likelihood surface: one with more recent timecharacteristics (ψrecent), with T0 z 375 kya and Ta z 15 kya andthe other at an older time (ψold), T0 z 700 kya and Ta z 35 kya.This leads to the question: do some loci favor one maximum overthe other, and if so, which ones? To address this we compute thelikelihood ratio:

LðψoldjdataÞ=LðψrecentjdataÞfor each locus (Fig. S2) using the approach guaranteeing at least10 matching simulations for each locus. Notably, the three locithat individually favor H1 (Table 1, main text) are among fourmost extreme likelihood ratios.

Genotyping Candidate Alleles. A sample of z500 individuals from14 sub-Saharan African populations was genotyped at a singleinsertion and two SNPs that marked divergent alleles at the threeloci exhibiting the lowest P value in the likelihood test describedabove. A 4-nt insertion (GCCA) at position 179598847 (hg18)within 4qMB179 was genotyped by using an allele-specific PCR.We obtained the DNA sequence of all samples containing theinsertion to confirm heterozygosity. A G/A nucleotide poly-morphism site at 107495053 (hg18) within 13qMB107 was gen-otyped via a PCR and subsequent restriction enzyme digestion(ApoI, NEB catalog no. R0566). An A/G nucleotide poly-morphism site at site 60718922 (hg18) within 18qMB60 wasgenotyped via a PCR and subsequent restriction enzyme di-gestion (DdeI, NEB catalog no. R0175).

1. Plagnol V, Wall JD (2006) Possible ancestral structure in human populations. PLoSGenet 2:e105.

2. Wall JD, Lohmueller KE, Plagnol V (2009) Detecting ancient admixture and estimatingdemographic parameters in multiple human populations. Mol Biol Evol 26:1823e1827.

3. Watterson GA (1975) On the number of segregating sites in genetical models withoutrecombination. Theor Popul Biol 7:256e276.

4. Wall JD (2000) Detecting ancient admixture in humans using sequence polymorphismdata. Genetics 154:1271e1279.

5. Mosteller F, Fisher RA (1948) Questions and answers #14. The American Statistician2:30e31.

6. Wall JD, et al. (2008) A novel DNA sequence database for analyzing humandemographic history. Genome Res 18:1354e1361.

7. Crawford DC, et al. (2004) Evidence for substantial fine-scale variation inrecombination rates across the human genome. Nat Genet 36:700e706.

8. Li N, Stephens M (2003) Modeling linkage disequilibrium and identifyingrecombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213e2233.

9. Frazer KA, et al.; International HapMap Consortium (2007) A second generationhuman haplotype map of over 3.1 million SNPs. Nature 449:851e861.

10. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model ofgenetic variation. Bioinformatics 18:337e338.

Hammer et al. www.pnas.org/cgi/content/short/1109300108 3 of 8

Page 10: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Fig. S1. Quantile–quantile plots for the P values for S*, calculated for each locus. Results are shown for the (A) Biaka and (B) San, in pairwise analyses with theMandenka.

Fig. S2. The likelihood ratios, L(ψoldjdata)/L(ψrecentjdata), of each locus in the distinct two maxima in H1. The loci 18qMB60, 13qMB107, and 4qMB179 all favorHold, and all three are loci, individually, favor the H1.

Hammer et al. www.pnas.org/cgi/content/short/1109300108 4 of 8

Page 11: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Fig. S3. Schematic of a simple isolation model (A) and 4qMB179 profile likelihood curves (B) for estimates of the split time T0 and the admixture time Ta, inthousands of years ago.

Fig. S4. (A) Sharing of SNPs between Neandertal and human divergent lineage at RRM2P4. (B) Asterisk indicates sharing of derived state in Neandertal andhuman divergent lineage on the phylogenetic tree shown in Garrigan et al. (1).

1. Garrigan D, Mobasher Z, Severson T, Wilder JA, Hammer MF (2005) Evidence for archaic Asian ancestry on the human X chromosome. Mol Biol Evol 22:189e192.

Hammer et al. www.pnas.org/cgi/content/short/1109300108 5 of 8

Page 12: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Fig. S5. Likelihood profiles for the archaic admixture parameters for (A) amount of admixture, (B) archaic split time, and (C) time of introgression. Horizontalline represents the 95% CI cutoff using the χ2 approximation. The Ta = 100 kya point was shown to be outside of our confidence region using a parametricbootstrap.

Table S1. Point estimates (simulation-based 95% CI) for theactual data

Parameter Man–Bia Man–San Bia–San

g1 (kya) 0 (0–5.2) 0 (0–5.5) 10 (0–22)g2 (kya) 4 (0–11) 2 (0–11) 4 (0–20)T (kya) 450 (280–690) 100 (64–500) 55 (40–230)M 10 (8.2–12) 3 (1.6–4.2) 1.5 (0–5.3)

Table S2. Mean values of parameter estimates on simulated data (g1 = 0, g2 = 4 kya)

Model T1 = 25, M = 0 T1 = 35, M = 5 T1 = 450, M = 10

g1 0.9 2.3 2.4g2 4.1 5.9 7.7T1 25 44 580M 1.1 4.3 9.6Coverage* (%) 97 88 93

*Coverage denotes the fraction of times that the estimated 95% CIs contained the true parameter value.

Hammer et al. www.pnas.org/cgi/content/short/1109300108 6 of 8

Page 13: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Table

S3.

Biaka

4qMB17

9hap

lotypes

179717925

179718025

179718164

179718283

179718420

179718491

179718492

179718621

179718977

179719030

179719130

179719260

179727869

179727870

179727922

179728023

179728221

179728254

179728314

179728525

179728711

179728796

179728857

179729161

179729221

179729231

179729281

179729444

179729451

179729541

179736287

179736363

179736378

179736411

179736422

179736437

179736564

179736762

179736898

179736963

179737002

179737179

179737276

179737345

179737389

179737426

179737436

179737452

179737532

179737646

179737685

179737719

179738071

179738125

Biaka

Ht

nA

CG

TG

CA

CC

AT

AC

AA

AG

CA

CA

CA

AT

GC

TA

AG

GA

GC

CC

TC

T—

GC

GC

AC

AG

CG

CC

TB1

1C

.T

..

NN

NN

N.

.T

GG

..

.T

..

.G

..

..

CG

..

CG

..

..

..

AGCCA

TA

.T

.T

..

T.

..

CB2

1C

.T

..

..

N.

..

.T

GG

..

.T

..

.G

..

..

C.

..

CG

..

..

..

AGCCA

TA

..

.T

..

T.

.A

CB3

1C

.T

..

..

N.

..

.T

GG

..

.T

..

.G

..

..

C.

..

CG

..

..

..

AGCCA

TA

.T

.T

..

T.

..

CB4*

9.

T.

..

.G

..

G.

..

..

G.

..

T.

..

—.

..

..

GA

..

A.

..

C.

..

..

..

..

G.

..

..

.B5†

1.

T.

..

.G

T.

G.

..

..

G.

..

..

..

—.

..

..

GA

..

A.

..

C.

..

..

..

..

G.

..

..

.B6

1.

T.

..

.G

..

G.

..

..

G.

..

..

..

—.

..

..

GA

..

..

..

C.

..

..

..

..

G.

..

..

.B7

4.

T.

..

TG

..

G.

..

..

G.

..

..

..

—.

.T

..

.A

..

..

.T

.G

..

..

A.

G.

.A

.A

T.

.B8

3.

..

..

.G

.—

G.

..

..

G.

..

..

..

—.

AT

..

.A

..

..

.T

..

..

..

A.

G.

..

.A

T.

.B9†

1.

..

..

.G

.N

G.

..

..

G.

..

..

..

—.

.T

..

.A

..

..

..

C.

..

..

..

..

G.

..

..

.B10

1.

..

..

.G

.—

G.

..

..

G.

..

.G

..

—.

..

..

GA

..

.T

..

C.

..

..

..

..

G.

..

..

.B11

*1

..

..

..

G.

—G

..

..

.G

A.

..

..

.—

C.

..

..

A.

..

.T

.C

..

..

..

..

.G

..

..

..

B12

*3

..

..

..

G.

—G

..

..

.G

.T

..

..

.—

..

..

..

A.

..

..

..

..

..

..

..

..

..

..

..

B13

2.

T.

CA

.G

..

G.

C.

..

G.

..

..

—.

..

..

..

GA

..

..

T.

C.

..

..

..

..

G.

..

..

NB14

1.

T.

..

.G

..

GC

..

..

G.

..

..

—.

..

..

..

GA

..

..

..

C.

..

..

..

..

G.

..

..

.Bonobo

AC

NT

GC

AC

CA

TA

CA

AA

GC

AC

AC

AA

TG

CT

AA

GG

AG

CC

CT

CT

—G

CG

CC

CA

GC

GC

CT

Chim

pA

CG

TG

CA

CC

AT

AC

AA

AG

CA

CA

CA

AT

GC

TA

AG

GA

GC

CC

TC

T—

GC

KC

CC

AG

CG

CC

TGorilla

AC

GT

GC

GC

CA

TA

YR

AT

GC

AC

AC

AA

TG

AT

AA

GG

AG

CC

CT

CT

—G

CG

CA

CA

GC

GC

CT

Orang

AC

GT

GT

GC

CA

TA

NN

NN

GC

AT

AC

A—

TG

CT

AA

GG

AG

CC

CT

CT

—G

CG

CA

CA

GT

GC

CT

*Shared

withMan

den

kaan

dSa

n.

†Sh

ared

withMan

den

ka.

Hammer et al. www.pnas.org/cgi/content/short/1109300108 7 of 8

Page 14: Genetic evidence for archaic admixture in Africabiology-web.nmsu.edu/~houde/archaic Homo admixture IN Africa.pdf · H. sapiens | hybridization It is now well accepted that anatomically

Table S4. Probability of /b3 37 for the Biaka data as a function of ρ

ρ/kb Pr (/b3 37)

0.00 0.2710.25 5.8 * 10−3

0.50 3.7 * 10−4

0.75 3.0 * 10−5

1.00 8. * 10−6

1.25 1.4 * 10−6

1.50 << 10−6

lb refers to the maximum numbers of pairwise congruent sites (1).

1. Wall JD, Lohmueller KE, Plagnol V (2009) Detecting ancient admixture and estimating demographic parameters in multiple human populations. Mol Biol Evol 26:1823e1827.

Hammer et al. www.pnas.org/cgi/content/short/1109300108 8 of 8