new mexico state university - genome-wide patterns of …lithornis.nmsu.edu/~phoude/population...
TRANSCRIPT
Genome-wide patterns of population structure andadmixture among Hispanic/Latino populationsKatarzyna Bryca,1, Christopher Velezb,1, Tatiana Karafetc, Andres Moreno-Estradaa,d, Andy Reynoldsa, Adam Autona,2,Michael Hammerc, Carlos D. Bustamantea,d,3,4, and Harry Ostrerb,3,4
aDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14850; bHuman Genetics Program, Department of Pediatrics,New York University School of Medicine, New York, NY 10016; cArizona Research Laboratories Division of Biotechnology and Department of Ecology andEvolutionary Biology, University of Arizona, Tucson, AZ 85721; and dDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305
Hispanic/Latino populations possess a complex genetic structurethat reflects recent admixture among and potentially ancientsubstructure within Native American, European, and West Africansource populations. Here, we quantify genome-wide patterns ofSNP and haplotype variation among 100 individuals with ancestryfrom Ecuador, Colombia, Puerto Rico, and the Dominican Republicgenotyped on the Illumina 610-Quad arrays and 112 Mexicansgenotyped on Affymetrix 500K platform. Intersecting these datawith previously collected high-density SNP data from 4,305 individ-uals, we use principal component analysis and clustering methodsFRAPPE and STRUCTURE to investigate genome-wide patterns ofAfrican, European, and Native American population structurewithin and amongHispanic/Latino populations. Comparing autoso-mal, X and Y chromosome, and mtDNA variation, we find evidenceof a significant sex bias in admixture proportions consistent withdisproportionate contribution of European male and Native Amer-ican female ancestry to present-day populations. We also find thatpatterns of linkage-disequilibria in admixed Hispanic/Latino popu-lations are largely affected by the admixture dynamics of the pop-ulations, with faster decay of LD in populations of higher Africanancestry. Finally, using the locus-specific ancestry inferencemethodLAMP, we reconstruct fine-scale chromosomal patterns of admix-ture.We documentmoderate power to differentiate among poten-tial subcontinental source populations within the Native American,European, and African segments of the admixed Hispanic/Latinogenomes. Our results suggest future genome-wide associationscans inHispanic/Latinopopulationsmayrequire correction for localgenomic ancestry at a subcontinental scalewhen associating differ-ences in the genome with disease risk, progression, and drug effi-cacy, as well as for admixture mapping.
human genomics | population genetics | Hispanic/Latino
The term “Hispanic/Latinos” refers to the ethnically diverseinhabitants of Latin America and to people of Latin Amer-
ican descent throughout the world. Present-day Hispanic/Latinopopulations exhibit complex population structure, with sig-nificant genetic contributions from Native American and Euro-pean populations (primarily involving local indigenouspopulations and migrants from the Iberian peninsula andSouthern Europe) as well as West Africans brought to theAmericas through the trans-Atlantic slave trade (1, 2). Thesecomplex historical events have affected patterns of genetic andgenomic variation within and among present-day Hispanic/Lat-ino populations in a heterogeneous fashion, resulting in rich andvaried ancestry within and among populations as well as markeddifferences in the contribution of European, Native American,and African ancestry to autosomal, X chromosome, and uni-parentally inherited genomes.Many key demographic variables differed among colonial
Latin American populations, including the population size of thelocal pre-Columbian Native American population, the extentand rate at which European settlers displaced native pop-ulations, whether or not slavery was introduced in a given region,and, if so, the size and timing of introduction of the African slavepopulations. There were also strong differences in ancestry
among social classes in colonial (and postcolonial) populationswith European ancestry often correlating with higher socialstanding. As a consequence, present-day Hispanic/Latino pop-ulations exhibit very large variation in ancestry proportions (asestimated from genetic data) not only across geographic regions(1, 2), but also within countries themselves (3, 4). In addition, theprocess of admixture was apparently sex-biased and preferen-tially occurred between European males and Amerindian and/orAfrican females; this process has been shown to be remarkablyconsistent among countries and populations including Argentina(5), Ecuador (6), Mexico (7), Cuba (8), Brazil (9), Uruguay (10),Colombia (11), and Costa Rica (11).The rich diversity of variation in ancestry among Hispanic/
Latino populations, coupled with consistent differences amongpopulations in the incidence of chronic heritable diseases, suggeststhat Hispanic/Latino populations may be very well suited foradmixture mapping (12, 13). For example, differences in relativeEuropean ancestry proportions correlate with higher susceptibilityin Puerto Ricans to asthma as compared withMexicans (14). Datahave also shown an increased risk of breast cancer in Latinas withgreater European ancestry (15) and an interplay between Africanancestry and cardiovascular disease and hypertension in PuertoRicans from Boston (16). Hispanic/Latinos are also likely to playan increasingly important role in multi- and transethnic geneticstudies of complex disease. Genome-wide scans have identifiedcandidate markers for onset of type 2 diabetes in Mexican-Americans from Texas (17) as well as a region on chromosome 5associated with asthma in Puerto Ricans (18).Quantifying the relative contributions of ancestry, environ-
ment (including socio-economic status), and ancestry by envi-ronment interaction to disease outcome in diverse Hispanic/Latino populations will also be critical to applying a genomicperspective to the practice of medicine in the United States andin Latin America. For example, whereas European ancestry wasassociated with increased asthma susceptibility in Puerto Ricans
This paper results from the Arthur M. Sackler Colloquium of the National Academy ofSciences, "In the Light of Evolution IV: The Human Condition," held December 10–12,2009, at the Arnold and Mabel Beckman Center of the National Academies of Sciencesand Engineering in Irvine, CA. The complete program and audiofiles ofmost presentationsare available on the NAS Web site at www.nasonline.org/SACKLER_Human_Condition.
Author contributions: C.D.B. and H.O. designed research; K.B., C.V., T.K., and M.H. per-formed research; K.B., A.A., M.H., C.D.B., and H.O. contributed new reagents/analytictools; K.B., T.K., A.M.-E., A.R., A.A., and C.D.B. analyzed data; and K.B., C.V., A.M.-E.,C.D.B., and H.O. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: Genotype data from 100 Hispanic/Latinos have been deposited in theGene Expression Omnibus (GEO) series record GSE21248.1K.B. and C.V. contributed equally to this work.2Present address: Wellcome Trust Centre for Human Genetics, Oxford OX3 7BN, UK.3C.D.B. and H.O. contributed equally to this work.4To whom correspondence may be addressed. E-mail: [email protected] or [email protected].
This article contains supporting information online at www.pnas.org/cgi/content/full/0914618107/DCSupplemental.
8954–8961 | PNAS | May 11, 2010 | vol. 107 | suppl. 2 www.pnas.org/cgi/doi/10.1073/pnas.0914618107
(14), it was also shown that the effect was moderated by socio-economic status (19). This suggests that quantifying fine-scalepatterns of genomic diversity among diverse U.S. and non-U.S.Hispanic/Latino may be critical to the efficient and effectivedesign of medical and population genomic studies. A fine-scalepopulation genomics perspective may also provide a powerfulmeans for understanding the roles of ancestry, genetics, andenvironmental covariates on disease onset and severity (13).Here, we introduce a larger, high-density SNP and haplotype
dataset to investigate historical population genetics questions—such as variation in sex-biased ancestry and genome-wide admix-ture proportions within and amongLatino populations—as well asprovide a genomic resource for the study of population sub-structurewithin putativeEuropean,African, andNativeAmericansource populations.Our dataset includes three Latino populationsthat are underrepresented in whole-genome analyses, namely,Dominicans, Colombians, and Ecuadorians, as well as Mexicansand Puerto Ricans, the two largest Hispanic/Latino ethnic groupsin the United States. This allows comparison of patterns of pop-ulation structure and ancestry across multiple U.S. Hispanic/Lat-ino populations. Our dense SNP marker panel is formed by theintersection of two of the most commonly used genotyping plat-forms, allowing for the inclusion of dozens of Native American,African, and European populations for ancestry inference. Ourwork expands on high-density population-wide genotype datafrom the International HapMap Project (HapMap) (20, 21), theHuman Genome Diversity Panel (HGDP) (22), and the Pop-ulation Reference Sample (POPRES) (23) that have representa-tion ofMexicans but not otherHispanic/Latino groups either fromthe Caribbean or from South America, with a resulting gap foranalyzing admixture in those populations. This project, therefore,represents an important step toward comprehensive panels forUS-based studies that can more accurately reflect the diversitywithin various Hispanic/Latino populations.
ResultsPopulation Structure.Weapplied the clustering algorithmFRAPPEto investigate genetic structure amongHispanic/Latino individualsusing amerged data set with over 5,000 individuals with European,African, and Native American ancestry genotyped across 73,901SNPs common to the Affymetrix 500K array and the Illumina610-Quad panel (Materials and Methods). FRAPPE implements amaximum likelihood method to infer the genetic ancestry of eachindividual, where the individuals are assumed to have originatedfrom K ancestral clusters (24). The plots for K = 3 and K = 7 areshown in Fig. 1 and for all other values of K in Fig. S1 K = 3. Weobserved clustering largely by Native American, African, andEuropean ancestry, with theHispanic/Latino populations showinggenetic similarity with all of these populations. However, sig-nificant population differences exist, with the Dominicans andPuerto Ricans showing the highest levels of African ancestry(41.8%and23.6%African, SDs 16%and12%),whereasMexicansand Ecuadorians show the lowest levels of African ancestry (5.6%and 7.3% African, SDs 2% and 5%) and the highest NativeAmerican ancestries (50.1% and 38.8% Native American, SDs13% and 10%). We also found extensive variation in European,Native American and African ancestry among individuals withineach population. A clear example could be observed in the Mex-ican sample, in which ancestry proportions ranged from predom-inantly Native American to predominantly European (withgenerally low levels of African ancestry). Similar results werefound in Colombians and Ecuadorians, whereas Dominicans andPuerto Ricans showed the greatest variation in the Africanancestry (Fig. 1). Interestingly, at K = 7, we were able to capturesignals of continental substructure such as a Southwest to North-east gradient in Europe and a Native American component that isabsent in the two Amazonian indigenous populations (Karitianaand Surui) but that substantially contributes to all other studied
Latino populations. We also note that several of the individualsfrom the Maya and Quechua Native American samples (and to alesser extent Nahua and Pima) from the Human Genome Diver-sity Panel (CEPH-HGDP) show moderate levels of Europeanadmixture, consistent with previous studies of these populations(25). Interestingly, this is not the case for theAymara andQuechuasamples genotyped by Mao et al. (26).We also undertook principal component analysis (PCA) of the
autosomal genotype data from Hispanic/Latino and putativeancestral populations using the smartpca program from thesoftware package eigenstrat (Fig. 2A) (27). The first two principalcomponents of the PCA strongly support the notion that thethree ancestral populations contributing to the Hispanic/Latinogenomic diversity correspond exactly to Native American,European, and African ancestry. The Hispanic/Latino pop-ulations showed different profiles of ancestry, as exemplified bythe fitting of ellipses to the covariance matrix of each pop-ulation’s first two PCs (Fig. 2C). Subsequent PCs showed sub-structure within Africa, Native Americans, and Europeans (Fig.S2). PCA on the X chromosome markers (Fig. 2B) showed asimilar pattern, although because there are only 1,500 markers,this PCA had greater variance, which is illustrated in the fittedellipses as well (Fig. 2D).We also ran the Bayesian clustering algorithm STRUCTURE in
“assignment mode” (28), and used a training set of Europeans,Africans, and Native Americans to estimate ancestral allele fre-quencies and assess admixture proportions within and among theHispanic/Latino populations. Using STRUCTURE analysis of theautosomes (Fig. 3,Upper) and the X chromosome (Fig. 3, Lower),we found that, again, Puerto Ricans and Dominicans showed thegreatest proportion of African ancestry whereas Colombians,Ecuadorians, and Mexicans showed extensive variation in Euro-pean and Native American ancestry among individuals. We cal-culated LD decay curves for all populations with at least 10individuals, choosing subsets of 10 individuals, and averagingmorethan 100 random subsets of the data. Patterns of decay of LDwereconsistent with previously published results (25), with NativeAmerican populations showing the highest levels of LD andAfrican populations the lowest (Fig. 4A). Interestingly, the His-panic/Latino populations demonstrated rates of decay of LD thatcorrelated strongly with the amount of Native American, Euro-pean, and African ancestry (Fig. 4B). Specifically, the populationswith the most Native American ancestry, Mexican and Ecua-dorian, exhibited higher levels of linkage disequilibrium amongSNP markers, whereas the populations with the highest pro-portions of African ancestry, the Dominican and Puerto Ricansamples, had the lowest levels of LD.
Locus-Specific Ancestry. To reconstruct local genomic ancestry at afine scale, we used the ancestry deconvolution algorithm LAMP(29), allowing for a three-way admixture and focused on the fourHispanic/Latino populations genotyped on the Illumina 610-Quad platform—Dominicans, Colombians, Puerto Ricans, andEcuadorians (Materials and Methods). Because this same SNPpanel had also been genotyped across the HGDP samples (1,043individuals from 53 populations), the merged data set containingmore than 500,000 markers provided a unique resource forinvestigating the extent of subcontinental ancestry among diverseHispanic/Latino populations. We found that individual averageancestries are in agreement with FRAPPE and STRUCTUREresults in which Ecuadorians have the highest Native Americanproportions, followed by Colombians (showing greater Europeancontribution), and with Puerto Ricans and Dominicans showingthe highest African ancestry—specially Dominicans, who showvery low contribution from Native Americans (Fig. 1). We alsoused the PCA-based methods of Bryc et al. (30) to infer ancestryat each locus for the samples genotyped on the Affymetrix 500K,which included more than 100 Mexican samples genotyped by
Bryc et al. PNAS | May 11, 2010 | vol. 107 | suppl. 2 | 8955
the POPRES project (23) and diverse Native American pop-ulations genotyped by Mao et al. (26). The local admixture tracksfor each individual are in large agreement with the genome-wideaverage ancestry proportions (Fig. 3, Middle).To investigate the genetic relationships among admixed His-
panic/Latino populations and putative ancestral groups, we
compared patterns of population divergence among the inferredsegments of European, African, and Native American ancestryand corresponding putative source populations using Wright’sFST measure. Specifically, we used LAMP to reconstruct for eachindividual in our data set, segments of European, African, andNative American ancestry across both the maximal SNP data set
Fig. 2. Principal component analysis results ofthe Hispanic/Latino individuals with Europeans,Africans, and Native Americans. PC 1 vs. PC 2scatter plots based on autosomal markers (UpperLeft) and based on X chromosome markers(Upper Right). Ellipses are fitted to the PCAresults on the autosomes (Lower Left) and toresults from the X chromosome markers (LowerRight).
ocixeM
rodaucE
aibmol o
Coci
R otreuP
cilbup eR
moD
nain idraS
nacs uT
eporuE
SnailatIiegyd
Aeuqsa
Bep oru
E W
S
eporuE
ES
hcnerF
epo ruE
W
e poruE
C
e po ruE
WN
epo ruE
EN
N
n aidac rO
n aissuR
ymgy
P akaiB
ymgy
P itubM
n aS
acirfA .
S utnaB
ayneK utna
Bakned na
M
aburoY
ami
Pau ha
N
ayaM
auhceuQ
au hceuQ/ .y
Anaib
moloC
ana it iraK
i uruS
0.0
0.2
0.4
0.6
0.8
1.0
K = 3
K = 7
0.0
0.2
0.4
0.6
0.8
1.0
Hispanic/Latino European African
NativeAmerican
Fig. 1. Frappe clustering illustrating the admixed ancestry of Hispanic/Latinos shown for K = 3 and K = 7. Individuals are shown as vertical bars colored inproportion to their estimated ancestry within each cluster. Native American populations are listed in order geographically, from North to South.
8956 | www.pnas.org/cgi/doi/10.1073/pnas.0914618107 Bryc et al.
for all of the admixed and putative source population individuals(i.e., either the 610K Illumina for Puerto Rican, Ecuadorian,Columbian, and Dominican or 500K for Mexicans from Gua-dalajara) as well as ∼70 K SNPs common to both platforms. Tocalculate FST at a given SNP for a given pair of populations, weincluded only individuals with unambiguous ancestry assignment(i.e., individuals with two European-, two Native American–, ortwo African-origin chromosomes). One potential confounder forthis analysis is that sample sizes differ substantially among sub-populations within major continental regions (e.g., in the NativeAmerican set, we have sample sizes that range from n = 7 forColombian indigenous Americans in HGDP to n= 29 for Nahuafrom Mexico in the Mao et al. dataset). To minimize thepotential bias of differences in sample size, we randomly selected
n = 7 individuals from all potential subpopulations and recom-puted Wright’s FST. As seen in Table 1, we found that consistentwith historical records, our results show that African segments ofthe Hispanic/Latino populations are more closely related to theBantu-speaking populations of West Africa than other pop-ulations. Specifically, we found that the Colombians and Ecua-dorians are most closely related to the Kenyan Bantupopulations, whereas the Puerto Ricans and Dominicans aremost close to the Yoruba from Nigeria. Likewise, Europeansegments show the lowest FST values when compared withSouthwest European populations (individuals from Spain andPortugal), as well as French and Italian individuals. NativeAmerican segments of the Hispanic/Latino individuals show theleast genetic differentiation with Mesoamerican (e.g., Maya and
y rts ecn a emo sot ua eru tcur t
S
0.0
0.2
0.4
0.6
0.8
1.0
0
50
100
150
200
250
)bM( yrtsec na 1 e
mo som orh
C
Mexico Ecuador Colombia PuertoRico
DominicanRepublic
yrtsecna em os o
mo rhc X erut cur t
S
0.0
0.2
0.4
0.6
0.8
1.0
European ancestry African ancestry Native American ancestry
Fig. 3. Genome-wide and locus specific ancestry estimates for Mexicans, Ecuadorians, Colombians, Puerto Ricans, and Dominicans. Shown for K = 3, clus-tering of the Hispanic/Latino individuals on the autosomes (Top) and on the X chromosome (Bottom). Individuals are shown as vertical bars colored inproportion to their estimated ancestry within each cluster. Local ancestry at each locus is shown for each individual on chromosome 1 (Middle). The Xchromosome shows greater Native American ancestry (blue) and greater variability in African ancestry (green), with reduced European ancestry (red).
0 20 40 60 80
0.01.0
2.03.0
4.0
Ancestral populations
Physical distance (kb)
muirbiliuqesid egakniL
A
0 20 40 60 80
0.01.0
2.03.0
4.0
Admixed populations
Physical distance (kb)
muirbiliuqesid eg akni L
BAfrican AmericanMexicoEcuadorColombiaPuerto RicoDominican RepublicSardinianItalianAdygeiBasqueFrenchOrcadianRussianC EuropeNNE EuropeNW EuropeS EuropeSE EuropeSW EuropeW EuropeBiaka PygmyMbutiPygmyBantuKenyaMandenkaYorubaNahuaMaya
Fig. 4. Linkage disequilibrium, genotype r2 estimated by PLINK, by population as a function of physical distance (Mb). (Left) Native American, European, andAfrican populations. (Right) Hispanic/Latino populations. Scale is the same.
Bryc et al. PNAS | May 11, 2010 | vol. 107 | suppl. 2 | 8957
Nahua), Chibchan (e.g., Colombian), andAndean (e.g., Quechua)populations. The closest relationship is clearly observed betweenMexicans from Guadalajara and Nahua indigenous individuals.
Sex Bias in Ancestry Contributions. We used the STRUCTUREancestry estimates on the autosomes and X chromosome toestimate Native American, European, and African, ancestryproportions of each Hispanic/Latino individual. We then com-pared the estimates of ancestry for each population on theautosomes vs. on the X chromosome (Fig. 5 and Figs. S3 and S4).Whereas the Native American ancestry was significantly higheron the X chromosome than on the autosomes (including thosepopulations with reduced Native American ancestry, i.e., PuertoRicans and Dominicans), the autosomal vs. X-chromosome dif-ference was more attenuated with regard to African ancestry.This reduced deviation is present even in those Hispanic/Latinopopulations analyzed whose non-European ancestry was princi-pally Native American in origin (i.e., Mexicans and Ecua-dorians). Furthermore, greater Native American ancestry on theX chromosome in Puerto Ricans did not necessarily implygreater Amerindian ancestry on the autosomes. This finding issimilar to those observed by analyzing fine-scale genome patternof population structure and admixture among African Ameri-cans, West Africans, and Europeans (31).Finally, we used SNP and microsatellite genotyping to identify
the canonical Y chromosome and mtDNA haplotypes for each ofthe Hispanic/Latino individuals that we genotyped. Details of theloci and classifications are found in Tables S1 and S2. We foundan excess of European Y chromosome haplotypes and a higherproportion of Native American and African mtDNA haplotypes,consistent with previous studies (Fig. 6). In addition, we foundseveral non-European Y chromosomal haplotypes with mostlikely origins from North Africa and the Middle East. Weobserved that African-derived haplotypes were the predominantorigin of mtDNA in Dominicans (17 of 27 individuals), matchingthe greater African vs. Native American origins of this pop-ulation on the autosomes and X-chromosomes. However, inPuerto Ricans we did not find evidence of a high African femalecontribution. The predominant Y chromosomal origins in thePuerto Ricans sampled were European and African; but, incontrast, 20 of 27 Puerto Rican individuals had mitochondrialhaplotypes of Native American origin, suggesting a strong femaleNative American and male European and African sex biascontribution. Overall, in all of the Hispanic/Latino populationsthat we analyzed, we found evidence of greater Europeanancestry on the Y chromosome and higher Native Americanancestry on the mtDNA and X chromosome consistent withprevious findings (5–11).
DiscussionOur work has important implications for understanding thepopulation genetic history of Latin America as well as ancestry ofUnited States–based Hispanic/Latino populations. As has beenpreviously documented, we found large variation in the pro-portions of European, African, and Native American ancestryamong Mexicans, Puerto Ricans, Dominicans, Ecuadorians, andColombians, but also within each of these groups. These trendsare a consequence of variation in rates of migration fromancestral European and African source populations as well aspopulation density Native Americans in pre-Columbian times(1). We found that Dominicans and Puerto Ricans in our studyshowed the highest levels of African ancestry, consistent withhistorical records. European settlers to island nations in theCaribbean basin largely displaced Native American populationsby the early to mid 16th century and concurrently imported largeAfrican slave populations for large-scale colonial agriculturalproduction (largely of sugar). In contrast, Colombia has widergeographic differences ranging from Caribbean coasts to AndeanTa
ble
1.Ancestry-specificF S
Tdistancesbetwee
nHispan
ic/Latinopopulationsan
ddifferentputative
sourcepopulations
African
segmen
tsofthegen
ome
Ban
tuKen
yaBan
tuS.
Africa
Biaka
Pygmy
Man
den
kaMbutiPy
gmy
YRI
COL
3.19
1%3.37
5%6.52
0%3.67
7%11
.217
%3.26
3%DOM
1.56
4%1.47
6%4.65
7%1.41
9%8.87
7%0.91
3%EC
U6.09
8%6.88
3%10
.143
%6.40
0%14
.702
%6.48
1%PR
I2.50
0%2.54
3%5.76
1%2.38
4%10
.216
%2.17
6%Eu
ropea
nsegmen
tsofthegen
ome
Adyg
eiBasque
Europea
nES
EEu
ropeC
EuropeN
Ne
EuropeN
WEu
ropeS
EuropeS
EEu
ropeS
WEu
ropeW
Fren
chItalian
Orcad
ian
Russian
Sardinian
Tuscan
COL
1.83
6%1.35
1%1.38
9%0.97
8%1.25
3%1.24
0%1.03
3%1.02
0%0.86
3%1.08
0%0.88
0%0.88
5%1.41
0%1.64
8%1.55
0%1.05
0%DOM
1.56
0%1.12
8%1.07
1%0.69
1%0.91
9%0.94
0%0.70
5%0.77
5%0.53
7%0.73
0%0.61
3%0.61
0%1.09
3%1.41
3%1.27
0%0.82
5%EC
U1.66
9%1.45
6%1.22
5%1.01
2%1.21
2%1.10
0%1.00
5%1.00
5%0.83
8%1.10
4%0.79
9%0.84
5%1.41
7%1.36
9%1.60
7%0.92
5%PR
I1.81
1%1.53
0%1.39
2%1.06
2%1.34
5%1.25
1%1.10
7%1.18
1%0.91
6%1.15
5%0.94
0%0.87
9%1.50
8%1.82
0%1.56
6%1.04
1%Mex
ico
1.01
4%0.78
4%0.55
9%0.33
5%0.43
8%0.44
2%0.19
3%0.30
7%0.12
2%0.26
5%0.27
0%0.27
1%0.79
3%0.88
2%0.85
2%0.33
6%NativeAmerican
segmen
tsofthegen
ome
Aym
ara
Colombian
Karitiana
May
aNah
ua
Pima
Quechua
Surui
COL
4.00
5%5.29
6%9.09
9%4.72
4%3.61
4%8.56
2%3.43
2%13
.803
%DOM
5.14
2%5.86
8%9.06
0%4.26
2%3.60
1%9.31
0%3.14
7%13
.736
%EC
U4.24
4%5.79
9%9.17
8%5.44
6%4.14
7%9.19
3%3.07
9%13
.765
%PR
I5.87
2%6.61
8%10
.120
%6.62
4%4.79
5%10
.578
%5.16
9%15
.093
%Mex
ico
2.39
7%4.18
5%8.19
7%1.41
7%0.57
2%5.11
2%2.08
6%11
.061
%
Resultsbased
on∼70
Kove
rlap
pingSN
Psbetwee
nAffym
etrixan
dIlluminaplatform
san
deq
ualizingpopulationsample
size
sdownto
seve
nindividualsper
population.
8958 | www.pnas.org/cgi/doi/10.1073/pnas.0914618107 Bryc et al.
valleys and mountains, which could explain the enrichment ofAfrican ancestry in some individuals and not in others, likelyrepresenting the differences in origin within Colombia. Finally,Mexico and Ecuador are two continental countries that had highdensities of Native Americans during pre-Columbian times; asexpected, the individuals from these two countries show thehighest degree of Native American ancestry. Our findings clearlyshow that the involuntary migration of Africans through the slavetrade appears to have left a clear trace in Hispanic/Latino pop-ulations proximal to these routes.From the FST analysis, we found that the high-density geno-
type data that we have collected is quite informative regardingthe personal genetic ancestry of admixed Hispanic/Latino indi-viduals. Specifically, we found that individuals differ dramaticallywithin and among populations and that we can reliably identifysubpopulations within major geographic regions (i.e., Europe,Africa, and the Americas) that exhibit lower pairwise FST (and,therefore, higher genetic similarity) to the inferred European,African, and Native American segments for the 212 individualsstudied. We found, for example, that Nahua showed the lowestFST in Mexicans, consistent with the observation that the Nahuaare one of the largest Native American populations in this regionand are likely to have contributed to the genomes of admixedindividuals in Mexico (as opposed, for instance, to the MexicanPima who fall outside the Mesoamerican cultural region andshow considerably higher levels of differentiation). We also
found that the lowest FST for the African regions of the Domi-nican and Puerto Rican genomes are with the Yoruba, a Bantu-speaking West African population that has been shown to begenetically similar to the African segments of African Americanssampled in the United States (30). Although we have limitedNative American populations and Hispanic/Latino sample sizesand, thus, the differences in FST with different subcontinentalpopulations suggest that there exists a reasonably strong signal ofwhich present day populations are most closely related to theancestral populations that contributed ancestry to each of theHispanic/Latino populations.When comparing inferred continental ancestry of the X and Y
chromosomes and mitochondrial vs. the autosomal genome, weobserved an enrichment of European Y-chromosome vs. auto-somal genetic material, and a greater percentage of both NativeAmerican and African ancestry on the X-chromosomes andmtDNA compared with the autosomes for the Hispanic/Latinoindividuals in this study. This suggests a predominance ofEuropean males and Native American/African females in theancestral genetic pool of Latinos, consistent with previousstudies. A particularly interesting observation from our work onsex-biased admixture is that the pattern exists not only withinpopulations but among Hispanic/Latino populations as well. Inall populations studied, there is an enrichment of NativeAmerican ancestry both on the X chromosome and mtDNAcompared with the autosomes. This would suggest that a greater
Fig. 5. Boxplots comparing autosomal vs. X chromosome ancestry proportions by population, shown for European ancestry (Left), Native American ancestry(Center), and African ancestry (Right). Filled boxes correspond to autosomal ancestry estimates; open boxes show X chromosome ancestry estimates. Median(solid line), first and third quartiles (box) and the minimum/maximum values, or to the smallest value within 1.5 times the IQR from the first quartile(whiskers). For each paired comparison of X chromosomes and autosomes, median Native American ancestries are consistently higher on the X chromosomein all Hispanic/Latino populations sampled, and European ancestries are lower across all populations.
A B
Fig. 6. Comparison of mtDNA and Y chromosome haplotypes. Each individual is represented by a point within the triangle that represents the autosomalancestry proportions. Themost probable continental location for each individual’s haplotype is designated by the color of the point. TheY chromosome containsa disproportionate number of European haplotypes, whereas themtDNA has a high proportion of Native American, slightlymore African haplotypes and fewerEuropean haplotypes, consistent with a sex bias toward a great European male and Native American/African female ancestry in the Hispanic/Latinos.
Bryc et al. PNAS | May 11, 2010 | vol. 107 | suppl. 2 | 8959
female Native American contribution to the genome of Latinos.A different result was obtained in relation to African ancestry.We found a smaller difference between mean African ancestryon the X chromosome and the autosomes, compared with thedifference in Native American ancestry. Furthermore, unlike inNative American ancestry, we found an overwhelming repre-sentation of Native American mtDNA haplogroups in PuertoRicans, even though non-European ancestry on the autosomeswas largely African.It is important to note that this observation does not neces-
sarily undermine the model of sex-biased admixture amongEuropean male and African females in the founding of Hispanic/Latino populations, especially when one considers the predom-inance of European Y chromosomes in all groups studied.However, it suggests that admixture between European malesand Amerindian/African females has been a complex process inthe formation of the various Hispanic/Latino populations. Spe-cifically, a reduced X vs. autosome mean African ancestrycompared with Native American ancestry suggests a more bal-anced gender contribution in the Hispanic/Latino genome byindividuals of African ancestry. In the case of Puerto Ricans, theonly way that one can reconcile greater African ancestry on theX chromosome vs. what would be expected on mitochondrialdata would be through transmission of X chromosomes inde-pendent of mitochondrial transmission, which is plausible bio-logically only via males. Caution, however, should be exercisedbefore considering such conclusions as concrete; unlike Xchromosomes, which can recombine and thus represent hap-lotypes derived from thousands of individuals, mitochondrialDNA represents just a sole distant ancestor among these thou-sands. Thus, a larger mtDNA sample would be necessary com-pared with X chromosomes to have similar confidence that acohort would accurately reflect the presumed diversity ofancestry in the population as a whole.The Y chromosomal results also demonstrate the insufficiency
of the paradigm of European males and Native American/Africanfemales to capture the complexity within the Latin Americanpopulations. For example, we find Y chromosomal haplotypes inHispanic/Latinos with presumed origins in the Middle East andNorthern Africa. Given that historical documentation suggeststhat most of the non-African and non–Native American con-tribution to admixed Hispanic/Latino populations is from South-west Europe, this suggests that the contemporary populationsinherited these Y chromosomes from Europeans who, in turn,were descended from Middle Eastern or North African men.Several historical events could have led to the acquisition byEuropeans of non-European haplotypes, perhaps during theperiod of the Roman Empire when the Mediterranean Seabehaved as a conduit (not a physical barrier) between Europe, theMiddle East, and North Africa or by Sephardic Jews or MoorishMuslims during the European Middle Ages/Islamic Golden Age.Alternatively, the presence of non-European Y chromosomalhaplotypes originating from the Middle East and North Africacould represent the result of Iberian Jews and Muslims (them-selves admixed) fleeing the peninsula for New World territoriesin response to discriminatory policies that strongly pressuredboth communities at the termination of the Reconquista. Essen-tially, the diversity of haplotypes in the Y chromosomes in Latinosreflects not only population dynamics from the 15th centuryonward, but also the historical trends of population movementoccurring across the Atlantic during centuries prior.Themarked genetic heterogeneity of Latino populations shown
in this study, as previously suggested by other surveys of geneticancestry (2, 26, 32) has important implications for the identi-fication of disease-associated variants that differ markedly in fre-quency among parental populations. In their study of 13 Mestizopopulations from Latin America, for example, Wang et al. (2008)suggested that admixturemapping inHispanic/Latino populations
may be feasible within a two-population admixture framework,since themeanAfrican ancestry inMestizo populations is typicallylow (<10%) (2). Although this is true for Hispanic/Latino pop-ulations with origins in the continental landmass of the Americas(such as the populations studied by Wang et al.), our results showthat this may not apply to Latino populations with origins in theCaribbean, as their African ancestry proportion is considerablyhigher and is highly variable among individuals, suggesting anextensive three-way admixture and representing additional chal-lenges for admixture mapping. Likewise, we find subtle butreproducible differences in subcontinental ancestry among His-pani/Latino individuals, suggesting that even a three-way admix-ture model may not be sufficient to accurately model the dynamicpopulation genetic history of these populations.Another observation with important implications for designing
association studies is the large variation in individual admixtureestimates within certain Latino populations (e.g., Mexicans,Colombians, and Ecuadorians). One could expect such outcomewhen collecting samples from United States–based Latinocommunities, which in turn may come from different locationswithin their countries of origin (e.g., Colombians and Ecua-dorians). However, within the Mexican sample, which has beencollected in a single sampling location (i.e., Guadalajara, Mex-ico), we also observed large variation in European vs. NativeAmerican admixture proportions. Our findings are in agreementwith previous studies on genetic ancestry from Mexico City (2,33), supporting the idea that such urban agglomerations, inwhich a large number of epidemiological studies are likely totake place, continue to host a wide range of genetic variabilityamong individuals that may self-identify as individuals from thesame population. Therefore, particular attention should be paidto carefully matching representative cases and controls, as well asto carefully control for ancestry when performing associationstudies using Hispanic/Latino populations. We hope that ourdense genome-wide admixture analysis has allowed greaterinsight into the population dynamics of multiple Hispanic/Latinopopulations and that it will provides a resource for designingnext-generation epidemiological studies in these communities,opening the possibility of better understanding the geneticmakeup of this growing segment of the U.S. population.
Materials and MethodsDatasets. We genotyped 100 individuals with ancestry from Puerto Rico, theDominican Republic, Ecuador, and Colombia on Illumina 610K arrays. Weextracted 400 European, 365AfricanAmerican, and 112Mexican samples fromtheGlaxoSmithKline POPRESproject,which is a resourceofnearly 6,000 controlindividuals fromNorthAmerica, Europe, andAsiagenotypedon theAffymetrixGeneChip 500K Array Set (23). We randomly sampled 15 individuals from eachEuropean country where possible, or the maximum number of individualsavailable otherwise, to select the POPRES European individuals to be includedin our study. Further description of sampling locations, genotyping, and dataquality control areavailableelsewhere (23).We include165and167 individualsfrom the HapMap project from the CEU and YRI populations, thinned to thesame SNP set (21). We also include all European, Native American, and Africanindividuals from the HGDP genotyped on Illumina 610K arrays (25). Finally, weinclude all Native American populations from the Mao et al. (2007) studygenotyped on Affymetrix 500K arrays (26). For each dataset, we used annota-tion information to determine the strand onwhich the data were given and tomap all Affymetrix and Illumina marker ids to corresponding dbSNP referenceids [rsids]. SNPs without valid rsids were excluded from analysis. Each datasetwas then converted to the forward strand to facilitate merging of the data.Data from the various platformswere merged using the PLINK toolset, version1.06 (34). Likewise, nonmissing genotype calls that showed disagreementbetween datasets were omitted. Demographic data for all individuals includedin this study are available on GenBank. All samples were approved by institu-tional review board protocols from their respective studies.
Data Quality Control. The HapMap II release 23, HGDP, Mao et al., and POPRESsamples were genotyped and called according to their respective qualitycontrol procedures (21, 23, 25, 26). Our final merged dataset contains 73,901
8960 | www.pnas.org/cgi/doi/10.1073/pnas.0914618107 Bryc et al.
SNPs with genotype missingness of <0.1 and <0.05 individual missingnessacross 5,104 individuals.
Population Structure. We used the software FRAPPE, which implements anexpectation-maximization algorithm for estimating individual membershipin clusters (24). This algorithm is more computationally efficient than otherMCMC methods, allowing it to analyze many more markers than, forexample, STRUCTURE (24, 28). After thinning markers to have r2 < 0.5 in 50SNP windows, shifted and recalculated every 5 SNPs, we ran FRAPPE on all64,935 remaining markers for 5,000 iterations. We also assessed admixtureproportions for the Hispanic/Latino individuals using STRUCTURE on areduced dataset of 5,440 markers after thinning for MAF > 0.2 and with aminimum separation of 400 Kb between markers. We use the F model withUSEPOPINFO = 1 to update allele frequencies using only the ancestral indi-viduals, with 5,000 burn-in and 5,000 iterations (28). We also used all 1,518SNPs on the X chromosome for the same analysis of the X chromosomeancestry. Principal component analysis was conducted using a dataset thin-ned to have r2 < 0.8 in 50 SNP windows, leaving 69,212 SNPs for analysisusing the package smartpca from the software eigenstrat. Ellipses werefitted following the means and 1 SD of the variance–covariance matrix ofthe PC1 and PC2 scores of each population.
For local ancestry estimation, we used the software LAMP in LAMPANCmode providing allele frequencies for the HGDP West Africans, Europeans,and Native Americans as ancestral populations (29). A total of 552,025 SNPswere included in the analysis, and configuration parameters were set asfollows: mixture proportions (alpha) = 0.2, 0.4, 0.4; number of generationssince admixture (g) = 20; recombination rate (r) = 1e-8; fraction of overlap
between adjacent windows (offset) = 0.2; and r2 threshold (ldcutoff) = 0.1.Local ancestry estimation for the Mexican individuals was performed usingthe two-way PCA-based method described in Bryc et al. (30) for both the fullIllumina 610K and the Affymetrix 500K datasets, in 10 SNP windows. OnlyNative Americans with <0.01 European ancestry (as estimated from FRAPPEresults) were used as the ancestral Native American individuals within theirrespective datasets. FST was calculated between Native American, European,and African regions of the Hispanic//Latino individuals and the respectivecontinental populations using a C++ implementation of Weir and Cocker-ham’s FST weighed equations as previously published (35). To eliminate biasin estimation of FST due to European ancestry shown in some of the NativeAmericans, we also removed regions showing European ancestry within anyof the Native Americans showing >0.01 European ancestry, using the samelocal ancestry estimation procedure as described for the Mexican individuals.Furthermore, to avoid any potentially confounding effect of sample size, weused a random sample of 7 (the minimum sample size of the Native Amer-ican populations) individuals per non-Hispanic/Latino population to calcu-late pairwise FST. MAF was set at a threshold >0.1 in the populationscompared by FST calculations.
ACKNOWLEDGMENTS. We thank Mariano Rey for support of the project;Peter Gregersen, Carole Oddoux, and Annette Lee for technical assistance;and Marc Pybus for valuable programming support during part of theanalyses. This work was supported by the National Institutes of Health(Grant 1R01GM83606) as part of the National Institute of General MedicalSciences research funding programs.
1. Sans M (2000) Admixture studies in Latin America: From the 20th to the 21st century.Hum Biol 72:155–177.
2. Wang S, et al. (2008) Geographic patterns of genome admixture in Latin AmericanMestizos. PLoS Genet 4:e1000037.
3. Seldin MF, et al. (2007) Argentine population genetic structure: Large variance inAmerindian contribution. Am J Phys Anthropol 132:455–462.
4. Silva-Zolezzi I, et al. (2009) Analysis of genomic diversity in Mexican Mestizopopulations to develop genomic medicine in Mexico. Proc Natl Acad Sci USA 106:8611–8616.
5. Dipierri JE, et al. (1998) Paternal directional mating in two Amerindian subpopulationslocated at different altitudes in northwestern Argentina. Hum Biol 70:1001–1010.
6. González-Andrade F, Sánchez D, González-Solórzano J, Gascón S, Martínez-Jarreta B(2007) Sex-specific genetic admixture of Mestizos, Amerindian Kichwas, and Afro-Ecuadorans from Ecuador. Hum Biol 79:51–77.
7. Green LD, Derr JN, Knight A (2000) mtDNA affinities of the peoples of North-CentralMexico. Am J Hum Genet 66:989–998.
8. Mendizabal I, et al. (2008) Genetic origin, admixture, and asymmetry in maternal andpaternal human lineages in Cuba. BMC Evol Biol 8:213.
9. Marrero AR, et al. (2007) Pre- and post-Columbian gene and cultural continuity: Thecase of the Gaucho from southern Brazil. Hum Hered 64:160–171.
10. Sans M, et al. (2002) Unequal contributions of male and female gene pools fromparental populations in the African descendants of the city of Melo, Uruguay. Am JPhys Anthropol 118:33–44.
11. Carvajal-Carmona LG, et al. (2003) Genetic demography of Antioquia (Colombia) andthe Central Valley of Costa Rica. Hum Genet 112:534–541.
12. Smith MW, et al. (2001) Markers for mapping by admixture linkage disequilibrium inAfrican American and Hispanic populations. Am J Hum Genet 69:1080–1094.
13. González Burchard E, et al. (2005) Latino populations: A unique opportunity for thestudy of race, genetics, and social environment in epidemiological research. Am JPublic Health 95:2161–2168.
14. Salari K, et al. (2005) Genetic admixture and asthma-related phenotypes in MexicanAmerican and Puerto Rican asthmatics. Genet Epidemiol 29:76–86.
15. Fejerman L, et al. (2008) Genetic ancestry and risk of breast cancer among U.S.Latinas. Cancer Res 68:9723–9728.
16. Lai CQ, et al. (2009) Population admixture associated with disease prevalence in theBoston Puerto Rican health study. Hum Genet 125:199–209.
17. Hayes MG, et al. (2007) Identification of type 2 diabetes genes in Mexican Americansthrough genome-wide association studies. Diabetes 56:3033–3044.
18. Choudhry S, et al. (2008) Genome-wide screen for asthma in Puerto Ricans: Evidencefor association with 5q23 region. Hum Genet 123:455–468.
19. Choudhry S, et al. (2006) Ancestry-environment interactions and asthma risk amongPuerto Ricans. Am J Respir Crit Care Med 174:1088–1093.
20. International HapMap Consortium (2005) A haplotype map of the human genome.Nature 437:1299–1320.
21. Frazer KA, et al.; International HapMap Consortium (2007) A second generationhuman haplotype map of over 3.1 million SNPs. Nature 449:851–861.
22. Rosenberg NA, et al. (2002) Genetic structure of human populations. Science 298:2381–2385.
23. Nelson MR, et al. (2008) The Population Reference Sample, POPRES: A resource forpopulation, disease, and pharmacological genetics research. Am J Hum Genet 83:347–358.
24. Tang H, Peng J, Wang P, Risch NJ (2005) Estimation of individual admixture: Analyticaland study design considerations. Genet Epidemiol 28:289–301.
25. Jakobsson M, et al. (2008) Genotype, haplotype and copy-number variation inworldwide human populations. Nature 451:998–1003.
26. Mao X, et al. (2007) A genomewide admixture mapping panel for Hispanic/Latinopopulations. Am J Hum Genet 80:1171–1178.
27. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoSGenet 2:e190.
28. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure usingmultilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164:1567–1587.
29. Sankararaman S, Sridhar S, Kimmel G, Halperin E (2008) Estimating local ancestry inadmixed populations. Am J Hum Genet 82:290–303.
30. Bryc K, et al. (2010) Genome-wide patterns of population structure and admixture inWest Africans and African Americans. Proc Natl Acad Sci USA 107:786–791.
31. Lind JM, et al. (2007) Elevated male European and female African contributions to thegenomes of African American individuals. Hum Genet 120:713–722.
32. Price AL, et al. (2007) A genomewide admixture map for Latino populations. Am JHum Genet 80:1024–1036.
33. Martinez-Marignac VL, et al. (2007) Admixture in Mexico City: Implications foradmixture mapping of type 2 diabetes genetic risk factors. Hum Genet 120:807–819.
34. Purcell SNB, et al. (2007) PLINK: A toolset for whole-genome association andpopulation-based linkage analysis. Am J Hum Genet 81:559–575.
35. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of populationstructure. Evolution 38:1358–1370.
Bryc et al. PNAS | May 11, 2010 | vol. 107 | suppl. 2 | 8961
Supporting InformationBryc et al. 10.1073/pnas.0914618107
Fig. S1. (Continued)
Bryc et al. www.pnas.org/cgi/content/short/0914618107 1 of 9
Fig. S1. Principal component 1–15 of all individuals in the merged dataset, colored by population.
Bryc et al. www.pnas.org/cgi/content/short/0914618107 2 of 9
Fig. S3. Individual scatterplots comparing autosomal vs. X chromosome ancestry proportions as estimated by STRUCTURE K = 3 using prior population in-formation of ancestral populations. Expected autosome = X chromosome ancestry lines are show in gray; fitted linear regression lines are in red.
K =
2
0.00.20.40.60.81.0
K =
3
0.00.20.40.60.81.0
K =
4
0.00.20.40.60.81.0
K =
5
0.00.20.40.60.81.0
K =
6
0.00.20.40.60.81.0
Mex
ico
EC
U
CO
L
PR
I
DO
M
Sar
dini
an
Tusc
anE
urop
eS
Italia
nA
dyge
i
Bas
que
Eur
opeS
W
Eur
opeS
E
Fren
ch
Eur
opeW
Eur
opeC
Eur
opeN
W
Eur
opeN
NE
Orc
adia
n
Rus
sian
Bia
kaP
ygm
y
Mbu
tiPyg
my
San
Ban
tuS
outh
Afri
caB
antu
Ken
yaM
ande
nka
YR
I
Pim
a
Nah
ua
May
a
Que
chua
Aym
ara
Col
ombi
anK
ariti
ana
Sur
ui
K =
7
0.0
0.2
0.4
0.6
0.8
1.0
Fig. S2. Frappe clustering of the HL individuals as well as Europeans, Africans, and Native Americans, shown for K = 2 through K = 7.
Bryc et al. www.pnas.org/cgi/content/short/0914618107 3 of 9
Fig. S4. Individual scatterplots comparing autosomal vs. X chromosome ancestry proportions for each population. Expected autosome = X chromosomeancestry lines are shown in gray; fitted linear regression lines are in red.
Bryc et al. www.pnas.org/cgi/content/short/0914618107 4 of 9
Table S1. mtDNA haplogroup motifs, classification, and predictions
Sample Prediction mtDNA HG HVR1_motif
COL00983 Native American A “16111T, 16187T, 16223T, 16290T, 16319A, 16362C”COL04356 Native American A “16111T, 16129A, 16223T, 16290T, 16319A”COL04506 Native American A “16111T, 16223T, 16239T, 16290T, 16319A, 16362C”COL11917 Native American A “16111T, 16223T, 16290T, 16319A, 16362C”COL12392 Native American A “16111T, 16213A, 16223T, 16290T, 16319A, 16362C”COL12551 Native American A “16111T, 16131C, 16209C, 16223T, 16290T, 16311C, 16319A”COL14057 Native American A “16111T, 16223T, 16290T, 16319A”COL16011 Native American A “16111T, 16213A, 16223T, 16290T, 16319A, 16362C”COL18873 Native American A “16111T, 16223T, 16290T, 16319A, 16362C”COL24360 Native American A “16111T, 16213A, 16223T, 16290T, 16319A, 16362C”COL09891 Native American B “16183C, 16189C, 16217C, 16519C”COL13175 Native American B “16097C, 16098G, 16183C, 16189C, 16217C, 16519C”COL17586 Native American B “16182C, 16183C, 16189C, 16217C, 16519C”COL18861 Native American B “16183C, 16189C, 16217C, 16519C”COL04248 Native American C “16223T, 16298C, 16325C, 16327T, 16428A, 16430G”COL04362 Native American C “16189A, 16223T, 16291T, 16298C, 16325C, 16327T, 16519C”COL04595 Native American C “16223T, 16298C, 16325C, 16327T, 16390A”COL10678 Native American C “16223T, 16298C, 16325C, 16327T”COL14721 Native American C “16223T, 16298C, 16300G, 16325C, 16327T, 16526A”COL25102 Native American D “16223T, 16241G, 16301T, 16311C, 16342C, 16362C”COL09416 Europe H (xH13) “16145A, 16304C”COL11065 Africa L1 “16017C, 16129A, 16163G, 16187T, 16189C, 16223T, 16278T,
16293G, 16294T, 16304C, 16311C, 16360T”COL18876 Africa L1 “16111A, 16129A, 16163G, 16187T, 16189C, 16223T, 16278T, 16290T,
16293G, 16294T, 16311C, 16360T, 16519C”COL15717 Africa L2 “16223T, 16264T, 16266T, 16278T, 16291T, 16390A”COL24480 Africa L2 “16223T, 16278T, 16294T, 16309G, 16390A”COL17450 Africa “L3 (xM,N)” “16124C, 16223T, 16256T”DOM10729 Native American A “16111T, 16223T, 16290T, 16319A, 16362C”DOM10990 Native American A “16092C, 16111T, 16223T, 16290T, 16319A, 16362C, 16519C”DOM17258 Native American A “16111T, 16223T, 16266T, 16290T, 16293G, 16319A, 16362C”DOM22095 Native American A “16111T, 16223T, 16290T, 16319A, 16362C, 16519C”DOM10727 Native American B “16173T, 16183C, 16189C, 16217C, 16352C, 16519C”DOM10984 Europe H (xH13) “16093C, 16304G, 16311C”DOM11034 Europe H (xH13) “16093C, 16304G, 16311C”DOM11098 Europe H (xH13) 16519CDOM10855 Europe K “16048A, 16162C, 16192A, 16224C, 16311C, 16519C”DOM10751 Africa L0 “16093G, 16189C, 16223T, 16287A, 16293T, 16301T, 16311C, 16355T, 16362C, 16399G”DOM10876 Africa L1 “16037G, 16093C, 16129A, 16140C, 16184T, 16187T, 16189C,
16223T, 16278T, 16294T, 16301T, 16304C, 16311C, 16360T, 16519C”DOM10987 Africa L1 “16017C, 16129A, 16163G, 16187T, 16189C, 16209C, 16223T, 16278T,
16293G, 16294T, 16311C, 16360T, 16519C”DOM10733 Africa L2 “16131C, 16189C, 16223T, 16225T, 16234T, 16278T, 16294T, 16309G, 16390A”DOM10865 Africa L2 “16114A, 16129A, 16213A, 16223T, 16278T, 16355T, 16362C, 16390A, 16465T”DOM10722 Africa “L3 (xM,N)” “16172C, 16183C, 16189C, 16223T, 16320T, 16519C”DOM10752 Africa “L3 (xM,N)” “16223T, 16320T, 16519C”DOM10766 Africa “L3 (xM,N)” “16124C, 16223T, 16278T”DOM10857 Africa “L3 (xM,N)” “16172C, 16183C, 16189C, 16223T, 16320T, 16519C”DOM10858 Africa “L3 (xM,N)” “16172C, 16183C, 16189C, 16223T, 16320T, 16519C”DOM10980 Africa “L3 (xM,N)” “16124C, 16223T, 16278T, 16311C, 16362C, 16519C”DOM11033 Africa “L3 (xM,N)” “16223T, 16254G, 16320T, 16519C”DOM11038 Africa “L3 (xM,N)” “16172C, 16183C, 16189C, 16223T, 16320T, 16335G, 16519C”DOM11113 Africa “L3 (xM,N)” “16189C, 16223T, 16265T, 16465T”DOM15122 Africa “L3 (xM,N)” “16124C, 16223T, 16278T, 16362C, 16527T”DOM17077 Africa “L3 (xM,N)” “16209C, 16223T, 16292T, 16295T, 16311C, 16519C, 16527T”DOM18717 Africa “L3 (xM,N)” “16209C, 16223T, 16292T, 16295T, 16311C, 16519C, 16527T”DOM10169 Europe T2 “16126C, 16292T, 16294T, 16399G, 16519C”ECU09798 Native American A “16111T, 16223T, 16319A, 16362C, 16519C”ECU09809 Native American A “16111T, 16213A, 16223T, 16290T, 16319A, 16356C, 16362C”ECU10717 Native American A “16111T, 16223T, 16290T, 16319A, 16362C, 16519C”ECU12402 Native American A “16111T, 16189C, 16223T, 16290T, 16319A, 16519C”ECU17286 Native American A “16111T, 16223T, 16290T, 16319A, 16362C, 16519C”ECU10017 Native American B “16182C, 16183C, 16189C, 16217C, 16519C”ECU11035 Native American B “16168T, 16172C, 16182C, 16183C, 16189C, 16217C, 16519C”
Bryc et al. www.pnas.org/cgi/content/short/0914618107 5 of 9
Table S1. Cont.
Sample Prediction mtDNA HG HVR1_motif
ECU11036 Native American B “16168T, 16172C, 16182C, 16183C, 16189C, 16217C, 16519C”ECU17270 Native American B “16183C, 16189C, 16217C, 16519C”ECU19481 Native American B “16182C, 16183C, 16189C, 16217C, 16519C”ECU20655 Native American B “16183C, 16189C, 16217C, 16437C, 16519C”ECU24695 Native American B “16157C, 16183C, 16189C, 16217C, 16468C, 16519C, 16536A”ECU11419 Native American C “16183C, 16189C, 16223T, 16325C, 16327T”ECU13293 Native American C “16154C, 16223T, 16298C, 16325C, 16327T, 16519C”ECU20753 Native American C “16223T, 16298C, 16325C, 16327T, 16519C”ECU13765 Native American D “16189C, 16223T, 16325C, 16355T”ECU16094 Native American D “16223T, 16325C, 16362C, 16519C”ECU17592 Native American D “16142T, 16223T, 16325C, 16362C, 16519C”ECU20846 Native American D “16142T, 16207G, 16223T, 16325C, 16362C, 16519C”ECU16073 Africa “L3 (xM,N)” “16124C, 16223T, 16278T, 16362C, 16519C”PRT04522 Native American A “16111T, 16223T, 16234T, 16290T, 16319A, 16362C, 16519C”PRT10430 Native American A “16111T, 16223T, 16290T, 16319A, 16362C”PRT10785 Native American A “16111T, 16223T, 16290T, 16319A, 16362C”PRT14630 Native American A “16111T, 16223T, 16234T, 16290T, 16319A, 16362C, 16519C”PRT14724 Native American A “16083T, 16111T, 16223T, 16256T, 16290T, 16319A, 16362C”PRT16090 Native American A “16097C, 16098G, 16111T, 16189C, 16223T, 16319A, 16320T”PRT17383 Native American A “16111T, 16223T, 16290T, 16319A, 16362C”PRT22009 Native American A “16083T, 16111T, 16223T, 16256T, 16290T, 16319A, 16362C”PRT02987 Native American B “16092C, 16182C, 16183C, 16189C, 16217C, 16249C,
16312G, 16344T, 16519C”PRT04387 Native American B “16092C, 16182C, 16183C, 16189C, 16217C, 16249C,
16312G, 16344T, 16519C”PRT04510 Native American C “16086C, 16183C, 16189C, 16223T, 16278T, 16298C, 16325C, 16327T”PRT10572 Native American C “16086C, 16183C, 16189C, 16223T, 16278T, 16298C, 16325C, 16327T”PRT10742 Native American C “16223T, 16298C, 16325C, 16327T, 16519C”PRT10754 Native American C “16223T, 16298C, 16325C, 16327T, 16519C”PRT12663 Native American C “16086C, 16183C, 16189C, 16223T, 16278T, 16298C, 16325C, 16327T”PRT15501 Native American C “16086C, 16183C, 16189C, 16223T, 16278T, 16298C, 16325C, 16327T”PRT21237 Native American C “16223T, 16298C, 16325C, 16327T”PRT21343 Native American C “16223T, 16298C, 16325C, 16327T, 16519C”PRT21827 Native American C “16223T, 16298C, 16325C, 16327T, 16519C”PRT19673 Native American D “16223T, 16325C, 16362C”PRT22087 Europe H13a CRSPRT17472 Africa L1 “16129A, 16187T, 16189C, 16223T, 16230G, 16265C, 16278T,
16286A, 16294T, 16311C, 16519C, 16527T”PRT10760 Africa “L3 (xM,N)” “16124C, 16223T, 16300G, 16319A”PRT13354 Africa “L3 (xM,N)” “16111T, 16124C, 16223T, 16362C”PRT13360 Africa “L3 (xM,N)” “16223T, 16327T”PRT16206 Africa “L3 (xM,N)” “16124C, 16223T, 16256T, 16292T”PRT10923 Europe N1 “16145A, 16176A, 16223T, 16390A, 16519C”
Bryc et al. www.pnas.org/cgi/content/short/0914618107 6 of 9
Table
S2.
Ych
romoso
mehap
logroupmotifs,classification,an
dpredictions
Sample
Final
Ancestries
TK-O
rigin
(guess)
Distribution
Reg
ion
(guess)
HG-N
ewNomen
clature
DYS
19DYS
385-A
DYS
385-B
DYS
388
DYS3
89ABCD
DYS3
89CD
DYS
390
DYS
391
DYS
392
DYS
393
DYS
426
DYS
438
DYS
457
TAGA
(DYS4
39+4)
COL0
0983
MiddleEa
st“Middle
East,most
probab
ly-Jew
”
NorthAfrica_
Middle
Ease_E
urope
NorthAfrica
E-M12
313
1519
1230
1223
1011
1311
1014
16
COL0
4248
Native
American
NativeAmerican
(NAM)
American
America
Q-M
313
1316
1229
1221
1015
1412
1214
15
COL0
4356
Africa
Africa
WestA
frica
Africa
E-P1
1617
1712
3213
2110
1114
1111
1516
COL0
4362
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3014
2411
1313
1212
1515
COL0
4506
Europe
Europe
WesternEu
rope
Europe
R-M
269
1612
1412
3014
2411
1313
1212
1417
COL0
4595
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2411
1313
1212
1516
COL0
9416
Middle
East
Middle
East
SouthAsia
SouthAsia
L-M61
1416
1712
3014
2310
1412
1110
1615
COL0
9891
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3114
2412
1313
1212
1517
COL1
0678
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2411
1313
1212
1416
COL1
1065
Africa
Africa
WestA
frica
Africa
E-P1
1617
1912
2913
2110
1115
1111
1415
COL1
1917
Europe
Europe
WesternEu
rope
Europe
R-M
269
1511
1412
2913
2411
1313
1212
1515
COL1
2392
Middle
East
“Eu
rope,
Middle
East”
Africa_
may
beEu
rope
Africa
“E-M35
(xM78
,M81
,M12
3)”
1312
1512
3113
228
1113
1110
1417
COL1
2551
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2812
2411
1313
1212
1516
COL1
3175
Europe
“Eu
rope,
most
probab
lySp
ain”
NEa
frica_
Europe_
MiddleEa
stNorthAfrica
E-M78
1316
1612
3013
2410
1113
1110
1415
COL1
4057
North
Africa
“NorthAfrica,
Middle
East”
NorthAfrica_
Europe
NorthAfrica
E-M81
1313
1412
3014
249
1113
1110
1414
COL1
4721
Europe
Europe
MiddleEa
st_E
urope_
NorthAfrica
Europe
G-P15
1514
1413
2912
2210
1114
1110
1616
COL1
5717
MiddleEa
stMiddle
East
MiddleEa
st_
NorthAfrica
NorthAfrica
J-P5
814
1418
1630
1324
1011
1211
1014
15
COL1
6011
NorthAfrica
“NorthAfrica,
Middle
East”
NorthAfrica_
Europe
NorthAfrica
E-M81
1313
1412
3014
249
1113
1110
1414
COL1
7450
Europe
Europe
WesternEu
rope
Europe
R-M
269
1410
1412
3114
2411
1313
1212
1516
COL1
7586
Europe
Europe
WesternEu
rope
Europe
R-M
269
NA
1114
NA
2812
2410
1313
1212
1516
COL1
8861
NorthAfrica
“NorthAfrica,
Middle
East”
NEa
frica_
Europe_
MiddleEa
stNorthAfrica
E-M78
1417
1812
2912
2410
1113
1110
1414
COL1
8873
Native
American
Native
American
(NAM)
American
America
Q-M
313
1415
1229
1323
1014
1312
1114
15
COL1
8876
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2812
2411
1312
1212
1514
COL2
4360
Europe
Europe
WesternEu
rope
Europe
R-M
269
1412
1412
2913
2411
1313
1211
1516
COL2
4480
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1512
3014
2411
1313
1212
1516
COL2
5102
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2411
1313
1212
1417
DOM10
169
Europe
Europe
Europe
Europe
I-M22
316
1415
1329
1223
1012
1411
1015
14DOM10
722
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2411
1313
1212
1516
DOM10
727
Europe
Europe
WesternEu
rope
Europe
R-M
269
1412
1312
3013
2411
1313
1211
1515
DOM10
729
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3014
2411
1314
1212
1516
DOM10
733
Native
American
NativeAmerican
(NAM)
American
America
Q-M
313
1618
1231
1324
1116
1312
1014
16
DOM10
751
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1413
2913
2411
1313
1212
1517
Bryc et al. www.pnas.org/cgi/content/short/0914618107 7 of 9
Table
S2.Cont.
Sample
Final
Ancestries
TK-O
rigin
(guess)
Distribution
Reg
ion
(guess)
HG-N
ewNomen
clature
DYS
19DYS
385-A
DYS
385-B
DYS
388
DYS3
89ABCD
DYS3
89CD
DYS
390
DYS
391
DYS
392
DYS
393
DYS
426
DYS
438
DYS
457
TAGA
(DYS4
39+4)
DOM10
752
Europe
Europe
(Italy?)
NEa
frica_
Europe_
MiddleEa
stNorthAfrica
E-M78
1316
1712
3114
2410
1113
1110
1416
DOM10
766
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1312
3014
2411
1313
1212
1515
DOM10
855
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2411
1312
1212
1616
DOM10
857
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2311
1313
1212
1516
DOM10
858
Europe
Europe
WesternEu
rope
Europe
R-M
269
1611
1612
2913
2411
1313
1112
1516
DOM10
865
Europe
Europe
WesternEu
rope
Europe
R-M
269
1212
1412
2913
2410
1313
1212
1515
DOM10
876
Europe
Europe
Europe
Europe
I-P3
716
1112
1328
1322
1111
1312
1015
17DOM10
980
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3014
2411
1313
1212
1415
DOM10
984
Europe
Europe
MiddleEa
st_E
urope_
NorthAfrica
Europe
G-P15
1513
1412
2912
2311
1014
119
1616
DOM10
987
MiddleEa
stMiddle
East
NorthAfrica_
Middle
Ease_E
urope
North
Africa
E-M12
313
1617
1232
1325
1011
1311
814
16
DOM10
990
Africa
Africa
WestA
frica
Africa
E-P1
1718
1912
3013
2110
1215
1111
1417
DOM11
033
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2412
1313
1212
1516
DOM11
034
Europe
Europe
WesternEu
rope
Europe
R-M
269
1511
1412
3114
2411
1313
1212
1516
DOM11
038
Europe
Europe
WesternEu
rope
Europe
R-M
269
1512
1312
2913
2411
1313
1212
1416
DOM11
098
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2914
2411
1313
1212
1516
DOM11
113
Europe
Europe
MiddleEa
st_E
urope_
NorthAfrica
Europe
G-P15
1613
1312
3012
2210
1114
1111
1516
DOM15
122
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2510
1314
1212
1515
DOM17
077
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3013
2411
1313
1212
1516
DOM17
258
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1512
2913
2410
1313
1212
1515
DOM18
717
Africa
Africa
WestA
frica
Africa
E-P1
1717
1712
2913
2111
1115
1111
1316
ECU09
798
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3014
2411
1313
1212
1516
ECU09
809
Europe
Europe
Europe
Europe
I-P3
716
1212
1328
1323
1011
1311
1015
15EC
U10
017
Europe
Europe
WesternEu
rope
Europe
R-M
269
1311
1412
2913
2211
1313
1212
1516
ECU10
717
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2410
1413
1212
1416
ECU11
035
MiddleEa
stMiddle
East
NEa
frica_
Europe_
MiddleEa
stNorth
Africa
E-M78
1316
1612
3013
2411
1113
1110
1417
ECU11
036
MiddleEa
stMiddle
East
NEa
frica_
Europe_
MiddleEa
stNorth
Africa
E-M78
1316
1612
3013
2411
1113
1110
1417
ECU11
419
MiddleEa
st“Middle
East,most
probab
ly-Jew
”
NEa
frica_
Europe_
MiddleEa
stNorth
Africa
E-M78
1314
1612
2913
2310
1113
1110
1416
ECU12
402
MiddleEa
stMiddle
East
MiddleEa
st_N
orth
Africa
North
Africa
J-P5
814
1318
1530
1423
1011
1311
1014
16
ECU13
293
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1312
3014
2511
1413
1212
1515
ECU13
765
Europe
Europe
WesternEu
rope
Europe
R-M
269
1511
1412
2913
2511
1313
1212
1516
ECU16
073
Europe
Europe
Europe
Europe
I-P3
717
1212
1328
1323
1011
1311
1015
15EC
U16
094
Europe
Europe
WesternEu
rope
Europe
R-M
269
1410
1312
3014
2411
1313
1212
1416
ECU17
270
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1512
2913
2411
1313
1212
1516
ECU17
286
MiddleEa
st“Eu
rope,Middle
East”
Europe_
MiddlEast_
NorthAfrica
Europe
“J-M41
0(xM
67,
M32
2)”
1313
1615
2913
2310
912
119
1515
ECU17
592
Native
American
NativeAmerican
(NAM)
American
America
Q-M
314
1414
1231
1325
1014
1312
1114
16
Bryc et al. www.pnas.org/cgi/content/short/0914618107 8 of 9
Table
S2.Cont.
Sample
Final
Ancestries
TK-O
rigin
(guess)
Distribution
Reg
ion
(guess)
HG-N
ewNomen
clature
DYS
19DYS
385-A
DYS
385-B
DYS
388
DYS3
89ABCD
DYS3
89CD
DYS
390
DYS
391
DYS
392
DYS
393
DYS
426
DYS
438
DYS
457
TAGA
(DYS4
39+4)
ECU19
481
NorthAfrica
“NorthAfrica,
Middle
East”
NorthAfrica_
Europe
NorthAfrica
E-M81
1313
1412
3014
249
1113
1110
1414
ECU20
655
Europe
Europe
Europe
Europe
I-P3
716
1212
1428
1323
1111
1311
1015
15EC
U20
753
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
14NA
3014
2410
1413
1212
1415
ECU20
846
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1512
2913
2410
1313
1212
1515
ECU24
695
NativeA
merican
NativeAmerican
(NAM)
American
America
Q-M
346(xM
3)13
1417
1229
1324
1014
1312
1114
15
PRT0
2987
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1512
2913
2310
1313
1212
1516
PRT0
4387
Europe
Europe
NEa
frica_
Europe_
MiddleEa
stNorthAfrica
E-M78
1317
1812
2912
2510
1112
1110
1415
PRT0
4510
Africa
Africa
WestA
frica
Africa
E-P1
1416
1812
3114
2111
1113
1111
1415
PRT0
4522
NorthAfrica
“NorthAfrica,
Middle
East”
NorthAfrica_
Europe
NorthAfrica
E-M81
1312
1412
3014
249
1113
1110
1414
PRT1
0430
Europe
Europe
WesternEu
rope
Europe
R-P25
(xP2
97)
1612
1414
3014
2311
1313
1211
1515
PRT1
0572
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2511
1313
1212
1515
PRT1
0742
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2410
1313
1212
1617
PRT1
0754
Africa
Africa
WestA
frica
Africa
E-P1
1516
1712
3113
2111
1113
1111
1415
PRT1
0760
MiddleEa
stMiddle
East
Europe_
MiddlEast_
NorthAfrica
Europe
“J-M67
(xM92
,M32
2)”
1414
1515
3013
2310
1113
1110
1515
PRT1
0785
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2813
2311
1313
1211
1515
PRT1
2663
Africa
Africa
WestA
frica
Africa
E-P1
1717
1912
2913
2110
1115
1111
1316
PRT1
3354
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2410
1313
1212
1516
PRT1
3360
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1512
3014
2511
1313
1212
1417
PRT1
4630
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3114
2510
1313
1212
1416
PRT1
4724
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2711
1313
1212
1516
PRT1
5501
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
3014
2411
1313
129
1416
PRT1
6090
Africa
Africa
WestA
frica
Africa
E-P1
1718
1812
3114
2110
1113
1111
1316
PRT1
6206
Europe
Europe
WesternEu
rope
Europe
R-M
269
1511
1412
2914
2411
1313
1212
1516
PRT1
7383
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2411
1313
1212
1515
PRT1
7472
Europe
Europe
Europe
Europe
I-P3
716
1112
1329
1423
1111
1311
1014
16PR
T196
73Eu
rope
Europe
WesternEu
rope
Europe
R-M
269
1511
1312
3013
2411
1413
1212
1516
PRT2
1237
Europe
Europe
Europe
Europe
I-P3
716
1112
1329
1423
1111
1311
1014
16PR
T213
43Eu
rope
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2711
1313
1212
1516
PRT2
1827
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1412
2913
2410
1313
1212
1617
PRT2
2009
Europe
Europe
WesternEu
rope
Europe
R-M
269
1411
1312
2913
2411
1313
1212
1516
PRT2
2087
Africa
Africa
WestA
frica
Africa
E-P1
1616
1714
3012
2110
1114
1110
1415
Bryc et al. www.pnas.org/cgi/content/short/0914618107 9 of 9