the genetic assimilation in language borrowing inferred from ......and tai-kadai speaking zhuang...
TRANSCRIPT
R E S E A R CH AR T I C L E
The genetic assimilation in language borrowinginferred from Jing People
Xiufeng Huang1* | Qinghui Zhou1* | Xiaoyun Bin1 | Shu Lai1 | Chaowen Lin1 |
Rong Hu2,3 | Jiashun Xiao4 | Dajun Luo4 | Yingxiang Li4 | Lan-Hai Wei5 |
Hui-Yuan Yeh6 | Gang Chen4 | Chuan-Chao Wang2,3
1College of Basic Medical Sciences, Youjiang
Medical University for Nationalities, Baise,
Guangxi 533000, China
2Department of Anthropology and
Ethnology, Xiamen University, Xiamen
361005, China
3International Medical Anthropology Team,
Xiamen University, Xiamen 361005, China
4WeGene, Shenzhen 518040, China
5Institut National des Langues et
Civilisations Orientales, Paris 75214, France
6School of Humanities, Nanyang
Technological University, Nanyang 639798,
Singapore
Correspondence
Chuan-Chao Wang, No. 422, Siming South
Road, Xiamen, Fujian, China.
Email: [email protected]
and
Xiufeng Huang, No. 98, Chengxiang Road,
Baise, Guangxi, China.
Email: [email protected]
Funding information
The Construction Project for Promoting
Technological Innovation of Guangxi
Universities for the Laboratory of Physical
Characteristics of Guangxi Minorities,
Grant Number: Gui[2015]5; Nanqiang
Outstanding Young Talents Program of
Xiamen University
Abstract
Objectives: The Jing people are a recognized ethnic group in Guangxi, southwest China, who are
the immigrants from Vietnam during the 16th century. They speak Vietnamese but with lots of lan-
guage borrowings from Cantonese, Zhuang, and Mandarin. However, it’s unclear if there is large-
scale gene flow from surrounding populations into Jing people during their language change due to
the very limited genetic information of this population.
Materials and Methods: We collected blood samples from 37 Jing and 3 Han Chinese individuals
from Wanwei, Shanxin, and Wutou islands in Guangxi and genotyped about 600,000 genome-
wide single nucleotide polymorphisms (SNPs). We used Principal Component Analysis (PCA),
ADMIXTURE analysis, f statistics, qpWave and qpAdm to infer the population genetic structure and
admixture.
Results: Our data revealed that the Jing people are genetically similar to the populations in south-
west China and mainland Southeast Asia. But compared with Vietnamese, they show significant
evidence of gene flow from surrounding East Asians. The admixture proportion is estimated to be
around 35–42% in different Jing groups using southern Han Chinese as a proxy. The majority of
the paternal lineages of Jing people are most likely from surrounding East Asians.
Discussion: We conclude that the formation and language change of present-day Jing people
have involved genetic assimilation of surrounding East Asian populations. The language borrowing,
in this case, is not only a cultural phenomenon but has involved demic diffusion.
K E YWORD S
gene flow, Jing people, language borrowing, population admixture
1 | INTRODUCTION
The Jing people form a relatively small population that lives mostly in
the three islands of Wutou, Wanwei, and Shanxin on the southern tip
of Guangxi off the southwestern coast of mainland China. They are
officially recognized as one of the 56 ethnic groups in China. The his-
torical literature records the ancestor of Jing people migrated from
northern Vietnam to southwest China at the beginning of the 16th
century (Olson, 1998). They are now living among the Han Chinese
and Tai-Kadai speaking Zhuang people in nearby counties and towns.
The language that Jing people speak is similar to Vietnamese but with a
large number of variations, which are not only observed in pronuncia-
tion and vocabulary but also in grammars. For example, the Jing lan-
guage has adopted a tonal system from Tai-Kadai languages (Wei,
2006).
The special population history makes Jing people as a very good
example to investigate the relationships of language borrowing and
genetic influence from surrounding populations. Forster and Renfrew*Xiufeng Huang and Qinghui Zhou contributed equally to this work.
638 | VC 2018Wiley Periodicals, Inc. wileyonlinelibrary.com/journal/ajpa Am J Phys Anthropol. 2018;166:638–648.
Received: 9 November 2017 | Revised: 10 January 2018 | Accepted: 15 February 2018
DOI: 10.1002/ajpa.23449
proposed that the language change in an already-populated region may
require a minimum proportion of immigrant males as reflected in the
strong association between languages with paternal Y chromosomes
but not with maternal Mitochondrial DNA (mtDNA) (Forster and
Renfrew, 2011). Therefore, it is expected that both Vietnamese and
local populations may have contributed to the gene pool of modern
Jing people, especially from perspective of Y-chromosome. A striking
feature of the Y chromosomal profile of Vietnamese is the high fre-
quency of haplogroup O2b-M176 (the updated name O1b2-M176)
ranging from 8.3 to 14%, which is mainly found among Japanese and
Korean people (Jin et al., 2003; Kim et al., 2011). The Y chromosomal
haplogroup frequencies of Vietnamese (Kim et al., 2011) and Kinh Viet-
namese (Poznik et al., 2016; https://www.yfull.com/tree/) are summar-
ized in Table 1. Zhong et al. genotyped the Y chromosomal SNPs of 45
Jing male individuals and found 91% belonging to East and Southeast
Asian specific haplogroup O-M175, 4.4% belonging to C3-M217, 2.2%
belonging to South Asian specific lineage H1-M370, and 2.2% belong-
ing to Q1a1a-M120 (Zhong et al., 2011). The Y chromosomal profile
suggests Jing people may have some South Asian admixture, but it
could not be able to tease apart the sources of Vietnamese and sur-
rounding East Asian ancestry without genotyping downstream makers
of the basal East and Southeast Asian clade O-M175.
On the maternal side, Li et al. reported the Kinh Vietnamese have
high frequencies of maternal mtDNA haplogroup B (B*, 4.2%; B4,
10.5%; B5, 6.3%), F (F*, 10.4%; F1a, 18.8%), M7b (M7b*, 8.3%; M7b1,
8.3%), and R (R*, 2.1%; R9b, 6.3%) (Li et al., 2007). Pischedda et al.
reported the typical Southeast Asian mtDNA lineages predominate in
the Vietnamese, for example, haplogroup M7 comprises 20%, M(3D,C)
comprises 29%, R9’F reaches 27%, and haplogroup B accounts for 25%
(Pischedda et al., 2017). Pischedda et al. also proposed the Vietnamese
are an admixture of an East Asian component from south China and a
southern Asian ancestral composite represented by the Malay 800
years ago using genomic data (Pischedda et al., 2017). Although a few
genome-wide studies of Vietnamese have been conducted so far, the
genomic and mtDNA data for Jing people are never reported before.
In this study, we report the genome-wide data of Jing people for
the first time. We analyze about 600,000 genome-wide SNPs including
18963 Y-chromosome and 4448 mtDNA phylogenetic relevant SNPs
from 40 samples collected from three Jing groups and one Han Chinese
group from the three islands of Wutou, Wanwei, and Shanxin in south-
ern Guangxi. We aim to explore the genetic structure and admixture of
Jing populations and shed light on the understanding of language
changes from a genetic perspective.
2 | MATERIALS AND METHODS
2.1 | Sampling and genotyping
We collected blood samples from 24 unrelated Jing and 3 Han Chinese
individuals from Wanwei, 8 Jing individuals from Wutou, and 5 Jing
TABLE 1 The Y chromosomal haplogroup frequency of Vietnamese and Kinh
Vietnamese Kinh
Haplogroup SNP Frequency Haplogroup SNP Frequency
O2* M122 2.08% O1b1a1a1a1a M88 26.10%
O2a* M324 29.17% O2a2a1a2 M7 13.00%
O2a2 P201 27.08% O2a1c1a F11 8.70%
C2 M217 12.50% O1b1a1a1b M1283 6.50%
O1a M119 4.17% O2a2b1a1a3a F2188 4.30%
O1b P31 10.42% O2a2b1a1a4a CTS5063 2.20%
O1b2a1a1 47z 4.17% O2a2b1a1a6 CTS1642 2.20%
D1 M15 2.08% O1a1a1 F446 (xF140) 4.30%
K* M9 2.08% O2a1 L127.1 4.30%
N M231 2.08% O2a2b1a2a1a2 F634 4.30%
Q1a1a M120 4.30%
O2a2b2a F871 2.20%
O2a2b1a2b F1725 2.20%
O2a2b1a2a1a1 F48 2.20%
N1b2a M1811 2.20%
N1a2a M128 2.20%
C2c1b F845 2.20%
F Y27277 2.20%
HUANG ET AL. | 639
individuals from Shanxin. The criteria for the sample collecting are that
the people have lived there and don’t have recorded intermarriages
with other surrounding populations for at least three generations. Our
study was approved by the Ethical Committee of Youjiang Medical Uni-
versity for Nationalities. The study was conducted in accordance with
the human and ethical research principles of Youjiang Medical Univer-
sity for Nationalities. Informed consent was obtained from all individual
participants included in the study. Genomic DNA was extracted using
DP-318 Kit (Tiangen Biotechnology, Beijing). The DNA quality control
was carried out at the experimental centre of BGI-Shenzhen. Genotyp-
ing was performed on the Affymetrix WeGene V1 Arrays covering
596,744 SNPs at the WeGene genotyping centre, Shenzhen. The
WeGene V1 arrays were designed to identify all known paternal Y-
chromosome and maternal mtDNA lineages by adding 18963 Y-
chromosome and 4448 mtDNA phylogenetic relevant SNPs to the
Infinium Global Screening Array (GSA) (Yao et al., 2017a,b). The
dataset generated during the current study can be downloaded by the
following link when the paper is published: http://pan.xmu.edu.cn/s/
f4THZOEvSGs.
2.2 | Data merging
We merged our 40 samples with previously published populations
from International HapMap Project Phase 3 (International HapMap
Consortium, 2003), Human Genome Diversity Project (HGDP) (Li et al.,
2008), Simons Genome Diversity Project (SGDP) (Mallick et al., 2016),
and Vietnamese and Thai samples of the Asian Diversity Project (ADP)
(Liu et al., 2017), Tibetan samples from Lhasa and Yunnan province
(Beall et al., 2010; Wang et al., 2011), Archaic Altai Neanderthal (Pr€ufer
et al., 2014) and Denisovan genomes (Meyer et al., 2012), the 40,000-
year-old Tianyuan sample (Yang et al., 2017), and ancient West Eura-
sians (Jones et al., 2015; Lazaridis et al., 2014; Mathieson et al., 2015).
We finally generated a combined dataset covering 280,950 SNPs that
were used in subsequent analysis.
2.3 | Principal component analysis
We used smartpca (version: 13050), part of the EIGENSOFT package
(Patterson, Price, & Reich, 2006) to carry out Principal Component
Analysis (PCA). We did not perform any outlier removal iterations
(numoutlieriter: 0). We set all other options to the default. We assessed
statistical significance with a Tracy-Widom test using the twstats
program of EIGENSOFT. All the first six principal components that we
discuss and plot in what follows were highly statistically significant
(P<10212).
2.3.1 | f3-statistics
We computed statistics of the form f3 (Mbuti; X, Y) using the qp3Pop
program of ADMIXTOOLS (Patterson et al., 2012; Reich, Thangaraj,
Patterson, Price, & Singh, 2009), which measure the shared genetic
drift between populations X and Y since their separation from an Afri-
can outgroup Mbuti.
2.3.2 | f4-statistics
We computed f4-statistics of the form f4(X, Y; Test, Outgroup) using
the qpDstat program of ADMIXTOOLS (Patterson et al., 2012; Reich
et al., 2009) with default parameters to show if population Test is sym-
metrically related to X and Y or shares an excess of alleles with either
of the two, with standard errors computed with a block jackknife.
2.3.3 | qpWave analysis
We used the qpWave program of ADMIXTOOLS (Patterson et al.,
2012; Reich et al., 2009) with default parameters to test the number of
sources of ancestry that are needed to explain the variation of Jing and
Vietnamese populations. We used Mbuti, Druze, Bedouin, Kalash,
Papuan, Sardinian, Karitiana, Onge, Ulchi, and Eskimo_Sireniki as out-
groups because those groups are unlikely to have been affected by
recent gene flow with Jing and Vietnamese and might be differentially
related to the ancestral sources of Jing.
2.3.4 | qpAdm estimation
We used the qpAdm program of ADMIXTOOLS (Patterson et al., 2012;
Reich et al., 2009) with default parameters to estimate the admixture
proportions of tested populations with the proposed sources. We used
Vietnamese and Han as two sources and the same ten populations as
outgroups as in the above qpWave analysis.
2.4 | ADMIXTURE analysis
We carried out model-based clustering analysis using ADMIXTURE
1.30 (Alexander, Novembre, & Lange, 2009) by combining the present-
day worldwide populations with our 40 newly-genotyped individuals.
We used PLINK v1.90 (Chang et al., 2015) to thin the dataset of
280,950 autosomal SNPs to remove SNPs in strong linkage disequili-
brium, employing a window of 200 SNPs advanced by 25 SNPs and an
r2 threshold of 0.4 (with the flag: –indep-pairwise 200 25 0.4). A total
of 157,880 SNPs remained for analysis after this procedure. We ran
ADMIXTURE with default 5-fold cross-validation (–cv55), varying the
number of ancestral populations between K52 and K516 in 100
bootstraps with different random seeds. We used the unsupervised
ADMIXTURE approach, in which allele frequencies for non-admixed
ancestral populations are unknown and are computed during the analy-
sis. We used point estimation and terminated the block relaxation algo-
rithm when the objective function delta <0.0001. We chose the best
run according to the highest log likelihood. We used cross-validation to
identify an “optimal” number of clusters. We observed the lowest CV
errors for K514.
2.5 | Y chromosomal and mtDNA
haplogroup assignment
We assign the Y chromosomal and mtDNA haplogroups using in-house
tools following the International Society of Genetic Genealogy Y-DNA
Haplogroup Tree 2016, Version: 1.87, Date: March 29, 2016, http://
www.isogg.org/tree/ March 30, 2016; and mtDNA tree Build 16 (van
Oven and Kayser, 2009), http://www.phylotree.org/.
640 | HUANG ET AL.
3 | RESULTS
We first carried out the model-based ADMIXTURE clustering analysis
to get a broad overview of the worldwide population genetic structure.
We used cross-validation to identify an “optimal” number of clusters
and observed the lowest CV errors for K514 (Figure 1, Supporting
Information Figure S1). At K514, we observed five main components
in East Eurasia. One of these components is enriched in Melanesian
FIGURE 1 The ADMIXTURE analysis of newly generated Jing and Han Chinese samples from Guangxi with other worldwide populations.We here only show the East Eurasian part of the plot at K514 with the lowest CV errors
FIGURE 2 Shared genetic drift among populations, measured by Outgroup f3 statistics (Mbuti; X, Y). Lighter colors indicate more shareddrift
HUANG ET AL. | 641
and Papuan but seldom seen in any other populations. The second
component is enriched in Yakut and also shown in Tungusic and Mon-
golic speaking groups in northern China. The third component is
enriched in Tibetans and also found prevalent in Han Chinese and
Tibeto-Burman speaking groups. The fourth component was found to
be at highest proportions in the populations living in south China and
Southeast Asia, such as Tai-Kadai speaking Dai and Thai, Austroasiatic
speaking Vietnamese, and Austronesian speaking Ami and Atayal. Our
newly genotyped Jing samples are genetically similar with Dai and
Vietnamese with high proportions of the above fourth component. The
Han Chinese samples collected from Wanwei fall out of the general
clustering pattern of Han Chinese but show great similarity with the
Jing people. The fifth component is enriched in Japanese and also pres-
ent in various groups in East Asia. The Thai and Cambodian also have
some of the component that is enriched in Gujarati Indians (GIH), but
Jing samples don’t seem to have this ancestry. The outgroup f3-statis-
tics of the form f3 (Mbuti; X, Y) are consistent with the patterns
observed in the above ADMIXTURE plot suggesting a close genetic
FIGURE 3 Top two principal components of newly generated Jing and Han Chinese samples from Guangxi with other East Asianpopulations. CHB: Han Chinese in Beijing, China; CHD: Chinese in metropolitan Denver, CO, United States; JPT: Japanese in Tokyo, Japan;Han—NChina: Han Chinese in northern China
642 | HUANG ET AL.
proximity between different Jing groups and other southern China and
Southeast Asian populations, especially Ami, Atayal, Dai, and Vietnam-
ese (Figure 2).
We next performed PCA using East and Southeast Asian popula-
tions. We observed the following five genetic clusters: Japanese;
Tungusic and Mongolic-speaking groups; Tibetans; Han Chinese; and
Southeast Asians including Cambodian and Thai. The Vietnamese are
plotted between the Han Chinese and Southeast Asian cluster. Our
newly reported Jing samples are laying right between Vietnamese and
Han Chinese in PC1 and PC2 (Figure 3). In PC3 and PC4, most of the
Jing samples overlap with Han Chinese (Figure 4). The PCA plots sug-
gest there is an excess affinity between Jing and Han Chinese com-
pared with Vietnamese.
We then used qpWave to test if Vietnamese and Jing are homoge-
neous or not by determining the number of sources of ancestry that
are needed to explain the variation of Vietnamese and Jing populations.
When the Vie Vietnamese and three Jing populations are analyzed
together as the tested groups, a minimum of two streams of ancestry
are needed to relate them to the outgroups: P51.342e-22 for rank 0
which amounts to a test for a single source of ancestry; P50.610 for
FIGURE 4 The third and fourth principal components of newly generated Jing and Han Chinese samples from Guangxi with other EastAsian populations. CHB: Han Chinese in Beijing, China; CHD: Chinese in metropolitan Denver, CO, United States; JPT: Japanese in Tokyo,Japan; Han—NChina: Han Chinese in northern China
HUANG ET AL. | 643
rank 1 which amounts to a test for two streams of ancestry. We next
dropped each outgroup to determine if this pattern is driven by a cer-
tain outgroup with extra affinity to the tested populations. The single
source of ancestry is rejected in all the outgroup-dropping test (Table
2). Thus, the qpWave analysis strongly suggest that Vietnamese and
Jing are not derived from a single homogeneous population.
Motivated by the PCA and qpWave analysis, we continue to do f4-
statistics in the form of f4 (Vietnamese, Mbuti; Jing, Test), f4 (Test,
Mbuti; Jing, Vietnamese), and f4 (Jing, Mbuti; Vietnamese, Test) to for-
mally determine which populations attract Jing people compared with
Vietnamese (Table 3). We found Vietnamese share significant more
alleles with Jing than with many other northern East Asian groups but
seems to be equally related to Jing and southern groups, such as Dai,
She, and Atayal. Most East Asian groups share significant more alleles
with Jing than with Vietnamese, suggesting there might be gene flow
from other East Asians into Jing people after their separation from
Vietnamese about 500 years ago. This signal is not driven by excess
affinity between Vietnamese with Archaic Humans, Upper Paleolithic
East Asian, West Eurasians, South Asians, and Oceanians, since we
have not found any significant negative Z-scores in f4 (Test, Mbuti;
Jing, Vietnamese) when using Altai Neandertal, Denisovan, Tianyuan,
Loschbour, Anatolia_Neolithic, Kotias, French, Balochi, Sindhi, Burusho,
GIH, Onge and Papuan in the position of “Test” (Table 3). The Jing peo-
ple share significant more alleles with Han (southern Han Chinese from
HGDP), CHD, She, Dai, and Ami than with Vietnamese, which implies
the East Asian related gene flow into Jing people was probably from
southern Han Chinese and other southern indigenous populations.
TABLE 2 The P values of the qpWave analysis suggest Vietnameseand Jing are not derived from a single homogeneous population
P value
Outgroup dropping 1 2
No drop 1.342E-22 0.610
Mbuti 6.222E-22 0.854
Druze 7.560E-23 0.468
Bedouin 1.539E-23 0.543
Kalash 2.024E-22 0.734
Papuan 8.499E-24 0.508
Sardinian 3.060E-23 0.618
Karitiana 2.078E-22 0.485
Onge 1.665E-21 0.768
Ulchi 2.063E-15 0.480
Eskimo_Sireniki 4.625E-19 0.498
TABLE 3 The Z scores of f4 (Test, Mbuti; Jing, Vietnamese), f4 (Jing, Mbuti; Vietnamese, Test) and f4 (Vietnamese, Mbuti; Jing, Test)
Test f4 (Test, Mbuti; Jing, Vietnamese) f4 (Jing, Mbuti; Vietnamese, Test) f4 (Vietnamese, Mbuti; Jing, Test)
Wanwei Shanxin Wutou Wanwei Shanxin Wutou Wanwei Shanxin Wutou
CHB 9.445 3.787 6.476 2.488 4.072 4.198 9.694 5.670 8.463
CHD 9.128 4.059 6.478 25.614 23.925 23.471 4.330 2.176 4.347
Han-NChina 8.740 4.209 6.112 8.135 7.643 8.670 12.901 9.254 12.138
Tujia 8.702 3.818 6.191 20.712 20.084 0.177 4.915 3.237 5.015
Han 8.664 4.145 6.173 26.608 25.696 24.634 2.431 1.145 2.923
JPT 8.656 3.518 6.265 12.120 12.500 12.729 16.022 10.661 14.044
Mongola 8.439 3.191 5.947 21.577 20.165 21.801 24.548 18.945 22.454
Hezhen 8.357 3.204 6.178 14.031 14.072 13.969 17.953 13.931 16.259
Japanese 8.348 3.323 6.103 9.699 9.832 10.056 13.628 9.759 12.614
Xibo 8.280 2.997 5.000 18.346 18.800 19.108 22.609 16.980 20.256
Daur 8.176 3.277 5.591 16.234 15.890 16.554 19.748 15.796 18.827
Tu 8.172 4.197 5.279 21.747 19.555 22.160 24.924 18.680 22.799
She 8.139 3.854 5.725 24.488 23.734 23.266 1.304 0.594 2.024
Tibetan_Yunnan 8.107 3.357 5.717 21.609 19.989 21.171 24.511 17.095 20.482
Oroqen 8.077 3.785 5.899 16.867 15.611 16.566 20.013 16.224 18.417
Tibetan_Lhasa 7.763 3.590 5.921 27.183 24.430 25.809 29.926 21.856 25.502
Miao 7.683 3.144 5.818 22.168 21.258 21.545 2.983 1.817 3.318
(Continues)
644 | HUANG ET AL.
We estimated the proportion of the ancestry derived from sur-
rounding East Asian populations in Jing people. We used Vietnamese
and the southern Han Chinese from HGDP as two proxies of the possi-
ble ancestral sources. The Jing people are suggested to have derived
35.5% to 43.0% of ancestry from southern Han related populations
(Table 4). The Jing people in Shanxin island have the lowest Han related
admixture, while the people in Wanwei and Wutou have higher pro-
portions of Han related component. The observation is consistent with
geographic information that Shanxin is an isolated island with only Jing
people living there, but Wanwei and Wutou are connected with the
mainland since the 1960s with Jing people living together with Han
and Zhuang people. The Han Chinese samples collected from Wanwei
are suggested to have the highest Han related ancestry at about 53.7%
and the left 46.3% Vietnamese related ancestry (Table 4).
4 | DISCUSSION
The Jing people are a relatively small population living mostly in the
three islands of Wutou, Wanwei, and Shanxin in southern Guangxi
after they separated from Vietnamese and migrated to China about
500 years ago. Their language has changed a lot compared with Viet-
namese by borrowing lots of words, constructions and other features
from surrounding Cantonese, Zhuang, and Mandarin languages. How-
ever, it’s unclear if their language change is accompanied by gene flow
TABLE 3 (Continued)
Test f4 (Test, Mbuti; Jing, Vietnamese) f4 (Jing, Mbuti; Vietnamese, Test) f4 (Vietnamese, Mbuti; Jing, Test)
Wanwei Shanxin Wutou Wanwei Shanxin Wutou Wanwei Shanxin Wutou
Naxi 7.620 3.743 5.199 10.861 10.274 11.483 14.224 11.018 13.547
Yakut 7.567 2.488 4.611 38.188 35.768 36.803 39.817 31.699 35.645
Yi 7.551 3.081 5.895 11.292 10.972 11.213 15.095 10.929 13.662
Dai 7.073 3.441 6.166 24.353 24.064 24.348 0.619 0.104 1.438
Atayal.SGDP 5.806 1.941 4.507 21.591 20.845 21.503 0.557 0.346 0.892
Thai 4.980 2.642 4.964 44.025 40.492 40.981 29.330 17.251 22.902
Lahu 4.879 2.716 4.825 6.111 5.180 5.496 8.413 6.378 8.230
Ami.SGDP 4.875 2.709 3.754 25.112 24.802 24.516 22.981 22.773 22.383
Cambodian 2.936 1.185 3.072 22.964 21.075 22.058 21.380 15.741 19.675
Altai Neandertal 0.653 20.644 20.698 100.000 100.000 100.000 100.000 100.000 100.000
Denisovan 0.640 0.098 20.239 100.000 100.000 100.000 100.000 100.000 100.000
Loschbour 21.117 21.110 1.151 43.816 41.703 42.169 44.221 42.809 44.169
Anatolia_Neolithic 21.408 21.648 0.858 66.542 63.916 64.900 65.829 63.464 65.420
Kotias 21.550 21.566 20.438 45.420 43.674 44.484 45.729 44.132 45.884
Tianyuan 1.238 20.096 1.244 27.729 26.605 27.772 28.567 27.468 29.021
Onge.DG 20.315 21.937 1.586 39.445 38.259 38.064 40.340 37.736 39.588
Papuan 1.026 20.372 0.980 41.744 40.688 41.403 42.628 41.003 42.788
Balochi 20.787 21.632 0.780 71.888 71.181 70.318 70.833 67.538 69.835
Sindhi 20.200 21.770 1.242 71.990 70.875 70.546 70.945 66.791 69.608
Burusho 0.760 20.707 1.616 69.201 67.451 67.383 68.683 63.496 66.557
GIH 20.140 21.530 1.205 68.723 67.702 67.254 67.100 62.974 65.846
French 20.448 21.222 1.248 66.323 63.970 64.861 66.009 62.988 65.077
We highlight the significant positive (Z>3) and negative (Z<23) values.
TABLE 4 Mixture proportions estimated in qpAdm using Vietnam-ese as one source population and southern Han Chinese as theother
Proportion
Test population P Vietnamese Han Std. err
Jing_Wanwei 0.151 0.570 0.430 0.046
Jing_Shanxin 0.140 0.645 0.355 0.080
Jing_Wutou 0.307 0.632 0.368 0.061
Han_Wanwei 0.399 0.463 0.537 0.095
“P” is the P value for rank 1, “proportion” refers to the proportion ofgene flow from the two sources. The “std.err” is the standard errorestimated using a Block Jackknife.
HUANG ET AL. | 645
TABLE 5 Y chromosomal and mtDNA haplogroup assignments
ID Sex Population mtDNA Y chromosome
Han01 Female Han_Wanwei C7a1 –
Han02 Female Han_Wanwei Y1 –
Han03 Female Han_Wanwei F1g –
Jing01 Female Jing_Shanxin F2a –
Jing02 Female Jing_Shanxin R9b2 –
Jing03 Female Jing_Shanxin B5a –
Jing04 Female Jing_Shanxin M7b1a1b –
Jing05 Male Jing_Shanxin B5b1 O1b1a1a1a1a-M88
Jing06 Male Jing_Wanwei M7b1a1 O1a1a1a-F140
Jing07 Female Jing_Wanwei M8a2b –
Jing08 Male Jing_Wanwei M12a2 O2a2b1a1a6-CTS1642
Jing09 Male Jing_Wanwei F2b O2a2b1a1a6-CTS1642
Jing10 Female Jing_Wanwei F2b –
Jing11 Male Jing_Wanwei B5a1 O2a2b1a1a6-CTS1642
Jing12 Male Jing_Wanwei F2b O2a2b1a1a6-CTS1642
Jing13 Female Jing_Wanwei N9a6 –
Jing14 Male Jing_Wanwei F1a1a O2a2b1a1a6-CTS1642
Jing15 Female Jing_Wanwei F2b –
Jing16 Male Jing_Wanwei N9a6 O1a1a1a1-F78
Jing17 Male Jing_Wanwei F1a O2a2b1a1a6-CTS1642
Jing18 Female Jing_Wanwei M8a2b –
Jing19 Female Jing_Wanwei B4a1e –
Jing20 Female Jing_Wanwei M7c2 –
Jing21 Male Jing_Wanwei B4a1e O1b1a1a1-M1348, M1310
Jing22 Female Jing_Wanwei M7c2 –
Jing23 Male Jing_Wanwei B4 O2a1a-Page127, F964, F3143
Jing24 Male Jing_Wanwei D5a2a1 O2a1a-Page127, F964, F3143
Jing25 Male Jing_Wanwei M7b1a1a3 D1a1-M15
Jing26 Male Jing_Wanwei F3a1 N1c1a-M178
Jing27 Male Jing_Wanwei D5b O2a2b1a1a-F8, F42
Jing28 Male Jing_Wanwei M7c2 O1a1a1a1-F78
Jing29 Male Jing_Wanwei F2b O1a1a1a-F140
Jing30 Female Jing_Wutou F1g –
Jing31 Male Jing_Wutou M7b1a1 O2a2a1a2-M7
Jing32 Female Jing_Wutou R9c1b1 –
Jing33 Female Jing_Wutou M71a1a –
Jing34 Female Jing_Wutou B5a1 –
Jing35 Male Jing_Wutou M7b1a1 O2a1c1a-F11
Jing36 Female Jing_Wutou B4 –
Jing37 Female Jing_Wutou B5a1a –
646 | HUANG ET AL.
from surrounding populations due to the very limited available genetic
information. Here the comprehensive genome-wide study of Jing peo-
ple has shed light on the understanding of language changes from a
genetic perspective. Our results show that Jing people are genetically
close to Vietnamese but also with significant evidence of deriving addi-
tional ancestry from southern Han Chinese and other southern indige-
nous populations. The language borrowing of Jing people is not only a
cultural phenomenon but also has involved gene flow. We note that
the Han Chinese in the Wanwei looks genetically more like Jing than
Han, which we suspect is caused by the fact that there are only a few
Han individuals in the Jing villages and they have intermarriages with
Jing people.
On the paternal side, one interesting question is if the language
transition in Jing people follows the male-induced hypothesis that For-
ster and Renfrew have proposed (Forster and Renfrew, 2011). If there
is male-dominated admixture from surrounding populations into Jing,
we would expect to find lineages that are common in surrounding pop-
ulations but rare in Vietnamese. We found a high frequency of hap-
logroup O2a2b1a1a-F8, F42 in Jing people, including 5.3% of
O2a2b1a1a*-F8, F42 and 31.6% of O2a2b1a1a6-CTS1642 (Table 5).
The haplogroup O2a2b1a1a-F8, F42 is suggested to be one of the
three super-grandfathers for present-day Chinese that experienced
star-like expansions in Neolithic Era about 5.4 thousand years ago (Yan
et al., 2014). The sublineage O2a2b1a1a6-CTS1642 reaches high fre-
quency in Dai people in Xishuangbanna (23.1%), but very low fre-
quency in Kinh Vietnamese (2.2%) (Poznik et al., 2016). Therefore, this
sublineage has more chance of being introduced from surrounding Tai-
Kadai speaking populations into Jing people rather than migrating from
Vietnamese. However, without large-scale high-resolution genotyping,
we cannot rule out the possibility that some populations of Vietnam
also have high frequencies of this O2a2b1a1a6-CTS1642 lineage. The
second frequent haplogroup is O1a1a1a-F140 accounting for 21% of
Jing people together with its sublineage O1a1a1a1-F78 (Table 5). The
haplogroup O1a1a1a is a sublineage of O1a-M119, which is prevalent
along the southeast coast of China, occurring at high frequencies in
Tai-Kadai speaking people and Taiwan aborigines (Wang & Li 2013).
The haplogroup O1a1a1a-F140 is also not observed in Kinh Vietnam-
ese. We also found Hmong-Mien enriched lineage O2a2a1a2-M7 (Cai
et al., 2011) and central-eastern Chinese enriched lineage O2a1c1a-
F11 (Wang et al., 2013; Yao X et al., 2017), and Tai-Kadai frequent line-
age O1b1a1a1a1a-M88 in Jing people, which has also been detected
in Kinh Vietnamese (Poznik et al., 2016). The maternal mtDNA lineages
of Jing people are consistent with the general profile of southern China
and Southeast Asia with high frequencies of haplogroup B, F, and M7
(Li et al., 2007).
The language change is an interesting cultural practice that might
be influenced and reflected by population genetic admixture, as sug-
gested by Forster and Renfrew that the language change in an already-
populated region may need immigrant males as reflected in the strong
association between languages with paternal Y chromosomes (Forster
and Renfrew, 2011). The previous studies could not be able to identify
and distinguish the downstream lineages in close-related populations
due to the very limited number of phylogenetic relevant SNPs. The
recent next-generation sequencing of worldwide samples has yielded a
variety of novel SNPs, which have revolutionized the Y chromosomal
tree. We took advantage of the microarray SNP genotyping technology
and classified the Jing samples into very detailed and informative line-
ages. We found the majority of paternal lineages in Jing people, espe-
cially the haplogroup O2a2b1a1a6-CTS1642 and O1a1a1a-F140 are
most likely introduced from surrounding southern Han Chinese or Tai-
Kadai speaking populations rather than a genetic legacy from Vietnam-
ese 500 years ago. The genetic evidence in this study supports the
male-associated language change hypothesis regarding the formation
of present-day Jing people and their language.
The data presented in the present study is the first genome-wide
dataset of Jing people generated to date, which is not only valuable in
anthropological studies, but also in other applied fields, such as forensic
identification, paternity tests, and medical research.
ORCID
Rong Hu http://orcid.org/0000-0002-3115-784X
Chuan-Chao Wang http://orcid.org/0000-0001-9628-0307
REFERENCES
Alexander, D. H., Novembre, J., & Lange, K. (2009). Fast model-based
estimation of ancestry in unrelated individuals. Genome Research, 19,
1655–1664.
Beall, C. M., Cavalleri, G. L., Deng, L., Elston, R. C., Gao, Y., Knight, J., . . .
Zheng, Y. T. (2010). Natural selection on EPAS1 (HIF2a) associated
with low hemoglobin concentration in Tibetan highlanders. Proceedings
of the National Academy of Science of the United States of America, 107,
11459–11464.
Cai, X., Qin, Z., Wen, B., Xu, S., Wang, Y., Lu, Y., . . . Li, H. (2011). Human
migration through bottlenecks from Southeast Asia into East Asia
during Last Glacial Maximum revealed by Y chromosomes. PLoS One,
6, e24282.
Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee,
J. J. (2015). Second-generation PLINK, rising to the challenge of
larger and richer datasets. GigaScience, 4, 7. Available at: https,//
www.cog-genomics.org/plink2.
Forster, P., & Renfrew, C. (2011). Mother tongue and Y chromosomes.
Science (New York, N.Y.), 333, 1390–1391.
International HapMap Consortium. (2003). The International HapMap
Project. Nature, 426, 789–796.
Jin, H.-J., Kwak, K.-D., Hammer, M. F., Nakahori, Y., Shinka, T., Lee, J.-W., . . .
Kim, W. (2003). Y-chromosomal DNA haplogroups and their implications
for the dual origins of the Koreans. Human Genetics, 114, 27–35.
Jones, E. R., Gonzalez-Fortes, G., Connell, S., Siska, V., Eriksson, A., Marti-
niano, R., . . . Bradley, D. G. (2015). Upper Palaeolithic genomes reveal
deep roots of modern Eurasians. Nature Communications, 6, 8912.
Kim, S.-H., Kim, K.-C., Shin, D.-J., Jin, H.-J., Kwak, K.-D., Han, M.-S., . . .
Kim, W. (2011). High frequencies of Y-chromosome haplogroup O2b-
SRY465 lineages in Korea: A genetic perspective on the peopling of
Korea. Investigative Genetics, 2, 10.
Lazaridis, I., Patterson, N., Mittnik, A., Renaud, G., Mallick, S., Kirsanow, K.,
. . . Krause, J. (2014). Ancient human genomes suggest three ancestral
populations for present-day Europeans. Nature, 513, 409–413.
Li, H., Cai, X., Winograd-Cort, E. R., Wen, B., Cheng, X., Qin, Z., . . . Jin, L.
(2007). Mitochondrial DNA diversity and population differentiation in
HUANG ET AL. | 647
southern East Asia. American Journal of Physical Anthropology, 134,
481–488.
Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachan-
dran, S., . . . Myers, R. M. (2008). Worldwide human relationships inferred
from genome-wide patterns of variation. Science, 319, 1100–1104.
Liu, X., Lu, D., Saw, W.-Y., Shaw, P. J., Wangkumhang, P., Ngamphiw, C.,
. . . Teo, Y.-Y. (2017). Characterising private and shared signatures of
positive selection in 37 Asian populations. European Journal of Human
Genetics, 25, 499–508.
Mallick, S., Li, H., Lipson, M., Mathieson, I., Gymrek, M., Racimo, F., . . .
Reich, D. (2016). The Simons Genome Diversity Project, 300
genomes from 142 diverse populations. Nature, 538, 201–206.
Mathieson, I., Lazaridis, I., Rohland, N., Mallick, S., Patterson, N., Rooden-
berg, S. A., . . . Reich, D. (2015). Genome-wide patterns of selection
in 230 ancient Eurasians. Nature, 528, 499–503.
Meyer, M., Kircher, M., Gansauge, M. T., Li, H., Racimo, F., Mallick, S., . . .
Sudmant, P. H. (2012). A high-coverage genome sequence from an
archaic Denisovan individual. Science, 338, 222–226.
Olson, J. S. (1998). An ethnohistorical dictionary of China. Westport:
Greenwood Press.
Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y., . . . Reich,
D. (2012). Ancient admixture in human history. Genetics, 192, 1065–1093.
Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and
eigenanalysis. PLoS Genetics, 2, e190.
Pischedda, S., Barral-Arca, R., G�omez-Carballa, A., Pardo-Seco, J., Catelli,
M. L., �Alvarez-Iglesias, V., . . . Salas, A. (2017). Phylogeographic and
genome-wide investigations of Vietnam ethnic groups reveal signa-
tures of complex historical demographic movements. Scientific
Reports, 7, 12630.
Poznik, G. D., Xue, Y., Mendez, F. L., Willems, T. F., Massaia, A., Wilson
Sayres, M. A., . . . Tyler-Smith, C. (2016). Punctuated bursts in human
male demography inferred from 1,244 worldwide Y-chromosome
sequences. Nature Genetics, 48, 593–599.
Pr€ufer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S.,
. . . Pääbo, S. (2014). The complete genome sequence of a Neander-
thal from the Altai Mountains. Nature, 505, 43–49.
Reich, D., Thangaraj, K., Patterson, N., Price, A. L., & Singh, L. (2009).
Reconstructing Indian population history. Nature, 461, 489–494.
van Oven, M., & Kayser, M. (2009). Updated comprehensive phyloge-
netic tree of global human mitochondrial DNA variation. Human
Mutation, 30, E386–E394.
Wang, B., Zhang, Y. B., Zhang, F., Lin, H., Wang, X., Wan, N., . . . Yu, J.
(2011). On the origin of Tibetans and their genetic basis in adapting
high-altitude environments. PLoS One, 6, e17002.
Wang, C. C., & Li, H. (2013). Inferring human history in East Asia from Y
chromosomes. Investigative Genetics, 4, 11.
Wang, C.-C., Yan, S., Qin, Z.-D., Lu, Y., Ding, Q.-L., Wei, L.-H., . . . Li, H.
(2013). Late Neolithic expansion of ancient Chinese revealed by Y
chromosome haplogroup O3a1c-002611. Journal of Systematics and
Evolution, 51, 280–286.
Wei, S. G. (2006). The variations of Chinese Jing Dialect. Journal of
Guangxi University for Nationalities, 28, 13–18.
Yan, S., Wang, C.-C., Zheng, H.-X., Wang, W., Qin, Z.-D., Wei, L.-H., . . .
Jin, L. (2014). Y chromosomes of 40% Chinese descend from three
Neolithic super-grandfathers. PLoS One, 9, e105691.
Yang, M. A., Gao, X., Theunert, C., Tong, H., Aximu-Petri, A., Nickel, B.,
. . . Fu, Q. (2017). 40,000-Year-old individual from asia provides
insight into early population structure in Eurasia. Current Biology, 27,
3202–3208.e9.
Yao, H.-B., Tang, S., Yao, X., Yeh, H.-Y., Zhang, W., Xie, Z., . . . Wang, C.-
C. (2017a). The genetic admixture in Tibetan-Yi Corridor. American
Journal of Physical Anthropology, 164, 522–532.
Yao, X., Tang, S., Bian, B., Wu, X., Chen, G., & Wang, C. C. (2017b).
Improved phylogenetic resolution for Y-chromosome Haplogroup
O2a1c-002611. Scientific Reports, 7, 1146.
Zhong, H., Shi, H., Qi, X.-B., Duan, Z.-Y., Tan, P.-P., Jin, L., . . . Ma, R. Z.
(2011). Extended Y chromosome investigation suggests postglacial
migrations of modern humans into East Asia via the northern route.
Molecular Biology and Evolution, 28, 717–727.
SUPPORTING INFORMATION
Additional Supporting Information may be found online in the sup-
porting information tab for this article.
How to cite this article: Huang X, Zhou Q, Bin X, et al. The
genetic assimilation in language borrowing inferred from Jing
People. Am J Phys Anthropol. 2018;166:638–648. https://doi.
org/10.1002/ajpa.23449
648 | HUANG ET AL.