phylogenetic structure of q-m378 subclade based on full y-chromosome sequencing

Upload: vladimir-gurianov

Post on 02-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    1/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    84

    __________________________________________________________

    Received: December 14 2013; accepted: December 16 2013;published: January 8 2014Correspondence: [email protected] [email protected]@yfull.com

    Phylogenetic Structureof Q-M378 Subclade

    Based On FullY-Chromosome Sequencing

    Vladimir Gurianov1Leon Kull2Roman Sychev3Vladimir Tagankin3Vadim Urasin3

    1 The Q-L275 Research Project, Russia,2 Full Genomes Corporation, USA,3 YFull research group, Russia.

    Abstract

    Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distributionand a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, knownso far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible direc-tions of their migrations in retrospect.

    The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validatedby analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generationsequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and includingQ-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined.

    SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (includingthe Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well.

    The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-Europeanlanguage carriers from Central Asia via Afghanistan and Iran to the West.

    Introduction

    The Q-M378 subclade1, downstream of Q-L275 haplogroup, is present in a number of pop-ulations in Europe, Southwest (Western)2 andSouthern Asia3, and also in the Central Asia allthe way to North-West China4.

    1yDNA Haplogroup Q and its Subclades 2013 -http://www.isogg.org/tree/ISOGG_HapgrpQ.html. Hereinafter subclades are referenced in

    line with ISOGG notation (International Society of Genetic Genealogy) specifyingsingle nucleotide polymorphism (SNP) typical for a respective subclade.2Cinnioglu et al, Excavating Y-chromosome haplotype strata in Anatolia, 2003.Haplotypes 337-339 according to predictor by Urasin (http://predictor.ydna.ru/) are

    positive to SNP M378. All samples belong to Central-Anatolian and East-Anatolianregions of Turkey.3Sanghamitra Sengupta et al., Polarity and Temporality of High-Resolution Y-

    Chromosome Distributions in India Identify Both Indigenous and Exogenous Ex-pansions and Reveal Minor Genetic Influence of Central Asian Pastoralists, Am JHum Genet. 2006 February; 78(2): 202221.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380230/(among the tested inhabitants of

    Pakistan 2 out of 176 or 1.14% were positive to SNP M378; SNP M378 was not

    identified among sample groups in India and Eastern Asia).4Zhong et al., Extended Y-chromosome investigation suggests post-Glacial mi-

    grations of modern humans into East Asia via the northern route // Molecular Bi-ology and Evolution, First published online: September 13, 2010, doi:10.1093/molbev/msq247 (among four populations of Uigurs from Xinjiang onesuch person was found in each of the two populations: 1 out of 71, 1 out of 18).

    One of the peculiar features of Q-M378 sub-clade is a relatively wide area of its distribution(connected with migrations of ancestral popula-tions of the Indo-European language family) andan extremely low percentage in almost all popu-lations (modern ethnic groups), where it hasbeen reported by now. The exception is the Jew-

    ish Diaspora (primarily Ashkenazi Jews), whereQ-M378 subclade share reaches 5.2 to 7 percent(Behar 20045, Hammer 20096). Therefore, Q-M378 locality is often associated with the MiddleEast. In the meantime, a more comprehensiveanalysis of research data and publicly availabledata of commercial tests enables us to draw aconclusion on more complex and rather unob-

    5 Behar DM, Garrigan D, Kaplan ME, Mobasher Z, Rosengarten D, Karafet TM, Quintana-Murci L, Ost-

    rer H, Skorecki K, Hammer MF. (2004)."Contrasting patterns of Y chromosome variation in AshkenaziJewish and host non-Jewish European populations". Hum Genet114(4): 354365.doi:10.1007/s00439-003-1073-7. PMID 14740294

    6Hammer MF, Behar DM, Karafet TM,et al.(November 2009). "Extended Y

    chromosome haplotypes resolve multiple and unique lineages of the Jewishpriesthood".Human Genetics126(5): 707717. doi:10.1007/s00439-009-0727-

    5.PMC2771134. PMID19669163.

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    2/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    85

    vious correlations between carriers of this Y-chromosome mutation for the last millennium.

    The article's aim is to, based on the available

    data from open sources and conducted researchdata, specify phylogenetic structure of Q-M378subclade and provide classification of its majorclusters (haplotypes, combined according to thefollowing criteria: pertaining to a sequence of asingle SNP - single nucleotide polymorphisms,phylogenetic similarity, geographical distribu-tion).

    Source data and methodology

    Data sets for comparison

    Data from the Personal Genome Project7

    and the 1000 Genomes Project8 were usedwithin the framework of the conducted research.Samples, taken from the specified projects (Ta-ble 1), have PGP and HG prefixes respectively.

    7 http://www.personalgenomes.org/ See also: Ball, M.P., et al., A publicresource facilitating clinical use of genomes. Proceedings of the NationalAcademy of Sciences, 2012. 109(30): p. 11920-11927.8http://www.1000genomes.org/ See also: 1000 Genomes Project Consortium.An integrated map of genetic variation from 1,092 human genomes. Nature,2012. 491(7422): p. 56-65.

    Table 1. Information based on the data from The Personal Genome Project and 1000 Genomes Project.

    Sample code Population Verified origin

    HG03914 Bengali (BEB) Bangladesh

    HG03652 Punjabi (PJL) Pakistan (Lahore)

    HG03864 Telugu (ITU) India

    PGP130 N/A Northern Africa (Morocco)

    Samples HG03914, HG03652, HG03864 thatdo not belong to Q-M378 subclade were used forcomparison.

    Additionally, data from targeted Y-chromosome sequencing of five individuals,tested at Full Genomes Corporation (FGC)9,were analyzed.

    9

    https://www.fullgenomes.com/

    Table 2. Information based on test participants' data at Full Genomes Corporation.

    Sample code Population Verified origin

    AJ1 Ashkenazi Jews Eastern Europe

    AJ2 Ashkenazi Jews Eastern Europe

    Ar1 Armenians Eastern Turkey

    Ir1 Iranians Iran, Khuzestan province

    Kz1 Kazakhs Kazakhstan, kozha lineage

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    3/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    86

    Genotyping

    Data sets in BAM format (BAM/SAM Specifi-cation10) and, in case of PGP130, TSV11 format

    were used for the research.

    Next-generation sequencing12, performedby Full Genomes Corporationat Beijing Ge-nomics Instituteusing Illumina HiSeq 2000sequencer, is characterized by the following pa-rameters: 50x coverage at read length of 100base pairs, with paired end reads. Mapped cov-erage at about 23 million base pairs out of ap-proximately 59 million base pairs, present in ahuman Y-chromosome, was obtained.

    Data processing and analysis

    Clusterization of Q-M378 subclade haplo-types (including haplotypes that belong to Q-L275 upstream level and downstream levels)was carried out based on 222 haplotypesprocessing (67 STR-markers13), obtained frompublic sources14. MURKA software15was used toconstruct the phylogenetic tree.

    Processing and analysis of full Y-chromosomesequencing data was made using FGC software,

    along with the software developed by YFull re-search group16.

    Samples pertaining to Q-L275 subclade andhaving no M378 mutation were used as refer-ence, along with the samples of an upstreamand parallel subclades on a case-by-case basis.Each sample was genotyped for both SNPs dis-covered during the research and SNPs includedin the ISOGG list under Q-L275 subclade and itsdownstream subclades.

    Presence of mutation in more than two sam-ples served as the criterion of a new SNP dis-covery, as well as data consistency between thenew SNPs inter seand the previously known in-10An up-to-date specification version can be found at.https://github.com/samtools/hts-specs11 TSV (Tab Separated Values) text format for storing and viewing tabular da-ta.12Behjati & Tarpey, What is next generation sequencing?, Arch Dis Child EducPract Ed 2013;98:236-238 doi:10.1136/archdischild-2013-304340http://ep.bmj.com/content/98/6/236.full13STR-markers (short tandem repeats).14 Public projects data from the Family Tree DNA website:http://www.familytreedna.com/projects.aspx. Hereinafter haplotypes from thespecified source are marked as follows - FTDNA kit and haplotype number.15 MURKA by Valery Zaporozhchenko (Research Center of Medical Genetics of the

    Russian Academy of Medical Sciences, Moscow, Russia).http://sourceforge.net/projects/phylomurka/16 http://www.yfull.com/

    formation on phylogenetic structure of a respec-tive subclade.

    Results

    Clusterization of Q-M378 subcladebased on SNP and STR-markers analysis

    Given that SNPs characterize distribution ofhaplotypes into clusters in a more specific way,primary clusterization was made taking into ac-count the known data on SNPs, defining sub-levels of Q-M378 subclade.

    There are three downstream subclades cur-rently known17 Q-L245, Q-L301, Q-L327. SNPswith an L prefix, defining the above subclades,were identified at the Family Tree DNA lab ledby Dr. Thomas Krahn.

    Geography of Q-L245 distribution essentiallyrepeats geography of M378 distribution (exceptfor Central and Southern Asia).

    Q-L301 subclade is localized exclusively inIran18. Simultaneous presence of two subcladesQ-L301 and Q-L245 in Iran and Iraq among au-tochthonous population is indicative of the long

    duration of residence of M378 mutation carriesamong the people living in this region1920.

    L327 is a private SNP, represented by a sin-gle haplotype of a Portuguese from Azores21.

    Another private SNP22 is P306, localized inone Indian. That being said, it was not foundamong the tested representatives of Q-M378subclades (including Q-L301)23.

    Until recently only two SNPs were acknowl-

    edged as downstream of L24524: L272.1, de-tected in Europe (Sicily) and L315 (discovered in

    17 Y-DNA Haplogroup Q and its Subclades 2013 -http://www.isogg.org/tree/ISOGG_HapgrpQ.html18FTDNA kit 178026, M7540, M7949.19 Nadia Al-Zahery et al, In search of the genetic footprints of Sumerians: a sur-vey of Y-chromosome and mtDNA variation in the Marsh Arabs of Iraq (2011).http://www.biomedcentral.com/1471-2148/11/288 This work has some data onQ haplotypes present in the Marsh Arabs (n=143) and Iraqis (n=154). Q-M378has a frequency of 2.1% in the first case and 1.9% in the second one.20Grugni et al., Ancient Migratory Events in the Middle East: New Clues from theY-Chromosome Variation of Modern Iranians (2012). DOI:10.1371/journal.pone.0041252. Among those positive to SNP M378 the followingethnic groups come under notice Khorasan Persians - 3 out of 59 (5.1%), Es-fahan Persians - 1 out of 11 (9.1%), Lurs - 2 out of 50 (3.9%), Assyrians - 1 outof 39 (2.6%), Azerbaijani - 1 out of 63 (1.6%).21FTDNA kit 13254.22 FTDNA kit N78873.23 FTDNA kit 178026, M7540, 193005, 95307 respectively.24 Both are private SNPs, i.e. found so far in a single carrier of such mutation.L315 FTDNA kit 51 and L272.1 (FTDNA kit 95307). L315 may not be stable asit was positive in HG02291 sample.

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    4/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    87

    East European Ashkenazi). Below L245 SNPL619.2 is located as well, discovered in two rep-resentatives of Armenian Diaspora25. Further-more, the fact that this SNP emerged relatively

    recently is confirmed by existence of ArmenianDiaspora representatives, who showed no signof this polymorphism26.

    Consequently, until very recently Q-L245subclade could not be clusterized using SNPs.Thereby phylogenetic definitions and analysis ofSTR-markers were used for clusterization. Asegment of DYF395S1 chromosome of low va-riability27 was used for clusterization (the ap-proach was initially proposed by Q yDNAProject28 administrator Rebekah A. Canada),which allowed formation of stable clusters withrespective geographical and ethnic reference.

    For example, the following clusters were hig-hlighted using this approach.

    DYF395S1=14-17

    It includes four haplotypes: two Dagestanis(identifiers according to the cited publication29-Avar Dag 511 and Kaitag Dag06 894), a Turk30and an Arab of Iraq31. The latter belongs to thelegendary tribe of Quraysh (Adnan-Modar tribal

    self-definition).

    This cluster is located closer to the tree rootL245 than any other one and, apparently, is thenearest to the ancestral haplotype.

    DYF395S1=15-17

    It includes a whole group of haplotypes ofpeople of various origin. One can pinpoint thefollowing subclusters in the cluster:

    -Central European (localization of most

    ancestral lineages Switzerland32, part of themis linked to a Mennonite community);

    25FTDNA kit E5340, 191379.26 FTDNA kit 173902, 178717.27Vladislav Ryzhkov, Calculating time to the most recent common ancestor byseparate panels of Y-STR markers, sorted by increasing mutation rate constants,The Russian Journal of Genetic Genealogy (Russian version): Vol. 3, No. 2, 2011,ISSN: 1920-2997 http://ru.rjgg.org28Q yDNA Project http://www.familytreedna.com/public/yDNA_Q/29Balanovsky et al, Parallel Evolution of Genes and Languages in the CaucasusRegion. Molecular Biology and Evolution, 13 May 2011.30FTDNA kit 303617.31FTDNA kit 197506.32 The SCHACKE surname appeared in Germany at least as early as the 1600sand perhaps earlier. The JAGGI surname in Switzerland goes back much further.With this DNA Project we hope to learn more about our early ancestors and

    where our ancestors originated. Johann Christoffel SCHACKE, the paternalancestor of most who carry the SHOCKEY surname, was born inKirchheimbolanden, Pfalz, Germany in 1720 to Swiss parents. He arrived inPhiladelphia PA in 1737. The Anglicized version of his name became JohnChristopher Shockey. He and his wife Barbara had nine children between 1739

    - North-European (localization of mostancestral lineages Netherlands33);

    - Italian (including haplotypes with partial

    SNP L272.1);

    - Armenian;

    - Southwest Asian.

    It should be noted that according toDYF395S1=15-17 attribute, a number of haplo-types with no L245 mutation, are part of thecluster, in particular haplotypes of a level, whichwill be further described as Q-Y2250, as well ashaplotypes of level Q-L327, and Q-P306. How-ever, in view of a thesis adopted by us on priori-ty of SNP application during clusterization, wewill not do that. This also implies a conclusionthat clusters DYF395S1=14-17 and/or 15-17were formed already as a part of Q-M378 level.This hypothesis however can be made morespecific only with the growth of a number oftested representatives of the cluster.

    DYF395S1=15-18

    DYF395S1=15-19

    These two clusters are represented exclu-sively by people of Jewish origin.

    Individual haplotypes, having RecLOH (theso-called Recombinational Loss of Heterozygosi-ty) in this part of Y-chromosome, were not con-sidered under this clusterization.

    It is expected to identify SNPs, correspondingto each of the above-mentioned STR-based clus-ters, as part of further research.

    and 1756, six sons and three daughters. After Barbara died John Christophermarried Anna Marie COMPTON. John Christopher and Anna Marie had one sonborn in 1774 or 1775. This project hopes to help identify the descendants of theseven sons of John Christopher SHOCKEY as well as learn more about his Swissancestors and their related families from Germany and/or Switzerland.http://www.familytreedna.com/public/shockey-schacke/default.aspx33 Huff/Hough Surname Project -http://www.familytreedna.com/public/HOUGH/default.aspx A Dutch namedDerrick Pauluszen Hoff (1649-1730), who arrived in New Amsterdam (New York)no later than 1660, is considered to be the common ancestor of the family.

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    5/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    88

    New phylogenetic structureof Q-M378 subclade,upstream and parallel subclades

    As a result of processing and analysis of fullY-chromosome sequencing data some new sin-gle nucleotide polymorphisms were discovered,their placements defined on Y-chromosome (ac-cording to the reference sequence of human ge-

    nome hg1934), as well as phylogenetic place-ments on the SNP tree.

    The data on the new SNPs was summarized

    in Tables 3-5 along with Diagram 1, specifyingSNP tagging according to Y notation35 and FullGenomes Corporation notation36.

    34hg19 reference sequence or GRCh37. See also: Human Genome Overview.http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/35 Y SNP prefix according to YFull.36 FGC SNP prefix according to Full Genomes Corporation.

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    6/19

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    7/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    90

    As can be seen from the above, below L275SNP level the following levels, not described tothis day, were discovered:

    1)

    Q-Y1150 level, which is downstream ofQ-L275 and parallel to Q-M378. SNPs of thislevel were discovered in only three natives ofHindustan (HG03914, HG03652, HG03864)37.

    2) Q-Y2250 level, downstream of Q-M378and parallel to Q-L245. SNPs of this level (Table3) were found in Ir1 and Kz1 samples. Seeingthat Ir1 sample has a positive SNP L301 value,and Kz1 is negative to this SNP, it is evidentthat Q-L301 level is downstream of Q-Y2250.Private SNPs of Kz1 sample are listed in Appen-dix 3. Private SNPs of Ir1 sample are listed inAppendix 7.

    3) Q-Y2220 level, downstream of Q-L245.This level combines haplotypes of Jewish andArmenian clusters Q-L245. All tested samples ofthis cluster representatives (AJ1, AJ2, Ar1) hadpositive SNPs of this level (see Table 4),excluding PGP130 sample (Moroccan origin).

    37G.R. Magoon, R.H. Banks, C. Rottensteiner, B.E. Schrack, V.O. Tilroe, T. Robb,A.J. Grierson, Generation of high-resolution a priori Y-chromosome phylogeniesusing next-generation sequencing data, 2013, doi:10.1101/00802 (in prepara-tion, preprint on bioRxiv.org).

    4) There is also Q-Y2220 level parallel to Q-Y2200 (xQ-Y2200) that contains SNPs, definingArmenian segment of DYF395S1=15-17 cluster.Due to the fact that these SNPs were found in

    only one sample (Ar1) they have a status of pri-vate ones. Although one can assume the follow-ing with high probability:

    - that part of these SNPs will be characte-rized by a rather wide range of haplotypes ofDYF395S1=15-17 cluster;

    - Q-L619.2 level will be downstream of Q-Y2220 (xQ-Y2200), since only a part of Arme-nians, who are positive to SNP L245, belong toit. Ar1 sample, tested by us, showed no sign ofL619.2 mutation.

    5) Q-Y2200 level, downstream of Q-Y2220.SNPs of this level define Jewish cluster Q-L245(see Table 5). Private SNPs of samples AJ1 andAJ2 are listed in Appendices 5, 6. In addition,both tested samples had no L315 mutation.

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    8/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    91

    Table 3. Q-Y2250 level. New SNPs, downstream of positive SNP M378.

    Position(hg19)

    Ancestralvalue

    Derivedvalue

    SNP name (Y)SNP name (FGC)

    or synonym

    7115834 C T Y2244 FGC4626

    6894323 C T Y2245 PR683

    3544336 C G Y2246 FGC4613

    2765038 T G Y2247 FGC4607

    4070598 G A Y2248 FGC4618

    4242831 A G Y2249 FGC4619

    4852955 G A Y2250 FGC4620

    6537988 A G Y2251 FGC4624

    6724553 C T Y2252

    8671530 A G Y2255 FGC4631

    10077457 T C Y2256 FGC4635

    15766997 A C Y2263 FGC4646

    18169503 A C Y2264 FGC4656

    18803364 C T Y2265 FGC4657

    18990293 A G Y2266 FGC4659

    22525954 AT A Y2268

    23956540 A T Y2269 FGC4675

    24452225 G C Y2270 FGC4676

    15684681 A T CTS4507

    13643442 T C FGC4638

    ___________________________

    Note:Y2268 deletion.

    Table 4.Q-Y2220 level. New SNPs, downstream of positive SNP L245.

    Position(hg19)

    Ancestralvalue

    Derivedvalue

    SNP name (Y) SNP name (FGC)

    9408770 G T Y2220 FGC1904

    18051798 A C Y2209 FGC1917

    22017904 G T Y2202 FGC1925

    4914530 A G Y2229

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    9/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    92

    Table 5. Q-Y2200 level. New SNPs, downstream of positive SNP L245.

    Position(hg19)

    Ancestralvalue

    Value positiveto SNP

    SNP name (Y) SNP name (FGC)

    23646920 C T Y2196 FGC1934

    22953894 A G Y2197 FGC1933

    22825080 A G Y2198 FGC1932

    22588598 C T Y2200 FGC1929

    22471554 A T Y2201 FGC1928

    21277083 G A Y2203 FGC1923

    19425984 G A Y2206

    19053060 C T Y2207 FGC1919

    18207170 A G Y2208 FGC1918

    18046486 T C Y2210 FGC1916

    18043999 G A Y2211 FGC1915

    16994660 T A Y2212 FGC1914

    15834557 G A Y2213 FGC1912

    14385853 T G Y2215 FGC1911

    14353022 A C Y2216 FGC1910

    14184253 C A Y2218 FGC1909

    9892635 C T Y2219 FGC1906

    9401947 C A Y2221 FGC1903

    8662585 C A Y2224 FGC1899

    6949449 C T Y2225 FGC1897

    4606181 C T Y2231 FGC1890

    3995524 G A Y2232 FGC1888

    3148720 A G Y2233 FGC1886

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    10/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    93

    Placement of SNPs, listed by ISOGG as SNPsunder Investigation, was specified within thescope of this work: F108, F803, F815, F1082,F1126, F1169, F1213, F1337, F1349, F1528,

    F1537, F1594, F1734, F1780, F1836, F1839,F1858, F1875, F1974, F2023, F2145, F2230,F2313, F2343, F2440, F2628, F2657, F2777,F2851, F2877, F2894, F2934, F3084, F3121,F3193, F3207, F3389, F3621, F3680. On May 8,2013 all of the above SNPs were classified byISOGG as pertaining to level L245 or below. Theanalysis showed necessity to modify the pro-posed scheme. All SNPs, apart from F1213,F1349, F1594, F1734, F1780, F1836, F1839,F2230, F2877, pertain to level Q-L275, as theyare positive for samples HG03914, HG03652,HG03864, AJ1, AJ2, Ar1, Ir1. The remainingSNPs, in their turn, are positive to all samples inthe research that are positive to M378 and L245.Consequently, the said SNPs are at the samelevel with Q-L275 and Q-M378 respectively38.

    Besides, a considerable amount of new SNPswas discovered at the same level with L275,M378 and L245.

    For example, the following SNPs pertain tolevel Q-L275 - Y1014-Y1022, Y1024-Y1057,Y1059-Y1069, Y1071-Y1137, Y1139, Y1142,

    Y1153, Y1160, Y1164, Y1166, Y1167, Y1169,Y1195, Y1220, Y1240, Y1978-Y1983, Y1985-Y1989, Y1991-Y1993, Y1995, Y1996-Y1997,Y2003, Y2005-Y2007, Y2009, Y2239, Y2243;

    to level Q-M378 - Y2012, Y2013, Y2016-Y2082, Y2084-Y2095, Y2097, Y2098, Y2113-Y2115, Y2226, Y2361 (Appendix 1, Table 6);

    to level Q-L245 - Y2116-2149, Y2195,Y2199, Y2204, Y2217, Y2222, Y2223, Y2235,Y2237 (Appendix 2, Table 7).

    The said SNPs do not at the moment haveany phylogenetic meaning, but it can be as-38 It should be noted that FTDNA research team led by Dr. Thomas Krahn, withthe participation of Q yDNA Project administrator Rebekah A. Canada, came to asimilar conclusion earlier. Respective data can be found on the SNP tree draftversion page of the Family Tree DNA website:http://ytree.ftdna.com/index.php?name=Draft&parent=31182976 There was nopublished justification of such conclusions, but, presumably, samples, tested un-der National Geographic Geno 2.0 project, were used for the analysis.

    signed to them later after a full sequencing ofsamples, pertaining to these levels and withoutSNP mutation, defining downstream levels.

    Summary

    The research proved high efficiency of full Y-chromosome sequencing to define phylogeneticstructure, allowed for forming a consistent phy-logenetic structure of Q-M378 subclade, con-firmed by analysis of SNP and STR-markers.

    As part of the research, new phylogenetic le-vels of Q-Y2250 (downstream of Q-M378 and in-cluding Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) weredefined. SNPs, which, in the future, may possi-bly mark certain European and Asian subclustersQ-Y2220 (including the Armenian subcluster), aswell as separate branches of the Jewish clusterQ-Y2200, were also defined.

    The research confirmed connection of Q-M378 subclade distribution with migration of In-do-European language carriers from Central Asiavia Afghanistan and Iran to the West. That beingsaid, the amount of materials at the researchers'disposal at the moment is not enough to form

    an entire picture of the mentioned migrationprocesses. The specified task can be resolved inthe near future, while statistically significant da-ta is being accumulated.

    Acknowledgements

    The authors of the article wish to thank thefollowing people, who rendered their assistancein its preparation and conducting the research:

    Mikhail Edelstein (Russia)Askar Abdullin (Kazakhstan)Igor Bukharov (Russia)Nazaret Chitilian (Lebanon)Justin Allen Loe (United States)Gregory Magoon (United States)

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    11/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    94

    Appendix 1.Table 6. SNPs at the same level with M378.

    Position(hg19)

    Ancestralvalue

    Derived valueSNP name (Y)or synonym

    SNP name (FGC)

    2806676 A G Y2012 FGC1770

    3111159 G C Y2013 FGC1758

    3815203 G C Y2016 FGC1774

    3929337 C A Y2017 FGC1988

    4234101 A G Y2018 FGC1775

    4332151 G A Y2019 FGC1776

    4634427 C A Y2020 FGC1777

    4775787 T C Y2021 FGC1779

    4778576 A G Y2022 FGC1780

    4783438 T C Y20234961249 C A Y2024 FGC1781

    5011266 A G Y2025

    5266522 A G Y2026 FGC1782

    5496739 A C Y2027 FGC1783

    5687522 T A Y2028 FGC1784

    5751055 T G Y2029 FGC1785

    5872168 C T Y2226

    5963558 G A Y2030

    6085717 C A Y2031 FGC1788

    6430659 T G Y2032 FGC1789

    6617825 T C Y2033 FGC1790

    6618215 T C Y2034 FGC1791

    6746675 T C Y2035 FGC1792

    6774328 T C Y2036 FGC1793

    6986250 T C Y2037 FGC1794

    7045044 C T Y2038 FGC1795

    7071796 C G Y2039 FGC1796

    7094691 A G Y2040 FGC1797

    7159039 C G Y2041 FGC1798

    7160439 G A Y2042 FGC1799

    7339849 G T Y2043 FGC1801

    7431253 C T Y2044 FGC1803

    7437821 C G Y2045 FGC1804

    7550568 G C Y2046 FGC1805

    7652630 G A Y2047

    7778164 G A Y2048 FGC1807

    7856334 A G Y2049 FGC1808

    7952263 C T Y2050 FGC1809

    8067818 C G Y2051 FGC1810

    8681004 T C Y2052 FGC1812

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    12/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    95

    8682184 C T Y2053 FGC1813

    8821295 A G Y2054 FGC1814

    9074666 C T Y2055 FGC1815

    9170505 G T Y2056 FGC181713127815 A G Y2057 FGC1818

    13928638 G C Y2058 FGC1820

    14017272 A G Y2059 FGC1825

    14193680 G A Y2060 FGC1827

    14293849 T A Y2061 FGC1830

    14435779 A G Y2062 FGC1833

    14540558 C T Y2063 FGC1834

    14674385 C T Y2064 FGC1835

    14733633 C A Y2065 FGC1836

    15498011 C A Y2066

    15521110 T C Y2067 FGC1838

    15699493 C T Y2068 FGC1841

    16217389 A AT Y2069

    16654310 C G Y2070 FGC1842

    16678163 C T Y2071 FGC1843

    17230548 G A Y2072 FGC1844

    17447489 C T Y2073 FGC1845

    17959860 A G Y2074 FGC1850

    18243302 C T Y2075 FGC1852

    18714407 C A Y2076 FGC1854

    18768735 G T Y2077

    18768736 C A Y2078

    18769454 A G Y2079 FGC1767

    18803642 T G Y2080 FGC1855

    18856911 G C Y2081 FGC1856

    19373808 A T Y2082 FGC1858

    21365952 G A Y2084 FGC1861

    21479863 G A Y2085 FGC1862

    21647670 G C Y2086 FGC186321832029 C A Y2087 FGC1864

    22022365 A G Y2088 FGC1865

    22101157 C T Y2089 FGC1866

    22440644 G A Y2361

    22624047 G A Y2090 FGC1768

    22931328 T A Y2091 FGC1869

    23053626 A G Y2092 FGC1872

    23078557 G T Y2093 FGC1873

    23166596 T C Y2094 FGC1874

    23279919 G T Y2095 FGC1875

    23566714 C T Y2097 FGC1877

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    13/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    96

    23615574 AT A Y2098

    28516009 A T Y2113

    28593688 T C Y2114

    28687807 A G Y2115

    ________________________

    *Note:Y2098 deletion, Y2069 insertion.

    Appendix 2.Table 7. SNPs at the same level with L245.

    Position(hg19)

    Ancestralvalue

    Derived value SNP name (Y) SNP name (FGC)

    2794289 C G Y2116 FGC1987

    3127708 T C Y2117 FGC1771

    3709585 A C Y2118 FGC1773

    4502969 T C Y2119 FGC1759

    4671322 C A Y2120 FGC1778

    7219594 T C Y2121 FGC1800

    7408851 C A Y2122 FGC1802

    7590793 C T Y2123 FGC1806

    8614513 C G Y2124 FGC1811

    9144039 A T Y2223 FGC1901

    9382621 G T Y2222 FGC1902

    9798919 G A Y2125 FGC181613956388 G A Y2126 FGC1821

    13982835 C T Y2127 FGC1823

    14012662 G A Y2128 FGC1824

    14045736 T C Y2129 FGC1826

    14202870 A G Y2130 FGC1828

    14285880 C G Y2131 FGC1829

    14296099 C A Y2217 FGC1831

    14402304 G A Y2132 FGC1832

    15569048 C T Y2133 FGC1839

    15614105 C G Y2134 FGC1840

    16519324 A G Y2135

    16757414 G GA Y2237

    17686482 T C Y2136 FGC1846

    17686883 A G Y2137 FGC1847

    17763793 T A Y2138 FGC1848

    17860015 G T Y2139 FGC1849

    18134822 T C Y2140 FGC1851

    18575106 G A Y2141 FGC1853

    19300050 C T Y2142 FGC1857

    21118566 T C Y2143 FGC1859

    22015887 C A Y2144 FGC1989

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    14/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    97

    22934317 ATC A Y2235

    23010582 C T Y2145 FGC1870

    23042385 C A Y2146 FGC1871

    23648959 T G Y2147 FGC187823733052 A G Y2148 FGC1879

    28520821 A G Y2149

    28646637 C G Y2195 FGC1883

    22767464 G A Y2199 FGC1868

    21235857 A G Y2204 FGC1860

    ________________________

    *Note:Y2235 deletion, Y2237 insertion.

    Appendix 3. Table 8. Private SNPs for Kz1 sample.

    Position(hg19)

    Ancestralvalue

    Derived value SNP name (Y) SNP name (FGC)

    2980949 T C YFS026208

    3027441 C A YFS026210 FGC4858

    3751684 G A YFS026242 FGC4859

    4164029 A G YFS026250 FGC4860

    4515848 G A YFS026257 FGC4862

    4714529 G T YFS026264 FGC4864

    5394870 T C YFS026279 FGC4865

    5398133 A T YFS026280 FGC4866

    6088200 T C YFS026301 FGC4867

    6675390 A G YFS026321 FGC4868

    7058898 G A YFS026329 FGC4869

    7208802 C T YFS026339 FGC4870

    7278041 G A YFS026340 FGC4871

    7704050 C T YFS026351 FGC4856

    7929100 A C YFS026356 FGC4872

    8268654 G A YFS026361 FGC4873

    8684090 G A YFS026366 FGC4874

    8714870 C T YFS026367 FGC4875

    9154952 G A YFS026372 FGC4876

    9990725 C G FGC4878

    13230336 G A FGC4879

    13313894 G C FGC4880

    13637299 G A FGC4881

    14599760 G A YFS026426 FGC4882

    15353330 C T YFS026439 FGC4883

    15540398 G A YFS026445 FGC488415617600 G A YFS026447 FGC4885

    15656595 A C YFS026448

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    15/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    98

    15881099 G A YFS026457 FGC4886

    17344441 A G YFS026496 FGC4887

    17455705 C G YFS026499 FGC4888

    17619239 A C YFS026502 FGC488918132430 T A YFS026506 FGC4890

    18205189 C A YFS026508 FGC4891

    18235952 C A YFS026509 FGC4892

    18427622 C T YFS026514 FGC4893

    18699065 G A YFS026522 FGC4894

    19119009 G A YFS026534 FGC4895

    21794826 T C YFS026585 FGC4896

    21824228 C T YFS026586 FGC4897

    22216997 C A YFS026594 FGC4898

    22263424 G T FGC4899

    22464918 G A YFS029304

    22470401 G T YFS029305 FGC4901

    22476862 T A FGC4902

    22779292 G A YFS026598 FGC4904

    22845858 T A YFS026600 FGC4905

    22980932 G A YFS026603 FGC4906

    23097922 G T YFS026606 FGC4907

    23188736 C T YFS026608 FGC4908

    23574588 G T YFS026618 FGC4909

    28577678 T G FGC4857

    28556325 T G YFS026709

    Appendix 4.Table 9. Private SNPs for Ar1 sample.

    Position(hg19)

    Ancestralvalue

    Derived value SNP name (Y) SNP name (FGC)

    2837084 G A YFS030295

    4687602 C T YFS030307

    3264534 G T YFS030298

    3692600 G A YFS030300

    6849037 A G YFS030309

    7389018 T C YFS030314

    7809088 C T YFS030318 FGC2000

    8227956 C T YFS030321 FGC2001

    8310172 G A YFS030322 FGC2002

    8891034 A G YFS030324 FGC2003

    9455617 G C YFS030326 FGC2004

    9507128 G A YFS030327 FGC2005

    13207417 C T FGC2006

    13862984 G A YFS030335 FGC2007

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    16/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    99

    14037704 A G YFS030339 FGC2008

    14266100 G A YFS030343

    14271743 G T YFS030344 FGC2009

    14645998 A T YFS03035015487465 T C YFS030354 FGC2010

    15532493 G C YFS030355 FGC2011

    15562737 G A YFS030356 FGC2012

    15649426 C G YFS030357

    15949197 C T YFS030358 FGC2013

    16033272 G A YFS030359 FGC2014

    16914913 A T YFS030368

    17143642 G A YFS030370 FGC2015

    17264341 C T YFS030371 FGC2016

    17350212 G T YFS030372 FGC2017

    17468836 G A YFS030374 FGC2018

    17522056 C A YFS030375 FGC2019

    17547056 C T YFS030376 FGC1986

    17969724 T C YFS030377 FGC2020

    18005360 G A YFS030378 FGC2021

    18082500 T C YFS030379 FGC2022

    18143358 C T YFS030380

    18269281 T C YFS030381 FGC2023

    19295864 G A YFS030386 FGC2024

    19305808 C G YFS030387 FGC2025

    21920836 G T YFS030396 FGC2026

    22195671 T G YFS030398 FGC2027

    22546195 T C YFS030431 FGC2029

    23036871 A C YFS030432 FGC2030

    23193319 C G YFS030433 FGC2031

    23633830 T C YFS030434 FGC2032

    23749442 C G YFS030435 FGC2033

    23952561 G A YFS030438 FGC2034

    28546577 A G YFS030460 FGC203528697215 C T YFS030463 FGC2036

    28728861 A G YFS030465 FGC2037

    28773229 G A YFS030466 FGC2038

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    17/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    100

    Appendix 5.Table 10. Private SNPs for AJ1 sample.

    Position(hg19)

    Ancestralvalue

    Derived value SNP name (Y) SNP name (FGC)

    3014878 G C YFS028077

    3279492 T C YFS028084

    4705139 G A YFS028121

    4734829 G T YFS028122

    5007712 T C YFS028135

    6028097 T C YFS028158 FGC4835

    6671453 T A YFS028174

    6985833 G C YFS028180 FGC4836

    7116693 C G YFS028187 FGC4837

    13225084 C A FGC483913227006 C T FGC4840

    14174284 C T YFS028277 FGC4841

    14683323 G A YFS028303

    15749472 C G YFS028328 FGC4842

    15911171 T A YFS028333 FGC4843

    17216758 C G YFS028365 FGC4844

    17842405 G A YFS028379 FGC4845

    18697269 A G YFS028399 FGC4846

    22541678 G A YFS028484

    22545510 G T YFS028485 FGC4850

    22809218 A T YFS028490 FGC4851

    22816094 C T YFS028491 FGC4852

    22989959 T C YFS028498 FGC4853

    23338485 T C YFS028509 FGC4854

    Appendix 6.Table 11. Private SNPs for AJ2 sample.

    Position

    (hg19)

    Ancestral

    value

    Derived value SNP name (Y) SNP name (FGC)

    3085515 C A YFS030088 FGC1885

    4157714 C T YFS030093 FGC1889

    7357489 C T YFS030117 FGC1898

    8757232 C A YFS030130 FGC1900

    9761433 C T YFS030140 FGC1924

    16933881 C T YFS030164 FGC1913

    19228285 T C YFS030189 FGC1920

    21322098 A G YFS030210 FGC1924

    22128896 C T YFS030218 FGC1926

    22612418 A T YFS030247 FGC1930

    22720359 C T YFS030248 FGC1931

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    18/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    101

    Appendix 7.Table 12. Private SNPs for Ir1 sample.

    Position (hg19) Ancestral value Derived value SNP name (Y)

    2808294 G A YFS0304862848925 C T YFS030487

    3241019 G A YFS030493

    3331565 C T YFS030495

    3617298 G A YFS030498

    3905106 T C YFS030501

    3983695 G A YFS030503

    4048861 C G YFS030505

    4976524 T C YFS030521

    4976526 T C YFS030522

    5021496 G C YFS030523

    5219277 T A YFS030526

    5844571 C T YFS030529

    6531744 G A YFS030531

    7398730 T C YFS030543

    7685828 G T YFS030547

    7997281 G C YFS030548

    8350958 G A YFS030550

    8482074 C G YFS030551

    8874735 C A YFS0305539459692 A G YFS030555

    9832592 A G YFS030556

    14022660 C A YFS030564

    14273656 A G YFS030573

    14401614 C T YFS030575

    14532575 G T YFS030582

    14916116 G A YFS030585

    14996654 G A YFS030588

    15012864 C A YFS030589

    15240341 G C YFS030591

    15799031 G C YFS030596

    15933501 T A YFS030599

    16253494 C T YFS030602

    16280147 C T YFS030603

    16304710 T C YFS030604

    16875622 C T YFS030608

    17529042 G A YFS030616

    18106050 C T YFS030618

    18903761 A C YFS030626

    19157289 G A YFS030633

    19198307 A T YFS030634

  • 8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing

    19/19

    The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    102

    19526472 A C YFS030637

    21359025 C G YFS030656

    21567329 G A YFS030657

    22564450 C T YFS03068422621906 G T YFS030685

    22687343 A T YFS030686

    22910874 G A YFS030688

    23018638 T C YFS030689

    23054174 T G YFS030690

    23198785 A T YFS030691

    23435852 A C YFS030694

    24484883 T C YFS030706

    28759876 C T YFS030732

    17188634 T C YFS030609

    19001468 C T YFS030630

    20534862 T C YFS030645

    21599239 A G YFS030658

    21836635 A T YFS030661