origins of the indo-europeans: genetic evidence

6
Origins of the Indo-Europeans: Genetic Evidence Author(s): Robert R. Sokal, Neal L. Oden and Barbara A. Thomson Source: Proceedings of the National Academy of Sciences of the United States of America, Vol. 89, No. 16 (Aug. 15, 1992), pp. 7669-7673 Published by: National Academy of Sciences Stable URL: http://www.jstor.org/stable/2360152 . Accessed: 14/07/2014 07:18 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . National Academy of Sciences is collaborating with JSTOR to digitize, preserve and extend access to Proceedings of the National Academy of Sciences of the United States of America. http://www.jstor.org This content downloaded from 85.220.46.231 on Mon, 14 Jul 2014 07:18:05 AM All use subject to JSTOR Terms and Conditions

Upload: neal-l-oden-and-barbara-a-thomson

Post on 27-Jan-2017

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Origins of the Indo-Europeans: Genetic Evidence

Origins of the Indo-Europeans: Genetic EvidenceAuthor(s): Robert R. Sokal, Neal L. Oden and Barbara A. ThomsonSource: Proceedings of the National Academy of Sciences of the United States of America,Vol. 89, No. 16 (Aug. 15, 1992), pp. 7669-7673Published by: National Academy of SciencesStable URL: http://www.jstor.org/stable/2360152 .

Accessed: 14/07/2014 07:18

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

National Academy of Sciences is collaborating with JSTOR to digitize, preserve and extend access toProceedings of the National Academy of Sciences of the United States of America.

http://www.jstor.org

This content downloaded from 85.220.46.231 on Mon, 14 Jul 2014 07:18:05 AMAll use subject to JSTOR Terms and Conditions

Page 2: Origins of the Indo-Europeans: Genetic Evidence

Proc. Natl. Acad. Sci. USA Vol. 89, pp. 7669-7673, August 1992 Population Biology

Origins of the Indo-Europeans: Genetic evidence (gene frequencies/Europe)

ROBERT R. SOKAL*t, NEAL L. ODENt, AND BARBARA A. THOMSON*

*Department of Ecology and Evolution, State University of New York, Stony Brook, NY 11794-5245; and tDepartment of Preventive Medicine, Division of Epidemiology, Health Sciences Center, State University of New York, Stony Brook, NY 11794-8036

Contributed by Robert R. Sokal, May 22, 1992

ABSTRACT Two theories of the origins of the Indo- Europeans currently compete. M. Gimbutas believes that early Indo-Europeans entered southeastern Europe from the Pontic Steppes starting ca. 4500 B.C. and spread from there. C. Renfrew equates early Indo-Europeans with early farmers who entered southeastern Europe from Asia Minor ca. 7000 BC and spread through the continent. We tested genetic distance matrices for each of 25 systems in numerous Indo-European- speaking samples from Europe. To match each of these ma- trices, we created other distance matrices representing geog- raphy, language, time since origin of agriculture, Gimbutas' model, and Renfrew's model. The correlation between genetics and language is significant. Geography, when held constant, produces a markedly lower, yet still highly significant partial correlation between genetics and language, showing that more remains to be explained. However, none of the remaining three distances-time since origin of agriculture, Gimbutas' model, or Renfrew's model-reduces the partial correlation further. Thus, neither of the two theories appears able to explain the origin of the Indo-Europeans as gauged by the genetics- language correlation.

Almost all Europeans speak Indo-European (IE) languages, the only exceptions being Finns, Estonians, Hungarians, Turks, Basques, and Maltese. Where did lEs come from and how did they spreal to most areas of Europe? Two theories of IE origins, derived from archaeological and linguistic evidence, currently predominate. The majority view is that of Marija Gimbutas (1-3) of the University of California, Los Angeles. She believes that early lEs entered southeastern Europe in three Kurgan culture waves from the Pontic Steppes starting ca. 4500 B.C. and spread from there. This view was challenged in 1987 by Colin Renfrew of Cambridge University (4). He equates early lEs with early farmers who entered southeastern Europe from Asia Minor ca. 7000 B.C. and spread through the continent by demic diffusion as proposed by Ammerman and Cavalli-Sforza (5). Genetic evidence from modern populations supports this model (6- 8), justifying a subsequent test of Renfrew's theory. How- ever, because Renfrew links his hypothesis with the origin of agriculture by demic diffusion, it becomes difficult to test the two hypotheses separately.

Here we examine whether the genetic evidence available from modern European populations favors one of the two hypotheses on IE origins. Our approach is to examine correlations between genetic and linguistic distances in Eu- rope and to estimate the effects of various factors (geography, origin of agriculture) and hypothesized movements (Gimbu- tas' and Renfrew's models) on the magnitude of these cor- relations.

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. ?1734 solely to indicate this fact.

MATERIALS AND METHODS We studied 25 genetic systems (erythrocyte antigens, plasma proteins, enzymes, histocompatibility alleles, immunoglob- ulins; Table 1) from 2111 IE-speaking samples in Europe. Details are specified elsewhere (10-14). We computed Pre- vosti's genetic distances (15, 16) (GEN) separately for the 479 to 27 localities (mean = 84) of each genetic system.

Linguistic distances (LAN) were subjective estimates fur- nished by M. Ruhlen, based on his current classification of IE languages (17). A dendrogram (Fig. 1) resulting from UPGMA clustering (18) of the linguistic distance matrix shows the relations between the IE languages in that matrix. We com- puted great-circle geographic distances (GEO) between pairs of localities. The origin-of-agriculture distances (OOA) be- tween any pair of points were described earlier (8). They sum distances from their respective starting times of agriculture back to their putative common agricultural origins.

The Renfrew hypothesis distance (REN) matrix was based on ref. 4 and discussions with Professor Renfrew. In his view, most of the introduction and subsequent diversification of the IE language families in Europe was concurrent with the spread of agriculture in the continent. Nevertheless, Renfrew explains the final branching into the major language families by a series of 10 so-called transitions illustrated in ref. 4 (figure 7.7). These transitions are associated with specific archaeological assemblages whose starting dates were en- tered on a map we smoothed by interpolation. Superimposed on this map (Fig. 2A) is a directed graph summarizing the directions and branching patterns of Renfrew's transitions. The REN between any pair of localities is their distance in time along the directed graph. If two localities are located in regions connected by different branches of the graph, the REN is computed by summing the time-distances along each branch to the point of their common origin. Suggestions by Professor Renfrew (personal communication) that some of these transitions might be wholly or partly acculturation rather than demic diffusion were tested by a sensitivity test aiming to maximize average GEN,LAN correlations. No genetic evidence for acculturation was found and the REN values were retained as described above.

The Gimbutas distances (GIM) are based on a map (Fig. 2B) redrawn from one provided by Professor Gimbutas. It shows the regions reached by Kurganization waves 1 and/or 2 and 3. Distances between any pair of localities ij are computed as

Dij qki + qkj - q(ki+kj).

In this formula q is the proportion of nonreplacement of resident genes by Kurgan genes, and ki and kj are the number of Kurganization waves received by localities i andj, respec- tively. Two localities are assigned Dij = 1 (ki = kj = 0) in the un-Kurganized region (N in Fig. 2B), and k1 = kj = 100 in the

Abbreviation: IE, Indo-European. tTo whom reprint requests should be addressed.

7669

This content downloaded from 85.220.46.231 on Mon, 14 Jul 2014 07:18:05 AMAll use subject to JSTOR Terms and Conditions

Page 3: Origins of the Indo-Europeans: Genetic Evidence

7670 Population Biology: Sokal et al. Proc. Natl. Acad. Sci. USA 89 (1992)

Table 1. Matrix correlations between genetic and linguistic distances, and partial matrix correlations involving these distances and geographic, origin-of-agriculture, Renfrew and Gimbutas distances

System N GEN,LAN .GEO .GEO,OOA .GEO,OOA,REN .GEO,OOA,GIM 1-2 ABO 133 0.144*** 0.001 0.067 0.067 0.027 2-5 MN 179 0.024 0.039 0.040 0.057* 0.106* 2-7 MN 51 -0.002 -0.142 -0.065 -0.050 0.016 3-1 P 79 -0.042 -0.077 -0.045 -0.046 -0.021 4-1 RHESUS 479 0.054*** 0.006 -0.001 0.004 0.003 4-13 RHESUS 74 0.114* 0.057 0.086* 0.087 0.065 4-19 RHESUS 69 0.179*** 0.164** 0.173** 0.178*** 0.192*** 5-1 LUTH 27 -0.029 -0.015 0.010 0.012 0.019 6-1 KELL 103 0.093 0.076 0.048 0.032 0.075 6-3 KELL 30 0.027 -0.054 0.025 0.027 0.061 7-1 ABHSE 49 0.017 -0.190 -0.174 -0.180 -0.217 8-1 DUFFY 81 0.109* 0.073 0.084 0.080 0.110** 36-1 HP 147 0.133*** 0.055* 0.065* 0.061* 0.038 37-1 TF 33 0.117* 0.072 0.063 0.060 0.077 38-1 GC 85 0.003 0.007 -0.005 -0.009 0.057* 50-1-1 AP 61 0.317*** 0.249*** 0.207* 0.195* 0.177* 52 PGD 39 0.122 0.043 0.045 0.049 0.005 53 PGM1 63 0.236*** 0.077 0.000 -0.013 0.048 56 AK 58 0.005 -0.119 -0.062 -0.073 0.025 63 ADA 41 0.203*** 0.059 0.058 0.031 -0.010 65 TASTER 52 0.359*** 0.273*** 0.291*** 0.294*** 0.219*** 100 HLA-A 60 0.408*** 0.238** 0.181*** 0.177* 0.184*** 101/2 HLA-B 60 0.455*** 0.280*** 0.216*** 0.211*** 0.231*** 200 GM 30 0.231*** 0.080 0.052 0.075 -0.011 201 KM 28 0.246* 0.213* 0.215* 0.184* 0.196*

Average 0.141 0.059 0.063 0.060 0.067 Numbers preceding the system symbols, up to 65, are those assigned by Mourant et al. (9); those from 100 and above were assigned in our

laboratory. N, numbers of localities samples; GEN, LAN, GEO, OOA, REN, and GIM stand for genetic, linguistic, geographic, origin-of- agriculture, Renfrew, and Gimbutas distances, respectively; the pairwise correlation is indicated as GEN,LAN; in the interest of brevity, partial correlations, which all are of GEN against LAN with various other distances held constant, are indicated by a period followed by the constant variables. Thus, .GEO,OOA stands for rGEN,LAN.GEO,OOA. The correlations are followed by significance symbols based on 249 permutations of rows and columns of one of the two distance or residual matrices. Significances are indicated as follows: *, 0.01 < P < 0.05; **, 0.004 < P - 0.01; ***, P = 0.004. The last probability is conservative, since it is the lowest we can demonstrate with 249 permutations. Had we carried out more permutations, we probably could have shown that many of the correlations marked by three asterisks are significant at P << 0.004. The significance for the average correlation is evaluated by Fisher's test for combining probabilities. In all cases P < 0.0001.

original Kurgan area (O in Fig. 2B). We do not, of course, know the value of q. We have been unable to learn from Gimbutas (refs. 1-3) how much actual population movement (versus cultural diffusion) is implied by her model. To resolve this dilemma, we iteratively solved for the maximum GEN,GIM correlation over all genetic systems, obtaining a value for q of 0.54. We used this estimate for computing our GIM values.

The six distance matrices were assembled for each genetic system. The five other distance matrices had to match the dimension and composition of the genetic distance matrix of each system. We applied Mantel's method (19, 20) to test the association between all pairs of distance matrices and Smouse-Long-Sokal tests (21) to yield partial matrix corre- lations. We evaluated significance by 249 permutations. The Smouse-Long-Sokal test extends Mantel's test to three or more matrices and tests whether an association between matrix A and B is significant when one or more matrices C, D, . . . are held constant. In this way we tested whether any correlation remained between GEN and LAN, once the correlation between these two variables due to one or more regressor variables was eliminated.

RESULTS Among pairwise correlations, the average correlations in- volving GEN or LAN with other variables are low, except for LAN with GEO (0.480) and LAN with OOA (0.594). Only one correlation (LAN,GIM = -0.035) is nonsignificant by Fisher's test for combining probabilities (22). All three design

matrices, OOA, REN, and GIM, are moderately related to GEO, ranging from 0.462 to 0.578. REN and GIM are also moderately related to OOA (0.342 and 0.222, respectively), but REN is only slightly correlated with GIM (0.231).

In Table 1, we show only correlations involving the zero- order pair GEN,LAN. Most pairwise correlations GEN,LAN are positive, moderately high, and strongly significant. Be- cause both genetics and language are spatially autocorre- lated, we next calculated their partial correlations by holding GEO constant. The average correlation for GEN,LAN of 0.141 drops to 0.059, with 7 systems retaining significant partial correlations. These linear correlations between dis- tance matrices are characteristically low, despite high signif- icance. Partial correlations with added distance matrices held constant do not decrease further and continue strongly sig- nificant. Fig. 3 summarizes these relations for the average correlation. Fisher's tests for combining probabilities indi- cate very high significance (P << 0.001) for every average correlation in the graph. In the absence of correlation there should be an equal number of positive and negative coeffi- cients. This null hypothesis can be firmly rejected by a sign test (22) at P < 0.025 for all correlations). We also tested by sign tests whether a significant number of genetic systems decrease their correlations as the number of distance matri- ces held constant increases. Holding GEO constant de- creases the correlation significantly, but no further distance matrix has any effect. If either REN or GIM explains part of the genetics-language correlation, they should reduce it. They do not, nor do the OOA distances. This last observation also runs counter to Renfrew's theory.

This content downloaded from 85.220.46.231 on Mon, 14 Jul 2014 07:18:05 AMAll use subject to JSTOR Terms and Conditions

Page 4: Origins of the Indo-Europeans: Genetic Evidence

Population Biology: Sokal et al. Proc. Natl. Acad. Sci. USA 89 (1992) 7671

LrINGUISTIC corSTflNCES CLUSTEPEO 16.00 12.00 8.PO 4.00 0.00

POPTUG. GALI CIAXN

.SPANI SH

.PPROVENC. FP.-PPOV FREM11-NC FPI ULIAXN LADI N POMANSH I TALIAXN APXUMAN. PUMANIAXN Q"R%IJJ Nll GREEK LATVIAXN LI THUAN. PUSSIAXN UKPAI N. BYELOPUS SLOVENE SEP-CPO MACEDON. BULGAP. POLISH SOPBIAN CZECH SLOVAK

r7DANI SH SWEDI SH NOPWEG. I CELAND. FAPOESE GEPMAN LUXEMB. DUTCH

.FPI SIAXN ENGLI SH A LBANIAXN

. IPRISH _ ~~~~~~~~~~~~SC.GAEL.

BPETON WELSH

FIG. 1. Dendrogram showing the results of UPGMA clustering (18) of the distances, furnished by M. Ruhlen, among 43 IE languages. Abscissa is in arbitrary units.

DISCUSSION There is significant correlation between genetic and linguistic distances among IE speakers in Europe. This correlation is significantly reduced by keeping geographic distances con- stant, conflrming earlier findings of spatial autocorrelation of both variables (11, 13, 14). The partial correlations remaining after geographic distances are held constant are still signifi- cant, yet none of the three distance matrices representing the hypotheses tested in this study-origin of agriculture, Ren- frew, or Gimbutas-can further explain (i.e., reduce) the correlations. Earlier we demonstrated (8) that the hypothesis of the origin of agriculture by demic diffusion (5, 23, 24) explains genetic distances in modern European populations. When tested separately for IE speakers (unpublished re- sults), this relationship is still true. However, the origin of agriculture is unable to explain the genetics-language corre- lations in Europe. Neither of the two contending hypotheses, Renfrew's or Gimbutas', contributes an additional explana- tory element. Why might that be so?

Is a study of the correlations of genetic and linguistic distances of IE speakers the wrong approach? To contribute to an understanding of the origin of IE speakers, genetic distances must be correlated with linguistic distances. Such a relationship has been demonstrated for Europe (25) and elsewhere (see references in ref. 25). Such correlations occur because (i) the processes of geographic differentiation of populations and those leading to linguistic differentiation proceed in tandem; (ii) once established, linguistic differ- ences serve as barriers to population mixing, reinforcing the genetics-language correlation; and (iii) introduction into an area of populations differentiated elsewhere will increase the genetics-language relation because these previously differ- entiated groups will differ with respect to both genetics and language. Of these, the first factor should be the major one. This is corroborated in the present data, where the only significant common factor is geography. The remaining sig- nificant partial correlation between language and genetics, after geography is held constant, indicates a relation between

these two variables above and beyond that caused by their common spatial differentiation.

Do REN and GIM fail to remove any genetics-language correlation because our coding of the two models is incor- rect? If there were no gene flow, genetics could not resolve the controversy. By basing his model on the demic diffusion theory of the origin of agriculture, Renfrew in effect admits gene flow. Yet neither OOA nor REN removes any genetics- language correlation. With respect to GIM, we note that the Indo-Europeanization of Iran and northwest India clearly involved population movements. We believe that Gimbutas' hypothesis of the Kurganization of southeastern Europe must imply an analogous process. We are supported in this by the outcome of our sensitivity test, which indicates population mixing. Thus our results support neither Renfrew's nor Gimbutas' theory. However, the significant partial correla- tions remaining after GEO, OOA, and REN or GIM have been held constant still require explanation and may hold the clue to IE origins.

The averaged GEN,LAN correlations are rather low. The highest pairwise correlation in Table 1 is only 0.455. Linear correlations of distance matrices generally tend to be on the low side, even with high statistical significance established by permutational methods. The averaged partial correlations of genetic distance with linguistic distance, with other distances including geography held constant, vary slightly around 0.06, depending on which other matrices are included. In Table 1 only a few systems show substantial correlations, the others being small and not significant. Not every locus will differ- entiate during the origins of the various populations. In a comparison of modern, racially diverse populations- Italians, Nigerians, and Japanese (as listed in ref. 26)- Italians differed from the other two populations by as much as 0.2 in only 20.4% of the cases. Differentiation or diffusion involving these populations would be detectable in only a few loci. Since the genetic differences between the pre-IE pop- ulation and the JEs surely were less than those among Italians, Japanese, and Nigerians, we should not expect many systems to show strong genetics-language correlation. Note

This content downloaded from 85.220.46.231 on Mon, 14 Jul 2014 07:18:05 AMAll use subject to JSTOR Terms and Conditions

Page 5: Origins of the Indo-Europeans: Genetic Evidence

7672 Population Biology: Sokal et al. Proc. Nati. Acad. Sci. USA 89 (1992)

A~~~~~~~~~~~~~~. ....

.. ...........FIG 2. Map.use.fo.co put

~.........wuuuuu. mgditacs.oresodig.o.h ...........lf3 w teris ofJR oign. A 9~~~~~~~~~~~~~~~uuuua usuauu

.......... ..Mp.or.om utig..enre hh..h.. hu...u..hm

gmu

7 Oe 00~~~~ .o 1a (Rn) distancscres.pTheicontourse

represent time intervals [years be- fore present (yr BP)], as identified in the key. They were obtained from a map in which the starting dates of archaeological assem-

HH11 ~xxxxxx 000ooooog 999GO 9099 blages, which characterize Ren- 2 ~44i XXXX4XXXX 00(0050000g64)~go lg7ge 1U518131 frew's 10 transitions (4), were in- 4--~xxxxxxxxx 0 0000 90009 UlosUIREW

++44V++XXXXXXXXX 00 0000 o ElhIUmmi terpolated to smooth the surface. The area has been subdivided into 4500 4950 5600 5900 6500 7250 7750 8500 9000 yr BP 10 numbered regions correspond-

_____-________ _______ ____ -- ~~~~~~~~~ing to the identically numbered B ~~~~~~~~~~~~~~~~~~~~~~~~~~transitions. A directed graph fol-

lowing the outline furnished by Renfrew has been superimposed

N ~~~~~~~~~~~on the map. (B) Map for comput- N ~~~~~~~~~~~~~~~ing Gimbutas (GIM) distances.

The map shows outlines of regions N ~~~~~in Europe that received none, one,

or more of the Kurgan waves de- 3 ~~~~~~~~~~scribed by Gimbutas in refs. 1-3.

Each region is labeled by the wave number it received. Regions that

23 ~~~~~~~~~~~~~received more than one wave are marked by more than one nu-

0 ~~~~~meral. Thus, area 123 received 3 23 ~~~~~~~~~~~~~~~waves 1, 2, and 3. The region

N 2 A ~~~~~~~~~~~~~~~~~~~~labeledO0is the original home of N the Kurgan people (the Sredni-

2 ~~~~~~~~~Stog and Yamna cultures), regions 2 ~~~~~~~~labeled N received none of the -' ~~~~~~~~~~~Kurgan waves. The map is based

N N on hand-drawn originals by M. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ~~ ~~Gim butas.

that the high correlations subsumed in the low averages remain high as partial correlations also.

Our results imply that, while neither of the currently contending hypotheses of IE origins can be supported by the genetic evidence, there is significant residual correlation for some genetic systems that requires explanation. We shall not propose an alternative hypothesis of IE origins. However, the observed correlations invite exploratory data analysis of the population samples supporting the relation between lan- guage and genetics.

We examined the residuals from the regressions of GEN on GEO and LAN on GEO. We examined the highest 2% of the products of these residuals and mapped the pairs of localities represented by them for each of the five systems (50.1.1 AP, 65 T, 101 HLA-A, 101/102 HLA-B, and 201 KM) showing the

highest correlation between genetics and language. Paired positive (negative) deviations indicate areas more (less) dis- tant genetically and linguistically than their geographic dis- tances would predict. In maps for these five systems, paired positive deviations frequently involve Sardinia, which is quite distant genetically, and also linguistically, from nearby Mediterranean populations. Paired negative deviations heavily involve Iceland. Icelanders, being "displaced Scan- dinavians," are far less distant genetically and linguistically from Scandinavians and other Germanic speakers than their geographic distances indicate.

While the map patterns do not, unfortunately, suggest an alternative hypothesis of IE origins, since the relations they do indicate are far more recent (such as the settlement of Iceland), they suggest that the overall pattern of partial

This content downloaded from 85.220.46.231 on Mon, 14 Jul 2014 07:18:05 AMAll use subject to JSTOR Terms and Conditions

Page 6: Origins of the Indo-Europeans: Genetic Evidence

Population Biology: Sokal et al. Proc. Natl. Acad. Sci. USA 89 (1992) 7673

.060*** .GEO,OOA,REN

llns / \15ns

.141*** .059*** .063*** .065*** 3*** 15ns

GEN,LAN .GEO - .GEO,OOA .GEO,OOA,REN,GIM

\15ns 9ns

-s _ \ .067*** .GEO,OOA,GIM

* Renfrew correct GI Gimbutas correct * Neither correct

FIG. 3. Summary of results. The large arrows indicate successive steps in computing zero- to fourth-orderpartial correlations between genetic (GEN) and linguistic (LAN) distances. Other distances successively held constant are geography (GEO), origin of agriculture (OOA), Gimbutas (GIM), and Renfrew (REN). The numerical values at both ends of the large arrows are the average correlations from the bottom line of Table 1. They are all highly significant (P << 0.001). The numbers above the large arrows are the numbers of genetic systems (out of 25) that respond counter to expectations when an added distance matrix is held constant. The symbols following these numbers give the results of a one-tailed sign test (22) of the positive and negative changes to the correlations during the operation indicated by the arrow [ns (not significant), P > 0.05; ***, P < 0.005]. The three small arrows beneath each large arrow furnish predictions made by each theory concerning the behavior of the partial correlations. From the top down the arrows represent Renfrew's theory, Gimbutas' theory, and the assumption that neither theory is correct. A horizontal small arrow predicts no effect, a downward sloping small arrow predicts a reduction in the magnitude of the partial correlations, and a downward vertical small arrow predicts a reduction of the partial correlation to nonsignificance. The small arrows illustrate that the predictions of the Renfrew and Gimbutas theories are not borne out and that the outcomes are compatible with the prediction that neither theory is correct.

correlations might help us decide among competing hypoth- eses. If the IEs originated in situ by local differentiation only, there should be no significant partial correlation, since ge- ography should fully explain the observed genetic and lin- guistic distances. This was not the case. If the genetics- language correlation were entirely due to the spread of populations accompanying the origin of agriculture, then the origin-of-agriculture model should suffice, or at least there should be some effect due to origin of agriculture. But we saw that origin-of-agriculture distances (OOA) cannot reduce the partial correlations remaining after geography has been held constant. If the JEs originated by a branching process outside or inside of Europe and the populations ancestral to the modern IE language families branched off at different times, and moved into different regions in Europe where they differentiated subsequently, they would yield a pattern such as was found by us. A phylogenetic tree structure would add additional similarities and distances to the data, above and beyond those engendered by local differentiation. These conclusions agree with earlier findings in our laboratory (13, 14, 27) that intrusion of populations differentiated elsewhere has contributed an important element to the association between genetics and language in Europe.

We thank Prof. Marija Gimbutas, Lord Renfrew, and Dr. Merritt Ruhlen for their collegial cooperation in this work. We are indebted to D. DiGiovanni, M.-J. Fortin, and C. Wilson for technical assis- tance. Part of the computation was carried out on the Cornell National Supercomputer Facility. This research was supported by National Science Foundation Grant BNS8918751 and National In- stitutes of Health Grant GM28262.

1. Gimbutas, M. (1973) J. Indo-Eur. Studies 1, 1-20. 2. Gimbutas, M. (1979) Arch. Suisses Anthropol. Gegn. 43, 113-137. 3. Gimbutas, M. (1986) in Ethnogenese Europaischer Vdlker, eds. Bern-

hard, W. & Kandler-Palsson, A. (Fischer, Stuttgart, F.R.G.), pp. 5-20.

4. Renfrew, C. (1987) Archaeology and Language: The Puzzle of Indo- European Origins (Jonathan Cape, London).

5. Ammerman, A. J. & Cavalli-Sforza, L. L. (1984) The Neolithic Transi- tion and the Genetics of Populations in Europe (Princeton Univ. Press, Princeton, NJ).

6. Menozzi, P., Piazza, A. & Cavalli-Sforza, L. L. (1978) Science 201, 786-792.

7. Sokal, R. R. & Menozzi, P. (1982) Am. Nat. 119, 1-17. 8. Sokal, R. R., Oden, N. L. & Wilson, C. (1991) Nature (London) 351,

143-145. 9. Mourant, A. E., Koped, A. C. & Domaniewska-Sobczak, K. (1976) The

Distribution of the Human Blood Groups (Oxford Univ. Press, London). 10. Derish, P. A. & Sokal, R. R. (1988) Hum. Biol. 60, 801-824. 11. Sokal, R. R. (1988) Proc. Natl. Acad. Sci. USA 85, 1722-1726. 12. Sokal, R. R., Oden, N. L. & Thomson, B. A. (1988) Am. J. Phys.

Anthropol. 76, 337-361. 13. Sokal, R. R., Oden, N. L., Legendre, P., Fortin, M.-J., Kim, J. &

Vaudor, A. (1989) Am. J. Phys. Anthropol. 79, 489-502. 14. Sokal, R. R., Harding, R. M. & Oden, N. L. (1989) Am. J. Phys.

Anthropol. 80, 267-294. 15. Prevosti, A., Ocana, J. & Alonso, G. (1975) Theor. Appl. Genet. 45,

231-241. 16. Wright, S. (1978) Evolution and the Genetics of Populations, Vol 4:

Variability Within and Among Populations (Univ. of Chicago Press, Chicago).

17. Ruhlen, M. (1991) A Guide to the World's Languages, Vol 1: Classifi- cation; With a Postscript on RecentDevelopments (Stanford Univ. Press, Stanford, CA).

18. Sneath, P. H. A. & Sokal, R. R. (1973) Numerical Taxonomy (Freeman, San Francisco).

19. Mantel, N. (1967) Cancer Res. 27, 209-220. 20. Sokal, R. R. (1979) Syst. Zool. 28, 227-231. 21. Smouse, P. E., Long, J. C. & Sokal, R. R. (1986) Syst. Zool. 35,627-632. 22. Sokal, R. R. & Rohlf, F. J. (1981) Biometry (Freeman, San Francisco),

2nd Ed. 23. Ammerman, A. J. & Cavalli-Sforza, L. L. (1973) in The Explanation of

Culture Change, ed. Renfrew, C. (Duckworth, London), pp. 343-357. 24. Ammerman, A. J. & Cavalli-Sforza, L. L. (1979) in Transformations:

Mathematical Approaches to Culture Change, eds. Renfrew, C. & Cooke, K. L. (Academic, New York), pp. 275-294.

25. Sokal, R. R., Oden, N. L., Legendre, P., Fortin, M.-J., Kim, J., Thom- son, B. A., Vaudor, A., Harding, R. M. & Barbujani, G. (1990)Am. Nat. 135, 157-175.

26. Roychoudhury, A. K. & Nei, M. (1988) Human Polymorphic Genes: World Distribution. (Oxford Univ. Press, New York).

27. Sokal, R. R. (1991) Annu. Rev. Anthropol. 20, 119-140.

This content downloaded from 85.220.46.231 on Mon, 14 Jul 2014 07:18:05 AMAll use subject to JSTOR Terms and Conditions