indo-european origins: a computer-simulation test of five hypotheses

24
AMERICAN JOURNAL. OF PHYSICAL ANTHROPOLOGY 96109-132 (1995) Indo-European Origins: A Computer-Simulation Test of Five Hypotheses GUIDO BARBUJANI, ROBERT R. SOW, AND NEAL L. ODEN Dipartimento di Scienze Statistiche, Universita di Bologna, I-40126 Bologna, Italy (G.B.); Department of Ecology and Evolution, State University of New York, Stony Brook, New York 11 794-5245 (R.R.S., G.B.); The EMMES Corporation, Potomac, Maryland 20854 (N.L.O.) KEY WORDS Genetic variation, Demic diffusion, Language, Computer simulation ABSTRACT Allele frequency distributions were generated by computer simulation of five models of microevolution in European populations. Genetic distances calculated from these distributions were compared with observed genetic distances among Indo-European speakers. The simulated models dif- fer in complexity, but all incorporate random genetic drift and short-range gene flow (isolation by distance). The best correlations between observed and simulated data were obtained for two models where dispersal of Neolithic farmers from the Near East depends only on population growth. More com- plex models, where the timing of the farmers’ expansion is constrained by archaeological time data, fail to account for a larger fraction of the observed genetic variation; this is also the case for a model including late Neolithic migrations from the Pontic steppes. The genetic structure of current popula- tions speaking Indo-European languages seems therefore to largely reflect a Neolithic expansion. This is consistent with the hypothesis of a parallel spread of farming technologies and a proto-Indo-European language in the Neolithic. Allele-frequencygradients among Indo-European speakers may be due either to incomplete admixture between dispersing farmers, who presum- ably spoke proto-Indo-European, and pre-existing hunters and gatherers (as in the traditional demic diffusion hypothesis), or to founder effects during the farmers’ dispersal. By contrast, successive migrational waves from the East, if any, do not seem to have had genetic consequences detectable by the present comparison of observed and simulated allele frequencies. 0 1995 Wiley-Liss, Inc. Wide allele-frequency clines exist a t sev- eral loci in Europe (Menozzi et al., 1978; Sokal and Menozzi, 1982; Sokal et al., 1989a). Their extent is such that simple models of isolation by distance are unlikely to explain them. It is generally agreed that they result from a population expansion starting in Anatolia approximately 10,000 years ago (Menozzi et al., 1978; Sokal and Menozzi, 1982; Renfrew, 1987, 1991, 1992; Cavalli-Sforza, 1988; Sokal et al., 1991), most likely associated with the development of technologies for food production (Hassan, 1973; Zeven, 1980). Radiocarbon chronolo- gies of Mesolithic and Neolithic settlements indicate a westward spread of farming tech- nologies from the Near East, starting ap- proximately 8,000 BC (Ammerman and Cav- alli-Sforza, 1984; Renfrew, 1987). The observed gene-frequency gradients can then be explained by attributing the propagation of farming to the dispersal of early farmers, Received January 25,1994; accepted August 7,1994 Address reprint requests to Robert R. Sokal, Department of Ecology and Evolution, State University of New York, Stony Brook, NY 11794-5245. 0 1995 WILEY-LISS, INC.

Upload: guido-barbujani

Post on 06-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Indo-European origins: A computer-simulation test of five hypotheses

AMERICAN JOURNAL. OF PHYSICAL ANTHROPOLOGY 96109-132 (1995)

Indo-European Origins: A Computer-Simulation Test of Five Hypotheses

GUIDO BARBUJANI, ROBERT R. S O W , AND NEAL L. ODEN Dipartimento di Scienze Statistiche, Universita di Bologna, I-40126 Bologna, Italy (G.B.); Department of Ecology and Evolution, State University of New York, Stony Brook, New York 11 794-5245 (R.R.S., G.B.); The EMMES Corporation, Potomac, Maryland 20854 (N.L.O.)

KEY WORDS Genetic variation, Demic diffusion, Language, Computer simulation

ABSTRACT Allele frequency distributions were generated by computer simulation of five models of microevolution in European populations. Genetic distances calculated from these distributions were compared with observed genetic distances among Indo-European speakers. The simulated models dif- fer in complexity, but all incorporate random genetic drift and short-range gene flow (isolation by distance). The best correlations between observed and simulated data were obtained for two models where dispersal of Neolithic farmers from the Near East depends only on population growth. More com- plex models, where the timing of the farmers’ expansion is constrained by archaeological time data, fail to account for a larger fraction of the observed genetic variation; this is also the case for a model including late Neolithic migrations from the Pontic steppes. The genetic structure of current popula- tions speaking Indo-European languages seems therefore to largely reflect a Neolithic expansion. This is consistent with the hypothesis of a parallel spread of farming technologies and a proto-Indo-European language in the Neolithic. Allele-frequency gradients among Indo-European speakers may be due either to incomplete admixture between dispersing farmers, who presum- ably spoke proto-Indo-European, and pre-existing hunters and gatherers (as in the traditional demic diffusion hypothesis), or to founder effects during the farmers’ dispersal. By contrast, successive migrational waves from the East, if any, do not seem to have had genetic consequences detectable by the present comparison of observed and simulated allele frequencies. 0 1995 Wiley-Liss, Inc.

Wide allele-frequency clines exist a t sev- eral loci in Europe (Menozzi et al., 1978; Sokal and Menozzi, 1982; Sokal et al., 1989a). Their extent is such that simple models of isolation by distance are unlikely to explain them. It is generally agreed that they result from a population expansion starting in Anatolia approximately 10,000 years ago (Menozzi et al., 1978; Sokal and Menozzi, 1982; Renfrew, 1987, 1991, 1992; Cavalli-Sforza, 1988; Sokal et al., 1991), most likely associated with the development of technologies for food production (Hassan, 1973; Zeven, 1980). Radiocarbon chronolo-

gies of Mesolithic and Neolithic settlements indicate a westward spread of farming tech- nologies from the Near East, starting ap- proximately 8,000 BC (Ammerman and Cav- alli-Sforza, 1984; Renfrew, 1987). The observed gene-frequency gradients can then be explained by attributing the propagation of farming to the dispersal of early farmers,

Received January 25,1994; accepted August 7,1994 Address reprint requests to Robert R. Sokal, Department of

Ecology and Evolution, State University of New York, Stony Brook, NY 11794-5245.

0 1995 WILEY-LISS, INC.

Page 2: Indo-European origins: A computer-simulation test of five hypotheses

110 G. BARBUJANI ET AL.

who interbred only sparingly with the bands of hunters and gatherers whom they met in the process, and who had already colonized most of Europe (Ammerman and Cavalli- Sforza, 1984). Such a combination of demo- graphic growth, range expansion, and lim- ited admixture has been termed demic diffusion (Menozzi et al., 1978). Its expected consequences include correlations between allele frequencies and the dates of onset of agriculture (Sgaramella-Zonta and Cavalli- Sforza, 1973), which have actually been ob- served (Sokal et al., 1991).

The European regions where early (Neo- lithic) agriculturalists expanded correspond approximately to the current western range of Indo-European languages. This raises the question whether the cultural process of Indo-European diffusion was also deter- mined by the demographic processes accom- panying the spread of farming (Renfrew, 1987). The traditional view holds, on the contrary, that proto-Indo-European entered Europe not earlier than 4500 BC, through three migrational waves also coming from the East, namely from the Pontic steppes (Gimbutas, 1979, 1986). Although directed westwards, like demic diffusion, these waves were not associated with population increases comparable to those caused by the introduction of farming and animal breed- ing. Therefore, diffusion of Indo-European in the late Neolithic would imply that lan- guages spread more through cultural con- tacts (Zvelebil and Zvelebil, 1988) than by demographic processes (see Renfrew, 1989). As a corollary to this, association between patterns of genetic and linguistic variation should be limited and occasional among con- temporary Indo-European speakers.

Support for the view linking the spread of proto-Indo-European in Europe with demic diffusion comes from studies showing that patterns of linguistic and genetic diversity correspond in many European populations (Sokal, 1988; Cavalli-Sforza et al., 1988, 1992; Harding and Sokal, 1988; Barbujani and Sokal, 1990, 1991; Bertranpetit and Cavalli-Sforza, 1991). The exceptions in- clude Basques (Piazza et al., 1988; Bertran- petit and Cavalli-Sforza, 1991 1, Hungarians (Barbujani et al., 1990), and Uralic-speakers (Guglielmino et al., 19901, that is to say,

groups whose current languages do not be- long to the Indo-European phylum.

Recent years have seen both a refinement of the hypotheses on Indo-European origins, and the emergence of contradictory data. Based on archaeological and linguistic evi- dence, Cavalli-Sforza (1988) and Renfrew (1991, 1992) argued that the common ele- ments recognized within the so-called Nos- tratic linguistic macrofamily (Kaiser and Shevoroshkin, 19881, including Indo-Euro- pean, Altaic, Afro-Asiatic, and Elamo-Dra- vidian, derive from a common biological ori- gin of most of their current speakers. It is then possible to interpret the current distri- bution of most Nostratic, and not only Indo- European, languages as a consequence of a multidirectional spread of agriculture. Re- cent genetic analyses agree with this view (Cavalli-Sforza et al., 1993; Barbujani and Pilastro, 1993). On the other hand, a model incorporating the effects of the origin of agri- culture and/or specifically the hypotheses of Renfrew and Gimbutas failed to explain a larger fraction of the correlations between genetic and linguistic distances than is ex- plained by the simple effects of geographic distances (Sokal et al., 1992).

Further evidence in favor of either hy- pothesis may be obtained by simulating their genetic consequences, and then com- paring them with the patterns of genetic variation observed in the field. In the only simulation study available so far, present- time genetic variation was demonstrated to be compatible both with a Neolithic origin of Indo-European speakers, and with later im- migration from the Pontic steppes (Rendine et al., 1986). However, the two hypotheses were not contrasted in that study, but com- bined. A relationship between current gene- frequency gradients and dispersal from the east was evident, and seems undisputable. However, when exactly, and by what type of process, proto-Indo-Europeans spread has not been convincingly ascertained. We are particularly interested in establishing if the demic diffusion of early farmers can, at least in principle, explain a large share of current genetic diversity among Indo-European speakers, or if additional processes must be included in the model to obtain a better fit with real data. Among these additional pro-

Page 3: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 111

cesses, we gave a special emphasis to the migratory waves postulated by Gimbutas.

In this study, we simulated microevolu- tionary scenarios of increasing complexity, from an unrealistically simple one to models including several archaeologically docu- mented migrations. We then calculated cor- relation coefficients between the simulated and real gene frequencies (or, more pre- cisely, between matrices of genetic distances calculated from each of them). We expected to observe an increasing agreement between real and simulated data as the models get more and more realistic. When an increase in the complexity of the model is not matched by an increase in the correlations, we conclude that the new factors included in that model do not improve our understand- ing of the phenomenon, and should there- fore be considered unnecessary. This does not automatically imply that these factors played no evolutionary role at all. But, if they did, evidence of their effects should be sought in data other than the currently available allele frequencies. A large data- base of European allele frequencies (see Sokal et al., 1989a) was analyzed for this purpose.

METHODS AND DATA Overview of the simulations

We carried out a series of computer simu- lations of five microevolutionary models. All models were based on a stepping-stone pop- ulation structure, consisting of a 60 X 37 regular lattice, superimposed onto the map of Europe. Each node of the lattice rep- resents a l-degree square quadrat. Of the 2,220 nodes of the lattice, 1,512 are land areas supporting a human population. Each node (population) is characterized by its ef- fective size (N,) and by the frequency of one allele (p). At each generation p undergoes random variation, representing the effects of genetic drift, which is a function of N,. Migration is allowed only between adjacent populations. The numbers of individuals mi- grating at each generation depend on popu- lation sizes, and on factors of resistance to migration, which are zero across plains, but greater than zero across mountain chains and seas. The differences among the five

models lie in parameters other than those described so far.

Each simulation experiment consisted of 440 iterations (representing generations) of a series of population processes. In this way, assuming a 25-year generation interval and non-overlapping generations, each experi- ment spans 11,000 years. The simulated al- lele frequencies were printed at generation 440, representing present time, for the local- ities corresponding to those for which allele- frequency data are available in a database of European allele frequencies (Sokal et al., 1989a), from which non-Indo-European speakers had been discarded. Matrices of Prevosti’s genetic distances (Prevosti et al., 1975) were calculated on both real and sim- ulated allele frequencies, and their degree of resemblance was evaluated by Mantel (1967) tests of matrix correlation.

Five FORTRAN programs were written, each corresponding to one of five models for the origins of Indo-Europeans described in the following section, and incorporating sub- routines developed by Press et al. (1986) and Manly (1991).

An outline of the models

IBD: Isolation by distance The first, clearly oversimplified, hypothe-

sis, is that Indo-European-speaking popula- tions evolved under conditions of isolation by distance (IBD model). Current patterns of genetic variation would then simply re- sult from the interaction between random fluctuations of allele frequencies in time, i.e., genetic drift, and dispersal of individu- als. Under isolation by distance, variations in population size affect only the impact of genetic drift-the larger the population, the smaller the allele-frequency fluctuations. Population growth, which occurs after gen- eration 40, does not prompt migratory move- ments. Thus, the IBD model neglects all gene flow processes other than those in which movements of individuals from their birthplaces are local and random (i.e., equally likely in all directions except for mi- gration resistance factors, see below).

Under the IBD model, the demographic increase that occurred in the Neolithic (and is detailed in the section Population Growth

Page 4: Indo-European origins: A computer-simulation test of five hypotheses

112 G. BARBUJANI ET AL

Among Farmers) was simulated without separating hunting-gathering and farming populations, i.e., as if all hunter-gatherers turned to agriculture a t 8,000 BC.

OAG: Isolation by distance plus effects of the origin of agriculture

Isolation by distance is the null hypothe- sis for human microevolution (Wijsman and Cavalli-Sforza, 1984). Therefore, all models that follow are not alternative to IBD. Rather, they incorporate it as a necessary, if not sufficient process, for determining the currently observed patterns of genetic varia- tion. OAG, the simplest such model, is one which combines IBD with the likely effects of the demographic processes following the origin of agriculture. Under this model, pop- ulations of hunter-gatherers initially occupy Europe and evolve under isolation by dis- tance. At a specific moment in time (8,000 BC, chosen on the basis of archaeological in- formation), a few populations in southern Anatolia turn to farming. This starts a local process of population growth in the areas where farming is being practiced, followed by dispersal outwards when local population densities have reached a certain threshold. In this way, migratory movements between farming communities are not necessarily symmetrical, as is reasonable to assume in many evolutionary scenarios (Rogers and Jorde, 1987). The rate of spread of farmers is driven by their intrinsic growth rate; it is constrained only by geographical factors such as mountain chains or bodies of water. No cultural transmission is simulated be- tween the hunter-gatherers and the farmers who immigrate into their regions. Only the farmers’ allele frequencies are eventually compared with observed matrices of genetic distances. In this way, the genetic conse- quences of this model are those that would be expected if hunter-gatherers were re- placed without admixture, i.e., became ex- tinct. The only microevolutionary role they play is to serve as a source population at the beginning of the Neolithic, for that small fraction in Anatolia of the total population that develops the new farming technologies.

OAC: Isolation by distance, plus effects of the origin of agriculture, and cultural transmission

Cultural transmission from farmers to hunter-gatherers may be built into the model, yielding what we call OAC; C stands for culture. Under this model, at all locali- ties some hunter-gatherers learn how to produce food, and therefore their alleles are transmitted across generations with greater efficiency. From the genetic standpoint, this is equivalent to a certain degree of admix- ture, whereby some genes of the hunter- gatherers contribute to the gene pool of the farmers. As a consequence, these genes spread at once with the genes of the farmers, and are thus carried into new localities. This is the Neolithic demic diffusion model, as originally proposed (Menozzi et al., 1978; Renfrew, 1987).

ATC: Isolation by distance, plus effects of the origin of agriculture, cultural transmission, and archaeological time constraints

Under OAC, the spread of farmers from Anatolia into Europe is driven by their in- crease in numbers at each locality, which causes dispersal towards areas of lower pop- ulation density. Therefore, in the OAC model, the farming technologies spread at an approximately constant rate through space (as in Ammerman and Cavalli-Sforza, 1971). This is known to be an approximation (Barker, 1985). A further refinement of OAC considers archaeological time constraints (ATC). Under ATC, we use archaeological information about the likely date at which farming reached each specific site in Europe (see Sokal et al., 1991). In this way, the arrival of farmers into a new locality re- flects archaeologically documented cultural transformations; farmers spread at an irreg- ular rate, corresponding to the actual pro- cess as inferred from archaeological evi- dence. Incorporation of hunter-gatherers into each farming population occurs at the same rates and through the same processes as in OAC.

Page 5: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 113

GIM: Isolation by distance, plus effects of the origin of agriculture, cultural transmission, archaeological time constraints, and late Neolithic migmtions

In the OAG, OAC, and ATC models, the first farmers are also considered the first speakers of proto-Indo-European. The alter- native hypothesis considers them as the Ne- olithic inhabitants of areas that were later invaded by the first proto-Indo-European speakers, the Kurgan people (Gimbutas, 1979, 1986). The three migrational waves postulated by Gimbutas are added to ATC in the GIM model, by simulating long-distance migratory movements between 4,250 BC and 2,900 BC. A number of successive population movements are added as well; presumably, they were independent from the spread of Indo-European, but are considered by Gim- butas (personal communication) relevant to an accurate description of human evolution in Europe.

Details on the simulation parameters and algorithms

The data matrix In all simulation cycles, a matrix of 60

columns by 37 rows was defined, each ele- ment in the matrix representing a square of edge length 1 degree in a Mercator projec- tion of Europe. The data matrix covers the area between 10 degrees of longitude West and 50 degrees East, and between 72 and 35 degrees of latitude North. Iceland is not in- cluded in this simulation.

Geography

Each of the 2,220 elements (= nodes or pixels or localities) of the data matrix con- tained an integer value, L, which was 1 for plains, 2 for mountains, 3 for seas, and 4 for the Black Sea. A local population was as- signed to each of the 1,512 land pixels.

Initial allele frequencies

Under all the models tested, for each land pixel a variable PHG was defined, represent-

ing the frequency of an allele at a polymor- phic locus in the hunting-gathering popula- tion, i.e., in the only type of population existing at the beginning of the simulation. From a mathematical standpoint, it does not make a difference if this locus is regarded as biallelic, or if it is considered multiallelic, since the fate of only one of its alleles is simulated in the following phases. Allele fre- quencies were drawn from a gamma distri- bution truncated at one (Nei, 19871, whose mean was fixed either a t 0.33 or at 0.50.

Initial population sizes

Estimates of population densities among current hunting-gathering tribes suggested to Rendine et al. (1986) that the effective size NHG of the hunting-gathering popula- tions of Europe should be approximately 300 in each of the 840 elementary areas of their simulation. To have the same population density in the 2,220 pixels of this simula- tion, NHG was fixed at 114. This corresponds to a population density of 0.04 individuals per square km, within the estimated range of population densities for hunter-gatherers in temperate climates (Hassan, 1981). In Rendine et al.’s (1986) model, the individu- als were considered as haploid, whereas here they are diploid. This may have caused a certain degree of divergence between the two models, as the drift variances are af- fected by the levels of ploidy of a population.

Genetic drift among hunter-gatherers

Non-overlapping generations were simu- lated. At each generation, a new allele fre- quency was drawn, for each locality, from a normal distribution whose mean was the al- lele frequencyp of the same population at the previous generation, and whose vari- ance wasp0 - p), divided by twice the effec- tive population size NHG (Nei, 1987). This represented the effect of sampling of alleles from one generation to the following, i.e., random genetic drift.

Dispersal of hunter-gatherers

Symmetrical dispersal occurred between adjacent pixels, once every generation fol-

Page 6: Indo-European origins: A computer-simulation test of five hypotheses

114 G. BARBUJANI ET AL.

TABLE 1. Factors of resistance to migration (RTW

Between a pixel in the- And an adjacent pixel in t h e Plains Mountains Sea Black Sea

Plains 0.00 Mountains 0.25 0.50 Sea 0.45 0.70 0.90 Black Sea 1.00 1.00 1.00 1.00

lowing drift; population sizes of the two lo- calities exchanging individuals did not change as a result of dispersal. Therefore, this study assumed a stepping-stone model (Kimura and Weiss, 19641, whose properties are discussed by Jorde (1980). We chose a dispersal rate m of 0.065 (as in Rendine et al., 1986), which means that at each genera- tion, after reproduction, 6.5% of the resident individuals could be replaced by immigrants from the adjacent pixels. However, physical obstacles in the pixels between which dis- persal occurred could reduce the number of migrants. Physical obstacles to migration were expressed by a factor of resistance to migration, RTM, detailed in Table 1. The average value of RTM, calculated between all suitable pairs of pixels, was 0.300. This means that, on the average, only 70% of the potentially dispersing individuals actually moved from their birthplace to an adjacent locality. To compensate for this, the dis- persal rate m was replaced by m' = 0.0651 0.700 = 0.0928. In other words, 9.28% of the individuals in a locality were potentially subject to migration elsewhere, and 6.5% ac- tually migrated, on the average. The num- ber N(AB) of individual hunter-gatherers moving from locality A t o B (and vice versa, from B to A) was

N(AB) = NHGm' ~ (1 - RTM(AB)) (1) 4

where the denominator refers to the number of adjacent populations in a stepping-stone model, and RTM(AB) depends on the envi- ronmental features a t localities A and B (Ta- ble 1). WhenNHG was fixed at 114, each pair of adjacent localities exchanged 2 effective individuals per generation. However, under the IBD model the hunting-gathering popu- lations increase in size, starting at genera-

tion 40; in this case, the number of migrants per generation increased accordingly. When, by contrast, hunter-gatherers decreased in numbers owing to the expansion of farmers, NAB) decreased as well until it reached zero. Because of the RTM factor, two locali- ties in the plains exchanged freely one- fourth of the migrants allowed at each gen- eration, whereas dispersal was reduced between populations separated by moun- tains or bodies of water, and for populations at the extremes of the simulated area. No dispersal was allowed across the Black Sea, to more carefully represent the population processes in the surrounding area (e.g., the migrations of Kurgan people), which oc- curred mostly by land movements.

The allele frequency after migration, pfHG, was then calculated as

P'HG = PNG[(~ - m') (1 - RTMI -

+ rn'PHGin (2)

where RTM is the average resistance to mi- gration between the pixel of interest and the nA adjacent pixels (1 < nA < 41, and pHGin is

Inception of farming in the nuclear zone

Under all models except IBD, the spread of farming starts with the splitting of some populations into two groups, one practicing agriculture, and the other still living in a hunting-gathering economy. At generation 40 (i.e., 10,000 years ago), 20 individuals turn to farming at each of six pixels in Ana- tolia, around the village of Catal Humk, where the oldest archaeological evidence of farming activities is situated (Redrew, 1991). Each group of 20 individuals repre- sents a random sample of the pre-existing hunting-gathering population at the same site; therefore, initially they have the same allele frequency: pF = PHG. However, from generation 41 they evolve in reproductive isolation, so that there will be two distinct populations of hunter-gatherers and farm- ers, HG and F, at those localities, with dis- tinct allele frequencies. In the absence of

Page 7: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 115

cultural transmission between groups, i.e., under the OAG model, the two groups coex- ist without any genetic exchange, at all lo- calities where farming communities have been established.

Population growth among farmers

The farming populations have access to a wider range of resources, and tend to in- crease in numbers (Hassan, 1973); 50-fold increases have been estimated by Ammer- man and Cavalli-Sforza (1984). We simu- lated a logistic increase, whose key parame- ter, the growth rate, generally referred to as r (Feller, 1940; Eisen, 1979; Keyfitz, 1977), was here called a, following Rendine et al. (1986). In the absence of reliable informa- tion on growth rates among early farmers, we tried a preliminary set of values, and chose a = 0.5, which gave us a rate of popu- lation increase compatible with the known rates of spread of farming in Europe, 1 km every year, on the average (Ammerman and Cavalli-Sforza, 1971). That value was em- ployed in all the simulations presented here. The equation calculating, at generation t and for each locality, the effective size NF of the farming population is

1) x (1 + a (1 -

where NF(t - 1) is the population size at the former generation, and 7,560 is the carrying capacity of the area, chosen for reasons anal- ogous to those that led us to choose NHG = 114. Since the effective population size is approximately one-third of the cen- sus size (Wright, 19691, this corresponds roughly to a farming population of 23,000 dwelling on a 1-degree-square quadrat of land, and to 20,000 farmers in each elemen- tary area of Rendine et al.’s (1986) simula- tion.

Dispersal of farmers and origin of farming outside the nuclear zone

Under the OAG and OAC models, the growth of farming populations prompts mi- gratory movements into neighboring locali- ties where farming has not yet started. Ex-

pansion towards localities a t the same latitude (i.e., with the same type of climate) was presumably more common than dis- persal northwards, towards more rigorous climates. This led us to subdivide the history of our simulated farming populations into four successive phases. Phase 1: Initially, the population is scarce, and simply tends to increase logistically, without sending emi- grants, but receiving immigrants from other farming populations at higher density, if any. Phase 2: When NF reaches a first threshold, T1, a few individuals begin to dis- perse longitudinally; this often entails colo- nization of a new site on the west, whereas gene flow is asymmetrical with the eastern neighbors, whose density is still higher. Phase 3: As the population size approaches the carrying capacity of 7,560, a second threshold, T2 is reached, and gene flow oc- curs also northwards and southwards, once again giving rise to a new population of farmers if the adjacent pixel to the north or south has not been colonized yet. The thresholds had to be fixed in a somewhat arbitrary manner. It seemed realistic to al- low for the first westwards dispersal two to three generations after inception of the farming economy, so as to roughly match the archaeologically documented rates of spread. T1 was fixed at 24 (corresponding to a census size of 72 individuals), whereas for T2 we chose 50% of the carrying capacity, i.e., 3,780. Phase 4: Finally, when adjacent populations have reached their equilibrium size of 7,560 effective individuals, the migra- tory exchanges become symmetrical. The general equation, expressing the number of farmers moving, say, from locality A to adja- cent locality B, N(AB), is

for NF(A) greater than the appropriate threshold, i.e., T1 for latitudinal and T2 for longitudinal movements, respectively. All relevant quantities have been defined for Equation 1. Since N,(A) is variable across generations, N(AB) varies too.

In this way, the input and output of genes, from and to the adjacent pixels of the map, could be different a t different times during

Page 8: Indo-European origins: A computer-simulation test of five hypotheses

G. BARBUJANI ET AL. 116

the simulations. If the number of immi- grants from each locality is taken into ac- count in the calculation of the allele fre- quency of the immigrants, which we called pHGin in Equation 2 and which will be called pFin for farmers, an analogous formula gives the allele frequency after gene flow among farmers:

p'F = p F [ ( l - m') (1 - RTM)I + rn'PFin (5)

Figure 1 is an example of allele frequencies generated under the OAG and OAC models. Allele frequencies of specific localities were, of course, different in different realizations of the same process, and between OAG and OAC. What was constant, however, was the pattern of occupation of land areas by ex- panding farmers, because it depended only on population growth and dispersal parame- ters, which were kept constant across real- izations.

For the ATC and GIM models, by contrast, the spread of farming followed the pattern that can be inferred from archaeological evi- dence. Once a farming community exists at a certain locality, the exchange of genes with neighboring farming communities occurred in the same manner as described for the OAG and OAC models (Eq. 5). The differ- ence is in the establishment of new farming populations, which under ATC and GIM was controlled by a matrix of dates of origin of agriculture at each land pixel of the map (details on how archaeological information was processed for this purpose are in Sokal et al. (1991)). Therefore, when colonization of a new locality by the first farmers had to be simulated, eight effective founding individuals were sampled with replace- ment from the closest suitable locality. In a few cases, this required input of immigrants from localities that were not directly adja- cent to the one in which farming was start- ing. This was the only violation that we tolerated of the assumptions of the step- ping-stone model. Figure 2 is an example of allele frequencies generated under the ATC and GIM models. It shows that the spread of farmers is initially slower than simulated under the OAG and OAC models, especially along the northern shores of the Mediterranean Sea. By genera-

tion 200, however, the four models yield a similar pattern of land occupation. The only major exception is an area of north-western Alps, where archaeological evidence shows a delayed onset of farming activities (Sokal et al., 1991). Of course, such a delay was not predicted by the mechanism of farming expansion underlying our OAG and OAC models.

Pixels in the sea

A stepping-stone model does not allow for long-distance population movements, i.e., those associated with sailing, which are con- sidered important in the colonization of Eu- rope, and in the successive phases of agricul- tural dispersal (Renfrew, 1987). To simulate the effects of the movement of a few individ- uals across the seas, we chose to assign a pseudopopulation to each pixel located in the sea. Pseudopopulations did not undergo random fluctuation of allele frequencies, and did not increase in numbers. They in- cluded only the individuals dispersing from neighboring pixels (their number was deter- mined according to Equation 1)) whose de- scendants had the same allele frequencies, and at each generation proceeded one step forward in the dispersal process. In this way, the movement of a few individuals across the sea was simulated. This had little importance for the allele frequencies of

Fig. 1. Spread of farmers under the OAG and OAC models. The localities where farming is being practiced are indicated by letters representing allele frequency in farmers, at generations 100 (6500 BC), 200 (4000 BC), and 240 (3000 BC). Eight allele-frequency classes are defined, from a to h, each corresponding to an interval equal to 0.125 (a, p F < 0.125; b, 0.125 <p,0.250; c, 0.250 < p F < 0.375; etc.). Hyphens represent areas inhabited only by hunter-gatherers. The nuclear zone is delimited by a solid square. While the pattern of land occupation shown was constant for all the realizations of the mod- els, the allele frequencies depicted are those of a single run of OAC.

Fig. 2. Spread of farmers under the ATC and GIM models. The localities where farming is being practiced, based on archaeological information, are indicated by figures representing allele frequency of farmers, at gen- erations 100 (6500 BC), 200 (4000 BC), and 240 (3000 BC).

Allele-frequency classes are as in Figure 1. The pattern ofland occupation was again the same in all realizations of ATC and GIM, but the allele frequency shown is that of a single realization ofATC.

Page 9: Indo-European origins: A computer-simulation test of five hypotheses

1

t

Page 10: Indo-European origins: A computer-simulation test of five hypotheses

118 G. BARBUJANI ET AL.

hunter-gatherers and for those of farmers once the spread of farming through Europe was completed; however, it had an effect on the establishment of farming communities under the OAG and OAC models, as a few individuals could quickly reach distant lo- calities by sea. The resulting pattern of occu- pation of coastal regions corresponds well with the archaeological evidence (see Figs. 1 and 2).

Cultural contacts and admixture

Under the OAC, ATC, and GIM models, at each generation a certain number of hunter- gatherers adopted farming, if a farming community already existed at the locality where they lived. The likelihood of this cul- tural shift depended on the probability of contacts between farmers and hunter-gath- erers, and on a coefficient of acculturation which was called y by Rendine et al. (1986). The number S of individuals shifting to farming at each generation is related to the probability of contacts between farmers and hunter-gatherers. If farmers are NF at a given locality and at a given moment in time, their probability of meeting one of the NHG hunter-gatherers will represent a frac- tion equal to 2(NF x NHG) of all the (NF + NHG)' possible contacts. The proba- bility y that such a contact will result in acculturation has been estimated at 0.00024 (Rendine et al., 1986). Therefore, at each generation, the NF farmers will transmit their technologies to a number S of the hunter-gatherers estimated as

where all parameters have already been de- fined.

It can be argued that the change in a par- ticular quadrat of the number of hunter- gatherers per generation might better be modelled as proportional to the product of the number of hunter-gatherers and the number of farmers, that is

S = YNHGNF (7)

where y = 1.56250 x In this expres- sion, closely resembling a formula used by

Rendine et al. (1986), the value of y was adjusted so that resultant values of S had approximately the same magnitude as in Equation 6. However, we found, over a range of test conditions, that the results were only trivially affected. Therefore, we report results for Expression 6 without redo- ing the analysis for every test condition.

Disappearance of hunting-gathering populations

In the IBD model, hunter-gatherers adopt farming at generation 40, and thus all genes of Indo-European speakers come from the genetic pool of the hunting-gathering com- munities. In the OAG model, conversely, the hunter-gatherers go extinct, and thus all genes of Indo-European speakers derive from the genes of the few first farmers of Southern Anatolia. These are the two ex- treme models, as far as the origins of Indo- Europeans are concerned. Under the other models, a certain degree of admixture is sim- ulated between the two communities at each locality, reflecting a widespread view of hu- man evolution in Europe (Cavalli-Sforza and Piazza, 1993).

Under the OAC, ATC, and GIM models, the hunters and gatherers are considered to disappear from a certain locality when their number is such that S is less than 1. Be- cause acculturation starts after a phase of population buildup for farmers, the extinc- tion of the hunting-gathering communities also proceeds as a wave, from southeast to northwest, spreading in parallel with the farming economy, but several generations later.

Long-range migratory movements

In the GIM model, three major migratory waves of Kurgan people are supposed to have introduced Indo-European languages into Europe, around 4,250 BC, 3,400 BC, and 2,900 BC, respectively (Gimbutas, 1979, 1986).

In Gimbutas' (1979) view, the westward migrations of Kurgan people in Europe in- troduced a new patriarchal culture, charac- terized by horse-riding and new warfare techniques. These cultural changes were not associated with major innovations of the

Page 11: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 119

subsistence techniques. Most of the popula- tions of Europe by then were farmers. It is therefore highly unlikely that concomitant population growth could occur. We chose to represent these waves as a flow of genes from the purported source area, north of the Black Sea and west of the Caspian Sea. For the sake of simplicity, these movements were concentrated in one generation’s time (at generations 190,224, and 244), although each of them probably lasted two centuries (Gimbutas, 1979). For each movement, we simulated replacement of 20% of the genes in the target area, with genes coming from the source area. Probably this overempha- sizes the genetic consequences of the simu- lated migratory movements.

The invading Kurgan people were not the entire population of the area between the Black and Caspian sea moving en masse; rather, they were groups of individuals be- longing to semi-nomadic tribes (Gimbutas, 1979). Accordingly, we chose to simulate their contribution to the genetic pool of the ”invaded” populations as if they were com- ing from several populations in the appro- priate zone. The location of four such popu- lations was chosen at random, and then held constant in all 2,600 simulation cycles of the GIM model. The allele frequencies of the re- cipient populations (Fig. 3) were then recal- culated as if 20% of the pre-existing indi- viduals had been replaced by immigrant Kur- gan people.

In addition, Gimbutas (personal commu- nication) pointed out to us 12 other direc- tional and potentially migratory processes that may have been important in determin- ing the current linguistic population struc- ture of Europe. These processes are summa- rized in Figure 4, and were incorporated in the GIM model. Once again, for each of them we simulated replacement of 20% of the in- dividuals of the “target” area by individuals whose allele frequency was the average al- lele frequency in the “source” area.

Decrease of population mobility after the establishment of a farming economy

Once farming populations have reached the maximum size allowed by the programs, a reduction of mobility is simulated by sim- ply halving the migration rate m’, all other

parameters being constant. Gene flow is re- duced at generation 265, i.e., 3,375 years ago, by which time all suitable regions had been colonized by early agriculturalists. The parameters of the simulation are summa- rized in Table 2.

Gene-frequency data The simulated sets of allele frequencies

were compared with a database of allele fre- quencies, which had been analyzed in vari- ous studies on Europe (Sokal et al., 1988, 1989a,b, 1990, 1991, 1992; Harding and Sokal, 1988; Barbujani and Sokal, 1990; Sokal, 19911, and had been continuously up- dated. The data corresponding to popula- tions speaking languages other than Indo- European (Basque, Finnish, Estonian, Lapp, Hungarian, Turkish: Ruhlen, 1987) were dis- carded.

Twenty-six genetic systems were consid- ered. Most of them corresponded to indepen- dent loci; exceptions are ABO, MN, and Rh, for which two (or three, for Rh) systems were independently considered, each resulting from typing of alleles by different sets of an- tisera. This convention has long been fol- lowed in studies on human variation (e.g., see Lewontin, 1972). Each system is indi- cated by a letter code, preceded by a number referring to Mourant’s coding system (Mou- rant et al., 1976), except for 100HLA-A, 101/2HLA-B, 200GM, and 201Kh4, whose numerical codes were assigned in our lab- oratory. Overall, 3,481 records, and 93 alle- les or haplotypes were considered.

The number of samples available for the 26 systems varied widely, ranging from a minimum of 27 (for 5-1 LUTHERAN), to a maximum of 762 (for 1-1 ABO). Genetic dif- ferences between localities were summa- rized by 26 matrices of Prevosti’s distances (Prevosti et al., 19751, separately calculated for each system. We shall refer to these ma- trices as observed distance matrices, as op- posed to the simulated ones, generated by computer under one of the five models tested. To properly compare the two sets of data, prior to calculating genetic distances we pooled the observed frequencies of all al- leles except the one whose average fre- quency was closest to 0.5.

Page 12: Indo-European origins: A computer-simulation test of five hypotheses

120 G. BARBUJANI ET AL.

Fig. 3. Migratory movements simulated under the GIM model. The four localities whose allele frequencies are averaged, to represent the allele frequencies of the migrating population, are marked by asterisks. The re- gions affected by the three migratory waves proposed by

Gimbutas are surrounded by solid lines; the zone where Basque is currently spoken is not supposed to have been affected by these migratory episodes. Figures refer to the generation at which each migratory wave was simu- lated.

Hypothesis testing For every one of the 26 genetic systems,

each of the five simulation programs was run 100 times, yielding 13,000 simulation cycles (or realizations) in all, 2,600 for each model. Every realization resulted from de- terministic movements of individuals across the map of Europe, and random allele-fre- quency fluctuations occurring during ini- tialization and from genetic drift each gen- eration. The latter were dictated by a random number generator, and gave rise to the various replicates at generation 440. A particular realization of the OAG process

adds OAG movements to those already re- quired by the IBD model, but random allele- frequency fluctuations are the same. This is also true for the other models, each incorpo- rating the previous ones. For all models, then, random change in allele frequencies was the same, for each pixel and each gener- ation, under all five models. It was not the same, however, for the hunting-gathering and for the farming populations of the same pixel. Thus, the 500 cycles of simulation for each locus fall naturally into 100 groups, each with five matched runs, in ascending model order. Because of the matching, we

Page 13: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 121

0

Fig. 4. Twelve potentially important migratory processes considered by Gimbutas (personal commu- nication) to have been relevant in European ethnohistory, each one represented by an arrow from a “source” region to a “target” region. Figures refer to the generation at which each population movement was simulated.

could use paired statistical tests to compare models. This provided a considerable in- crease in statistical power over unpaired comparisons.

After 440 generations in each computer cycle, simulated allele frequencies were sampled from localities chosen so as to match the locations of the samples of the observed allele-frequency database. Matri- ces of Prevosti’s distances were computed from the simulated data, so as to obtain 100 simulated matrices for each matrix of ob- served genetic distances (Prevosti et al., 1975). This measure of genetic distance was

chosen for consistency with previous studies (Sokal, 1988; Sokal et al., 1993).

Simulated and observed matrices were then compared pairwise by means of Man- tel’s test of matrix association (Mantel, 1967; Smouse et al., 1986). This test com- putes the equivalent of a correlation coeffi- cient between matrices, and evaluates its significance by constructing a null distribu- tion of the test statistic. A Monte Carlo pro- cedure is employed for this purpose; rows and columns of one matrix are repeatedly permuted at random, while the other matrix is kept constant, and the test statistic is re-

Page 14: Indo-European origins: A computer-simulation test of five hypotheses

122 G. BARBUJANI ET AL.

TABLE 2. Parameters defined in the simulation

L NHG

Environment: 1 = plains; 2 = mountains; 3 = seas; 4 = Black Sea Effective population size of hunter-gatherers (114 in all models but IBD, where it is allowed to

increase up to 7560. In the OAC, ATC, and GIM models it is then reduced as an effect of the cultural transmission of farming technologies)

Effective population size of farmers (Initially equal to 0, then allowed to increase up to 7560 in all models but IBD)

Frequency of one allele among hunter-gatherers (initially sampled from a gamma distribution) Frequency of one allele among farmers for all models except IBD (Initially undefined. For the six

pixels of the nuclear zone, pF = pHG at generation 40. In the other pixels, the initial pF value reflects the proportion of immigrating farmers and of hunter-gatherers who were incorporated into the farming population, under the different models)

NF

PHG PF

a Intrinsic growth rate of the farming populations Y m Migration rate RTMAB)

RTM

Acculturation rate, i.e., rate of assimilation of hunter-gatherers by farmers

Resistance to migration between localities A and B (see Table 1) Average resistance to migration between one pixel and its adjoining pixels

calculated each time, so as to yield the de- sired null distribution.

Because each observed matrix was com- pared with 100 simulated matrices for each model tested, a procedure was needed to combine all this information. We chose to compute average Mantel correlation coeffi- cients, and to calculate Fisher’s combined probabilities (Sokal and Rohlf, 1995) from the 100 individual probabilities for each model.

Since we are looking for positive associa- tion of observed and simulated data, and a negative correlation would have no biologi- cal meaning, all tests of significance were one-tailed. However, as a further control, we also counted the number of occurrences of negative correlations that would be signifi- cant if the test had been two-tailed. This allowed us to identify the models generating allele-frequency distributions departing widely from the observed distributions.

The main purpose of this study was to compare competing hypotheses on the origin of Indo-Europeans. Because the hypotheses can be ranked, by increasing complexity, from IBD through OAG, OAC, ATC, and GIM, four painvise tests of goodness of fit were carried out, OAG versus IBD, OAC ver- sus OAG, ATC versus OAC, and GIM versus ATC. This was done taking advantage of the paired design based on the same random seeds in the simulations. We employed two different procedures: 1) a paired-compari- sons t-test, where the test statistic was the difference between the average Mantel cor-

relation coefficients, and only positive dif- ferences (indicating an improvement of the fit for the more complex hypothesis) were considered significant; 2) Wilcoxon’s signed rank test (Sokal and Rohlf, 19951, a nonpa- rametric paired-comparisons test, once again considering significant only the cases in which the fit improved for the more com- plex hypothesis.

RESULTS The average Mantel correlations for each

genetic system (Table 3) yield numerous sig- nificant agreements between observed and simulated matrices of genetic distances, for all models, although fewest with IBD. The number of significant (P < 0.05) positive av- erage correlations is maximal for the ATC model (20/26), but it is not substantially lower for OAG, OAC, and GIM (respectively, 18, 17, and 16 systems). For IBD it is only 10/26. The numerical values of the correla- tions are low despite their high level of sta- tistical significance. This is characteristic for genetic distances and is because the Mantel correlations were constrained to be linear.

Next we examine the number of cases in which individual simulation realizations gave genetic distance matrices that would appear negatively associated (P s 0.05) with the observed one, if the test had been two-tailed. A substantial number of dis- agreements, between observed and simu- lated data is evident for IBD (6 at system 4-13 RHESUS), for ATC (8 in the 1-1 ABO

Page 15: Indo-European origins: A computer-simulation test of five hypotheses

123 INDO-EUROPEAN ORIGINS

TABLE 3. A) Average Mantel correlations of observed with simulated genetic distances and B) significance levels based on ont-tailed probabilities for positive correlation combined by Fisher's method'

~

System IBD OAG OAC An: GIM

A. Mantel correlations 1-1-mo 1-2x30 2-5-MN 2-7-MN

4-1RHESU 4-13RHES 4-19RHES 5-1LUTH 6-1-KELL 6-3-KELL 7-1ABHSE 8-1DUFFY

3-1-P

36-1-HP 37-1-TF 38-1-GC 50-1-1AP 52-PGD

56-AK 63-ADA 65-TASTE 100HLA-A 101-102 200-GM 201-KM

1-1-ABO 1-2-ABO 2-5-MN 2-7-MN 3-14' 4-1RHESU 4-13RHES 4-19RHES 5-1-LUTH 6-1-KELL 6-3-KELL 7-1ABHSE 8-1DWFY

53-PGM1

B. Significance levels

36-1-HP 37-1-TF 38-1-GC 50-1-1AP 52-PGD 53-PGM1 56-AK 63-ADA 65-TASTE 100HLA-A 101-102 200-GM 201-KM

0.01294 0.05699 0.00324

-0.02886 -0.03558

0.01311 -0.02547 -0.04258 -0.00590

-0.04254 0.01789

0.01781 0.00290 0,11991 0.00592

-0.01547 -0.00965

0.04775 -0.00845 -0.00696

0.00667 0.02802 0.0 14 17

-0.00854 0.01455

-0.02816

0.00000 0.00000 0.00339 1.00000 1.00000 0.00000 0.99975 0.99996 0.52240 0.47350 1.00000 0.00942 0.94188 0.00000 0.82795 0.98856 0.94062 0.00554 0.99819 0.97077 0.05673 0.00000 0.00107 0.97945 0.04825 0.99886 0.00000

0.14800 0.10831

-0.05627 -0.02236 -0.04556 -0.01457

-0.03158 -0.05392

0.06920

0.07960 0.06145 0.01064 0.04017 0.15415 0.08658

0.12963

0.13174 0.06676 0.27423 0.17960 0.13654 0.24013 0.34928 0.07813

0.00000 0.00000 1.00000 1 .ooooo 1.Ooooo 1.00000 0.00000 0.99980 1.00000 0.0oooo 0.00000 0.03488 0.00000 0.00000 0.00000 1.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.0oooo 0.00000 0.00000 0.00000

-0.05739

-0.03417

0.15233 0.09718

-0.05352 -0.02091 -0.03411 -0.01428

0.06120 -0.02899 -0.04768

0.08378 0.01510 0.02805 0.05362 0.17941 0.09142

-0.05843 0.11903

-0.03457 0.14000 0.10858 0.26493 0.16922 0.15443 0.22784 0.33330 0.09279

0.00000 0.00000 1.00000 1.00000 1.00000 1.00000 0.00000 0.99913 1.00000 o.ooooo 0.09492 0.00002 0.00000 0.00000 0.00000 1.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 o.ooooo 0.00000 0.00000 0.00000

0.13137 0.00411

-0.04100 -0.03127

0.02409 0.00674 0.03431

-0.01635 0.00777 0.03392

-0.01550 0.05623 0.01197 0.07459 0.08117

0.10520

0.10789 0.05081 0.07537 0.07373

.0.08977 0.12228 0.14102 0.05959

-0.01637

-0.04371

0.00000 0.00489 1.00000 1.00000 0.o0000 0.00198 0.00000 0.00001 0.22014 0.00001 0.94677 0.00000 0.01885 0.00000 0.00000 0.51374 0.00000 0.99974 0.00000 0.00000 0.00000 0.00000 0.00000 0.0oooo 0.00000 0.00000 0.00000

0.01887 0.03532

-0.03231 -0.03912

0.01112 0.03130 0.02375

0.02033 0.03330

0.05420

0.08002 0.00461 0.05178 0.06548 0.01603 0.06152

0.03099 0.04040 0.01311 0.09136 0.03480

-0.08314

-0.06665

-0.01077

-0.01301

-0.03732

0.00000 0.00000 1.00000 1.00000 0.00003 0.00000 0.00000 1.00000 0.04162 0.00385 1.00000 0.00000 0.99931 0.00000 0.79836 0.00000 0.00000 0.68047 0.00000 0.99442 0.00132 0.00000 0.06326 0.00000 0.00115 0.99918 0.00000 Overall probability

'Values below 0.05 show significant positive correlation between simulated and observed genetic distances

and 11 in the 2-5 MN systems), and for GIM (26 in the 1-1 ABO, and 8 in the 4-19 RHESUS systems).

The painvise comparisons by t-tests and by Wilcoxon's signed ranks test agree in their means and significances that the most

significant increase in the resemblance of observed and simulated genetic distances occurs between IBD and OAG. In the t-tests shown in Table 4,17 systems show a signifi- cant increase of correlation, whereas in only 7 cases does the similarity decrease. These

Page 16: Indo-European origins: A computer-simulation test of five hypotheses

G. BARBUJANI ET AL

TABLE 4. Results of paired comparisons t-tests for differences between 5 Indo-European simulation hypotheses:

124

P ( H I ) - P (H2)’ ~ ~

System OAG - IBD OAC - OAG ATC - OAC

1-1-ABO 0.13506*** 0.00432 -0.02095 1-2-ABO 0.05133*** -0.01113 - 0.09307 2-5-MN 2-7-MN 0.00650 0.00145 -0.01037 3-1-P -0.00998 0.01145*** 0.05819*** 4-1RHESU -0.02767 0.00029 0.02102*** 4-13RHES 0.09467*** -0.00800 -0.02689 4-19RHES 0,01099 0.00259 0.01265 5-1-LUTH -0.04802 0.00624 0.05545*** 6-1-KELL 0.06171*** 0.00419 -0.04987 6-3-KELL 0.10399*** -0.04634 - 0.03060 7-1ABHSE -0.00717 0.01741*** 0.02818* 8-1DUFFY 0.03727*** 0.01345*** -0.04164 36-1-HP 0.03425** 0.02525** - 0.10482 37-1-TF 0.08065*** 0.00484 -0.01026 38-1-GC -0.04192 -0.00104 0.04206*** 50-1-1AP 0.13928*** -0.01061 -0.01383 52-PGD -0.08192 -0.00040 -0.00914 53-PGM1 0.14019*** 0.00826 -0.03212 56-AK 0.07372*** 0.04182*** -0.05777 63-ADA 0.26756*** -0.00930 -0.18956 65-TASTE 0.15158*** -0.01038 -0.09549 100HLA-A 0.12237*** 0.01789*** -0.06466 101-102 0.24867*** -0.01229 -0.10556 200-GM 0.33473*** -0.01597 -0.19229 201-KM 0.10630*** 0.01466 -0.03320 Mean difference 0.07402 0.00198 -0.03662

‘Asterisks indicate significant positive differences as follows: *0.05 3 P 2 0.01, **0.01 P P > 0.001, ***0.001

-0.05951 0.00275*** 0.01252**

P.

GIM - ATC

-0.11250 0.03121*** 0.00869*

-0.00785 -0.01296

-0.01056 -0.06679

-0.00061 -0.05115 -0.00203 -0.02275

0.00543 -0.07655

0.06816*** -0.03972

0.05974*** -0.04637 -0.06382 -0.04438 -0,03333 -0.07666 -0.03092 - 0.10622 -0.09691 -0.02660

0.02456***

0.01256

figures compare with 7 systems showing sig- nificantly improved correspondence and 10 showing decreased correspondence for OAC versus OAG, 6 and 19, and 5 and 19, respec- tively, for ATC versus OAC, and GIM versus ATC. These results are reflected in the mean differences shown at the bottom of Ta- ble 4.

Using Wilcoxon’s criterion, the results do not change much, so we do not feature them as a separate table. The numbers of systems showing significance increased and any de- creased resemblance between observed and simulated data are, respectively, 17 and 7 for OAG versus IBD; 10 and 9 for OAC ver- sus OAG; 5 and 19 for ATC versus OAC; and 5 and 18 for GIM versus ATC.

We conclude, as a result of all the tests, that similarity between observed and simu- lated genetic distances increases from IBD to OAG, and, to a lesser extent, from OAG to OAC. On the contrary, it decreases as mod- els are tested in which the spread of farmers is constrained by archaeological time data (ATC), or demographic processes occurring in the late Neolithic are added (GIM). Thus

although ATC shows a significant positive correlation for the largest number of sys- tems, on the average, correlations between observed and simulated genetic distance are higher for OAC and OAG.

To test the plausibility of our simulations we also calculated FsTvalues (Wright, 1978) for both our observed gene-frequency surfaces and the simulated surfaces. The median results over all genetic systems are 0.011780 for the observed surfaces, and 0.098273, 0.10399, 0.096562, 0.056609, and 0.003211, respectively, for the simulated surfaces of models IBD, OAG, OAC, ATC, and GIM. Although the FsTvalues of the observed surfaces overlapped only slightly those of the simulated surfaces, the median of the observed data falls within the bound- aries of the medians described by the mod- els. The latter fall into 3 groups by magni- tude of FsT. These are 1) IBD, OAG, OAC; 2) ATC; and 3) GIM. Thus, the clear superior- ity of OAG and OAC over IBD cannot be shown by FST since various patterns of local- ity differentiation can yield the same F,, value.

Page 17: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 125

DISCUSSION Which model matches observed

data best?

The IBD model assumes that, in the Neo- lithic, groups of hunter-gatherers and farm- ers were not separated. The former gradu- ally turned to farming, so that there was a genetic continuity between pre- and post- Neolithic populations in Europe, and Indo- European languages spread only by cultural transmission. Allele frequency patterns gen- erated under this model resemble poorly the patterns of genetic variation observed in contemporary populations, showing that this evolutionary hypothesis does not fit with the available genetic evidence.

Resemblance between observed and simu- lated patterns is much greater for the other four models, in which farmers evolve sepa- rately from hunter-gatherers, and processes of population expansion are important. The levels of resemblance, however, do not differ much among these four models. For in- stance, the GIM model, including several population processes occurring in the last 5,000 years, does not give higher correla- tions, or significant correlations at a higher number of loci, than the OAG model, where the demographic changes prompted by the origin of agriculture are simulated in a much rougher manner.

Actually, various results of this simula- tion study indicate that the fit of simple models, such OAG and OAC, is better than that of more complex models. The Mantel correlations between observed and simu- lated genetic distances would be negative and significant in only 21 of the 2,600 cases for OAG, and in 14 cases for OAG, had tests been two-tailed. These figures compare with 89 and 105 negative significant correlations for ATC and GIM, respectively. It seems, therefore, that nothing is added to our un- derstanding of the phenomena, if we add ar- chaeological time data to constrain the spread of Neolithic farmers, and even less so if we simulate population movements in the late Neolithic. The models where farmers disperse into new areas simply because of their numbers, which increase logistically, yield patterns showing a better agreement with the observed data.

Similarly, the pairwise comparison of models show a substantial increase of fit of the OAG over the IBD model (Table 41, whereas the elements included in the ATC and GIM simulations cause a slight but evi- dent departure from the patterns observed, making them poorer fits than OAG.

A first conclusion one may draw from the results of this simulation study is that two models account best for many aspects of the contemporary genetic structure of Indo- European-speaking populations of Europe. One is the demic diffusion model, as origi- nally put forward by Menozzi et al. (1978), and associated with linguistic evidence by Renfrew (1987). Under this model, here called OAC, the two forces driving microevo- lution in Europe were population growth de- termined by farming, and dispersal accom- panied by limited population admixture between early agriculturalists (possibly pro- to-Indo-European speakers) and preexisting hunters and gatherers. The other model, OAG, is a simplified version of the demic diffusion model, in which dispersal of farm- ers does not lead to any degree of admixture with hunter-gatherers.

How plausible is the OAG model? While the OAC model has already re-

ceived support from studies focussing on its genetic (Sokal et al., 1991; Cavalli-Sforza et al., 19931, as well as linguistic and archaeo- logical, aspects (reviewed in Renfrew, 1992), what we called OAG here has not been ana- lyzed in detail so far. An apparent problem with it is, how can a model not involving admixture account for the continent-wide clines observed in Europe?

Inspection of gene-frequency maps gener- ated in this study, at various moments in time, shows that founder effects are common while farmers disperse. Founder effects are due to the limited numbers of individuals who start the farming communities in new localities. In the OAG model as we simu- lated it, most farming communities start with 8 effective individuals; but even if this number were larger, the probability for the allele in question to be lost or fixed would be substantial. Loss of genetic variation through repeated founder effects has been invoked as the likely cause of clines in sev-

Page 18: Indo-European origins: A computer-simulation test of five hypotheses

G. BARBUJANI ET AL. 126

era1 studies on natural populations of toads in Australia (Easteal, 1988) and aquatic in- vertebrates in Canada (Boileau et al., 1992). Theoretical work on the genetic effects of colonization of previously unoccupied locali- ties (Wade and McCauley, 1988) agrees with this view.

An additional factor, increasing the likeli- hood of clines even in the absence of admix- ture between farmers and hunter-gatherers, is the Black Sea. Archaeological evidence (e.g., see Renfrew, 1991) indicates that two waves of early farmers dispersed westwards and northwards from the Near East, with the Black Sea separating them (this is why we did not allow movement of individuals through it, but only along its coasts). The two waves later converged in eastern Eu- rope, after a period of independent evolu- tion. If the same allele had been lost, or fixed, in both groups of farmers, no particu- lar pattern would result; but if founder ef- fects had had opposite consequences in the two groups, the successive admixture would initially determine a steep cline, and succes- sive gene flow would smooth it, resulting in a wide gradient (Endler, 1977).

Even under OAG, therefore, a certain role of admixture is important. But admixture, under OAG, is between different groups of farmers, who were geographically separated in part of their evolutionary history, rather than between farmers and hunter-gatherers of the same area. This interpretation em- phasizes the role both of geographical fac- tors, such as distance between regions, and of cultural barriers between sympatric com- munities of farmers and hunters-gatherers. Indeed, physical barriers are often associ- ated with genetic and linguistic change, even between Indo-European speakers (Bar- bujani and Sokal, 1990, 1991), although other evolutionary mechanisms may also ac- count for that association (Barbujani, 1991).

Genetics and Kurgan waves Introducing the three migratory waves

postulated by Gimbutas (GIM model) into the simulation, not only does not increase the correlations, but somewhat reduces them. This means that the current patterns of allele frequencies among Indo-Europeans can be explained without resorting to the

migrations of Kurgan people. This study cannot establish whether or not these mi- gration events really occurred, but, if they occurred, they did not leave a significant mark on the allele frequencies of current populations.

Renfrew (1987) argued that the cultural transformations that led Gimbutas to hy- pothesize late-Neolithic migration waves could be due to cultural contacts instead, and equated the first Indo-Europeans with the first farmers. The extensive changes in ceramics, architecture, and metallurgy oc- curring in the late Neolithic are then attrib- uted to trading and imitation; long-distance migratory movements, if any, may have been marginal. Although not proved by our simulation, this view is fully compatible with it.

This study, therefore, agrees with the main views expressed by Menozzi et al. (19781, Rendine et al. (19861, Piazza (19931, and Cavalli-Sforza et al. (1993). By contrast, the emphasis laid by the same authors on late Neolithic migrations from the Pontic steppes (Cavalli-Sforza et al., 1993) does not find support in our simulations. Among the possible causes of this discrepancy, it may be that Mantel's correlations are not sensitive enough to recognize the effects of minor pro- cesses of gene flow, such as those presum- ably occurring in the late Neolithic. Alterna- tively, however, or in addition, one should consider the possibility that principal com- ponents associated with low eigenvalues re- flect, at least in part, artificial gradients due to data interpolation. This may be the case for areas where population samples are sparse, such as most of eastern Europe. For example, the Caucasus seems to show clinal variation in the first and third principal components of Cavalli-Sforza et al. (1993), but a detailed genetic study shows that clines are very uncommon there (Barbujani et al., 1994).

Our evaluation of the Gimbutas model should be revised if evidence could be pro- vided that the spread of the Kurgan people was accompanied by an increase in popula- tion sizes larger than that simulated by us. A certain level of ambiguity exists about this, as the movement of people from the Pontic steppes that Gimbutas (1979) hy-

Page 19: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 127

pothesized is called a “population expan- sion” by Cavalli-Sforza et al. (1993). These authors seem to suggest that, because of the warfare technologies associated with it, larger populations could be supported in the regions affected. This aspect remains to be explored, and we do not have evidence for or against this view. However, even if this had been the case, the increases in population sizes prompted by the beginning of food pro- duction seem to have been much larger than those associated with new war technologies (see Ammerman and Cavalli-Sforza, 1984). Unless European populations increased dra- matically in size between 6,000 and 5,000 years ago, as they did with the arrival of the new farming technologies, we conclude that the long-distance migrations postulated by Gimbutas remain an unnecessary element in the evolution of Indo-European-speaking populations, as reconstructed from the com- parison of theoretical models and gene-fre- quency data.

Besides, early farmers expanded into ar- eas of low population density, where few im- migrants could substantially modify the ge- netic build up of local populations; but this was not the case for late Neolithic groups, who invaded regions already occupied by large farming communities. Simulations of genetic processes based on the coalescent approach (see Hudson, 1990) show that pat- terns of genetic variation do not tend to change much after a demographic expansion (Harpending, 1994; Rogers and Jorde, 1995). Successive population movements can smooth out the gradients and blur some patterns, but are unlikely to leave a signifi- cant mark on allele frequencies.

Are parasites responsible for clines? Recent evolutionary models (reviewed in

Ladle, 1992) indicate that new genotypes entering an area could be resistant to the parasites that are already adapted to the common resident genotypes. The new geno- types would then increase in frequency, un- til the parasites adapt to them. A selective mechanism of this type, combined with gene flow, might have been important in deter- mining the European clines of allele fre- quencies; models may be envisaged whereby a form of frequency-dependent selection,

rather than limited admixture or founder effects, leads to clinal variation of gene fre- quencies (e.g., see Hedrick, 1986). It is in- triguing to note that the hypothesis of demic diffusion from the Near East was initially developed to account for clines a t the histo- compatibility loci, HLA-A and HLA-B (Menozzi et al., 1978; Sokal and Menozzi, 19821, and that the most significant evi- dence for clines spanning Eurasia has been found at the glyoxalase locus (Barbujani, 19871, which is linked with HLA on chromo- some 6, in a region of extensive linkage dise- quilibrium (Hedrick et al., 1986).

However, this view, although compatible with the gradients existing at the HLA and linked loci, can hardly account for the pat- terns of variation observed among Indo-Eu- ropean speakers at other, independently in- herited, loci. Had the resistance to parasites been the main cause of clines in Europe, one would expect isolation by distance patterns at most loci not involved in tissue recogni- tion, which is not the case (Sokal et al., 1989a; and this study). On the contrary, the nearly parallel gradients observed for many independent alleles suggest that an evolu- tionary pressure affecting the entire genome and not merely part of it, i.e., gene flow, played a major evolutionary role (Slatkin, 1985,1987).

Relation to other work Diakonov (1984; cited in Redrew, 1987)

listed what he called the essential questions concerning the origins of Indo-European speakers: Who migrated? Why? How many of them were there? Was it actually a migra- tion of people, or rather the transfer of a language from one population to another? The present study may contribute to an- swering some of these questions. Our results show that migrations in the late Neolithic, which have been inferred from changes in the material culture of eastern and central Europe (Gimbutas, 19791, are not reflected in the current genetic structure of Indo-Eu- ropean-speaking populations. Conversely, the correlations of observed and simulated data are positive and significant only if we simulate dispersal of farmers from the Le- vant by demic diffusion. The results of this study are, therefore, compatible with the

Page 20: Indo-European origins: A computer-simulation test of five hypotheses

G. BARBUJANI ET AL. 128

view of identifying proto-Indo-European speakers with the first Neolithic farmers (Renfrew, 1987).

These findings appear at first glance to contradict the results of Sokal et al. (1992) who use the same genetic dataset. These au- thors concluded that geographic proximity explained a substantial amount of the ob- served correlation between genetic and Indo- European linguistic distances. However, after allowing for geographic distances, sta- tistically significant partial correlations re- main, which are not explained by distances describing the origin of agriculture by demic diffusion, Renfrew’s hypothesis (as de- scribed by his postulated transitions subse- quent to the origin of agriculture), or Gim- butas’ hypothesis. But note that the study by Sokal et al. (1992) was based on the spa- tial pattern of correlations between genetic and linguistic distances, whereas this study examines genetic variation patterns only. The simulations reported here do not con- sider the patterns of linguistic diversity ob- served in Europe today. Our findings are therefore compatible as well with a simpler model in which the observed genetic pat- terns reflect the process of demic diffusion accompanying the origin and spread of agri- culture in Europe, but these populations are not the proto-Indo-Europeans. Note that the new findings provide further support for the demic diffusion hypothesis of Ammerman and Cavalli-Sforza (1984). Statistical tests of this hypothesis were carried out by Sokal et al. (1991) using origin-of-agriculture dis- tances constructed from observed dates of the onset of the Neolithic. In the present study these distances were constructed from the simulation results using simple models €or the spread of farming populations. It is reassuring that these models yield results in agreement with observed genetic patterns as had already been noted by Rendine et al. (1986).

Our work also offers an answer to Dia- konov’s second question; presumably, proto- Indo-European speakers dispersed because their increase in numbers forced them to look for new suitable land. A study of genetic variation such as this cannot provide reli- able estimates of population sizes in the re-

mote past, and even less so of numbers of migrants. However, even a very small num- ber of dispersing individuals may yield pat- terns that correlate with the observed ones, as seen for the OAG model. The results of this study are compatible both with a com- plete replacement of pre-existing hunter- gatherers by Near Eastern farmers (the OAG model), and with the more conven- tional view that this replacement was only partial, and that hunter-gatherers contrib- uted to some extent to the genetic pool of Indo-European speaking populations (the OAC model). But the view whereby lan- guage replacement was largely independent of population movements (Zvelebil and Zvelebil, 1988) fails to account for the large- scale clinal patterns matching the direction of the spread of agriculture observed in Eu- rope, and therefore does not seem easy to reconcile with the available genetic evi- dence.

In principle, one could also envisage a sce- nario whereby expanding farmers deter- mined the main genetic characteristics of European populations, whereas Indo-Euro- pean languages spread in a later moment, and mainly by a cultural process. Although this cannot be ruled out, it does not seem the best explanation available for the current patterns of genetic and linguistic variation. The model of Neolithic demic diffusion pro- posed by Renfrew (1991) predicts the exist- ence of clines in three linguistic groups which are supposed to have expanded to- gether with Indo-European. Many such clines have actually been observed among speakers of Altaic and Elamo-Dravidian languages (Barbujani and Pilastro, 1993; Barbujani et al., 1994). Moreover, some of these clines disappear if different linguistic groups are jointly analyzed (Barbujani and Pilastro, 1993). Although not a proof, these findings suggest that linguistic affiliation is the key to deciphering gene-frequency pat- terns in much of Eurasia. In the areas where Indo-European, Elamo-Dravidian, and Al- taic languages are spoken, linguistic, ge- netic, and archaeological evidence can jointly be accounted for by a demic expan- sion from the Near East. Clearly, some lan- guages may also have changed by cultural

Page 21: Indo-European origins: A computer-simulation test of five hypotheses

129 INDO-EUROPEAN ORIGINS

contact (some examples are well docu- mented: Renfrew, 1991); however, the over- lap between large clines and linguistic areas suggests that cultural transmission had a lesser impact than demographic expansions.

CONCLUSIONS AND FUTURE STUDIES The replacement of a food-collecting econ-

omy by one of food production has been a complex process. In the Mediterranean area, for instance, farming replaced hunting- gathering very rapidly in certain regions, but gradually in other regions where the two economies coexisted for centuries (Barker, 1988). However, this does not seem to have deeply affected gene-frequency patterns, since the ATC model, where farmers spread at an irregular rate, did not result in a greater correspondence of observed and sim- ulated genetic distances than OAG and OAC. This may mean that the variable rate of spread of Neolithic agriculture depended on factors in the physical environment. Once these factors are incorporated in the model (OAG or OAC), simulated and observed al- lele frequency patterns resemble each other. At any rate, the models put forward and tested in our study should doubtless be re- garded as approximate. Archaeological, lin- guistic, and genetic studies of individual populations will certainly add details to our reconstruction of European history; complex models of the Indo-European expansion, such as the one outlined by Sherratt (1988), may be reformulated in such a way as to become comparable with genetic data. Nev- ertheless, this study indicates that not all hypotheses on the origins of Indo-Europeans account equally well for the available ge- netic evidence.

The hypotheses whereby Indo-Europeans entered Europe as the first farmers show the best fit. There seems to be no cogent reason to think that the farmers’ spread was due to factors other than their tendency to grow in numbers, thanks to the increased resources available. Both incomplete admix- ture with hunter-gatherers, and founder ef- fects occurring in the expansion of farmers, seem to account satisfactorily for the ob- served patterns of genetic differentiation

among Indo-European speakers. These models are not mutually exclusive, and it may well be that both phenomena were im- portant, a t different localities. Conversely, there is no evidence for a major evolutionary role of migratory phenomena occurring in the late Neolithic. These phenomena may have affected population sizes and allele fre- quencies on a local scale, but the large-scale structure of Indo-European speaking popu- lations seems basically to reflect Neolithic demic diffusion.

New archaeological evidence will cer- tainly be valuable for describing in detail times and modes of Neolithic expansions, on which our ideas are certainly simplistic at the moment. On the genetic side, new allele frequency data on previously neglected pop- ulations are unlikely to substantially alter the picture, since, for most loci, the data sets already include hundreds of samples. Rather, collection and analysis of mtDNA may offer a new perspective on human evo- lution in Europe, as it has already done for other areas, including Oceania (Stoneking et al., 1990) and the Americas (Ward et al., 1991,1993; Torroni et al., 1992; Wallace and Torroni, 1992).

ACKNOWLEDGMENTS This is contribution No. 913 in Ecology

and Evolution from the State University of New York at Stony Brook. We thank Prof. Marija Gimbutas and Lord Renfrew for their collegial cooperation in this work. We are indebted to Barbara A. Thomson for techni- cal assistance. Jeff Walker computed the 3’-statistics and Donna DiGiovanni word- processed the manuscript. Part of the com- putation was carried out on the Cornell National Supercomputer Facility. This research was supported by National Science Foundation grant BNS 9117350. This paper was prepared while Robert R. Sokal was a Fellow at the Center for Advanced Study in the Behavioral Sciences at Stanford, Cali- fornia. He is grateful for financial support provided by the National Science Founda- tion grant SES-9022192, and by sabbatical funds from his home institution. Guido Bar- bujani wishes to acknowledge fruitful dis-

Page 22: Indo-European origins: A computer-simulation test of five hypotheses

G. BARBUJANI ET AL. 130

cussions with Professors Luca Cavalli- Diakonov IM (1984) On the original home of the speak- Sforza, H~~~ ~ ~ ~ ~ ~ ~ d i ~ ~ , ers of Indo-European. Soviet Anthropol. Archaeol. 23:

5-87. Easteal S (1988) Range expansion and its genetic conse- Italo Scardovi, and Michael Turelli.

quences in populations of the giant toad, Bufo rnari- nus. Evol. Biol. 23:4%84.

Eisen MM (1979) Mathematical Models in Cell Biology and cancer Chemotherapy, ~ ~ ~ l i ~ : springer, Clines. Princeton, New Jersey: Princeton University press.

(1940) On the logistic law of growth and its empirical verifications in biology, Ada Biotheor, 5: 51-66.

Gimbutas M (1979) The three of KurKan people into Old Europe, 4500-2500 B,C, Arch, Suisses An- thropol. Gen. 43~113-137.

Gimbutas M (1986) Remarks on the ethnogenesis of the Indo-Europeans in Europe. In Bernhard and A

R ~ ~ ~ ~ ~ ,

LITERATURE CITED Ammerman AJ, and Cavdli-Sforza LL (1971) Measur-

ing the rate Of spread Of in Europe' Man Endler JA (1977) Geographic Variation, Speciation, and 6:674-688.

Ammerman AJ, and Cavalli-Sforza LL (1984) The Neo- lithic Transition and the Genetics of Populations in Europe. Princeton, New Jersey: Princeton University Press.

Barbujani G (1987) Diversity of Some gene frequencies in European and Asian populations. 111. Spatial corre- logram analysis. Ann. Hum. Genet. 51:345-353.

us about human microevolution? Trends Ecol. Evol. 6:151-156.

Feller

BarbuJani (1991) What do languages

Barbujani G, Jacquez GM, and Ligi L (1990) Diversity of Kandler-P&son (eds.): Ethnogenese Europaischer Volker, Stuttgart: Gustav Fischer Verlag, pp. 5-20. Some gene frequencies in European and

tions. V. Steep multilocus clines. Am. J. Hum. Genet.

Barbujani G, Nasidze IS, and Whitehead GN (1994) Ge- netic diversity in the Caucasus. Hum. Biol. 66:639- 668.

Barbujani G, and Pilastro A (1993) Genetic evidence on origin and dispersal of human populations speaking

Acad. Sci. U. S. A. 90:4670-4673.

Guglielmino CR, Piazza A, Menozzi p, and Cavalli- Sforza LL (1990) Uralic genes in Europe. Am. J . Phys.

Harding RM, and Sokal RR (1988) Classification of the European language families by genetic distance. Proc. Natl, Acad. sci, u, s, A. 85,.937c9372,

growth in a low resolution mitochondria1 DNA mis- match distribution. Hum. Biol. 66:591-600.

growth during the Neolithic. Curr. Anthropol. 14: 535-543.

New York: Academic Press.

Hedrick pw (1986) Genetic P b o V h i s m in heteroge- neous environments: A decade later. Annu. Rev. Ecol. 'yst. 17t535-566.

Hedrick PW, Thomson G, and Klitz W (1986) Evolution- ary genetics: HJA as an system. In s Kar- lin and E Nevo (eds.): Evolutionary Processes and Theory. Orlando, Florida: Academic Press, pp. 503- '06.

Hudson RR (1990) Gene genealogies and the coalescent Process. In FutuWa and J AntonovicS (eds.): OX- ford Surveys in Evolutionary Biology, Vol. 7. Oxford, U K Oxford University Press, pp.

Jorde LB (1980) The genetic structure of subdivided hu- man Populations. A review. In JH Mieke and MH Crawford (eds.): Current Developments in Anthropo- logical Genetics, Theory and Methods. New York: Ple- numt pp' 135-208'

Kaiser M, and Shevoroshkin V (1988) Nostratic. Annu. Rev, Anthropol. 17:30%329.

Keyfitz N (1977) Introduction to the Mathematics of Populations with Revisions. Reading, UK: Addison- Wesley.

Kimura M, and Weiss GH (1964) The stepping stone model of Population structure and the decrease of ge- netic correlation with distance. Genetics 49561-576.

Ladle RJ (1992) Parasites and sex: Catching the red queen. Trends Ecol. Evol. 7:405408.

47:a67-875. 83:57-68.

languages of the Nostratic macrofamily, pro,., Natl, Harpending HC (1994) Signature Of ancient population

Barbujani G, and Sokal RR (1990) Zones of sharp ge- netic change in Europe are also language boundaries, HaMan FA (1973) On the mechanisms Of population Proc. Natl. Acad. Sci. U. S. A. 87:1816-1819.

Barbujani G, and Sokal RR (1991) Genetic population structure of Italy. XI. Physical and cultural barriers to gene flow. Am. J . Hum. Genet. 48:398-411.

Barker G (1985) Prehistoric Farming in Europe. Cam- bridge: Cambridge University Press.

Barker G (1988) Comment on "Archaeology and Lan- guage," by C. Renfrew. Cum. hthropol . 29:44H49.

Bertranpetit J, and Cavalli-Sforza LL (1991) A genetic reconstruction of the history of the population of the Iberian peninsula. Ann. Hum. Genet. 5551-67.

Boileau MG, Hebert PDN, and Schwartz SS (1992) Non- equilibrium gene frequency divergence: Persistent founder effects in natural populations. J. Evol. Biol. 5:25-39.

Cavalli-Sforza LL (1988) The Basque population and ancient migrations in Europe. Munibe 6:12%137.

Cavalli-Sforza LL, Menozzi p , and Piazza A (1993) ~~~i~ and human evolution, science 259: 639446.

Cavalli-Sforza LL, Minch E, and Mountain J L (1992) Coevolution of genes and languages revisited, P ~ c . Natl. Acad. Sci. U. S. A. 89:5620-5624.

Cavalli-Sforza LL, and Piazza A (1993) Human genomic diversity in Europe: A summaw of recent research and prospects for the future. Eur. J . Hum. Genet. l:3-18.

Cavalli-Sforza LL, Piazza A, Menozzi P, and Mountain J (1988) Reconstruction of human evolution: Bringing together genetic, archaeological and linguistic data. Proc. Natl. Acad. Sci. U. S. A. 85:6002-6006.

FA (lg81)

Page 23: Indo-European origins: A computer-simulation test of five hypotheses

INDO-EUROPEAN ORIGINS 131

Lewontin RC (1972) The apportionment of human diver- sity. Evol. Biol. 6:381-398.

Manly BFJ (1991) Randomization and Monte Carlo Methods in Biology. London: Chapman and Hall.

Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Res. 27: 209-220.

Menozzi P, Piazza A, and Cavalli-Sfona LL (1978) Syn- thetic maps of human gene frequencies in Europeans. Science 201:786-792.

Mourant AE, Kopec AC, and Domaniewska-Sobczak K (1976) The Distribution of the Human Blood Groups and Other Polymorphisms. Oxford Oxford University Press.

Nei M (1987) Molecular Evolutionary Genetics. New York Columbia University Press.

Piazza A (1993) Who are the Europeans? Science 260:

Piazza A, Cappello N, Olivetti E, and Rendine S (1988) The Basques in Europe: A genetic analysis. Munibe 6:168-176.

Press WH, Flannery BF, Teukolsky SA, and Vetterling WT (1986) Numerical Recipes. Cambridge: Cam- bridge University Press.

Prevosti A, Ocana J , and Alonso G (1975) Distances between populations of Drosophila suboscura based on chromosome arrangement frequencies. Theor. Appl. Genet. 45:231-241.

Rendine S, Piazza A, and Cavalli-Sforza LL (1986) Sim- ulation and separation by principal components of multiple demic expansions in Europe. Am. Nat. 128:

Renfrew C (1987) Archaeology and Language. London: Jonathan Cape.

Renfrew C (1989) Models of change in language and archaeology. Trans. Philol. SOC. 87t103-155.

Renfrew C (1991) Before Babel: Speculations on the ori- gins of linguistic diversity. Cambridge Arch. J . 1:s 23.

Renfrew C (1992) Archaeology, genetics and linguistic diversity. Man N.S. 27:445478.

Rogers AR, and Jorde LB (1987) The effect of non-ran- dom migration on genetic differences between popula- tions. Ann. Hum. Genet. 51:169-176.

Rogers AR, and Jorde LB (1995) Genetic evidence on modern human origins. Hum. Biol., in press.

Ruhlen M (1987) A Guide to the World’s Languages. Vol. 1: Classification. London: Edward Arnold.

Sgaramella-Zonta L, and Cavalli-Sforza LL (1973) A method for the detection of a demic cline. In NE Mor- ton (ed.): Genetic Structure of Populations. Honolulu: University of Hawaii Press, pp. 12%135.

Sherratt A (1988) Comment on -Archaeology and Lan- guage,” by C. Renfrew. Curr. Anthropol. 29:45-63.

Slatkin M (1985) Gene flow in natural populations. Annu. Rev. Ecol. Syst. 16:393430.

Slatkin M (1987) Gene flow and the geographic struc- ture of natural populations. Science 236:787-792.

Smouse PE, Long JC, and Sokal RR (1986) Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35:627432.

1767-1769.

681-706.

Sokal RR (1988) Genetic, geographic, and linguistic dis- tances in Europe. Proc. Natl. Acad. Sci. U. S. A. 85:

Sokal RR (1991) Ancient movement patterns determine modern genetic variances in Europe. Hum. Biol. 63: 589-606.

Sokal RR, Harding RM, and Oden NL (1989a) Spatial patterns of human gene frequencies in Europe. Am. J . Phys. Anthropol. 80:267-294.

Sokal RR, Jacquez GM, Oden NL, DiGiovanni D, Fal- setti AB, McGee E, and Thomson BA (1993) Genetic relationships of European populations reflect their ethnohistorical affinities. Am. J. Phys. Anthropol. 91: 55-70.

Sokal RR, and Menozzi P (1982) Spatial autocorrelation of HLA frequencies in Europe support demic diffusion of early farmers. Am. Nat. 119:l-17.

Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J, Thomson BA, Vaudor A, Harding RM, and Barbujani G (1990) Genetics and language in European popula- tions. Am. Nat. 135:157-175.

Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J , and Vaudor A (1989b) Genetic differences among lan- guage families in Europe. Am. J . Phys. Anthropol. 79:489-502.

Sokal RR, Oden NL, and Thomson BA (1988) Genetic changes across language boundaries in Europe. Am. J. Phys. Anthropol. 76:337-361.

Sokal RR, Oden NL, and Thomson BA (1992) Origins of the Indo-Europeans: Genetic evidence. Proc. Natl. Acad. Sci. U. S. A. 89:7669-7673.

Sokal RR, Oden NL, and Wilson C (1991) Genetic evi- dence for the spread of agriculture in Europe by demic diffusion. Nature 351:143-145.

Sokal RR, and Rohlf FJ (1995) Biometry, 3rd ed. New

Bhatia K, and Wilson AC (1990) Geographic variation in human mitochondrial DNA from Papua New Guinea. Genetics 124:717-733.

Torroni A, Schurr TG, Yang C-C, Szathmary EJE, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM, and Wallace DC (1992) Na- tive American mitochondrial DNA analysis indicates that the American and NaDene populations were founded by two independent migrations. Genetics 130r153-162.

Wade MJ, and McCauley DE (1988) Extinction and re- colonization: Their effect on the genetic differentia- tion of populations. Evolution 42:995-1005.

Wallace DC, and Torroni A (1992) American Indian pre- history as written in the mitochondrial DNA A re- view. Hum. Biol. 64t403-416.

Ward RH, Frazier BL, Dew-Jager K, and Paabo S (1991) Extensive mitochondrial diversity within a single Am- erindian tribe. Proc. Natl. Acad. Sci. U. S. A. 88:872& 8724.

Ward RH, Redd A, Valencia D, Frazier BL, and Paabo S (1993) Genetic and linguistic differentiation in the Americas. Proc. Natl. Acad. Sci. U. S. A. 90: 1066% 10667.

Wijsman EM, and Cavalli-Sforza LL (1984) Migration and genetic population structure with special refer- ence to humans. Annu. Rev. Ecol. Syst. 15:279-301.

1722-1726.

Page 24: Indo-European origins: A computer-simulation test of five hypotheses

G. BARBUJANI ET AL. 132

Wright S (1969) Evolution and the Genetics of Popula- tions. Vol. 2. The Theory of Gene Frequencies. Chi- cago: University of Chicago Press.

Wright S (1978) Evolution and the Genetics of Popula- tions. Vol. 4. Variability within and among Natural Populations. Chicago: University of Chicago Press.

Zeven AC (1980) The spread of bread wheat over the old

world since the Neolithicum as indicated by its geno- type for hybrid necrosis. J. d’Agric. Trad. Bota. Appl. 27:25-53.

Zvelebil M, and Zvelebil KV (1988) Agricultural transi- tion and Indo-European dispersal. Antiquity 62574- 583.