genetic variation in north africa and eurasia: neolithic demic diffusion vs. paleolithic...

18
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 95:137-154 (1994) Genetic Variation in North Africa and Eurasia: Neolithic Demic Diffusion vs. Paleolithic Colonisation GUIDO BARBUJANI, ANDREA PILASTRO, SILVIA DE DOMENICO, AND COLIN RENFREW Dipartimento di Scienze Statistiche, Uniuersita di Bologna, I-40126 Bologna, Italy (G.B.); Dipartimento di Biologia, Uniuersita’ di Padoua, I-35121 Padoua, Italy (G.B., A.P., S.D.D.); Department of Archaeology, University of Cambridge, Cambridge CB2 3 0 2 , UK (C.R.) KEY WORDS transmission, Spatial autocorrelation Gene frequencies, Languages, Gene flow, Cultural ABSTRACT The hypothesis that both genetic and linguistic similarities among Eurasian and North African populations are due to demic diffusion of neolithic farmers is tested against a wide database of allele frequencies. Demic diffusion of farming and languages from the Near East should have determined clines in areas defined by linguistic criteria; the alternative hy- pothesis of cultural transmission does not predict clines. Spatial autocorrela- tion analysis shows significant gradients in three of the four linguistic fami- lies supposedly affected by neolithic demic diffusion; the Afroasiatic family is the exception. Many such gradients are not observed when populations are jointly analyzed, regardless of linguistic classification. This is incompatible with the hypothesis that major cultural transformations in Eurasia (diffusion of related languages and spread of agriculture) took place without major demographic changes. The model of demic diffusion seems therefore to pro- vide a mechanism explaining coevolution of linguistic and biological traits in much of the Old World. Archaeological,linguistic, and genetic evidence agree in suggesting a multidirectional process of gene flow from the Near East in the neolithic. However, the possibility should be envisaged that some allele frequency patterns can predate the neolithic and depend on the initial spread of Homo sapiens sapiens from Africa into Eurasia. o 1994 Wiley-Liss, Inc. The transition from a hunter-gatherer economy to one of food production (the neo- lithic transition) was, in many regions of the world, one of the most momentous episodes in human history. It was associated in many cases with a very significant population in- crease (Hassan, 1973). Two contrasting models have been put forward for the incep- tion of a farming economy in the Near East and in Europe in areas beyond the nuclear zone (Harlan, 19711, where the initial do- mestication of the basic food plants and live- stock took place. The first model, one of demic diffusion (Ammerman and Cavalli-Sforza, 1973, 1984), is based on the observation that when technological advances make food supplies abundant, population sizes increase. To sup- port greater numbers of individuals, the population will then need to expand into new territories. If expansion is accompanied by limited admixture with previous resi- dents, and if the new technologies are not passed to them, the immigrants and their descendants will be able to further increase in numbers and spread into adjacent areas. In the course of time, this will bring their genes to distant locations. A combination of population growth and gradual dispersal Received February 16,1993; accepted March 21,1994. Address reprint requests to Guido Barbujani, Dipartimento di Biologia,via Trieste 75, 1-35121Padova, Italy. 0 1994 WILEY-LISS. INC

Upload: guido-barbujani

Post on 06-Jun-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 95:137-154 (1994)

Genetic Variation in North Africa and Eurasia: Neolithic Demic Diffusion vs. Paleolithic Colonisation

GUIDO BARBUJANI, ANDREA PILASTRO, SILVIA DE DOMENICO, AND COLIN RENFREW Dipartimento di Scienze Statistiche, Uniuersita di Bologna, I-40126 Bologna, Italy (G.B.); Dipartimento di Biologia, Uniuersita’ di Padoua, I-35121 Padoua, Italy (G.B., A.P., S.D.D.); Department of Archaeology, University of Cambridge, Cambridge CB2 302 , UK (C.R.)

KEY WORDS transmission, Spatial autocorrelation

Gene frequencies, Languages, Gene flow, Cultural

ABSTRACT The hypothesis that both genetic and linguistic similarities among Eurasian and North African populations are due to demic diffusion of neolithic farmers is tested against a wide database of allele frequencies. Demic diffusion of farming and languages from the Near East should have determined clines in areas defined by linguistic criteria; the alternative hy- pothesis of cultural transmission does not predict clines. Spatial autocorrela- tion analysis shows significant gradients in three of the four linguistic fami- lies supposedly affected by neolithic demic diffusion; the Afroasiatic family is the exception. Many such gradients are not observed when populations are jointly analyzed, regardless of linguistic classification. This is incompatible with the hypothesis that major cultural transformations in Eurasia (diffusion of related languages and spread of agriculture) took place without major demographic changes. The model of demic diffusion seems therefore to pro- vide a mechanism explaining coevolution of linguistic and biological traits in much of the Old World. Archaeological, linguistic, and genetic evidence agree in suggesting a multidirectional process of gene flow from the Near East in the neolithic. However, the possibility should be envisaged that some allele frequency patterns can predate the neolithic and depend on the initial spread of Homo sapiens sapiens from Africa into Eurasia. o 1994 Wiley-Liss, Inc.

The transition from a hunter-gatherer economy to one of food production (the neo- lithic transition) was, in many regions of the world, one of the most momentous episodes in human history. It was associated in many cases with a very significant population in- crease (Hassan, 1973). Two contrasting models have been put forward for the incep- tion of a farming economy in the Near East and in Europe in areas beyond the nuclear zone (Harlan, 19711, where the initial do- mestication of the basic food plants and live- stock took place.

The first model, one of demic diffusion (Ammerman and Cavalli-Sforza, 1973, 1984), is based on the observation that when technological advances make food supplies

abundant, population sizes increase. To sup- port greater numbers of individuals, the population will then need to expand into new territories. If expansion is accompanied by limited admixture with previous resi- dents, and if the new technologies are not passed to them, the immigrants and their descendants will be able to further increase in numbers and spread into adjacent areas. In the course of time, this will bring their genes to distant locations. A combination of population growth and gradual dispersal

Received February 16,1993; accepted March 21,1994. Address reprint requests to Guido Barbujani, Dipartimento di

Biologia, via Trieste 75, 1-35121 Padova, Italy.

0 1994 WILEY-LISS. INC

G. BARBUJANI ET AL 138

may have caused parallel cultural and ge- netic transformations in the neolithic. The technique of farming is then supposed to have been carried to new areas by the local movements of peasants from the farming re- gion of high population density to adjacent areas of lower density where farming was not yet practised. In this way, the new farm- ing technique may have spread across the land mass together with the population de- scended from the original farmers in the nu- clear zone.

In the alternative, acculturation model (Zvelebil and Zvelebil, 1988), the indigenous population outside the nuclear area ac- quires the farming techniques, along with the plant and animal domesticates, by con- tact. The indigenous population may have increased, without significant population input from the first farmers of the nuclear zone or their descendants.

Under the demic diffusion model, differ- ent phenomena of evolutionary relevance may have taken place, depending on whether or not the regions into which early farmers moved were already populated. If the expansion of farmers led to contacts with other groups, a certain degree of admixture presumably ensued. If, on the contrary, early farmers expanded into unoccupied ar- eas, or displaced without admixture the pre- vious residents, founder effects presumably occurred. In both cases, gradients of allele frequencies are to be expected (Sgaramella- Zonta and Cavalli-Sforza, 1973; Menozzi et al., 1978; Rendine et al., 1986; Cavalli- Sforza et al., 1993; for a theoretical treat- ment, see Endler, 1977). On the contrary, spatial structuring of allele frequencies is not predicted by an acculturation model, or by a model in which gene flow is not accom- panied by population growth (Menozzi et al., 1978; Cavalli-Sforza et al., 1993).

The dates of origin of farming at various localities in Europe, as estimated from the archaeological record, correlate with the fre- quencies of several alleles in current Euro- pean populations (Menozzi et al., 1978; Sokal and Menozzi, 1982; Sokal et al., 1991). This strongly suggests that the introduction of agriculture and animal breeding was not the result simply of cultural transmission but in fact involved significant population

movements, very possibly in one or the other form of demic diffusion. Accordingly, a large fraction of the allele frequency gradients currently observed in Europe (Sokal et al., 1989a, 1992; Piazza, 1993) are interpreted as a consequence of the spread of alleles of Near Eastern origin. But little is known to date on the other regions bordering on the Near East.

Under the demic diffusion model, the dis- persing farmers should have propagated their language or its derivative into the new territories, resulting in large-scale linguistic similarities. But if, on the contrary, farming spread mostly by acculturation, no major linguistic changes need be envisaged, al- though language change can occur in such cases without any significant movement of population (Ehret, 1988). The existence of large-scale language families whose distri- bution intersects with the nuclear farming area in the Near East gives support, along with other arguments, to the hypothesis that dispersing early farmers from the Near East brought with them not only the tech- niques of farming and the basic plant and animal domesticates, but also the proto-lan- guage ancestral to the languages of the fam- ilies in question (Cavalli-Sforza, 1988; Ren- frew, 1991). This has been argued for Europe and the Indoeuropean language family (Renfrew, 1987); for north Africa and the Afroasiatic languages; for southern Iran, India, and Pakistan (the hypothetical original distribution of the Elamo-Dravidian languages); and possibly for Turkestan and central Asia (the early Altaic languages). Such a hypothesis may cast light on two puz- zling aspects of the large-scale structure of human populations. First, the clines de- scribed at various loci extend far beyond the limits of agricultural dispersal in Europe (Piazza et al., 1981; Piazza and Menozzi, 1983; Barbujani, 1987a, 1988; Cavalli- Sforza et al., 1993); second, large genetic dif- ferences are often observed at linguistic boundaries (Barbujani and Sokal, 1990; Barbujani et al., 1990, 1994; Guglielmino et al., 1990; Aguirre et al., 1991; Bertranpetit and Cavalli-Sforza, 1991).

The model of neolithic demic diffusion sets out to explain linguistic affinities of popula- tions without resorting to undocumented

GENETIC VARIATION IN NORTH AFRICA AND EURASIA 139

migrations between them. Cavalli-Sforza and Renfrew envisage a gradual, multidirec- tional spread of neolithic farmers from the nuclear area in three (Cavalli-Sforza, 1988) or four (Renfrew, 1991) episodes. In each ep- isode, the language of the first farmers re- placed preexisting languages in the area in question, giving rise to a distinct language family: Indo-European, Afroasiatic, Elamo- Dravidian, and [in Renfrew’s (1991) view1 Altaic, respectively. For a theoretical outline of how a process of this kind may occur, see Sherratt and Sherratt (1988).

This proposal is in itself strengthened by the recognition of deep similarities among the four language families in question (Dol- gopolsky, 1987; Kaiser and Shevoroshkin, 1988). The hypothetical language notionally ancestral to all of them has been termed Nostratic [the term Eurasiatic, proposed by Greenberg (19871, entails a comparable but different classification (Ruhlen, 1991, 1992)l. The Nostratic hypothesis thus classes the four language families in ques- tion within a larger, Nostratic, macrofamily. It suggests on linguistic grounds that the original homeland for the proto-Nostratic language will have been in the Near East, namely, in what, on quite independent ar- chaeological arguments, has been identified as the area from which early farming spread.

The Nostratic hypothesis is not a neces- sary component of the view presented here, that the current distribution of these major language families is in large measure due to the dissemination of relevant proto-lan- guages by demic diffusion. On the other hand, the geographical implications of our argument would lend some measure of sup- port to the Nostratic view. Although the ex- tent and significance of the relationships among the relevant language families (Mor- purgo-Davies, 1989; Nocentini, 19931, and between languages and genes (Bateman et al., 19901, is still controversial, the idea of multidirectional demic diffusion in the neo- lithic can offer a clue to understanding cur- rent patterns of genetic variation in Eurasia and North Africa. In turn, the identification of clines within certain groups of linguisti- cally related populations would provide bio- logical support for the Nostratic hypothesis.

To test the hypothesis of demic diffusion of proto-Nostratic speakers, or NDD hypothe- sis, we studied genetic variation in groups of populations postulated to derive from a com- mon ancestor living in the Near East (Ren- frew, 1991). In a companion paper we show that these groups display larger genetic variances than comparable groups, and that their allele frequencies correlate with their geographic distance from the Near East (Barbujani and Pilastro, 1993). Here we em- ploy spatial autocorrelation statistics to summarise geographic variation of allele frequencies, and we discuss the results ob- tained in the light of other biological, lin- guistic, and archaeological evidence.

DETAILS ON THE NDD MODEL The four episodes of agricultural dispersal

associated with farming origin in the Near East, as hypothesized by Renfrew, are seen in Figure 1. Such a dispersal was proposed as the main, certainly not the only, phenom- enon resulting in the current levels of popu- lation differentiation. During the neolithic transition and after its end [dated in Europe at approximately 5,000 years ago (Renfrew, 198711, other migratory movements took place and language replacement may have occurred for other reasons. Such processes certainly modified the simple and parallel patterns of linguistic and genetic variation expected under demic diffusion. But still, if the neolithic transition was important in de- termining the genetic population structure of a large section of the Old World, we expect to be able to recognize at least some of its consequences.

Before attempting to do so, it was neces- sary to correctly classify, with respect to the hypothesis being tested, some populations that adopted a new language independently of demic diffusion. Indeed, episodes of lan- guage replacement accompanied by limited genetic change [as in the three models listed by Renfrew (1989, 1992)], can decouple bio- logical and linguistic evolution (Cavalli- Sforza et al., 1992).

Historical data on three comparatively re- cent episodes of language replacement (the spread of Indoeuropean languages in Asia, of the Turkic subgroup of Altaic languages in Western Asia, and of Arabic in North Af-

Fig.

1.

Four

wav

es o

f N

eolit

hic

agri

cult

ural

dis

pers

al, a

s pr

opos

ed b

y R

enfr

ew (

1991

). F

or t

he p

ur-

pose

s of

thi

s pa

per,

Ind

oeur

opea

n sp

eake

rs o

f Asi

a an

d E

lam

o-D

ravi

dian

spe

aker

s bel

ong

to th

e sa

me

grou

p. L

angu

ages

cod

ed a

s fol

low

s: IE

, Ind

oeur

opea

n (E

urop

e); A

A, A

froa

siat

ic;

ID, I

ndoe

urop

ean

(Asi

a)

and

Ela

mo-

Dra

vidi

an; A

L, A

ltaic

.

GENETIC VARIATION IN NORTH AFRICA AND EURASIA 141

rica) suggest that they did not entail major changes in the biological buildup of the pop- ulations involved (Renfrew, 1987, 1991). These were not cases of demic diffusion, but of elite dominance (see Renfrew, 1987), asso- ciated with gene flow only to a very limited extent. Instead of trying to guess the rela- tive proportions of former residents and re- cent immigrants in each locality at the end of each episode, we chose to treat these as purely cultural transformations. Negligible demographic changes were assumed to have occurred while the languages changed. This is certainly an overly simplified assumption, but it is conservative. If the NDD model proves compatible with the data, that will be despite this assumption, certainly not be- cause of it.

Accordingly, all current speakers of In- doeuropean or Elamo-Dravidian languages from Iran and the Indian subcontinent were considered to derive from farmers who im- migrated along the Southeastern dispersal route (Fig. 1). In this way we assume that when Elamo-Dravidian languages were re- placed by Indoeuropean around 2,000 BC, allele frequencies were only marginally af- fected. Similarly, Turkic speakers of Anato- lia were considered as descendants of indig- enous farmers who adopted Turkic only in the early second millennium AD; hence, they were not pooled with the other Altaic speakers. Finally, Arabic speakers of North Africa were excluded from analysis when- ever a sample of Berber speakers was avail- able at the same locality; but when this was not the case, their allele frequencies contrib- uted to the Afroasiatic dataset. In addition, recent immigrants (e.g., Jews in Europe, Parsi in India) were excluded from analysis if they could not be assigned to their place of origin.

In this way, 13 groups of populations could be defined (Fig. 2). We shall refer to them by a two-letter code throughout this paper. The first group, NE, includes all pop- ulations of the Near East, regardless of the language they currently speak. For the pur- poses of this study, and in agreement with the most recent hypotheses on the area of origin of agriculture (Renfrew, 1991), the Near East extends up to 50 degrees of longi- tude East. These populations are considered to derive in their genetic makeup from local

neolithic populations that did not emigrate and underwent only very limited admixture.

Under the NDD model, each of the follow- ing groups represents the descendants of a group of neolithic farmers who dispersed along one of the four routes of Figure 1 (Ren- frew, 1991): IE, Indoeuropeans of Europe; ID, Indoeuropeans of Asia and Elamo-Dra- vidians; AA, Afroasiatic speakers; AL, Al- taic speakers, excluding Turkic speakers of Anatolia. These groups will be jointly called NDD groups; the NDD model predicts allele frequency gradients within them, radiating away from the Near East.

Eight additional groups include popula- tions not belonging to the Nostratic macro- family, or belonging to it but not considered to owe their language to a process of demic diffusion from the Near East (Cavalli- Sforza, 1988; Renfrew, 1991). They are, re- spectively, speakers of Caucasian (CA), Uralic (UR), Sinotibetan (ST), Austric (AU), Chukchi-Kamchatkan (CK), and Eskimo- Aleut (ES) languages, plus two linguistic isolates, namely, the Basques (BA) and Hunza, the latter speaking Burushaski (BU). Some of these groups were used for comparison in a previous study (Barbujani and Pilastro, 1993) and will be jointly re- ferred to as other linguistic families.

MATERIALS AND METHODS Data

A database of Eurasian allele frequencies, previously including data on eight markers (Barbujani et al., 1990), was updated. Infor- mation was added on seven other markers and on African Afroasiatic-speaking popula- tions. For haptoglobins, group-specific com- ponent, Duffy and Kell blood groups, we in- corporated all data we could find through an extensive survey of the literature published between 1960 and 1991, including compila- tions of human allele frequencies (Mourant et al., 1976; Tills et al., 1983; Roychoudhury and Nei, 1988). Conversely, data on ABO, Rh, and MN blood groups were incorporated only for populations that were already present in the database having been typed for at least another marker, except for Afroasiatic speakers, of which we included all samples we could find.

Fig.

2.

Lan

guag

e fa

mili

es in

Eur

asia

and

Nor

th A

fric

a (R

uhle

n, 1

987)

. Lan

guag

es c

oded

as

follo

ws:

B

A, B

asqu

e; I

E, I

ndoe

urop

ean

(Eur

ope)

; AA

, Afr

oasi

atic

; CA

, Cau

casi

an; U

R, U

ralic

; ID

, Ind

oeur

opea

n (A

sia)

and

Elam

o-D

ravi

dian

; AL,

Alta

ic;

BU

, Bur

usha

ski;

ST, S

ino-

Tibe

tan;

AU

, Aus

tric

; ES,

Esk

imo-

A

leut

; CK

, Chu

kchi

-Kam

chat

kan.

The

put

ativ

e ar

ea of

orig

in o

f agr

icul

ture

(whe

re p

opul

atio

ns d

efin

ed

as N

E a

re lo

cate

d) is

in b

lack

.

143 GENETIC VARIATION IN NORTH AFRICA AND EURASIA

TABLE 1. Numbers of samples considered

Locus1 Populationgroup ABO Rh MN Kell Duffy Hp Gc GLO ESD PGM ACP AK ADA PGD G W

Indoeuropean of Europe

Indoeuropean of Asia and Elamo-Dravidian

Afroasiatic Altaic Near East Uralic Caucasic Sinotibetan Austric Totals

109 75 89

39 35 41

33 31 37 33 25 33 14 14 16 12 9 17 10 9 8 24 17 16 18 10 16

292 225 274

100 60

21 23

30 24 12 9 12 10 17 16

7 5 17 12 14 14

230 173

152 93 69

48 31 20

30 16 9 41 22 15 15 8 7 17 17 7 6 4 7

24 13 13 23 23 10

356 226 157

84

48

5 18 6 7 7

13 16

205 -

92

45

12 39 10 12 3

16 20

249 -

92

38

20 28 10 9 8

12 11

228 __

15 68 73

45 25 38

19 6 20 13 15 30 6 11 9

12 10 9 3 3 3

11 10 11 13 13 28

197 161 225

46

10

4 18 1 5 2 8

12 106

‘Loci abbreviated as follows: Haptoglobins, Hp; group-specific component, Gc; glyoxalase I, GLO; esterase D, ESD; phosphoglucomutase, PGM; acid phosphatase, ACP; adenylate kinase, AK; adenosine dearninase, ADA phosphogluconate dehydrogenase, PGD; glutamic-pyruvic transami- nase. GPT

Linguistic classification Each record of the updated database con-

tained allele frequencies for one locus, the sample size, and the geographic coordinates of the sampling locality. Following Ruhlen (19911, each population was then assigned a language family code, unless it belonged to the Near East. Overall, 222 allele frequen- cies were available for the CA, CK, ES, BA, and BU groups. These data were insuffi- cient for the statistical analysis described below within each group and had to be dis- carded. In addition, samples were discarded when their linguistic identity could not be defined with certainty. The database anal- ysed includes 3,304 records (Table 1).

Spatial autocorrelation analysis Spatial autocorrelation statistics (Sokal

and Oden, 1978) summarize spatial pat- terns of genetic similarity. An arbitrary number of distance classes is defined for each set of data, and an autocorrelation coef- ficient (Moran’s I in the present study) is calculated within each of them, represent- ing the average level of genetic similarity between the pairs of populations separated by that distance. In this study, eight dis- tance classes were generally employed (de- tailed in the caption to Table 21, but for a few markers and groups the distribution of sam- pling localities was such that different num- bers of classes, or different class limits, had to be chosen.

In large samples, Moran’s I values range from +1 (for perfect identity of all pairs of values in a distance class: positive autocor- relation) to - 1 (when all pairs of values fall on opposite sides of the average: negative autocorrelation). Under a randomization hy- pothesis, the value expected in the absence of autocorrelation is close to 0. Criteria for assessing the significance of autocorrelation statistics are in Sokal and Oden (1978) and Oden (1984).

Rare alleles were not considered. In addi- tion, at each locus one polymorphic allele was discarded to reduce the effects of the negative correlations among allele frequen- cies. Overall, 19 independent alleles were analysed.

The set of autocorrelation coefficients cal- culated for one allele (correlogrum) allows an objective definition of the spatial modes of variation shown by that allele. Schemati- cally, the following classes of correlograms were recognised here (Sokal, 1979; Sokal and Wartenberg, 1981; Sokal et al., 1989a). All of them, except class 1, had to be overall significant by the Bonferroni criterion (Oden, 1984).

1. Nonsignificant: levels of genetic varia- tion statistically compatible with chance, no spatial structure apparent.

2. Cline: a decreasing series of coeffi- cients, from positive in the first distance classes, to negative at large distances. Clines represent the strongest genetic evi-

144 G. BARBUJANI ET AL

TABLE 2. Autocorrelation coefficients (Moran's 1) for 19 alleles in seven population groups

Distance class Allele 1 2 3 4 5 6 7 8 P

Indoeuropean of Europe and Near East ABO-IA .43 .30 ABO-ZB .61 .36 Rh-D .51 .41 MN-M .22 .05 Kell-K .04 .oo Duffy-a .17 .07 HP' .10 - .09 Gc' .21 -.01 GLO' .63 .ll ESD' .30 .ll PGM' .38 .03 A C P .40 .26 ACp6 .41 .16 AK' .37 .04 ADA' .33 .26 PGD" .08 .17 P G P .25 .12 GPT' .07 -.15 G P F -.01 -.09

Indoeuropean of Asia, Elamo-Dravidian, ABO-ZA .62 .16 ABO-ZB .29 .28 Rh-D .32 51

Kell-K -.30 .17 Duffy-a .22 .28 HP' .66 .41 Gc' .ll .02 GLO' .ll .08 ESD' .47 50 PGM' .27 .10 A C P .oo .08 ACPb .07 .27 AK' .33 .18

PGIP .20 .14 P G P .21 .15

ABO-ZA .89 .25 ABO-IB .37 .28 Rh-D -.02 -.lo MN-M .35 .12 Kell-K .78 - .33 Duffy-a .70 .24 HP' .26 .32 Gc' -.13 .17 GLO' .07 .49 PGM' .43 -.17 A C P .36 .40 A C P .38 .44 AK' .73 .28 ADA1 .20 - .06 PGD" .16 - .24 P G P .16 p.24

MN-M .ll p.25

ADA' -.32 -.04

Afroasiatic and Near East

Altaic and Near East ABO-ZA .75 .13 ABO-ZB .26 .40 Rh-D .87 .61 MN-M .31 .18 Kell-K .45 .40 Duffy-a .64 .68 HP' .36 .13 Gc' .22 .28 GLO' 1.00 1.00 ESD' 5 2 .55 PGM' 5 5 .45

.04

.23

.28

.04 -.03

.01

.03

.07

.05

.06

.01

.06

.09 -.12

.16

.04

.12

.01

.02

.12

.44

.37

.06 p.03

.12

.42 -.07

.01

.44 -.lo -.05

.03

.40

.06

.03

.03

.13

and Ne

.15

.03

.29 -.09 .31 .23 .09

-.02 .lo .09 .10 .25

-.12 .14 .14

-.13 .13 .61 .32 .07 .68 .13

-.22 .86 .62 .38

-.20 .07 .10 .03

-.01 -.03 -.03 -.03 -.03

.03

.08 -.lo -.07 -.12

.16 -.05

.04 -.05 -.05

.01

.15

.25

'ar East

-.04 - .07 -.20

5 6 p.18 -.lo

.08

.03

.03

.10

.02 -.02 - .08 -.07

.oo

.20

.22

.17 -.lo - .35

.19

-.13 -.02 - .09

.06

.01

.12 -.01 - .09 p.19 -.03 - .08 -.05 - .07

.08

.05 - .03 - .04

.01

.02

-.02 .05 .20 .09

p.09 -.08

.26 -.06

.19 -.03

.12 -.01

.08

.21 p.07 - .28

.28

.02

.01 -.06

.08

.01 -.08 -.03

.55 -.33 -.23 -.15 -.04 -.04 - .25 -.04 - .24 -.09 -.21 -.19 -.14 .05 -.02 .10 -.02 .10

-.06 .08 .27 .08 .68 .59 .37 .06 .42 .18 .80 1.00 .21 -.11 .04 -.17 .23 -.05 .60 .12 .12 .08

.08 .02 - .07 -.78 -.17 .43

.04 -.17

.02 -.07 -.06 p.20 .01 -.01

-.09 .07 -.05 - .30 -.lo -.30

.01 -.25 -.05 -.12 -.07 -.lo

.05 - . lo - .09 - .63 -.03 -.11 - .08 -.32 -.01 .09 p.05 .15

.03 -.l8 -.08 -.41 -.08 - .42

.oo -.01 - .09 .07

.10 -.06 - .02 -.28

.oo .03

.12 .02 -.14 -.61 -.06 - . lo

.02 -.21 -.02 -.30 -.19 -.27

. 00 -.04

.08 -.06

.06 -.05

-.05 -.31 -.32 -.08 -.09 -.lo

.04 -.18

.01 -.03

p.08 - .42 .05

-.14

-.18 -.18

-.19 -.02

.71 -.38

.59

.02 -32

-.27 -.75

.17

- .03

-.31

-.03 -.03

-.17 -.12 -.21 -.18

.17 - .96 -.31

-.84

.39

-.16 -1.00 -.64 - .34 -.07 -.08 -.06

.10 -.09 -.31 - .35 -.49 - .44

.09 -1.00 -.15 -.45

-.19 .21

-.34 -.06 -.04 -.19 -.a

.oo -.35 -.41 -.11

.03 -.12 - .26

.03

.02

.01

-.07 .08 .01

- .35 . 00

- 3 4 .~

-57

-.07

-.lo -.lo -.12 -.23 -.87 -.16 -.41

-.06

-.47

0.00016 0.00016 0.00016 0.00016

ns 0.0008 0.008 0.008 0.00016 0.00016 0.00016 0.00016 0.00016 0.00016 0.00016 0.00016 0.00016

ns ns

0.00016 0.00016 0.00016

ns ns ns

0.00016 ns

0.0016 0.00016

ns ns

0.00016 0.00016

ns 0.008 0.016

0.00016 0.00016

ns 0.00016 0.00016 0.0001 0.00016 0.0016

ns ns

0.00025 0.00025 0.05

ns ns ns

0.0008 0.00016 0.00016 0.00016 0.00016 0.00014 0.00016

ns 0.00014 0.00012 0.00016

Continued

GENETIC VARIATION IN NORTH AFRICA AND EURASIA 145

TABLE 2. Autocorrelation coefficients (Moran's I) for 19 alleles in seven population groups (continued)

Distance class ~

Allele 1

A C P ACPb AK' ADA' PGD" PGD" GPT' G P F

Sinotibetan ABO-I* ABO-IB Rh-D MN-M Kell-K HP' PGM'

ABO-IA A B O - I ~ MNM Duffy-a

Gc' ESD' PGM' PGD" P G P

Austric

HP'

Uralic MN-M Kell-K Duffy-a

Gc' HP'

.19 27 .56 .70 .32 .30

-.17 -.17

.05

.15 -.61 -.66

.13 1.00 2 3

.13

.17 - 2 2 -.07

2 7 .46

-21 .10

-24

.12

.61 5 0 .31 .85

2 3 4 5

.01

.15

.35

.78

.60

.42 -.07 -.06

.37

.57 2 5 .88 .46 .40

-.09 -.lo

.08

.21

.51

.68

.46

.40

.05

.05

21 .15 -.03 .38 -.05 -.31

-.16 2 0 .03 -25 .18 .01 -.lo 2 5 -.01

.89 .68 2 9 2 4 -.06 .01

-.67 - 2 0

.03 -.13

.40

.18

.49

.01

.03 -.lo

-.31 -.19 .16 -.04 2 6 -20

-.06 -.16 .06 .03 .oo .32

-.62 .68 .31 .ll .38 .35

-.14 .13

.03 -.06 -.02

.ll .lo .oo

.07 -50 2 8 -.09 -.02 .32

.53 2 4 -.02

-.05 .26 .05 .23 .22 .21

-.03 -.03

-.04 -.31 - .09 -.18

-10 -.40 -.15

-.19 .22 .33 .07 .13

- .04 -.02 -.05

.39

.03

.07 -.35

.28 -.03

6

-.04 .16

-.61 -.51 - .32 -.37

~

-.06 .08

-.18 .06

-.19 - .50 -.35

.05

.06 -21

.08

.05 - .21 -.18 -.06 -.05 -.02

.02 - .08 -.14

.03 p.87 -1.00

7

-.lo -.03 -.77 -21 -.18 -.11

__

-.16 .15 .09

- 2 2 -.19 - 5 0

.ll

.08 - 2 8 -.19 -.44 -.34 -21 -.09

.02 <:-.12

.07

- 6 2 P.70

-1.00 - .99

8 P

-.19 - .31

-.65 - .36 -.30

-.07 -.32

.12 -.41

-1.00

-.09 -24 -.45 -.07

-.84 - .77

-1.00

0.0008 0.00016 0.00014 0.00016 0.00016 0.00016

ns ns

ns ns ns ns ns

0.00014 ns

ns ns ns ns

0.00016 ns ns ns

0.00016 ns

ns 0.008 0.00014 0.008 n.nnni2

~

Significant autocorrelation coefficients are in boldtype. Pare upper limits of Bonferroni probabilities for the whole correlogram (Oden, 1984). Upper distance class limits are as follows: 1: 300, 2: 600, 3: 1,00", 4: 1,500, 5: 2,000, 6: 3,000, 7: 4,000, 8:.8,000 km; due to the distribution of sampling localities, for some alleles different numbers of classes had to be chosen. For seven classes, upper class limits are as follows: 1: 500,Z: 1,000,3: 1,500,4: 3,000,5: 4,000,6: 6,000, 7: 8,000 km; for six classes, upper class limits are as follows: 1: 500,Z: 1,000,3: 2,000,4: 3,000,5: 5,000, 6: 8,000 km; for five classes, upper class limits are as follows: 1: 500, 2: 1,000, 3: 1,500, 4 3,000, and 5: 8,000 km. Upper class limits for Sinotibetans are as follows: 1: 300,Z: 600, 3: 1,000, 4: 1,500, 5: 2,000, 6: 3,000, 7: 4,000 km.

dence for demic diffusion of early farmers in Europe (Menozzi et al., 1978; Ammerman and Cavalli-Sforza, 1984).

3. Long-distance differentiation: largest negative autocorrelation at the largest dis- tances. When near populations resemble each other genetically, then a long-distance differentiation pattern is just a cline whose central part shows a somewhat irregular variation and will be classified as a cline in the following sections of this paper. We shall speak of long-distance differentiation only in the absence of positive autocorrelation at short distances.

4. Isolation by distance: a decrease of ge- netic similarity with distance, from positive to insignificant (Barbujani, 1987b). This is expected when only drift and short-range dispersal affect populations (Kimura and Weiss, 1964; Morton et al., 1968, 1971),

within distances not much greater than in- dividual dispersal distances (Yasuda and Morton, 1969).

5 . Depression: highest negative autocorre- lation in one of the central classes, genetic differences at the extremes of the range ei- ther smaller or nonsignificant.

6. Intrusion: highest positive autocorrela- tion in one of the intermediate distance classes, genetic differences at the extremes of the range insignificant.

Only clines were considered consistent with the demic diffusion model. This is cer- tainly a conservative choice with respect to the hypothesis being tested. First of all, a certain initial allele-frequency differential is necessary for any migratory process to de- termine a cline (see Sokal et al., 1989b); even under demic diffusion, clines can be

G. BAREXJJANI ET AL. 146

established only if the genetic differences between the expanding and the recipient populations are larger than drift variances. In addition, drift and sampling variances add random noise to the autocorrelation pat- terns (Slatkin and Arter, 1991), and clinal distribution of allele frequencies may some- times evolve into, or be described as, long- distance differentiation patterns. These pat- terns were not considered to support the NDD model. Basically, in this way we re- garded as compatible with the NDD model only the patterns that by no means could be determined by random divergence of allele frequencies.

REFORMULATING THE NDD MODEL For the purposes of this paper, the NDD

model may now be restated as follows. Alle- les of Near Eastern origin spread in the neo- lithic, following four major routes. As a con- sequence, one should observe clinal or quasi- clinal autocorrelation patterns, when each of the IE, ID, AA, and AL population groups is analysed together with the NE popula- tions; the last mentioned represent the cur- rent closest relatives of the populations that underwent demic diffusion. Clinal or quasi- clinal patterns should be uncommon in the other linguistic families, unless other demic diffusion processes took place there.

Joint analysis of the NDD groups with the Near Eastern populations is necessary to ensure that the clines detected, if any, are really consistent with the spread of individ- uals from the Near East. Indeed, a different statistical technique described several clines in which the genetic differences with the Near East increased at decreasing dis- tances from there, among ST, CA, and UR speakers (Barbujani and Pilastro, 1993).

RESULTS The spatial autocorrelation coefficients

calculated are in Table 2; significances are two tailed. As expected, significant spatial structure is the rule rather than the excep- tion. This is partly due to the large size of the regions considered, with populations at the extremes of the range being often sepa- rated by several thousand kilometers. Ran- dom genetic divergence (and negative auto- correlation) is to be expected when distances

are so large. On the other hand, especially among the NDD language families, many alleles have similar frequencies in spatially close localities, as shown by the positive au- tocorrelation in the first distance class. This is the likely consequence of isolation by dis- tance, causing a decline of population re- semblance over limited distances (Yasuda and Morton, 1969; Wijsman and Cavalli- Sforza, 1984), as is to be expected under a wide range of evolutionary conditions. Ex- ceptions are ST- and AU-speaking popula- tions, where even at small distances genetic variation appears generally random.

Among NDD language families, most pat- terns (56/71) significantly depart from ran- domness, as assessed by Bonferroni crite- rion (Oden, 1984). Clines are observed for 9 out of 19 alleles in the IE group, 7 of 17 in ID, 3 of 16 in AA, and 12 of 19 in AL (Table 3). An example of the relationship between lon- gitude and allele frequencies is shown in Figure 3, where a clinal decrease of the fre- quency of the ADA2 allele with distance from the Near East is evident in two linguis- tic groups. When the frequencies of this al- lele were studied regardless of linguistic classification, the overall pattern did not ap- pear clinal (Barbujani, 1987a) because of the opposite trends among ID and AL speaking populations. Among the other families, con- versely, there is some evidence of patterns compatible with demic diffusion processes in the UR, but not in the ST and AU groups.

Statistically perfect, or nearly so, clines prevail for IE speakers (nine alleles), whereas a discontinuous decrease of auto- correlation with distance is common in the AL group (eight alleles), perhaps reflecting a more irregular distribution of sampling lo- calities. In the ID group, both kinds of clinal patterns are observed. The loci showing gra- dients differ among language families. Among the 15 alleles showing at least one significant gradient in the NDD groups, only for ESD is variation clinal and signifi- cant in all the groups tested.

Although a discussion of the possible adaptive values of these polymorphisms would be out of place here, this does not sug- gest that gene-frequency gradients are due to selection affecting specific alleles in spe- cific environments, but rather it points to

GENETIC VARIATION IN NORTH AFRICA AND EURASIA 147 TABLE 3. A summary of spatial autocorrelation results-Nostratic demic diffusion groups

Population group

of Europe Asia, Elamo-Dravidian, Afroasiatic Altaic and Near East

Indoeuropean Indoeuropean of

Allele and Near East and Near East and Near East

ABO-I* ABO-IB Rh-D MN-M Kell-K

Duffya HP'

Gc' GLO' ESD' PGM' ACP" ACPb AK' ADA' PGD" PGD' GFT1 GPTZ

Depression Cline Cline Cline ns

Depression Isolation by

distance Depression Depression Cline Cline Cline Cline Depression Cline Depression Cline ns ns

Cline Cline Cline ns ns

ns Cline

ns Cline Cline ns ns Depression Cline ns Depression Depression - -

Depression Depression ns Cline Isolation by

distance Depression Cline

Long dist. diff. ns

ns Depression Depression Cline ns ns ns

-

Cline Cline Cline Cline Long dist. diff.

Cline Depression

ns Cline Cline Cline Long dist. d Z . Long dist. diff. Cline Cline Cline Cline ns na

gene flow processes affecting all the ge- nome, and evident only at certain loci (Slat- kin, 1985,1987).

The spatial patterns among AU speakers do not depart from randomness for any of the 11 alleles tested (summarized in Table 4). Only the distribution of Hp variants among ST speakers corresponds to the ex- pectations of a model of demic diffusion. A greater number of significant spatial pat- terns is detected among UR speakers, where, however, only five loci could be tested.

DISCUSSION Spatial autocorrelation analysis among

speakers of languages whose distribution is here attributed to the NDD process reveals a strong patterning of genetic variation. Ge- netic variation appears largely consistent with a process of multidirectional demic dif- fusion: Nearly half of the alleles studied show clinal patterns. This seems remark- able because, as mentioned earlier, gene flow can determine a cline only in the pres- ence of a substantial initial allele frequency difference between populations. Computer simulations suggest that such a difference could be close to 40% (Sokal et al., 1989b), that is, larger than that observed at most

loci between current black and white popu- lations (see Roychoudhury and Nei, 1988). Therefore, under the NDD model, clines could not occur in more than a limited frac- tion of the genome, and even less so in the absence of a population expansion process.

A n alternative explanation for the gradi- ents observed, still consistent with demic diffusion, is a series of founder effects occur- ring in a phase of population expansion not accompanied by admixture, but followed by local gene flow. The genetic consequences of such a process have not been modelled in detail, but empirical studies in natural pop- ulations whose history is known show gradi- ents that were doubtless determined in this way (Easteal, 1985, 1988). Computer simu- lations (unpublished results by Barbujani, Sokal, and Oden) confirm that some allele frequency gradients in Europe are compati- ble with founder effects during a population expansion. At the present stage, therefore, a role of founder effects cannot be ruled out, but it remains largely to be explored.

In the NDD language families, eight sig- nificant clines are apparent at the AK, ADA, PGD, and GPT loci, which did not show cli- nal variation in a study where populations were jointly analysed by spatial autocorrela- tion methods, regardless of linguistic classi-

148 G. BARBUJANI ET AL.

0.25

0.20

0.15

0.10

0.05

0.00

0 00

@y ""0

0 0

$ * 0

0

A

A A

A A

A

AAA 2 A A

A A A A

A

A AA

1 1 I I

0 50 100 150

Longitude

Fig. 3. Plot of ADA' allele frequencies in the Near East (solid stars), and in the IE (open circles), ID (open triangles), AA (solid circles), and AL (solid triangles) groups. Clinal variation consistent with the spread of the ADA' allele from the Near East is evident among IE and AL speakers.

TABLE 4. A summary of spatial autocorrelation resu l t sa ther language families

Population group Allele Sinotibetan Austrie Uralic

ABO-I* ns ABO-IB ns Rh-D ns I@-M ns Kell-K ns Duff' -

HPl Gc' ESD' - PGM' ns PGD" - PGD" -

Cline -

ns ns

ns -

-

ns Long dist. diff. Isol&ion by dist.

ns ns

ns Long dist. diff.

- - - ns

Cline Cline Cline Cline - - - -

fication (Barbujani, 1987a). The patterns shown by AL-speaking populations differ lit- tle from those observed among IE and ID, thus supporting the view whereby Altaic languages should be included among those that were propagated by demic diffusion (Renfrew, 1991).

Although it is impossible exactly to quan-

tify the departure from a model not includ- ing major population movements from the Near East, these results are incompatible with both random variation and pure isola- tion by distance, that is, the two likely mi- croevolutionary scenarios associated with cultural diffusion of farming from the nu- clear zones.

Previous studies showed that genetic vari- ation is larger in the NDD groups than in four other families and in the Near East, as should be expected if the former evolved through incomplete admixture between ge- netically heterogeneous populations. Also, a direct proportionality was demonstrated be- tween genetic distances and geographic dis- tances from the putative place of origin of farming, the Near East (Barbujani and Pi- lastro, 1993). These findings, along with the evidence here provided by spatial autocorre- lation analysis, correspond to the expecta- tions of a model whereby the current pat- terns of genetic and linguistic resemblance largely reflect the centrifugal spread of neo-

149 GENETIC VARIATION IN NORTH AFRICA AND EURASIA

lithic farmers from the Near East, or the NDD model.

The evidence supporting the NDD model is very strong for the IE, ID, and AL groups, less so for AA speakers. These populations also showed the least marked, although sig- nificant, correlation between genetic differ- ences and geographic distances from the Levant (Barbujani and Pilastro, 1993). Methodological reasons, namely, the limited number of data points available and their irregular distribution in space, may have contributed to conceal a significant genetic structure. However, this argument applies to the ST and AU populations as well and, furthermore, cannot be tested. Therefore, based both on this spatial autocorrelation analysis and on the results of a comparison of genetic and geographic distances, there seems to be only scant evidence for demic diffusion in the area where AA languages are currently spoken.

According to Ruhlen (personal communi- cations, cited in Renfrew, 19911, AA lan- guages spread from Africa into Asia, and not vice versa. The few clines observed in this and in the previous study, although signifi- cant, would then be due to phenomena other than neolithic demic diffusion. One such phenomenon could be the expansion of Ar- abs, in historical times. That expansion, however, is unlikely to have caused the clines here described, since few Arabic- speaking populations of North Africa were considered. The vast majority of African samples in the AA group, on the contrary, spoke Berber or Cushitic. A drawback of this was that most AA populations in this study occupy marginal zones in the AA-speaking region. These zones may have been affected only marginally by demic diffusion, and their inhabitant’s genetic pool may include a large fraction of genes coming from previous Paleolithic residents. In addition, somatic characteristics and allele frequencies at the Rh, Gm, and HLA loci among Cushitic speakers suggest substantial Caucasoid ad- mixture (Excoffier et al., 1987).

Accordingly, a t least three views on the diffusion of AA languages are consistent with the results of this study: (1) AA lan- guages spread through demic diffusion in the neolithic, but the populations speaking

these languages underwent major demo- graphic transformations, and the clines re- sulting from demic diffusion are now appar- ent only at a limited set of loci; (2) AA languages spread from Africa by a yet-to- define demographic process that led to the establishment of a limited number of gradi- ents; (3) AA languages spread mostly through cultural contacts, either from Africa or from Asia, and the clines identified in the AA-speaking area are due to microevolu- tionary phenomena that have nothing to do with linguistic evolution.

Two papers that we were not aware of in the phase of development of this study sug- gest that views (2) and (3) may be more likely than (1). According to Starostin’s (1990) glottochronological calculations, un- der the assumption of a constant rate of lin- guistic divergence, Proto-Afroasiatic should have separated about 15,000 years ago from the other Nostratic proto-languages, that is, much earlier than posited by the NDD model. Conversely, the estimated times of separation among Proto-Indoeuropean, Proto-Elamo-Dravidian, and Proto-Altaic agree with the dates of demic diffusion esti- mated by archaeologists (Starostin, 1990). Recent linguistic findings, therefore, show a surprising agreement with the results of our analysis of genetic variation.

Other genetic observations are consistent as well. Mitochondria1 DNA data show evi- dence of two population expansions, in Eu- ropeans, and in Arabs and sub-Saharan Af- ricans, respectively, which occurred recently relative to the whole history of Homo sapi- ens sapiens (Templeton, 1993). The large ge- netic differences between these groups sug- gest that the two expansions have been largely independent.

Among the linguistic families other than AA, IE, ID, and AL, most patterns of genetic variation appear to be random; UR speakers are a possible exception. The quasi-clinal patterns observed among them, however, may largely reflect East-West differences between populations that share common an- cestors (whose existence is documented by linguistic similarities), but who then evolved in virtual independence, being sepa- rated at present by five to eight thousand kilometers; indeed, genetic distances be-

G. BARBUJANI ET AL. 150

tween the UR and the NE groups do not correlate with the respective geographic dis- tances (Barbujani and Pilastro, 1993). The few clinal patterns observed in a former study among AU speakers (Barbujani and Pilastro, 1993) are not confirmed by this spatial autocorrelation analysis.

These findings are in contrast with what is observed in other continents. Native American populations are markedly differ- entiated, but only in North America do they show large-scale genetic structure (Suarez et al., 1985; Schurr et al., 1990; ORourke et al., 1992; Torroni et al., 1992). In addition, with very few exceptions (Barrantes et al., 1990), genetic and linguistic differences ap- pear largely uncorrelated (Chakraborty, 1976; Chakraborty et al., 1976; Salzano et al., 1977; Murillo et al. 19771, or negatively correlated (Spuhler, 1972), which does not suggest coevolution of biological traits and languages. The overall picture emerging is one in which isolating mechanisms have caused founder effects and favoured random genetic drift, with processes of gene flow playing a comparatively minor evolutionary role (see also Cavalli-Sforza et al., 1993). In sub-Saharan Africa, conversely, genetic variation appears spatially random at the few loci studied on a sufficient scale, but genetic and linguistic distances correlate (Excoffier et al., 1987,19911, which points to long-distance displacement of linguistically related groups. Eurasia is clearly the conti- nent where the tightest relationships are ev- ident among different kinds of population markers, genetic and cultural, and between them and geography.

Is the NDD model the only possible expla- nation for these relationships? Local gene flow is expected to generate only the pat- terns that we classified as isolation by dis- tance (see Wijsman and Cavalli-Sforza, 1984; Barbujani, 1987a). To determine a sig- nificant autocorrelation structure over such a vast area, massive demographic phenom- ena (or, less likely, adaptive pressures) ap- pear necessary. However, no single recent, archaeologically or historically documented evolutionary process, seems to have had the potential for establishing clines over entire continents. Some or all Eurasian popula- tions have been affected by other demo-

graphic processes, such as long-range mi- gration (Gimbutas, 1979; Anthony, 1986), dispersal not associated with the origin of agriculture (Sokal, 19911, and local extinc- tions and recolonizations, all of them poten- tially affecting genetic andor linguistic di- versity. However, it would be difficult to explain how processes such as these, inde- pendently occurring in distinct areas, could eventually yield such a strong correspon- dence between patterns of linguistic and biological variation on a continental scale. Unless one evolutionary pressure predom- inated, and largely determined both genetic and linguistic differentiation, languages and allele frequencies should be only poorly associated. It seems reasonable to conclude that phenomena occurring after the neo- lithic may well account for the genetic char- acteristics of specific populations (see e.g., Cavalli-Sforza and Piazza, 1993), or for clines at few individual loci, but the overall population structure of Eurasia has proba- bly been determined in the neolithic, or ear- lier.

The possibility should then be considered that the clines here described originated earlier than 10,000 years ago. Despite some controversies in the interpretation of single pieces of evidence (Excoffier and Langaney, 1989; Maddison, 1991; Templeton, 1992), genetic and fossil data show some agree- ment in indicating that Homo sapiens sapi- ens originated in Africa (Wainscoat et al., 1986; Cann et al., 1987; Stringer and An- drews, 1988; Rouhani, 1989; Vigilant et al., 1991; Livingstone, 1992; Templeton, 1993) and was present in the Near East approxi- mately 90,000 years ago (Valladas et al., 1988). This means that two groups of hu- mans dispersed from the Levant at different times, along similar routes. The first group was composed of Paleolithic hunter-gather- ers coming north from Africa, who colonized Eurasia between 90 and 20 thousmd years ago (Renfrew, 1992a); the second group was composed by the first neolithic farmers, who started expanding in Eurasia roughly 10 thousand years ago (Renfrew, 1987,1992a). Both dispersal waves could have left a per- sistent mark in the genetic structure of con- temporary populations, in the form of clines. The first wave could have done that through

GENETIC VARIATION IN NORTH AFRICA AND EURASIA 151

a series of founder effects, the second wave either through the same mechanism or be- cause of admixture with the populations en- countered during demic diffusion, or by a mixture of the two.

Schematically, there seem to be two classes of evidence in favour of a Paleolithic origin of the gradients we described:

1. Some gradients encompass the area where U R languages are spoken, which also belong to the Nostratic macrofamily (Kaiser and Shevoroshkin, 1988), but are not consid- ered to have undergone neolithic demic dif- fusion.

2. Evidence of gradients is weakest (if still significant under certain statistical criteria; Barbujani and Pilastro, 1993) among AA. Some implications of this finding have al- ready been discussed. The clines observed nowadays at the MN, Hp, and AK loci in AA-speaking populations may then depend on processes preceding neolithic demic diffu- sion.

On the contrary, three types of consider- ations support a neolithic origin of the clines here described:

1. If these clines are not due to the farm- ers’ dispersal, an alternative explanation should be found for the parallelism between patterns of genetic and linguistic variation, demonstrated by many authors in the Old World (Cavalli-Sforza et al., 1988, 1992; Sokal, 1988; see also Barbujani, 1991 and Renfrew, 1992b), including Africa (Excoffier et al., 1987). A Paleolithic origin of present- time language families, paralleling the ini- tial dispersal of Homo sapiens sapiens, seems highly unlikely. Actually, critics of the models of coevolution between language and genes support a more recent origin of current language families (see, e.g., Cole- man, 1988, for Indoeuropean, and Cal- laghan, 1990).

2. The populations speaking ST and AU languages do not belong to the clines, in agreement with the NDD model, and in con- trast with the likely consequences of the spread of hunter-gatherers in Asia, which

cannot have been affected by linguistic bar- riers established much later.

3. As stated earlier, there is ample archae- ological evidence of cultural diffusion pro- cesses in the neolithic, which does not prove demic diffusion, but is perfectly consistent with it (Ammerman and Cavalli-Sforza, 1984; Renfrew, 1987, and references therein).

Of course, it is possible that both paleo- lithic and neolithic processes determined the strong patterning of genetic variation that is evident in contemporary populations of the NDD groups. Neolithic demic diffu- sion may have both reinforced some genetic effects of previous Paleolithic migrations and may have contributed to concealing others.

As Langaney et al. (1992) pointed out, rec- onciling human phylogeny with archaeologi- cal and linguistic evidence is no easy goal, and one that no single study is likely to achieve. At this stage, it may be worthwhile to reconsider the NDD model, which we had to put forward in a certainly oversimplified way, to allow testing of its predictions.

If the AA-speaking populations are con- sidered not to have expanded through demic diffusion in the neolithic, one of the two main biological objections to the NDD model is removed. Genetic data are then fully com- patible with demic diffusion in three areas of Eurasia, regardless of the statistical tech- nique employed for the analysis. These ar- eas represent linguistic units, which further corroborates the already abundant evidence for coevolution of biological and cultural traits (Sokal, 1988; Cavalli-Sforza et al., 1988,1992). Whether or not this multidirec- tional demic diffusion accounts for the gen- eral pattern of linguistic relatedness in Eur- asia is open to some doubt. Indeed, not all language families comprised in the Nos- tratic macrofamily may owe their current distribution to neolithic demic diffusion. At present, a strong case can be made for paral- lel diffusion of farming and language among IE, AL, and ID speakers, by means of a process that involved major demographic changes (and not purely cultural transfor- mations).

152 G. BARBUJANI ET AL

The results of this study are therefore con- sistent with the view that current biological and linguistic characteristics of most Eur- asian populations largely depend on a single dispersal phenomenon. Neolithic farmers diffusing from the Near East presumably brought their languages and their genes into new territories, thus establishing conti- nentwide clines that often end at language- family boundaries. This pattern is still rec- ognizable despite successive processes of population subdivision, drift, and gene flow that locally altered it. The status of Afroasi- atic-speaking populations will need further studies to be defined, but linguistic and ge- netic evidence agree in suggesting that their evolutionary history might have been dif- ferent.

ACKNOWLEDGMENTS We thank Robert R. Sokal, Luca Cavalli-

Sforza, Laurent Excoffier, and Michael Turelli for stimulating discussion across the years. Funds for this study were provided by the Italian Ministry for the University and Scientific Research (MURST).

LITERATURE CITED Aguirre A, Vicario A, Mazon LI, Estomba A, Martinez de

Pancorbo M, Arrieta Pic0 V, Perez Elortondo F, and Lostao CM (1991) Are the Basques a single and a unique population? Am. J. Hum. Genet. 49:450-458.

Ammerman AJ, and Cavalli-Sforza LL (1973) A popula- tion model for the diffusion of early farming in Eu- rope. In C Renfrew (ed.): The Explanation of Culture Change: Models in Prehistory. London: Duckworth,

Ammerman AJ, and Cavalli-Sforza LL (1984) The Neo- lithic Transition and the Genetics of Populations in Europe. Princeton, NJ: Princeton University Press.

Anthony DW (1986) The “Kurgan culture,” Indo-Euro- pean origins, and the domestication of the horse: A reconsideration. Curr. Anthropol. 27:291-313.

Barbujani G (1987a) Diversity of some gene frequencies in European and Asian populations. 111. Spatial corre- logram analysis. Ann. Hum. Genet. 51:345-353.

Barbujani G (1987b) Autocorrelation ofgene frequencies under isolation by distance. Genetics 11 7:777-782.

Barbujani G (1988) Diversity of some gene frequencies in European and Asian populations. IV. Genetic popu- lation structure assessed by the variogram. Ann. Hum. Genet. 52:215-225.

Barbujani G (1991) What do languages tell us on human microevolution? Trends Ecol. Evol. 6t151-155.

Barbujani G, and Pilastro A (1993) Genetic evidence on origin and dispersal of human populations speaking

pp. 348-358.

languages of the Nostratic macrofamily. Proc. Natl. Acad. Sci. USA 90:4670-4673.

Barbujani G, and Sokal RR (1990) Zones of sharp ge- netic change in Europe are also linguistic boundaries. Proc. Natl. Acad. Sci. USA87:1816-1819.

Barbujani G, Jacquez GM, and Ligi L (1990) Diversity of some gene frequencies in European and Asian popula- tions. V. Steep multilocus clines. Am. J. Hum. Genet. 47:867-875.

Barbujani G, Nasidze IS, and Whitehead GW (1994) Genetic diversity in the Caucasus. Hum. Biol. 66t639- 668.

Barrantes R, Smouse PE, Mohrenweiser HW, Gersho- witz H, Azofeifa J, Arias TD, and Nee1 JV (1990) Microevolution in lower Central America: Genetic characterization of the Chibcha-speaking group of Costa Rica, and a consensus taxonomy based on ge- netic and linguistic affinity. Am. J. Hum. Genet. 46: 63-84.

Bateman R, Goddard I, OGrady R, Funk VA, Mooi R, Kress WJ, and Cannell P (1990) Speaking of forked tongues. Curr. Anthropol. 31:l-13.

Bertranpetit J, and Cavalli-Sforza LL (1991) A genetic reconstruction of the history of the population of the Iberian peninsula. Ann. Hum. Genet. 55:5147.

Callaghan CA (1990) Comment on “Speaking of forked tongues,” by Bateman et al. Curr. Anthropol. 31: 15-16.

Cann RL, Stoneking M, and Wilson AC (1987) Mitochon- drial DNA and human evolution. Nature 325t31-36.

Cavalli-Sforza LL (1988) The Basque population and ancient migrations in Europe. Munibe 6:129-137.

Cavalli-Sforza LL, and Piazza A (1993) Human genomic diversity in Europe: A summary of recent research and prospects for the future. Eur. J . Hum. Genet. 1: 3-18.

Cavalli-Sforza LL, Piazza A, Menozzi P, and Mountain J (1988) Reconstruction of human evolution: Bringing together genetic, archaeological, and linguistic data. Proc. Natl. Acad. Sci. USA 85:6002-6006.

Cavalli-Sforza LL, Minch E, and Mountain JL (1992) Coevolution of genes and language revisited. Proc. Natl. Acad. Sci. USA 89.5620-5624.

Cavalli-Sforza LL, Menozzi P, and Piazza A (1993) Demic expansions and human evolution. Science 259: 639-646.

Chakraborty R (1976) Cultural, language and geograph- ical correlates of genetic variability in Andean Indi- ans. Nature 264:350-352.

Chakraborty R, Blanco R, Rothhammer F, and Llop E (1976) Genetic variability in Chilean Indian popula- tions and its association with geography, language and culture. SOC. Biol. 23r73-81.

Coleman C (1988) Comment on “Archaeology and Lan- guage. The Puzzle of Indo-European Origins” by C. Renfrew. Curr. Anthropol. 29:449-453.

Dolgopolsky AB (1987) The Indoeuropean homeland and lexical contacts of Proto-Indo-European with other languages. Mediterr. Lang. Rev. 3:7-31.

Easteal S (1985) The ecological genetics of introduced populations of the giant toad, Bufo nmrinus. 111. Geo-

GENETIC VARIATION IN NORTH AFRICA AND EURASIA 153

graphical patterns of variation. Evolution 39:1065- 1075.

Easteal S (1988) Range expansion and its genetic conse- quences in populations of the giant toad, Bufo mari- nus. Evol. Biol. 23:49-84.

Ehret C (1988) Language change and the material corre- lates of language and ethnic shift. Antiquity 62.564- 573.

Endler JA (1977) Geographic Variation, Speciation, and Clines. Princeton, NJ: Princeton University Press.

Excoffier L, and Langaney A (1989) Origin and differen- tiation of human mitochondria1 DNA. Am. J . Hum. Genet. 44:73-85.

Excoffier L, Pellegrini B, Sanchez-Mazas A, Simon C, and Langaney A (1987) Genetics and history of sub-Saharan Africa. Yearb. Phys. Anthropol. 30:151- 194.

Excoffier L, Harding RM, Sokal RR, Pellegrini B, and Sanchez-Mazas A (1991) Spatial differentiation of RH and GM haplotype frequencies in sub-Saharan Africa and its relation to linguistic affinities. Hum. Biol. 63:

Gimbutas M (1979) Three waves of the Kurgan people into Old Europe, 450G2500 B.C. Arch. Suisses An- thropol. Gen. 43:113-137.

Greenberg J H (1987) Language in the Americas. Stan- ford, C A Stanford University Press.

Guglielmino CR, Piazza A, Menozzi P, and Cavalli- Sforza LL (1990) Uralic genes in Europe. Am. J. Phys. Anthropol. 83:5748.

Harlan JR (1971) Agricultural origins: Centers and non- centers. Science 174:468-474.

Hassan FA (1973) On mechanisms of population growth during the Neolithic. Curr. Anthropol. 14535-542.

Kaiser M, and ShevoroshkinV (1988) Nostratic. Annu. Rev. Anthropol. 17:309-329.

Kimura M, and Weiss GH (1964) The stepping-stone model of population structure and the decrease of ge- netic correlation with distance. Genetics 49.561-576.

Langaney A, Roessli D, Hubert van Blyenburgh N, Dard P (1992) Do most human populations descend from phylogenetic trees? Hum. Evol. 7:47-61.

Livingstone FB (1992) Gene flow in the Pleistocene. Hum. Biol. 64:67-80.

Maddison D (1991) African origin of human mitochon- drial DNA reexamined. Syst. Zool. 40:355-363.

Menozzi P, Piazza A, and Cavalli-Sforza LL (1978) Syn- thetic maps of human gene frequencies in Europeans. Science 201:786-792.

Morpurgo-Davies A (1989) Comment on “Models of change in language and archaeology,” by C. Renfrew. Trans. Philol. SOC. 87:165-171.

Morton NE, Yee S, Harris DE, and Lew R (1971) Bioas- say of kinship. Theor. Pop. Biol. 2:507-524.

Morton NE, Miki C, and Yee S (1968) Bioassay of popu- lation structure under isolation by distance. Am. J . Hum. Genet. 20:411-419.

Mourant AE, Kopec AC, and Domaniewska-Sobczak K (1976) The Distribution of Human Blood Groups and Other Polymorphisms. Oxford: Oxford University Press.

273-307.

Murillo F, Rothhammer F, and Llop E (1977) The Chipaya of Bolivia: Dermatoglyphics and ethnic rela- tionships. Am. J. Phys. Anthropol. 46:45-50.

Nocentini A (1993) Power and limits of the genetic clas- sification of languages. Mankind Quart. 33:265-281.

Oden NL (1984) Assessing the significance of a spatial correlogram. Geogr. Anal. 16:l-16.

O’Rourke DH, Mobarry A, and Suarez BK (1992) Pat- terns of genetic variation in native America. Hum. Biol. 64:417-434.

Piazza A (1993) Who are the Europeans? Science 270:

Piazza A, and Menozzi P (1983) Geographic variation in human gene frequencies. In J Felsenstein (ed.): Numerical Taxonomy. Berlin: Springer, pp. 444-450.

Piazza A, Menozzi P, and Cavalli-Sforza LL (1981) Syn- thetic gene frequency maps of man and selective ef- fects of climate. Proc. Natl. Acad. Sci. USA 78:2638- 2642.

Rendine S, Piazza A, and Cavalli-Sforza LL (1986) Sim- ulation and separation by principal components of multiple demic expansions in Europe. Am. Natur. 128:681-706.

Renfrew C (1987) Archaeology and Language. The Puz- zle of Indo-European Origins. London: Jonathan Cape.

Renfrew C (1989) Models of change in language and archaeology. Trans. Philol. SOC. 87:103-155.

Renfrew C (1991) Before Babel: Speculations on the ori- gins of linguistic diversity. Cambridge Archaeol. J. 1 :3-23.

Renfrew C (1992a) World languages and human dis- persals: A minimalist view. In JA Hall and IC Jarvie (eds.): Transition to Modernity. Cambridge: Cam- bridge University Press, pp. 11-68.

Renfrew C (1992b) Archaeology, genetics, and linguistic diversity. Man 27:445-478.

Rouhani S (1989) Molecular genetics and the patterns of human evolution: Plausible and implausible models. In P Mellars and CB Stringer (eds.): The Human Rev- olution: Behavioural and Biological Perspectives on the Origin of Modern Humans. Edinburgh: Edinburgh University Press, pp. 49-61.

Roychoudhury AK, and Nei M (1988) Human Polymor- phic Genes. World Distribution. Oxford: Oxford Uni- versity Press.

Ruhlen M (1991) A Guide to the World’s Languages, 2nd ed. Vol. 1: Classification. London: Edward Arnold.

Ruhlen M (1992) An overview of genetic classification. In JA Hawkins and M Gell-Mann (eds.): The Evolu- tion of Human Languages. Redwood City, C A Addi- son-Wesley, pp. 159-189.

Salzano FM, Nee1 JV, Gershowitz H, and Migliazza EC (1977) Intra- and intertribal genetic variation within a linguistic group: The Ge-speaking Indians of Brazil. Am. J. Phys. Anthropol. 47:337-348.

Schurr TG, Ballinger SW, Gan W, Hodge JA, Merry- wether DA, Lawrence DN, Knowler WC, Weiss KM, and Wallace DC (1990) Amerindian mitochon- drial DNAs have rare Asian mutations at high fre- quencies, suggesting that they derived from four pri-

1767-1769.

154 G. BARBUJANI ET AL.

mary maternal lineages. Am. J. Hum. Genet. 46:613- 623.

Sgaramella-Zonta L, and Cavalli-Sforza LL (1973) A method for the detection of a demic cline. In NE Mor- ton (ed.) Genetic Structure of Populations. Honolulu: University of Hawaii Press, pp. 128-135.

Sherratt A, and Sherratt S (1988) The archaeology of Indo-European, an alternative view. Antiquity 62: 584-595.

Slatkin M (1985) Gene flow in natural populations. Annu. Rev. Ecol. Syst. 16:393-430.

Slatkin M (1987) Gene flow and the geographic struc- ture of natural populations. Science 236:787-792.

Slatkin M, and Arter HE (19911 Spatial autocorrelation methods in population genetics. Am Natur. 138:499- 517.

Sokal RR (1979) Ecological parameters inferred from spatial correlograms. In GP Patil and M Rosenzweig (eds.): Contemporary Quantitative Ecology and Re- lated Ecometrics. Fairland, MD: International Co- operative Publishing House, pp. 167-196.

Sokal RR (1988) Genetic, geographic, and linguistic dis- tances in Europe. Proc. Natl. Acad. Sci. USA85:1722- 1726.

Sokal RR (1991) Ancient movement patterns determine modern genetic variances in Europe. Hum. Biol. 63: 589-606.

Sokal RR, and Menozzi P (1982) Spatial autocorrelation of HLA frequencies in Europe supports demic diffu- sion of early farmers. Am. Natur. 119:l-17.

Sokal RR, and Oden NL (1978) Spatial autocorrelation in Biology. 1. Methodology. Biol. J . Linn. SOC. 10:229- 249.

Sokal RR, and Wartenberg DA (1981) Space and popula- tion structure. In D Griffith and R McKinnon (eds.). Dynamic Spatial Models. Aalphen aan der Rijn, The Netherlands: Sijtoff and Noordhoff, pp. 18C213.

Sokal RR, Harding RM, and Oden NL (1989a) Spatial patterns of human gene frequencies in Europe. Am. J . Phys. Anthropol. 80r267-294.

Sokal RR, Jacquez GM, and Wooten MC (198913) Spatial autocorrelation analysis of migration and selection. Genetics 121.84-55.

Sokal RR, Oden NL, and Wilson C (1991) Genetic evi- dence for the spread of agriculture in Europe by demic diffusion. Nature 351:143-145.

Sokal RR, Oden NL, and Thomson BA (1992) Origins of Indo-Europeans: Genetic evidence. Proc. Natl. Acad. Sci. USA 89:7669-7673.

Spuhler J N (1972) Genetic, linguistic, and geographical distances in native North America. In JS Weiner and J Huizinga (eds.): The Assessment of Population Af- finities in Man. Oxford: Clarendon Press, pp. 72-95.

Starostin SA (1990) A statistical evaluation of the time- depth and subgrouping of the Nostratic macrofamily. In R Dawkins and J Diamond (eds.): Evolution: From Molecules to Culture. Cold Spring Harbor, NY: Cold Spring Harbor Press, p. 33.

Stringer CB, and Andrews P 11988) Genetic and fossil evidence for the origin of modern humans. Science 239:1263-1268.

Suarez BK, Crouse JD, and O’Rourke DH (1985) Genetic variation in North Amerindian populations: The geog- raphy of gene frequencies. Am. J . Phys. Anthropol.

Templeton AR (1992) Human origins and analysis of mitochondrial DNA sequences. Science 255:737.

Templeton AR (1993) The “Eve” hypothesis: A genetic critique and reanalysis. Am. Anthropol. 95:51-72.

Tills D, Kopec AC, and Tills RE (1983) The Distribution of Human Blood Groups and Other Polymorphisms. Suppl. 1. Oxford: Oxford University Press.

Torroni A, Schurr TG, Yang CC, Szathmary EjE, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM, and Wallace DC (1992) Na- tive American mitochondrial DNA analysis indicates that the Amerind and the Nadene populations were founded by two independent migrations. Genetics 130:153-162.

Valladas H, Reyss JL, Joron G, Valladas 0, Bar-Yosef B, and Vandremeersch B (1988) Thermoluminescence dating of Mousterian ‘Proto-Cro-Magnon’ remains from Israel and the origin of modern man. Nature 331:614-616.

Vigilant L, Stoneking M, Harpending H, Hawkes K, and Wilson AC (1991) African populations and the evolu- tion of human mitochondrial DNA. Science 253:1503- 1507.

Wainscoat JS, Hill AVS, Boyce AL, Flint J, Hernandez M, Thein SL, Old JM, Lynch JR, Falusi AG, Weather- all DJ, and Clegg JB (1986) Evolutionary relationship of human populations from an analysis of human DNA polymorphism. Nature 319:491493.

Wijsman EM, and Cavalli-Sforza LL (1984) Migration and genetic population structure with special refer- ence to humans. Annu. Rev. Ecol. Syst. 15r279-301.

Yasuda N, and Morton NE (1969) Studies on human population structure. In JF Crow and JV Nee1 (eds.): Proceedings of the 3rd International Congress of Hu- man Genetics. Baltimore, MD: Johns Hopkins Press, pp. 249-265.

Zvelebil M, and Zvelebil KV (1988) Agricultural transi- tion and Indo-European dispersal. Antiquity 62574- 583.

67:217-232.