genetic diversity and genetic burden in humans

28
Genetic Diversity and Genetic Burden in Humans Henry Harpending (corresponding author) Department of Anthropology, University of Utah Salt Lake City UT 84112 USA phone: 801 581 3776 fax: 801 581 6252 email: [email protected] Gregory Cochran Department of Anthropology, University of Utah April 2005

Upload: utah

Post on 28-Mar-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Genetic Diversity and Genetic Burden in Humans

Henry Harpending (corresponding author)

Department of Anthropology, University of Utah

Salt Lake City UT 84112 USA

phone: 801 581 3776

fax: 801 581 6252

email: [email protected]

Gregory Cochran

Department of Anthropology, University of Utah

April 2005

Abstract

We discuss categories of genetic diversity in humans. Neutral diversity, population differences

in frequencies of genetic markers that we think are invisible to natural selection, provides

a passive record of population history but is otherwise of little interest in human biology.

Genetic variation related to disease can be separated into mutational noise and variation

due to selection, either ongoing selection else effects of a past environment.

We distinguish consequences of genetic diversity for fitness, relevant to evolution, and

consequences for well-being, relevant to medicine and public health. We call genetic varia-

tion that causes impairment of health or well-being of individual humans “apparent genetic

burden” and variation that has effects on fitness but not well-being “unapparent genetic

burden.” We use “burden” to distinguish these notions from the classical concept of “ge-

netic load” that refers to effects on population fitness, a concept formulated by Morton et al.

(1956).

We distinguish adapted genes and adapted genotypes: an adapted gene is a gene that

increases fitness of its bearer either in heterozygous or homozygous state or both, while an

adapted genotype is a genotype that increases fitness of its bearer but is not transmitted

intact to future generations. Balanced polymorphisms in which the heterozygote is superior

in fitness may generate most adapted genotypes. In the face of major rapid environmental

change adapted genotypes appear first but over time they are replaced by adapted genes.

The presence of adapted genotypes is a good indication of recent environmental change:

for example there are apparently many polymorphisms in domestic animals of this nature,

responses to domestication, and many fewer in wild animals (and in humans.)

Keywords: genetic burden, mutation, rapid selection, genetic polymorphism, selective sweep,

Ashkenazi Jews

Introduction

Much of human genetic diversity is generally thought to be neutral, that is invisible to

natural selection. Neutral diversity is of interest because it contains a kind of record of

human history. The global patterns of human neutral diversity are now well known and

understood as a consequence of DNA typing and sequencing technologies that have been

applied to samples from many human populations.

Diversity that is not neutral, selected diversity, may also be important for understanding

history, but a more important question in many fields is about how much variation in health

and well-being is a consequence of mutation and natural selection.

Much selected diversity must be just random damage, deleterious mutations that are

eventually removed by natural selection. Many rare genetic disorders are probably in this

category. Since the highest known spontaneous mutation rates for disorders like this are on

the order of 10−4, anything with a total fitness cost greater than 10−4 is almost certainly not

in this category.

Many polymorphic systems that have health and fitness consequences are leftovers, adap-

tations to past environments. For example sickle cell heterozygotes enjoy some protection

from falciparum malaria. Malaria can be eliminated in a region instantly in evolutionary

time, but the genetic variation that evolved in response to the malaria persists as detrimental

or lethal in the new environment.

Neutral diversity

Neutral diversity is DNA diversity that is invisible to natural selection. Much of our DNA

is apparently “junk” in which mutations have no biological consequences. Even synonymous

mutations in coding regions may be close to neutral but apparent exceptions are known.

Neutral diversity is then of only indirect evolutionary and medical interest. On the other

1

hand it does provide a passive record of population history precisely because it is unnoticed

by natural selection: there is information about past population movements and episodes of

growth and decline in it. Neutral diversity also provides markers in the genome useful for

genetic counseling and for searching for nearby functional genes in linkage disequilbrium.

Discussions of neutral genetic diversity distinguish diversity within and between popu-

lations. Diversity within populations refers directly or indirectly to the average amount of

difference between random DNA sequences chosen from that population. This can be mea-

sured by the mean pairwise number of sequence differences in a collection of sequences, by

the squared difference in length for repeat polymorphisms, by heterozygosity for collections

of simple markers, and so on.

Within population diversity is greatest in sub-Saharan Africa and gradually declines

away from Africa into the New World. For a collection of repeat polymorphisms (Eller, 1999)

diversity declines approximately 15 to 20% from Africa to northeast Asia and as much as 30%

into the New World. The interpretation is usually that the population of Africa is “older”,

that is that they were demographically successful earlier, while populations outside Africa

are primarily derived from migrants from the edge of the population of sub-Saharan African

ancestors (Eswaran, 2002). Interestingly the increased diversity in Africa was not apparent

until genetic systems like STRs became available that were not so subject to ascertainment

bias. In the standard encyclopedic summary of classical marker polymorphisms (Cavalli-

Sforza et al., 1994) there is no trace of it. Classical markers were mostly discovered in

Europeans, biasing the sample toward those polymorphic in Europeans.

The standard way of expressing diversity between populations is some variant of a statistic

called Fst. We pick pairs of genes or sequences from within populations, measure how

different they are on average, and call this Ho. We then pick random pairs from the whole

sample without regard to population membership, measure their average difference, and call

this He. The statistic

Fst = (He −Ho)/He (1)

2

can then be interpreted as a measure of inter-population diversity because it describes the

fraction of total diversity He that is between populations. For a collection of human groups

from all over the earth this statistic is usually 0.10 to 0.15. This has been well-known for

decades (Lewontin, 1972) and has remained unchanged with the advent of new large datasets.

Lewontin emphasized in his article that 1/8 is a small number, that most neutral genetic

diversity is within populations, and therefore that human group differences were small and

minor. It is not clear why 1/8 should be considered small: the relative differences among

human populations implied by a kinship of 1/8 are the same as the relative differences

among sets of half siblings. Most of us do not think that kinship between half siblings or

grandparents and grandchildren is trivially small. (Note that in behavioral ecology genetic

similarity is often discussed in terms of the coefficient of relationship while in genetics the

coefficient of kinship is more frequently used. In the simple case the coefficient of relationship

is twice the coefficient of kinship (Bulmer, 1994), that is 1/4 between half sibs.)

As Edwards points out (Edwards, 2003), Fst statistics do not take into account the

correlation structure of gene-frequencies in groups. Correlated differences in the frequencies

of alleles that influence a phenotypic trait can cause arbitrarily large differences in that

trait, even while the great majority of genetic diversity is intra-population. These correlated

gene-frequency differences are exactly what we expect from natural selection, but they also

accumulate under pure random drift.

A consequence of human neutral diversity is that it is possible to assign an individual to

an ethnic group correctly nearly every time when a reasonable number of variable sites, 50

to 100, have been typed.

The pattern of neutral differences between populations in our species is one of isolation by

distance: populations close to each other on the ground are genetically similar to each other,

while distant populations are more genetically dissimilar at neutral loci. This means that

there is a high correlation between measure of geographical distance and genetic distance

between populations. This has been a surprise to many because the global pattern can be

3

very different for external appearance. Dark skin color, for example, is found in Africa and

again in Australia and much of Oceania but is not so common in populations in between.

This similarity in skin color is not reflected in the neutral genome, suggesting that skin color

differences are caused by natural selection rather than passive genetic similarity. Similarly

there are small “pygmoid” people in Africa whose appearance is similar to that of groups in

the Indian ocean, Australia, and Oceania. There is no evidence from neutral genes of any

population kinship of these groups, so again the natural inference is that the morphology

has been generated and maintained by selection.

Diversity generated by random mutation

Many genetic diseases and disorders are thought to be consequences of random damage to

genetic material. In general the total impairment of fitness due to mutation at a locus is,

at equilibrium, close to the mutation rate at the locus. Therefore any condition whose total

effect on population fitness is greater than 10−4 is almost certainly not purely mutation

driven since this is the upper limit of known mutation rates.

Sometimes, either accidentally or deliberately, we lump a number of different disease enti-

ties with specific individual causes into a broad symptomatic category: such broad category

diseases can be more considerably more common.

For example, consider congenital deafness, which is made up of many different diseases,

basically anything that in some way interferes with hearing before birth. In the past, perhaps

as many as one in 500 individuals were born deaf. About half of those cases were caused

by prenatal infection, mostly rubella (now prevented by vaccination). The other half were

caused by many different mutations: tens of loci have been identified, although a single locus,

connexin-26, accounts for 40% of genetic deafness. Among Caucasians, a single connexin-

26 mutation dominates (35delG.): there is reason to think that it became common due to

heterozygote advantage, probably as a defense against some skin disease. About 15% of the

4

many mutations causing deafness are syndromic—that is, they cause other symptoms as well

as deafness. Waardenburg syndrome causes a white forelock as well as deafness, and other

mutations causing syndromic deafness can have much more serious effects.

Even with a historical fitness load of some 10−3, not particularly high, and even though it

is a broad category rather than a specific disease entity, most congenital deafness was most

likely a result of pathogen pressures rather than mutational noise. Part was in the form

of direct infection (rubella) and part in form of the long-term consequences of a selective

response to pathogens, the common connexin-26 mutations. Although we need to be careful

to avoid lumping that leads to this kind of imprecise disease classification, it still seems

that we must look to sources other than mutational pressure to explain familiar disorders

with large effects on population fitness, such as diabetes, obstructive arterial disease, or

schizophrenia. Parenthetically, the existence of a syndromic subset of a given disease is

probably a sign that it is really a broad category rather than a specific disease entity.

The extent to which minor discomforts, aches and pains, and idiosyncratic ill health re-

flect random mutational damage is not known. If an average rate of deleterious mutation per

gene per generation is 10−5, and there are slightly fewer than 105 genes in the human genome,

an average gamete may carry less than one new deleterious mutation. Many of these may

have no apparent consequences for fitness in the current environment. On the other hand

this damage accumulates. The Morton, Crow, and Muller genetic load theory attempted

to estimate how much random deleterious noisy diversity we each carried by looking at the

increase of morbidity and mortality with inbreeding in humans: random mutational dam-

age, mostly recessive, should be “revealed” by inbreeding while other diversity, for example

that maintained by higher heterozygote fitness, should not increase much with inbreeding.

Unfortunately the overall results of those efforts were inconclusive: there were deleterious

effects of inbreeding but the slope of the regression of fitness loss on amount of inbreeding

was squarely between the predictions of the deleterious mutants model and the heterozygote

advantage model.

5

Selectively maintained diversity

Advantageous genes

Although mutation is the ultimate source of all genetic diversity, persistence of genetic varia-

tion that affects health and fitness must usually reflect natural selection. We will distinguish

between selection for advantageous genes and selection for advantageous genotypes. An

advantageous gene is a gene that increases fitness of its bearer either in heterozygous or

homozygous state or both. An advantageous genotype is a genotype that is not directly

transmitted to offspring, for example the heterozygous genotype of the sickle cell polymor-

phism in malarial environments.

Persistence into adulthood of the ability to synthesize lactase is apparently an advan-

tageous gene among dairying people, especially when there is consumption of unfermented

milk. Since those unable to digest lactose suffer from gastrointestinal upset in such environ-

ments, lactase persistence is likely advantageous in both the homozygous and heterozygous

states. Lactase persistence is an advantageous gene, in the sense that it is unconditionally

favored by selection in a certain environment, that of consumption of unfermented dairy

foods.

Another unconditionally advantageous gene may be the CCR5 ∆32 variant that protects

individuals from HIV progression to AIDS (and probably became common in European

populations because it gave protection against smallpox). There are no known ill effects of

this gene, even though it seems to break a part of the immune system. Presumably in an

environment of hyperendemic HIV infection the gene would go to fixation.

We don’t know many cases of such purely advantageous genes, perhaps because they go

to fixation so rapidly that it is difficult to find any in the transient stage of going to fixation:

certainly the more advantageous such a gene is the more rapidly it will displace any other

alleles at the locus.

6

Advantageous genotypes

The best understood advantageous genotype in humans is the carrier state of the sickle cell

gene in areas of hyperendemic falciparum malaria. Carriers of a single sickling mutation

(HbS) of the hemoglobin beta chain enjoy a fitness advantage while bearers of two copies

of HbS are severely ill and usually die in early childhood in the absence of modern medical

care. The heterozygote HbA/HbS is an advantageous genotype in the sense that bearers

of that genotype enjoy health and fitness advantages in that environment. The result is

a balanced polymorphism, in which selection maintains the genetic diversity at the locus.

As HbA becomes more common in a population, more and more HbS alleles are spending

lifetimes in heterozygous HbA/HbS genotypes and the fitness, hence the frequency of HbS

increases. Similarly as HbA declines in frequency, perhaps because of drift, the fitness of HbS

alleles decreases. At the equilibrium, the fitness of each allele is exactly the same, indeed

this is the definition of equilibrium.

The biology in this case is that heterozygotes have slightly impaired red blood cells, and

the impairment hurts the malaria more than it hurts the individual. The net effect of this

adaptation to malaria on population fitness is unfortunately not great. Consider a simple

numerical example: in some environment saturated with falciparum malaria the fitness of

wild-type HbA/HbA homozygotes is 1, the fitness of HbA/HbS heterozygotes is 1.2, and

the fitness of those HbS/HbS homozygotes, born with sickling disease, is 0. In the absence

of the sickling gene the population fitness is 1 since everyone is HbA/HbA, while with the

polymorphism the equilibrium frequency of the wild-type allele is 1.2/1.4 ≈ 0.86 and the

average population fitness is only 1.03, a trivial increase.

In this population the infant mortality rate due to sickle cell disease is 0.142 ≈ 0.02 or 20

per thousand live births. The tragedy of this adaption is that malaria can disappear with

environmental change but the genetic adaptation to malaria persists for centuries, initially

causing the deaths soon after birth of about 2% of all babies born but declining slowly

7

afterward.

What if mutation and natural selection hit upon a gene that had the same physiological

consequences as the HbA/HbS heterozygous state? Such a gene would quickly proceed to

fixation in the malarial environment and population fitness would increase by 20% rather

than 3%. An advantageous gene at a locus must usually be superior to an advantageous

genotype but more difficult for mutation and natural selection to “discover.” In general

balanced polymorphisms, advantageous genotypes, are the first responses to drastic envi-

ronmental change, and they are later replaced by advantageous genes. The most familiar

examples of balanced polymorphism in humans are these responsive to malaria, especially

falciparum malaria. Deadly malaria is both a very strong and a new selective agent, and

indeed the presence of numerous balanced polymorphisms is testimony to its novelty.

We expect to find advantageous genotypes in populations that have been exposed to

recent strong selection, and that seems to be the pattern (Orr, 2005). Many are known in

domesticated animals, many fewer in wild animals.

Polymorphisms in domesticates

We know of a number of examples of genes of strong effect in domesticated animals, and in

many cases we understand the selective pressures involved.

Myostatin is a protein that regulates and limits muscle growth, and we find several

different high-frequency myostatin mutations in some breeds of beef cattle, such as Belgian

Blue, Piedmontese, and South Devon. These myostatin mutations render the gene inactive

and result in a phenotype known as double muscling, which increases muscle mass—the

target of selection in these breeds. Homozygotes have calving difficulties, and thus these

myostatin mutations caused adapted genotypes but it is not an adapted gene in our usage

(McPherron and Lee, 1997; Grobet et al., 1997).

Pigs too have at least one prominent gene of strong effect, a mutated ryanodine receptor

that has a significant effect on carcass lean content (Wendt et al., 2000). It seems to have

8

become common in the 1970s, as breeders attempted to adjust to changing market tastes

of less lard and more lean meat. Homozygotes for the gene have poorer meat quality than

normal pigs, and are extremely susceptible to stress. Again, this is a mutation that produces

an adapted genotype in heterozygotes at the cost of terribly maladapted homozygotes.

Another clear example is hyperkalemic periodic paralysis in quarter horses, characterized

by sporadic attacks of muscle tremors, weakness, and collapse. This is a dominant muscle

disorder caused by a mutant allele of the skeletal muscle form of the sodium channel, found

only in descendants of the quarter horse Impressive (Naylor, 1997; Cannon et al., 1995). It

has spread rapidly since it originated in 1968, and it now exists in approximately 100,000

quarter horses. It produces a muscular phenotype that has been selected by show judges: of

the top 15 halter horses in 1992, 13 were descendants of Impressive.

Perhaps the most dramatic example, in terms of a significant life-history change, are the

mutations causing twinning in domestic sheep. Twinning is rare in wild sheep and is still

rare in most breeds of domesticated sheep. It reduced fitness in typical past environments,

since it was difficult for the ewe to take care of two offspring. But in modern conditions,

where sheep experience very favorable environments and considerable human intervention,

twinning increases fitness, and now several different mutations that induce twinning are

found at polymorphic frequencies in some breeds of sheep. We now know of four different

mutations of the same X-linked gene (bone morphogenetic protein 15) that cause twinning

in heterozygotes and sterility in homozygotes (in Inverdale, Hanna, Belclare, and Galway

sheep). We know of another twinning mutation involving the bone morphogenetic protein

1B receptor (Booroola sheep - homozygotes have triplets!) (Davis, 2005; Galloway et al.,

2000; McNatty et al., 2001). This pattern, multiple mutations in the same enzyme path, is

common in cases of recent strong selection. We see the same thing in malaria defenses such

as HbS and the thalassemias, which tweak the hemoglobin molecule in different ways. There

is also a broader clustering, mutations aimed at a common physiological target, including

the malaria-defense examples like G6PD, which changes the environment inside the red cell,

9

and Melanesian ovalocytosis, which changes the red cell membrane.

In terms of those searching for genotypic correlates of disease in humans, advantageous

genotypes would appear as “major genes” affecting some inherited disease. Random diversity

consequent to deleterious mutation would lead to rare genes that would likely be geograph-

ically local. In fact much of human gene hunting has turned up rare local mutants but very

few major genes of large effect (Orr, 2005). The simple prediction from evolutionary theory

is that there aren’t very many, and those that are present in our population are responses to

environmental change that is both recent and severe. Many such major genes are known in

domestic animals because, we think, the domestication process and later selection have been

precisely the kind of new environment of strong selection that leads initially to the evolution

of advantageous genotypes and only later to the evolution of advantageous genes.

Leftover diversity

As we discussed above, one of the great tragedies of the HbS response to hyperendemic

falciparum malaria is the long persistence of the genetic response when the external agent,

malaria, is eliminated. Thousands of premature deaths and compromised lives in North

America are caused, indirectly, by malaria since they are the costs of an adaptation no

longer relevant in a malaria-free environment. The genetic diversity is “leftover” from a past

environment.

There are large numbers of humans on earth, we are mobile, and we occupy a wide range

of environments. For all these reasons we are especially prone to epidemic infectious diseases.

Many known polymorphisms in our species are thought to be responses to infectious disease,

essentially again “leftovers” from past environments to the extent that we have managed to

control or eradicate the pathogens.

10

Infections

The importance of expensive genetic defenses to infection varies geographically, depending

on the historical impact of infectious disease pressure. On a worldwide basis, they account

for the majority of cases of genetic disease. Genetic defenses against falciparum malaria

are far and away the most important part of this story. Wherever falciparum malaria has

existed for a long time, mainly the tropical areas of the Old World, we find many expensive

genetic defenses against malaria, and those defenses account for the great preponderance of

all genetic disease in populations originating in those regions. The most important are sickle-

cell (HbS), HbC, HbE, alpha- and beta- thalassemia, Melanesian ovalocytosis, and G6PD

deficiency. They are far more common than ’noise’ genetic diseases caused by mutational

pressure. For example, about 250,000 children are born with sickle cell anemia each year

worldwide, while about 5,000 boys are born with Duchenne’s muscular dystrophy, one of the

most common mutation-driven genetic diseases (WHO, 1994). These malaria defenses give

heterozygote advantage while causing problems of varying severity in homozygotes. They

are not adapted genes, but instead produce adapted genotypes. This sort of simple, sloppy

adaptation is atypical of species near equilibrium with the selective environment. Normally,

adaptations involve a number of genes that work together in a coordinated way. We think

that this evolutionary sloppiness exists because falciparum malaria, as we know it today,

has not been around very long; perhaps as little as 4,000 years (Carter and Mendis, 2002).

The same appears to be true of the anti-malaria genetic defenses. For example, the main

African variety of G6PD deficiency is roughly 2500 years old (Sabeti et al., 2002). The end

of the ice age and increased population density resulting from the spread of agriculture seem

likely to have favored the spread of this virulent form of malaria. This trend was particularly

unpleasant in Africa , where mosquito strains evolved that prefer humans to animals, which

greatly facilitated malaria transmission. Vivax malaria is milder, propagates over a wider

range of conditions, and is likely much older than falciparum malaria. There is at least one

11

genetic defense that completely prevents infection—the Duffy negative allele—which does

not appear to cause disease. Duffy negative, which is near fixation among central Africans,

is thus a good example of an adapted gene.

There are a number of other genetic diseases that clearly have been favored by selec-

tion and seem likely to have given protection against some pathogen other than falciparum

malaria. Cystic fibrosis (CF), the most common serious genetic disease among Europeans,

alters the cystic fibrosis transmembrane regulator (CFTR) protein. There is good reason

to believe that it has reached its present high frequency through natural selection (Slatkin

and Bertorelle, 2001). Salmonella typhi, the agent for typhoid fever, uses CFTR to enter

intestinal cells, and inefficiently infects cells heterozygous for the main European mutation,

deltaf508 (Pier et al., 1998). That mutation is apparently considerably older than the malaria

defenses. Typhoid has had a smaller impact on human fitness than falciparum malaria, but

it may have been around longer. The typhoid carrier state, caused by a persistent infection

of the gall bladder in a few percent of those infected, would have allowed typhoid propagation

at low population density.

The common European hemochromatosis mutation (C282Y) probably works in a similar

way, altering a pathogen receptor. The HFE protein is normally expressed in intestinal crypt

cells , where it regulates iron absorption. The C282Y-mutant form of HFE fails to reach the

cell surface. In homozygotes this sometimes leads to iron overload and organ damage such as

cirrhosis of the liver. At one time the general opinion was that increased iron absorption in

hemochromatosis carriers yielded heterozygote advantage, but this was always dubious, since

the C282Y HFE is only common in northwestern Europe, a region not particularly prone

to anemia. Recent studies (Hunt and Zeng, 2004) show that heterozygotes show only a tiny

increase in iron absorption. More likely the HFE protein served as the entry port for some

intestinal pathogen (Rochette et al., 1999) which was thwarted by the C282Y mutation.

Many intestinal pathogens have an important role in child mortality and thus impact

fitness. Some of these likely defenses appear to involve other anti-pathogen mechanisms,

12

for example upregulating inflammatory mechanisms. Familial Mediterranean Fever (FMF) ,

quite common among populations originating in the Middle East, is caused by a number of

defective alleles of pyrin, a protein that down-regulates granulocyte-mediated inflammation.

Mutant pyrin alleles can result in harmful fever and inflammation in homozygotes, but it is

quite easy to believe that unleashing granulocytes might protect heterozygotes from some

pathogen (Online Inheritance in Man:OMIM, MEFV).

In a similar vein, alpha-1-antitrypsin (AAT) is a protease inhibitor of leukocyte elastase,

and thus deficient AAT alleles might protect heterozygotes against some pathogen. Of course

homozygotes (and heterozygotes to a lesser degree) run very significant risks of emphysema.

Two different low-activity alleles of AAT reach polymorphic frequencies in Europeans (Online

Inheritance in Man:OMIM, AAT).

Congenital deafness is caused by many different mutations (over 100 have been identified)

and is thus, for the most part, a good example of a broad syndrome caused by mutational

pressure. Altogether perhaps 1 in 1500 children have some form of genetic deafness. Most

of the individual mutations are rare, since in the past deaf individuals had very low fitness.

However, mutations of a single gene (connexin-26) account for about 40% of congenital deaf-

ness. In Europeans, the main mutation is 35delG, which has a single origin approximately

10,000 years ago. Other populations have their own characteristic connexin-26 mutation:

R143W in Africa, 167 delT among Ashkenazi Jews, 235delC among Japanese and Koreans.

Somehow selection has increased the frequency of certain connexin-26 mutations in a num-

ber of populations, despite the severe negative effects in homozygotes. There is evidence

that these mutations affect the skin (Meyer et al., 2002), resulting in a somewhat thicker

epidermis and saltier sweat, which may act as a barrier to pathogens.

13

Social and sexual selection

While the majority of our genetic burden from adapted genotypes seems to consist of re-

sponses to infectious agents, recent strong social or sexual selection ought to lead to the

same kind of transient outcomes. An interesting case is that of the Ashkenazi Jews, who are

burdened with an array of inherited disease, especially recessive disease (Risch et al., 2003).

We have calculated (Cochran and Harpending, 2005) that fewer than half of contemporary

Ashkenazi Jews bear none of the ethnic-specific mutations. While the best-known of these

disorders is Tay-Sachs disease, there are several others that affect the same metabolic path-

way. Another cluster of Ashkenazi diseases is the “DNA repair cluster” including mutations

in BRCA1 and BRCA2. This may be a misnomer, since these genes are directly involved in

early brain growth and development and their role in DNA repair may be secondary.

While the presence of the large number of genetic disorders among Ashkenazi has often

been attributed to genetic drift due to a severe bottleneck in their history, several lines

of evidence show that there never was any bottleneck. First, the only evidence for such

a bottleneck is the presence of the disorders: there is no independent record of any such

bottleneck in Ashkenazi history. There is not even any hint of a bottleneck. Second, we were

able to obtain data on allele frequencies of a large number of polymorphisms and examine

overall population genetic diversity of several European and Middle Eastern populations.

While several Middle Eastern groups showed heterozygosity reduction implying either a

bottleneck or a long interval of small size, Ashkenazi showed no such loss of diversity. Any

bottleneck severe enough to have led to the elevated frequency of even one of the Ashkenazi

disorders would have left a signature of diversity loss, and there is no such signature. Finally

the clustering of the disorders in a few pathways denies the bottleneck and drift hypothesis,

since drift that lead to elevated frequencies of deleterious mutations would not have acted

in only a few specific biochemical pathways.

There has been speculation that these traits are responses to selection by infectious

14

disease, following the model of the sickle cell gene and falciparum Malaria. Adaptation to

tuberculosis was one model that had some currency but failed to find support from family

studies. At any rate the infectious disease hypothesis is extremely implausible since none of

the Ashkenazi disorders rose to appreciable frequencies in their neighbors who lived, literally,

across the street. The selective pressure must have been in some sense social.

Ashkenazi history may furnish clues about what the selective social environment was.

From about the ninth to the seventeenth century they were a nearly completely endogamous

group with extreme occupational specialization in finance, trade, and management. The

amount of gene flow outward from the population is unknown, but there was almost none into

it. Only after this time, as the demographically successful population outgrew its specialized

niche, did Ashkenazi branch out into trades, shopkeeping, and occasionally even farming. In

societies prior to the demographic transition of the eighteenth century and continuing to the

present there was a positive correlation between wealth and Darwinian fitness everywhere

anyone has looked. The extreme occupational specialization of this population and the

possibility that occupation success led to wealth and to differential fitness is the likely context

of strong selection that led to the presence of the numerous Ashkenazi genetic disorders. A

simple implication is that heterozygotes for these largely recessive disorders will be better

at whatever skills or abilities were the target of selection (Cochran and Harpending, 2005).

Detecting ongoing evolution directly from the gene

While past studies of ongoing evolution have been dominated by reasoning from the pheno-

type to the locus, as in sickle cell anemia or lactose tolerance, it has become possible in the

last few years to reason instead from characteristics of the gene directly. There is a lively

literature on this theme (Tishkoff et al., 2001; Slatkin and Rannala, 2000). We will not

review this literature here but will instead describe several particularly interesting examples

of ongoing selective sweeps inferred purely from the pattern of variation at the loci (Wang

15

et al., 2004; Ding et al., 2002; Harpending and Cochran, 2002; Hardy et al., 2005; Stefans-

son et al., 2005; Mekel-Bobrov et al., 2005; Evans et al., 2005). While similar patterns are

known for genes modulating response to falciparum malaria, all these are happening in genes

involved in behavior and central nervous system function.

At any genetic locus extant alleles or variants are tips of a tree of descent, called a

coalescent tree. The depth of this tree, that is the time back to the most recent common

ancestor at the locus, varies from locus to locus because it is a random process. Most nuclear

loci seem to coalesce between 1 and 2 million years ago, while haploid loci like mtDNA or

the non-recombining part of the Y chromosome (nrY) coalesce much more recently.

[Figure 1 about here.]

Figure 1 shows a typical gene tree from a neutrally evolving locus, in the left panel, and

one from a locus in which a selective sweep has occurred, in the right panel. The dots on

the branches represent mutations that have occurred in the history of the locus. “Selective

sweep” refers to the rapid spread of an advantageous new allele: in figure 1 the subtree on the

right side of the right tree has undergone such a sweep. At some time in the recent (recent

with respect to the scale of the total tree) past an advantageous variant has appeared on

that part of the tree. Many of the alleles at the locus are of the recent type (shown in red in

the figure), they share a recent common ancestor, and they are not very different from each

other. This pattern of allelic descent is called a “star phylogeny”: a recent common ancestor

and little or no differentiation since that ancestor.

A gene tree is of course not directly observable but many important properties of the

tree can be inferred from extant alleles. For example if we had samples from the tree in the

right panel we would notice that one subclade had a high frequency but very little diversity

within the clade. Compare, for example, the red clade in the left and right panels of the

figure: on the left the red clade alleles would be different from each other because there are

old branches that separate them. With the history shown in the right panel the red alleles

16

would be all very similar to each other. These differences are conventionally measured by

the mean pairwise sequence difference (MPD) among all possible pairs of alleles. A clade at

high frequency with low MPD is a strong suggestion of a selective sweep occurring.

A second consequence of the history shown in the right panel is that much of whatever

sequence diversity due to mutation is present in the red clade occurs as singletons. Any

mutation in the history of the sample since the sweep began will most likely have a single

descendant in the sample. In the tree in the left panel, many mutations have more than one

descendant in the sample. We can evaluate the extent of “starness” in a clade by examining

the ratio of the normalized number of segregating site, that is mutations, to the average

number of pairwise sequence differences. The familiar Tajima D statistic is a normalized

difference between these two numbers, designed to assess statistical significance.

A third consequence of a recent selective sweep is that there is linkage disequilibrium

between the favored type and neighboring parts of the chromosome. Since all the copies

share a recent common ancestor, little time has elapsed for recombination to erase the initial

disequilibrium. Scans of the genome for regions of high disequilibrium is a simple method

for detecting likely targets of recent selection.

There is a repeat polymorphism in the human D4 dopamine receptor gene that is associ-

ated with variation in personality or behavior of its bearers. Each repeat is 48 bp in length

corresponding to 16 amino acid changes in the length of the gene product. The common

worldwide variant has 4 repeats (4R) while the (7R) variant is undergoing a sweep as in-

ferred from reduced SNP diversity within the variant and high linkage disequilibrium. There

is little linkage disequilibrium around the ancestral 4R variant.

The literature suggests that carriers of 7R may be at elevated risk of childhood attention-

deficit hyperactivity disorder without the neurological impairments that often are found with

ADHD (Swanson et al., 2000). The bearers may also be more impulsive, more risk-seeking,

and less altruistic in experimental games.

The MAPT locus has two major haplotypes that have been separate for an estimated

17

three million years. They are distinct because one is an inversion, but maintenance of two

clades for such a long time is highly unlikely. The recent evolutionary success of the H2

clade in Europeans, where it is spreading rapidly, as well as the lack of diversity within the

clade suggest that it was a Neanderthal allele introduced into the modern human population

invading Europe (Hardy et al., 2005). This locus, when damaged, is implicated in tangle

disorders of the brain.

A similar pattern is found in both Microcephalin (MCPH1 ) and ASPM, related genes

that determine brain size in mammals. In each case a haplotype seemingly only distantly

related to the others at the locus is undergoing strong positive selection. The sweep in

microcephalin appears to have started about 40,000 years ago, i.e. at the time modern

humans entered Europe. The sweep in ASPM is apparently much more recent, about 6,000

years.

While the effects of these variants are not yet well understood, it is striking that the first

and best-described human genes undergoing vigorous selective sweeps are genes involved

in behavior and central nervous system development. A conventional impression from an-

thropology textbooks is that modern humans appeared about 40,000 years ago and have

remained essentially unchanged since then. These examples show that evolution is ongoing

in our species, especially evolution of the brain. There is even a suggestion in the literature

that Microcephalin regulates BRCA1, one of the loci prominent in our discussion of Ashke-

nazi Jewish evolution. In other words ongoing evolution of the brain and the particular turn

it happened to take among northern European Jews are almost certainly parts of the same

story (Xu et al., 2004).

Consequences of genetic diversity

The viewpoints of an evolutionary biologist and of a health professional on genetic diversity

are quite different. While biologists are interested in differences in fitness and in ongoing evo-

18

lution, health professionals are interested in well-being of individuals. These two categories

may often not overlap very much.

A medical disease is some trait that impairs well-being or shortens lifespan or both.

Interestingly, the impairment of well-being may be to the bearer of the disease (tuberculosis,

cancer) or even to others (sociopathy, bad breath). A disease in the evolutionary sense is a

trait that lowers fitness. There are then medical diseases that do not impair fitness or some

that, in the case for example of sociopathy, elevate fitness. There are, conversely, disorders

in the strict evolutionary sense, like chastity, left-handedness (Aggleton et al., 1994), or male

homosexuality, that are not considered to be medical diseases. There is speculation in the

literature of human evolutionary ecology that male homosexuality has a genetic basis and

that it is maintained by an inclusive fitness effect in which males may improve the fitness of

their relatives by giving them resources. However there is no support in the data available

(Bobrow and Bailey, 2001) for such behavior. The very weak genetic influence on male

homosexuality might as easily be explained by genetic variation in resistance to a pathogen

that may cause the trait directly.

We can think of genetic burden as the net contribution of genetic diversity to disease

in either sense. This burden may be apparent, meaning that it is responsible for medical

disease, or it may be unapparent meaning that it does not lead a diminished quality of

human life. For example sickle cell anemia causes compromised and prematurely-terminated

lives in hundreds of thousands of people because it is expressed after birth: an apparently

healthy baby falters. Melanesian ovalocytosis is a parallel adaptation to malaria in parts

of the Pacific, and a homozygote has never been observed. While the homozygous state is

apparently lethal it is expressed so early in development that its consequences for human

well being, that is of the mother and the family, are small. The apparent burden of the sickle

cell polymorphism is great while the burden of ovalocytosis is mostly unapparent.

From this perspective a major goal of prenatal diagnosis and selective abortion is to

convert apparent burden to unapparent burden. The ethical problems surrounding this field

19

are complex, of course, but from the viewpoint of allocating burden, human intervention in

some ways mirrors evolutionary processes.

References

Aggleton, J., Bland, J., Kentridge, R., Neave, N., 1994. Handedness and longevity: archival

study of cricketers. British Medical Journal 309, 1681–1684.

Bobrow, D., Bailey, J., 2001. Is male homosexuality maintained via kin selection? Evolution

and Human Behavior 22, 361–368.

Bulmer, M., 1994. Theoretical Evolutionary Ecology. Sinauer, Sunderland, MA.

Cannon, S., Hayward, L., Beech, J., Jr., R. B., 1995. Socium channel inactivation is impaired

in equine hyperkalemic periodic paralysis. J. Neurophysiol. 73, 1892–1899.

Carter, R., Mendis, K., 2002. Evolutionary and historical aspects of the burden of malaria.

Clinical Microbiology Reviews 15, 564–594.

Cavalli-Sforza, L. L., Menozzi, P., Piazza, A., 1994. The History and Geography of Human

Genes. Princeton Univ. Press, Princeton, NJ.

Cochran, G., Harpending, H., 2005. The natural history of Ashkenazi intelligence. Jour.

Biosoc. Sci. Published online.

Davis, G., 2005. Major genes affecting ovulation in sheep. Genet. Sel. Evol. 37, 511–523.

Ding, Y.-C., Chi, H.-C., Grady, D., Morishima, A., Kidd, J., Kidd, K., Flodman, P., Spence,

M., Schuck, S., Swanson, J., Zhang, Y.-P., Moyzis, R., 2002. Evidence of positive selection

acting at the human dopamine receptor D4 gene locus. Proc. Nat. Acad. Sci. USA 99,

309–314.

20

Edwards, A., 2003. Human genetic diversity: Lewontin’s fallacy. Bioessays pp. 798–801.

Eller, E., 1999. Population substructure and isolation by distance in three continenta l

regions. Amer.J. Phys. Anth. 108, 147–159.

Eswaran, V., 2002. A diffusion wave out of Africa—the mechanism of the modern human

revolution? Current Anthropology 43, 749–774.

Evans, P., Gilbert, S., Mekel-Bobrov, N., Vallender, E., Anderson, J., Vaez-Azizi, L.,

Tishkoff, S., Hudson, R., Lahn, B., 2005. Microcephalin, a gene regulating brain size,

continues to evolve adaptively in humans. Science 309, 1717–1720.

Galloway, S., McNatty, K., Cambridge, L., Laitinen, M., Juengel, J., Jokiranta, T., McLaren,

R., Luiro, K., Dodds, K., Montgomery, G., Beattie, A., Davis, G., Ritvos, O., 2000.

Mutations in an oocyte-derived growth factor gene (BMP15) cause increased ovulation

rate and infertility in a dosage-sensitive manner. Nature Genetics 25, 279–283.

Grobet, L., Martin, L., Poncelet, D., Brouwers, B., Riquet, J., Schoeberlein, A., Dunner, S.,

Menissier, F., Massaband, J., Fries, R., Hanset, R., Georges, M., 1997. A deletion in the

bovine myostatin gene causes the double-muscled phenotype in cattle. Nature Genetics

17, 71–74.

Hardy, J., Pittman, A., Myers, A., Gwinn-Hardy, K., Fung, H., de Silva, R., Hutton, M.,

Duckworth, J., 2005. Evidence suggesting that Homo neanderthalensis contributed the

H2 MAPT haplotype to Homo sapiens . Biochem. Soc. Trans. 33, 582–585.

Harpending, H., Cochran, G., 2002. In our genes. Proc. Nat. Acad. Sci. USA 99, 10–12.

Hunt, J. R., Zeng, H., 2004. Iron absorption by heterozygous carriers of the HFE C282Y

mutation associated with hemochromatosis 1,2,3. American Journal of Clinical Nutrition

80, 924–931.

21

Lewontin, R. C., 1972. The apportionment of human diversity. In: Hecht, M. (Ed.), Evolu-

tionary Biology, Appleton–Century–Crofts, New York, volume 6, pp. 381–398.

McNatty, K., Juengel, J., Wilson, T., Galloway, S., Davis, G., 2001. Genetic mutations

influencing ovulation rate in sheep. Reprod. Fertil. Dev. 13, 549–555.

McPherron, A., Lee, S.-J., 1997. Double muscling in cattle due to mutations in the myostatin

gene. Proc. Natl. Acad. Sci. U.S.A. pp. 12457–12461.

Mekel-Bobrov, N., Gilbert, S., Evans, P., Vallender, E., Anderson, J., Hudson, R., Tishkoff,

S., Lahn, B., 2005. Ongoing adaptive evolution of ASPM , a brain size determinant in

Homo sapiens . Science 309, 1720–1722.

Meyer, C., Amedofu, G., Brandner, J., Pohland, D., Timmann, C., Horstmann, R., 2002.

Selection for deafness? Nature Medicine 8, 1332–1333.

Morton, N. E., Crow, J. F., Muller, H. J., 1956. An estimate of the mutational damage in

man from data on consanguineous marriages. Proc. Nat. Acad. Sci USA 42, 855–863.

Naylor, J., 1997. Hyperkalemic periodic paralysis in quarter horses. Vet. Clin. North Am.

Equine Pract. 13, 129–144.

Online Inheritance in Man:OMIM, AAT, . Protease inhibitor 1: PI and Alpha-a-antitrypsin.

March 17, 2004, MIM +107400.

Online Inheritance in Man:OMIM, MEFV, . Familial Mediterranean fever gene. June 30,

2004, MIM *608107.

Orr, H. A., 2005. The genetic theory of adaptation: a brief history. Nature Reviews Genetics

6, 119–127.

22

Pier, G., Grout, M., Zaidi, T., Meluleni, G., Mueschenborn, S., Banting, G., Ratcliff, R.,

Evans, M., Colledge, W., 1998. Salmonella typhi uses CFTR to enter intestinal epithelial

cells. Nature 393, 79–82.

Risch, N., Tang, H., Katzenstein, H., Ekstein, J., 2003. Geographic distribution of dis-

ease mutations in the Ashkenazi Jewish Population supports genetic drift over selection.

American Journal of Human Genetics 72, 812–22.

Rochette, J., Pointon, J., Fisher, C., Perera, G., Arambepola, M., Aricchi, D. K., Silva, S. D.,

Vandwalle, J. L., Monti, J., Old, J., Merryweather-Clarke, A., Weatherall, D., Robs, K.,

1999. Multicentric Origin of Hemochromatosis gene (HFE) mutations. Amer. J. Hum.

Genet. 64, 1056–1062.

Sabeti, P., Reich, D., Higgins, J., Levine, H., richter, D., Schaffner, S., Gabriel, S., Platko,

J., Patterson, N., McDonald, G., Ackerman, H., Campbell, S., Altshuler, D., Cooper, R.,

Kwiatkowsk, D., Ward, R., Lander, E., 2002. Detecting recent positive selection in the

human genome from haplotype structure. Nature 419.

Slatkin, M., Bertorelle, G., 2001. The use of intraallelic variability for testing neutrality and

estimating population growth rate. Genetics 158, 865–874.

Slatkin, M., Rannala, B., 2000. Estimating allele age. Annual Review of Genomics and

Human Genetics 1, 255–249.

Stefansson, H., Helgason, A., Thorleifsson, G., Steinthorsdottir, V., Masson, G., Barnard, J.,

Baker, A., Jonasdottir, A., Ingason, A., Gudnadottir, V., Desnica, N., Hicks, A., Glfason,

A., Gudbjartsson, D., Jonsdottir, G., j. Sainz, Agnarsson, K., Birgisdottir, B., s. Ghosh,

Olafsdottir, A., Cazier, J., Kristjansson, K., Frigge, M., thorgeirsson, T., Gulcher, J.,

Kong, A., Stefansson, K., 2005. A common inversion under selection in europeans. Nature

Genet. 37, 129–137.

23

Swanson, J., Oosterlaan, J., Murias, M., Schuck, S., Flodman, P., Spence, A., Wasdell, M.,

Ding, Y., Chi, H.-C., Smith, M., Mann, M., Carlson, C., Kennedy, J., Sergeant, J., Leung,

P., Zhang, Y.-P., Sadeh, A., Chen, C., Whalen, C., Babb, K., Moyzis, R., Posner, M., 2000.

Adhd children with a 7–repeat allele of the drd4 gene have extreme behavior but normal

performance on critical neuropsychological tests of attention. PNAS 97, 4574–4579.

Tishkoff, S., Varkonyi, R., Cahinhinan, N., Abbes, S., Argyropoulos, G., Destro-Bisol,

G., Drousiotou, A., Dangerfield, B., Lefranc, G., Loiselet, J., Piro, A., Stoneking, M.,

Tagarelli, A., Tagarelli, G., Touma, E., Williams, S., Clark, A., 2001. Haplotype diversity

and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial

resistance. Science 293, 455–462.

Wang, E., Ding, Y.-C., Flodman, P., Kidd, J., Kidd, K., Grady, D., Ryder, O., Spence,

M., Swanson, J., Moyzis, R., 2004. The genetic architecture of selection and the human

dopamine receptor D4 (DRD4) gene locus. Am. J. Hum. Genet. 74, 931–944.

Wendt, M., Bickhardt, K., Herzog, A., Fischer, A., Martens, H., Richter, T., 2000. Porcine

stress syndrome and PSE meat: clinical symptons, pathogenesis, etiology and animal

rights aspects. Berl. Munch. Tierarztl. Wochenschr. 113, 117–190.

WHO, 1994. Guidelines for control of haemoglobin disorders. Technical Report

WHO/HDP/HB/GL/94.1, World Health Organization.

Xu, X., Lee, J., Stern, D., 2004. Microcephalin in a DNA damage response protein involved

in regulation of CHK1 and BRCA1. J. Biol. Chem. 279, 34091–34094.

24

List of Figures

1 Typical trees of descent of alleles at a neutral locus, in the left panel, and ata locus where a selective sweep is occurring, in the right panel . . . . . . . . 26

25

Figure 1: Typical trees of descent of alleles at a neutral locus, in the left panel, and at alocus where a selective sweep is occurring, in the right panel

26