classical population genetics genetic variation at one locus with 2 alleles source: theory of...

38
Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology , Jonathan Roughgarden, Prentice Hall, Upper Saddle River, NJ, 1996 reprint of 1979 edition, Part One, pp17-100

Upload: aubrey-leonard

Post on 13-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Classical Population Genetics

Genetic Variation at one Locus with 2 Alleles

Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden, Prentice Hall, Upper Saddle River, NJ, 1996 reprint of 1979 edition, Part One, pp17-100

Page 2: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Consider a population with two Alleles, A and a.

Possible Genotypes: AA, Aa, and aa

Suppose that we have a population of size, N (usually a large number). Distribution of Genotypes

NAA = Number of AA Homozygotes

NAa = Number of Heterozygotes

Naa = Number of aa Homozygotes

N = NAA + NAa + Naa

Page 3: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Two important frequencies for us to consider

Genotype Frequencies:

N

NR

N

NH

N

ND

aa

Aa

AA

Gene Frequences:

N

NNq

N

NNp

Aaaa

AaAA

2

22

2

These are important relationships be sure that you understand them.

Page 4: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Hardy – Weinberg Law:

If we assume no external forces or processes, within one generation,

D → p2 H → 2pq R → q2

and these frequencies remain stable for all future generations.

Page 5: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

What assumptions are being made:

1. Individuals of different genotypes do not differ in fertility.

2. Random union of gametes.

Page 6: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

3. All individuals, regardless of genotype, have an equal likelihood of survival from gamete to adulthood.

Page 7: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

An example to illustrate what is being said by the law:

Suppose an aquarium owner purchases a variety of fish with two alleles that determine their fin color.

A = red fin

a = blue fin

In the shipment the owner receives, 75% of the fish have red fins, 25% have blue fins, and none have purple fins. What will be the eventual distribution of fin colors in the aquarium?

After one generation:

Note: Aa = purple fin

4

1

4

300 qp

16

1

8

3

16

62

16

9 2000

20 qRqpHpD

Page 8: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Proof of the Law:

Because of random union of gametes:

prob(AA) = p*p = p2

prob(Aa) = prob(aA) = p*q

or prob(Heterozygote) = 2pq

prob(aa) = q*q = q2

Note: gamete frequencies at start are p and q.*

At this point we use the third assumption that equal ratios of gametes survive, mate, and the zygotes survive until the adult stage to produce gametes for the next generation.

Thus, D = p2 H = 2pq R = q2

* Gametes are haploid and previous information about previous diploids’ population is lost.

Page 9: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

What is missing?

1. Natural selection

2. Differential fertility and/or survival

3. Mutation

4. Immigration from other populations

5. Genetic drift

Page 10: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Are any assumptions unnecessary?

1. Random mating also produces the same results. Just slightly more complex to show than the random union case.

2. The requirement of distinct generations is not necessary. However, this assumption makes the algebra easier.

3. If there is a different distribution of genotypes among the sexes, the stable position does not emerge for two generations (assuming that all other assumptions hold – in particular the survival one)

Page 11: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Enter Natural Selection:

Consider

Survival Rates: lAA , lAa , Iaa

Fertility Rates: mAA, mAa, maa

Let WAA = lAA*mAA WAa = lAa*mAa Waa =Iaa*maa

Page 12: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Go Back to slides 2 and 3 and we can derive the number of gametes in the population at time, t + 1:

# from AA adults= 2*WAA*pt2 * Nt

# from Aa adults = 2*WAa*2*pt*qt*Nt

# from aa adults = 2*Waa* qt2*Nt

The total population size at time, t+1, is one half the sum of these three quantities.

Nt+1 = (WAA*pt2 + WAa*2*pt*qt + WAA* qt

2)*Nt

An equation such as this is called a difference equation.

Page 13: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

This is an example of a “fast” evolutionary change (< 40 years). It was caused by industrial pollution in the area of Birmingham, England. Before pollution these moths had majority coloration (light) that was difficult to see against the lichen of trees growing in the area. After pollution the bark became black and the lichen died. This meant that the light colored insects became easy prey. So “selection pressure” favored the dark colored moths.

Page 14: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

The difference equation for the population size leads to these two absolutely essential difference equations for the gene frequencies:

11

221

1

2

)(

tt

aatAattAAt

tAatAAtt

pq

WqWqpWp

pWqWpp

So what?

These equations coupled with the difference equation for the population size allow us to assign different fertility and survival rates to the existing three genotypes and model how the gene pool and population size change as a result.

Question: Is this absolutely the way things will turn out?

Page 15: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

One last notational adjustment to make matters a little more simple.

We will work to eliminate the preponderance of W’s from the equation by multiplying them by a suitable constant. We “normalize” by selecting one of the W’s to be 1. Say WAA=1. Then we must divide the remaining two W’s by WAA. Thus,

wAA = 1 (=WAA/WAA)

wAa = WAa/WAA

waa = Waa/WAA

Note that we denoted these normalized values with a small, italicized w.

Page 16: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

And, FINALLY, we define the selectivity coefficients:

sAA = 1 – wAA

sAa = 1 – wAa

saa = 1 – waa

Notice that, in general, these are selectivity against. That means that a value of 0 is good and positive decreases the gene pool.

Example:

mAA = 100 mAa = 50 maa = 25

lAA = ¾ lAa = ½ laa = 1/5

Then,

WAA = (100)(3/4) = 75 WAa = (50)(1/2) = 25 Waa = (25)(1/5) = 5

wAA = 75/75 = 1 wAa = 25/75 = 1/3 waa = 5/75 = 1/15

sAA = 0 sAa = 2/3 saa = 14/15

Page 17: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

With all of these substitutions we finally have an expression for pt+1 that is “manageable”.

)2(

)1(

:

2

)(

1

221

taatAat

tAatt

aatAattt

Aatttt

qspsq

qspp

or

wqwqpp

wqppp

The simulations that follow all used the first form of the difference equation.

We will consider:

1. Selection against a dominant allele

2. Selection against a recessive allele

3. Heterozygote superiority

Page 18: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

HWApprox(p, wdd, wdr, wrr, n, q, i, pp, pn, qp, qn, hw) ≔ Prog q 1 - p ≔ i 0 ≔ pp p ≔ qp q ≔ pn p ≔ qn q ≔ hw [] ≔ Loop If i > n RETURN hw hw APPEND(hw, [[i, pn]]) ≔ pn dp(pp, qp, wdd, wdr, wrr) ≔ qn 1 - pn ≔ pp pn ≔ qp qn ≔ i i + 1≔

p·(p·wdd + q·wdr) dp(p, q, wdd, wdr, wrr) ≔⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ 2 2 p ·wdd + 2·p·q·wdr + q ·wrr

Writing a program to implement this model is a quite straight forward process. This program is written in a functional programming language used in the Derive® Computer Algebra System.

Page 19: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Selection Against the Dominant Allele

p0 = .9 wAA = .8 wAa = .8 waa = 1

Note that even though the recessive allele made up only 10% of the gene pool, in approximately 70 generations it makes up the entire gene pool.

Page 20: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Selection Against the Recessive Allele

p0 = .1 wAA = 1 wAa = 1 waa = .8

The end result is expected, but there is a qualitative difference. In the former case the decline of the majority gene started slowly and then accelerated. Here the initial decline is rapid and then the rate slows down.

Page 21: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Selection in Favor of Heterozygote

Selection against the recessive is four times that against dominant

p0 = .9; .5 wAA = .9 wAa = 1 waa = .6

Note that in each of the cases (in fact, all cases except p0 = 0 or 1) The dominant allele will eventually make up 80% of the gene pool and the recessive will make up 20%. This result is called a stable equilibrium.

Page 22: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Finally a Highly Unusual Result

Selection against the Heterozygote

p0 = .55; .5; .45 wAA = 1 wAa = .8 waa = 1

Notice that if both populations start out with 50% of the gene pool then that percentage will persist. However, if the percentage wanders off of 50%, the majority gene will become the entire gene pool and the other will become extinct. Thus 50% is called an unstable equilibrium.

Page 23: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Of the four scenarios that we considered, three resulted in the elimination of one of the Alleles. Only the case of selection in favor of the Heterozygote resulted in a mixed gene pool.

Thus, in the presence of natural selection (We will see later in this lecture what a powerful force this can be.), this is the only case where genetic variation is maintained. Polymorphism

Other cases fix on one or the other of the alleles.

Page 24: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Selection in Favor of Heterozygote

Selection against the recessive is four times that against dominant

p0 = .9; .5 wAA = .9 wAa = 1 waa = .6

Note that in each of the cases (in fact, all cases except p0 = 0 or 1) The dominant allele will eventually make up 80% of the gene pool and the recessive will make up 20%. This result is called a stable equilibrium. Can we determine what this equilibrium will be?

Page 25: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

More notation (Mathematicians love it!!)

allelesAoffrequencyforvaluemequilibriup ˆ

What do we mean by equilibrium?

When equilibrium is achieved then the frequency of the alleles stays stable.

pt+1 = pt for all t > some t0

And of course,

qt+1 = 1 – pt+1 = 1 – pt = qt

On the previous slide this happens around generation 50. So, t0 ≈ 50.

Let’s see if we can predict . Recall, .

p̂ 1,0ˆ p

Page 26: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

We start with the definition of equilibrium:

pt+1 = pt

Earlier we saw that in the presence of natural selection,

221 2

)(

tAattAAt

tAatAAtt qwqpwp

pwqwpp

Since pt ≠ 0, this means that 22 2 tAattAAtAatAAt qwqpwpwqwp

For all t > t0 . Or at the equilibrium value,

2)ˆ1()ˆ1(ˆ2ˆ)ˆ1(ˆ pwppwpwpwp AaAAAaAA

Page 27: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Some simple, but messy, algebra gives us the following result.

)()(ˆ

aaAaAAAa

aaAa

wwww

wwp

Or,

aaAA

aa

ss

sp

ˆ

In our example:

wAA = .9 wAa = 1 waa = .6

So,

2.8.1ˆ1ˆ

8.5.

4.

4.1.

4.

)6.1()9.1(

6.1ˆ

pq

p

Page 28: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Experimental evidence:

ST and CH are names of blocks of genes in Drosophilia pseudo-obscura because of a chromosomal feature called inversion the genes in each block are held together and function as two alleles at a single locus.

Solid line simulated path for p.

Dashed lines are 95% confidence limits

Vertical bars: experimental data

Results correctly predicted the equilibrium and the dynamics of the approach to equilibrium.

Page 29: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

But, what about mutation?

Ordinarily it works this way

A a

v

u

We are going to “stack the deck” in favor of mutation and assume

A a

i.e. we assume: v = 0

In the absence of any selection our difference equation becomes

pt+1 = (1 – u) pt

This is just the difference equation for exponential decay

u

Page 30: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Look at the time axis! This process is much slower than our simulations of natural selection that was anywhere from 1 generation (pure Hardy-Weinberg) to about 15,000 generations to drop from p=.9 to p=.1.

Page 31: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

To actually calculate the predicted time to move from p0 to pt . Begin with:

01

03

23

02

12

01

)1()1(

)1()1(

)1()1(

)1(

pupup

pupup

pupup

pup

ttt

Rearrange bottom line as:

t

t

up

p)1(0

Take log of both sides and solve for t. This yields,

)1log(

)log(

)1log()log(

0

0

u

pp

t

utp

p

t

t

Page 32: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Let’s calculate the time to move from p0 = .9 to pt = .1 for the first curve on the graph shown two slides previously, i.e. u = 10-5 =.00001

sgeneration721,219)99999log(.

)11111log(.

)101log(

)9.1.log(5

t

Mathematical note: Since this quantity involves the quotient of two logarithms, any base logarithms will give the same numerical result. i.e. We can use either the log10 or ln button on our calculator or even log2 if we care to do this.

Extra Credit Project: Use a spreadsheet or write a computer program to generate the graphs that were shown two slides previously.

Page 33: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

In general, mutation has little effect if selection is at work.

If selection is virtually neutral, say s < .001, then mutation can have an effect, but it is slow.

However, recurrent mutation can not be totally disregarded.

• Recurrent mutation tends to maintain a supply of genetic variation for mutation to act upon

• Even if selection is tending to eliminate one allele, recurrent mutation tends to maintain its presence in the gene pool. Thus, if the environment changes to a situation that is more favorable to the allele that was being selected against, that allele is still available.

•Mutation is the ultimate source of genetic variation.

Page 34: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Sometimes mutation may oppose selection.

Suppose selection is against A

wAA = 1 – s wAa = 1 – s (A is dominant) waa = 1

However, also assume v > 0, i.e. There is recurrent mutation of a to A at a rate, v. Then, it can be shown

s

vp ˆ

On the other hand if A is recessive,

wAA = 1 – s wAa = 1 waa = 1

We have,

s

vp ˆ

If A is recessive, mutation maintains a much higher frequency than if it is dominant.

Page 35: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Genetic Drift

So far every model we have considered has been a deterministic model, i.e. everything is set in motion on a predetermined path. Chance has been ignored.

But, chance does play a role!

• In the sea urchin model, gametes can wash out to sea.

• Some types of individual may produce more offspring than others

• Survival rates may vary

A theory involving chance is called a stochastic theory.

Instead of getting a single number, we get a distribution between several states

Page 36: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Two sources for chance occurrences1. Changing environment2. Internal to the population – they would occur even in a fixed

environment.“Genetic Drift” refers to all chance events internal to the population

Example:

Suppose we start with a large population and p = ½ .

From the gamete pool draw 4 individuals (small sample)

Could be 2 & 2 relative to the alleles

Could also be 3 & 1, or 1 & 3, or 0 & 4, or 4 & 0.

Suppose 3 & 1 is the distribution in our sample, then p has moved from ½ to ¾ without any selective pressures. This is called “sampling error.”

NOTE: Sampling error is more likely to occur as the population size decreases.

Page 37: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Experimental evidence of Genetic Drift

Kerr and Wright (1954) sampled a population of Drosophilia melanogaster heterozygotes. They constructed 96 groups of 4 males and 4 females. At each generation they randomly extracted 4 males and 4 females from that generation, etc. The following is their data.

Page 38: Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden,

Note the “U” shape of the later histograms of the frequency distributions. This is characteristic of this type of situation