basic principles of population genetics lecture 4

28
. Basic Principles of Population Genetics Lecture 4 This slide show follows closely Chapter 1 of Lang’s book. Prepared by Dan Geiger. Background Readings : Chapter 1, Mathematical and statistical Methods for Genetic Analysis, 1997, Kenneth Lang.

Upload: rhonda-gates

Post on 02-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

Basic Principles of Population Genetics Lecture 4. Background Readings : Chapter 1, Mathematical and statistical Methods for Genetic Analysis, 1997, Kenneth Lang. This slide show follows closely Chapter 1 of Lang’s book. Prepared by Dan Geiger. A 1 /A 2 B 1 /B 2. A’ 1 /A’ 2 B’ 1 /B’ 2. 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Basic Principles of Population Genetics Lecture 4

.

Basic Principles of Population Genetics

Lecture 4

This slide show follows closely Chapter 1 of Lang’s book. Prepared by Dan Geiger.

Background Readings: Chapter 1, Mathematical and statistical Methods for Genetic Analysis, 1997, Kenneth Lang.

Page 2: Basic Principles of Population Genetics Lecture 4

2

Founders’ allele frequency

2

3

1A1/A2

B1/B2

A’1/A’2

B’1/B’2

A”1/A”2

B”1/B”2

In order to write down the likelihood function of a data given a pedigree structure and a recombination value , one need to specify the probability of the possible genotypes of each founder. Assuming random mating we have,

Pr(G1,G2)=Pr(A1/A2, B1/B2) Pr(A’1/A’2, B’1/B’2)

The likelihood function also consists of transmission matrices that depend on and penetrances matrices to be discussed later.

Page 3: Basic Principles of Population Genetics Lecture 4

3

Hardy-Weinberg and Linkage EquilibriumsThe task at hand is to establish a theoretical basis for specifying the probability Pr(A1/A2, B1/B2) of a multilocus, from allele frequencies. We will derive under various assumptions the following two rules which are widely used in genetic analysis (Linkage & Association) and which ease computations a great deal. Of course, the assumptions are not satisfied for all genetic analyses.

Hardy-Weinberg (HW) Equilibrium: Pr(A1/A2) = PA1· PA2, namely, the probability of an ordered genotype A1/A2 is the product of the frequencies of the alleles constituting that genotype.

Linkage Equilibrium: Pr(A1B1) = PA1· PB1, namely, the probability of a haplotype A1,B1 is the product of the frequencies of the alleles constituting that haplotype.

A1 A2

B1 B2

These rules imply: Pr(A1/A2, B1/B2)=PA1· PA2 · PB1 · PB2

Page 4: Basic Principles of Population Genetics Lecture 4

4

A simple setup to study HW equilibriumConsider a bi-allelic locus A with alleles A1, A2 .

Let u,v, and w be the frequencies of unordered genotypes A1/A1, A1/A2, A2/A2. Clearly, u+v+w=1.

But, the Hardy-Weinberg equilibrium states that alsou = p1

2

v = 2 p1 p2 (The factor 2 because A1/A2 genotypes are not ordered.)w = p2

2

-------------

(p1+p2)2=1Clearly these relations do not hold for arbitrary frequencies u,v,w ; only for those values in the image of this polynomial mapping.

How are these frequencies related to allele frequencies p1 and p2

of A1 and A2 ,respectively ?Answer: p1 = u + ½v and p2 = ½v + w

Page 5: Basic Principles of Population Genetics Lecture 4

5

Assumptions made to Justify HW

1. Infinite population size2. Discrete generations3. Random mating4. No selection5. No migration6. No mutation7. Equal initial genotype frequencies in the two sexes

HW equilibrium can be shown to hold under more relaxed sets of assumptions as well. These assumption are clearly not universal.

Page 6: Basic Principles of Population Genetics Lecture 4

6

What happens after one generation ?Mating Type-

Unordered genotype

Nature of Offspringand segregation ratios

Frequency of

mates

A1/A1 x A1/A1 A1/A1 u2

A1/A1 x A1/A2 ½ A1/A1 + ½ A1/A2 2uv

A1/A1 x A2/A2 A1/A2 2uw

A1/A2 x A1/A2 ¼ A1/A1 + ½ A1/A2 + ¼ A2/A2v2

A1/A2 x A2/A2 ½ A1/A2 + ½ A2/A2 2vw

A2/A2 x A2/A2 A2/A2 w2

(u+v+w)2=1

Frequency of A1/A1 after one generation: u’=u2+ ½(2uv)+ ¼v2= (u+ ½v)2 = p1

2

Page 7: Basic Principles of Population Genetics Lecture 4

7

After one generation …

Frequency of A1/A1: u’=u2+uv+ ¼v2= (u+ ½v)2 = p12

Frequency of A1/A2: v’=

Frequency of A2/A2: w’=¼v2 + vw + w2 = (½v+w)2 = p22

Hardy-Weinberg seems to be established after one generation, but

So, after one generation the genotype frequencies u,v,w change to u’,v’,w’ as follows (using the previous table):

u’,v’,w’ are frequencies for the second generation while p1 and p2 are defined as the allele frequencies of the first generation. Are these also the allele frequencies of the second generation ?

uv+2uw + ½v2 + vw = 2(u+½v)(½v+w) = 2p1p2

Yes ! Because p’1= u’+ ½v’ = p12+p1p2=p1 and similarly

p’2= p2.

Page 8: Basic Principles of Population Genetics Lecture 4

8

After yet another generation …

Frequency of A1/A1: u”=(u’+ ½v’)2 = (p12+p1p2)2

= p12

Frequency of A2/A2: w”=(½v’+w’)2 = (p22 + p1p2)2

= p22

Frequency of A1/A2: v”= 2(u’+ ½v’)(½v’+w’) = 2(p1

2+p1p2 )(p22+p1p2 )= 2p1p2

Hardy-Weinberg is indeed established after one generation; allele and genotype frequencies do not change under the assumptions we have made. Can you trace where each assumption is used ?

Have we reached equilibrium ? Let’s look at one more generation and see that genotype frequencies are now fixed.

Page 9: Basic Principles of Population Genetics Lecture 4

9

Use of Assumptions in the derivation1. Infinite population size2. Discrete generations (mating amongst ith generation members

only)3. Random mating4. No selection5. No migration6. No mutation7. Equal initial genotype frequencies in the two sexes

Mating Type-Unordered genotype

Nature of Offspringand segregation ratios

Frequency of

mates

A1/A1 x A1/A2 ½ A1/A1 + ½ A1/A2 2uv

Segregation ratios below assume 1,2,3,7

Frequency formula of A1/A1 after one generation: u2+ ½(2uv)+ ¼v2 assume 4,5,6.

Page 10: Basic Principles of Population Genetics Lecture 4

10

An alternative justificationPreviously, we started with arbitrary genotype frequencies u,v,w and showed that they are modified after one generation to satisfy HW equilibrium.

Now we start with arbitrary allele frequencies p1 and p2.Random mating is equivalent to random pairing of alleles; each person contributes one allele with the prescribed frequencies.

The frequency p’1 of A1 in new generation equals p12+ ½(2p1p2 )=

p’1 and the frequency of A2 in new generation equals p22+

½(2p1p2 )=p’2. So after one generation allele frequency is fixed and satisfies the HW equilibrium .Exercise: Generalize the argument to k-allelic loci.

So the frequency of A1/A1 in the new generation is p12 , that of

A1/A2 is 2p1p2 , and that of A2/A2 is p22. Argument completed ?

Page 11: Basic Principles of Population Genetics Lecture 4

11

HW equilibrium at X-linked loci

Consider an allele at an X-linked locus. At generation n, let qn denote that allele’s frequency in females and rn denote that allele’s frequency in males. More explicitly,

malesin schromosome-X ofnumber Total

allele thehaving malesin schromosome-X ofNumber nr

Questions:•What is the frequency pn of the allele in the population ?•Does pn converge and to which value p ?•Does qn and rn converge to the same value ?

Page 12: Basic Principles of Population Genetics Lecture 4

12

Argument Outline

Let p = p0 = 2/3 q0 + 1/3 r0. We will now show that both qn and rn converge quickly to p (but not in one generation as before).

Having shown this claim, the female genotype frequency of A1/A1 must be p2 , that of A1/A2 is 2p(1-p) , and that of A2/A2 is (1-p)2, satisfying HW equilibrium.

For male, genotypes A1 and A2 have frequencies p and 1-p.

Assuming equal number of males and females, we havepn = 2/3 qn + 1/3 rn for every n.

Page 13: Basic Principles of Population Genetics Lecture 4

13

The recursion equations

Because a male always gets his X chromosome from his mother, and his mother precedes him by one generation, rn = qn-1 (Eq. 1.1)

Similarly, females get half their X-chromosomes from females and half from males, qn = ½ qn-1+ ½ rn-1 (Eq. 1.2)

Eqs 1.1 and 1.2 imply:

2/3 qn+1/3 rn = 2/3 qn-1 + 1/3 rn-12/3(½ qn-1+ ½ rn-1 ) + 1/3 qn-1=

It follows that the allele frequency pn= 2/3 qn + 1/3 rn never changes and remains equal to p0= p. To see that qn converges to p, we need to relate the difference qn-p with the difference qn-1-p.

Page 14: Basic Principles of Population Genetics Lecture 4

14

The fixed point solutionqn-p = qn- 3/2 p + ½ p

= ½ qn-1+ ½ rn-1 - 3/2 (2/3 qn-1 + 1/3 rn-1) + ½ p

= - ½ qn-1+ ½ p (just cancel terms)

= - ½ (qn-1- p)

So in each step the difference diminishes by half and qn approaches p in a zigzag manner. Hence, rn = qn-1 also converges to p. What does this mean ?

Continuing in this manner,qn-p= - ½ (qn-1- p) = (- ½)2 (qn-2- p) = …= (- ½)n (q0- p) 0

Having shown this claim, the female genotype frequency of A1/A1 must be p2 , that of A1/A2 is 2p(1-p) , and that of A2/A2 is (1-p)2, satisfying HW equilibrium. For male, genotypes A1 and A2 have frequencies p and 1-p. HW equilibrium is not reached in one generation but gets there fast (quite there in 5 generations).

Page 15: Basic Principles of Population Genetics Lecture 4

15

Linkage equilibrium

Let Ai be allele at locus A with frequency pi Let Bj be allele at locus B with frequency qj Denote the recombination between these loci by f and m for females and males, respectively.Let = (f + m )/2.

Linkage equilibrium means that Pr(Ai Bj) = piqj

Ai A’i

Bj B’j

We use the same assumptions employed earlier to demonstrate linkage equilibrium, namely, to show that Pn(Ai Bj) converges to piqj

at a rate that is fastest when the recombination is the largest.

Page 16: Basic Principles of Population Genetics Lecture 4

16

Convergence Proof

Pn(Ai Bj) = ½ [gamete from female] + ½ [gamete from male]

= ½ [ (1-f )Pn-1(Ai Bj) + f piqj ] +

No recombination

recombination

= ½ [ (1-f )Pn-1(Ai Bj) + f piqj ] + ½ [ (1-m )Pn-1(Ai Bj) + m piqj ]

= (1- )Pn-1(Ai Bj) + piqj

So, Pn(Ai Bj) - piqj = (1- ) [Pn-1(Ai Bj) – piqj]= …= (1- )n[P0(Ai Bj) – piqj]

Exercise: Repeat this analysis for three loci (Problem 7, with guidance, in Kenneth Lang’s book).

½ [gamete from male]

In short, we have established, . For loci on different chromosomes, the deviation from linkage is halved each generation. For close loci with small , convergence is slow.

nn 10

Page 17: Basic Principles of Population Genetics Lecture 4

17

Ramifications for Association studiesMany diseases are thought to been caused by a single random mutation that survived and propagated to offspring, generation after generation.

Would we see association at random population samples?

If the mutation happened many generations ago, no trace will be significant. Allele frequency will reach linkage equilibrium ! We need a combination of close markers and recentallele age of the disease. Association studies like that are also called linkage disequilibrium mapping or LD mapping in short.

Marker Mutated locus

Suppose there is a close marker:

nn 10

Page 18: Basic Principles of Population Genetics Lecture 4

18

Selection and FitnessFitness of a genotype is the expected genetic contribution of that genotype to the next generation, or to how many offspring it contributes an allele. Let the fitness of the three genotypes of an autosomal bi-allelic locus be denoted by wA/A, wA/a and wa/a .

If pn and qn are the allele frequencies of A and a, then the average fitness under HW equilibrium, is wA/Apn

2 + wA/a 2pnqn + wa/a qn

2.Conventions: Since only the ratios of fitness of various genotypes matter, namely, wA/A /wA/a and wa/a /wA/a, we arbitrarily set wA/a =1 and define wA/A = 1-r, wa/a = 1-s, where r 1 and s 1.

Interpretation: When s=r=0, there is no selection.When r is negative A/A has advantage over A/a. Similarly with negative s. When r is positive (must be fraction), A/A has a disadvantage over A/a. When both s and r are positive, there is a heterozygous advantage.

Page 19: Basic Principles of Population Genetics Lecture 4

19

Assuming selection exists …

In our new notations the average fitness wn at generation n is given by wn (1-r)pn

2 + 2pnqn + (1-s)qn2 = 1-rpn

2 -sqn2

First, note that pn+1 = [(1-r)pn2 + pnqn] / wn

pn pn+1 - pn = [(1-r)pn2 + pnqn] / wn - pn

= [(1-r)pn2 + pnqn- (1-rpn

2 -sqn2)pn] / wn

= [pnqn (s- (r+s) pn)] / wn

Our goal is to study the equilibrium of allele frequencies under various selection possibilities (namely, different values for r and s).

To find equilibrium we study the difference pn pn+1 - pn

A/A A/a a/a

Page 20: Basic Principles of Population Genetics Lecture 4

20

Interpretation when r>0 and s0

Claim: When (r>0 and s 0), pn 0, i.e., allele A disappears. In the opposite case (r0 and s>0), allele a should be driven to extinction.(Why is this extinction process sometimes halted in real life ? )

We just derived pn = [pnqn (s- (r+s) pn)] / wn

Convergence occurs when pn=0, namely, when pn=0, pn=1 (i.e., qn=0) or pn=s/(r+s). Where should it converge to ?

Proof: When (r>0 and s 0), the linear function g(p)=s-(r+s) psatisfies g(0) 0 and g(1) < 0, hence it is negative at (0,1).

Thus pn > pn+1 and so, pn decreases monotonically and must approach 0 at equilibrium. Similarly, with the other case.

Page 21: Basic Principles of Population Genetics Lecture 4

21

when r and s have the same sign

Conclusion I (for negative sign): If r and s are negative, (pn ) > 1, so pn 1 for p0 above s/(r+s), and pn 0 for p0 below s/(r+s). In other words, s/(r+s) is an unstable equilibrium.

sr

spp

sr

sp nnn

1

sr

sp

sqrpsr

spqpsr

nnn

nnn

221

)()(

sr

sp

sqrp

qpsrsqrpn

nn

nnnn22

22

1

)(1

sr

sp

sqrp

sqrpn

nn

nn221

1

sr

spp nn )(

Page 22: Basic Principles of Population Genetics Lecture 4

22

when r and s are both positive

Conclusion II: If both r and s are positive, pn s/(r+s) and this point is a stable equilibrium.

11

1)(0

22

nn

nnn sqrp

sqrpp

If both r and s are positive (Heterozygous advantage), then

sr

spn

1Hence has a constant sign and declines in magnitude.

Conclusion III (rate of convergence): If p0 s/(r+s), namely the starting point is near equilibrium, then,

and we get (locally) a geometric convergence

rssr

rssr

sr

spn

2)(0

sr

sp

sr

s

sr

sp

n

n 0

Page 23: Basic Principles of Population Genetics Lecture 4

23

Heterozygous advantage

However, if the A/a genotype has an advantage over other genotypes, then the defective allele would be kept around.

Technically, if both r and s are positive, then the A/a genotype has the best fit.

If we observe a recessive disease that is maintained in high frequency, how can we explain it ? Intuition says that it should disappear.

The best evidence for such phenomena is the sickle cell anemia.In some part of Africa, this anemia, despite being a recessive disease, is kept in high frequency. It turns out that the A/a genotype appears to provide protection against malaria ! (so it has high fit in swamp-like areas).

Page 24: Basic Principles of Population Genetics Lecture 4

24

Sickle cell anemiaאנמיה חרמשית -

Medical EncyclopediaRed blood cells, sickle cell

Sickle cell anemia is an inherited autosomal recessive blood disease in which the red blood cells produce abnormal pigment (hemoglobin). The abnormal hemoglobin causes deformity of the red blood cells into crescent or sickle-shapes, as seen in this photomicrograph.

The sickle cell mutation is a single nucleotide substitution (A T) at codon 6 in the beta-hemoglobin gene, resulting in the following substitution of amino acids: GAG (Glu) GTG (Val).

Source (Edited): http://www.nlm.nih.gov/medlineplus/ency/imagepages/1212.htm

Page 25: Basic Principles of Population Genetics Lecture 4

25

Facts about Sickle cell Disease•Sickle Cell Disease is much more common in certain ethnic groups affecting approximately one out of every 500 African Americans.

•Because people with sickle trait were more likely to survive malaria outbreaks in Africa than those with normal hemoglobin, it is believed that this genetically aberrant hemoglobin evolved as a protection against malaria.

•Although sickle cell disease is inherited and present at birth, symptoms usually don't occur until after 4 months of age.

•Sickle cell anemia may become life-threatening when damaged red blood cells break down (and other circumstances). Repeated crises can cause damage to the kidneys, lungs, bones, eyes, and central nervous system.

•Blocked blood vessels and damaged organs can cause acute painful episodes. These painful crises, which occur in almost all patients at some point in their lives. Some patients have one episode every few years, while others have many episodes per year. The crises can be severe enough to require admission to the hospital for pain control.

Page 26: Basic Principles of Population Genetics Lecture 4

26

Balance of Mutation and Selection

Most mutations are neutral or deleterious. We discuss balance between deleterious mutations and selection. Let denote the mutation rate from normal allele a to mutated allele A. Suppose the equilibrium frequency of allele A is p and of a is q=1-p.

When is a balance achieved between selection (say, preferring a ) and mutation that changes allele a back to allele A ?The frequencies p and q must satisfy at equilibrium the condition:

Rate that allele a does not mutate to Allele A

)1(1

)1(22

2

sqrp

qspqq

Offsprings contributing allele a (From slide 19)

Total expected number of offsprings (From slide 19)

Probability of allele a:

Page 27: Basic Principles of Population Genetics Lecture 4

27

Example for a Recessive DiseaseNow suppose we a have a recessive disease caused by genotype (A,A). The frequency of A is p. Assume r>0 (for allele A) and s=0 (for allele a). Thus the heterogeneous genotype aA has an advantage in fit over the doubly diseased genotype AA and equal fit to the doubly normal genotype aa.

How much of the diseased allele A will stay in the population ?

)1(1

)1(22

2

sqrp

qspqq

This yields 1- rp2 = 1- and thus p2 = /r and a balance is achieved that retains both alleles. When is larger, there are more A alleles and p increases. When 1> r >0 gets larger, the fit of AA decreases and so p decreases.

(r>0, s=0)

(recessive disease))1(

1 2

2

rp

qpqq

q

Page 28: Basic Principles of Population Genetics Lecture 4

28

Founder effect and Genetic Drift

0

100

200

300

400

500

600

700

800

900

1000

0 100 200 300 400 500 600 700 800 900 1000

Generation

All

ele

Fre

qu

ency

Source: Gideon Greenspan

After 800 generations, by simulation, from the ten alleles only two remain: numbered 5 and number 7.

Alelle 10

Alelle 5