lecture 20: tests of neutrality november 6, 2015

Lecture 20 : Tests of Neutrality

November 6, 2015

Last TimeMutation and selection

Infinite alleles and stepwise mutation models

Introduction to neutral theory

Today Sequence data and quantification of variation

Infinite sites model

Nucleotide diversity (π)

Sequence-based tests of neutrality

Ewens-Watterson Test

Tajima’s D

Hudson-Kreitman-Aguade

Synonymous versus Nonsynonymous substitutions

McDonald-Kreitman

nov6_neutraltest

The main power of neutral theory is it provides a theoretical expectation for genetic

variation in the absence of selection.

Equilibrium Heterozygosity under IAM

Frequencies of individual alleles are constantly changing

Balance between loss and gain is maintained

4Neμ>>1: mutation predominates, new mutants persist, H is high

4Neμ<<1: drift dominates: new mutants quickly eliminated, H is low

Effects of Population Size on Expected Heterozgyosity Under Infinite Alleles Model (μ=10-5)

Rapid approach to equilibrium in small populations

Higher heterozygosity with less drift

Fate of Alleles in Mutation-Drift Balance

Time to fixation of a new mutation is much longer than time to loss

Generations from birth to fixation

Time between fixation events

Fate of Alleles in Mutation-Drift-Selection Balance

Purifying Selection

Neutrality

Balancing Selection/Overdominance

Which case will have the most alleles on average at any given

time?What will this depend upon?Highest HE?

Assume you take a sample of 100 alleles from a large (but finite) population in mutation-drift

equilibrium.

A.

Number of Observations of Allele

Num

ber o

f Alle

les

2

4

6

8

10

2 4 6 8 10

B.

2 4 6 8 10

C.

2 4 6 8 10

What is the expected distribution of allele frequencies in your sample under neutrality and the Infinite Alleles

Model?

Allele Frequency Distributions Neutral theory allows a

prediction of frequency distribution of alleles through process of birth and demise of alleles through time

Comparison of observed to expected distribution provides evidence of departure from Infinite Alleles model

Depends on f, effective population size, and mutation rate

Hartl and Clark 2007

Black: Predicted from Neutral Theory

White: Observed (hypothetical)

Ewens Sampling Formula

i

10

2

3

12

0

)(N

i ikE

3211)(

3

0

12

0

i

N

i iikE

.

Probability the i-th sampled allele is new given i alleles already sampled:

Probability of sampling a new allele on the first sample:

eH

1

Probability of observing a new allele after sampling one allele:

Probability of sampling a new allele on the third and fourth samples:

12...

211

N

Expected number of different alleles (k) in a sample of 2N alleles is:

Example: Expected number of alleles in a sample of 4:

eN4Population mutation rate: index of variability of population:

Ewens Sampling Formula Predicts number of different

alleles that should be observed in a given sample size if neutrality prevails under Infinite Alleles Model

Small , E(n) approaches 1

Large , E(n) approaches 2N

can be predicted from number of observed alleles for given sample size

Can also predict expected homozygosity (fe) under this model

12...

211

)(12

0

N

inE

N

i

where E(n) is the expected number of different alleles in a sample of N diploid individuals,

and = 4Ne.

1

1

14

1

ee N

f

Ewens-Watterson Test

Compares expected homozygosity under the neutral model to expected homozygosity under Hardy-Weinberg equilibrium using observed allele frequencies

Comparison of allele frequency distributions

fe comes from infinite allele model simulations and can be found in tables for given sample sizes and observed allele numbers

2iHW pf

Ewens-Watterson Test Example

Drosophila pseudobscura collected from winery

Xanthine dehydrogenase alleles

15 alleles observed in 89 chromosomes

fHW = 0.366

Generated fe by simulation: mean 0.168

feHartl and Clark 2007

How would you interpret this result?

Most Loci Look Neutral According to Ewens-Watterson Test

Exp

ecte

d H

omoz

ygos

ity

f e

Hartl and Clark 2007

DNA Sequence Polymorphisms DNA sequence is ultimate view of standing genetic variation: no

hidden alleles

Is this really true?

What about back mutation?

Signatures of past evolution are contained in DNA sequence

Neutral theory presents null model

Departures due to:

Selection

Demographic events

- Bottlenecks, founder effects- Population admixture

Sequence Alignment Necessary first step for comparing sequences within and

between species

Many different algorithms

Tradeoff of speed and accuracy

Quantifying Divergence of Sequences

Nucleotide diversity (π) is average number of pairwise differences between sequences

ijij

ji ppN

N

1

where

N is number of sequences in sample,

pi and pj are frequency of sequences i and j in the sample,

and

πij is the proportion of sites that differ between sequences i and j

Sample Calculation of π

A->B, 1 differenceA->C, 1 differenceB->C, 2 differences

5 10 15 20 25 30 35A

B

C

01867.0

)35/2)(33.0)(33.0()35/1)(33.0)(33.0()35/1)(33.0)(33.0(2

3

ijij

ji ppN

N

1

On average, there are 18.67 polymorphisms per kb between pairs of haplotypes in the population

Tajima’s D Statistic

Infinite Sites Model: each new mutation affects a new site in a sequence

Expected number of polymorphic sites in all sequences:

mE

)(

eN4where m is length of sequence, and

where n is number of different sequences compared

m

Sample Calculation of S

Two polymorphic sitesS=2

5 10 15 20 25 30 35A

B

C

5.12

1

1

111

11

n

i ia 33.1

5.1

2

1

a

SS

01867.0 65.0)35)(01867.0( m

Tajima’s D Statistic Two different ways of estimating same parameter:

Deviation of these two indicates deviation from neutral expectations

m 1a

SS

Sd

)(dV

dD where V(d) is variance of d

Tajima’s D Expectations D=0: Neutrality

D>0

Balancing Selection: Divergence of alleles (π) increases

OR

Bottleneck: S decreases

D<0

Purifying or Positive Selection: Divergence of alleles decreases

OR

Population expansion: Many low frequency alleles cause low average divergence

Sd

Balancing Selection

Balancing

selection

‘balanced’ mutation

Neutral mutation

Slide adapted from Yoav Gilad

Should increase nucleotide diversity () Decreases polymorphic sites (S)

initially. D>0Sd

Recent Bottleneck

Rare alleles are lost Polymorphic sites (S) more severely affected than

nucleotide nucleotide diversity () D>0

Standard neutral model

Sd

Positive Selection and Purifying Selection

sweep

S


Advantageous mutation

Neutral mutation Should decrease both nucleotide

diversity () and polymorphic sites (S) initially.

S recovers due to mutation recovers slowly: insensitive to

rare alleles D<0

s sTime

recovery

Sd

Standard neutral model

Often two main haplotypes, some

rare alleles

Rapid Population Growth will also result in an excess of rare alleles even for neutral loci


Tim

e

Rapid population size increase

Most alleles are rare

eN4

Most alleles are rare Nucleotide diversity ()

depressed Polymorphic sites (S)

unchanged or even enhanced : 4Neμ is large

D<0

Sd

How do we distinguish these two forms of divergence (selection vs demography)?

lecture 20: tests of neutrality november 6, 2015

Documents