lecture 20: tests of neutrality november 6, 2015
TRANSCRIPT
Lecture 20 : Tests of Neutrality
November 6, 2015
Last TimeMutation and selection
Infinite alleles and stepwise mutation models
Introduction to neutral theory
Today Sequence data and quantification of variation
Infinite sites model
Nucleotide diversity (π)
Sequence-based tests of neutrality
Ewens-Watterson Test
Tajima’s D
Hudson-Kreitman-Aguade
Synonymous versus Nonsynonymous substitutions
McDonald-Kreitman
nov6_neutraltest
The main power of neutral theory is it provides a theoretical expectation for genetic
variation in the absence of selection.
Equilibrium Heterozygosity under IAM
Frequencies of individual alleles are constantly changing
Balance between loss and gain is maintained
4Neμ>>1: mutation predominates, new mutants persist, H is high
4Neμ<<1: drift dominates: new mutants quickly eliminated, H is low
Effects of Population Size on Expected Heterozgyosity Under Infinite Alleles Model (μ=10-5)
Rapid approach to equilibrium in small populations
Higher heterozygosity with less drift
Fate of Alleles in Mutation-Drift Balance
Time to fixation of a new mutation is much longer than time to loss
Generations from birth to fixation
Time between fixation events
Fate of Alleles in Mutation-Drift-Selection Balance
Purifying Selection
Neutrality
Balancing Selection/Overdominance
Which case will have the most alleles on average at any given
time?What will this depend upon?Highest HE?
Assume you take a sample of 100 alleles from a large (but finite) population in mutation-drift
equilibrium.
A.
Number of Observations of Allele
Num
ber o
f Alle
les
2
4
6
8
10
2 4 6 8 10
B.
2 4 6 8 10
C.
2 4 6 8 10
What is the expected distribution of allele frequencies in your sample under neutrality and the Infinite Alleles
Model?
Allele Frequency Distributions Neutral theory allows a
prediction of frequency distribution of alleles through process of birth and demise of alleles through time
Comparison of observed to expected distribution provides evidence of departure from Infinite Alleles model
Depends on f, effective population size, and mutation rate
Hartl and Clark 2007
Black: Predicted from Neutral Theory
White: Observed (hypothetical)
Ewens Sampling Formula
i
10
2
3
12
0
)(N
i ikE
3211)(
3
0
12
0
i
N
i iikE
.
Probability the i-th sampled allele is new given i alleles already sampled:
Probability of sampling a new allele on the first sample:
eH
1
Probability of observing a new allele after sampling one allele:
Probability of sampling a new allele on the third and fourth samples:
12...
211
N
Expected number of different alleles (k) in a sample of 2N alleles is:
Example: Expected number of alleles in a sample of 4:
eN4Population mutation rate: index of variability of population:
Ewens Sampling Formula Predicts number of different
alleles that should be observed in a given sample size if neutrality prevails under Infinite Alleles Model
Small , E(n) approaches 1
Large , E(n) approaches 2N
can be predicted from number of observed alleles for given sample size
Can also predict expected homozygosity (fe) under this model
12...
211
)(12
0
N
inE
N
i
where E(n) is the expected number of different alleles in a sample of N diploid individuals,
and = 4Ne.
1
1
14
1
ee N
f
Ewens-Watterson Test
Compares expected homozygosity under the neutral model to expected homozygosity under Hardy-Weinberg equilibrium using observed allele frequencies
Comparison of allele frequency distributions
fe comes from infinite allele model simulations and can be found in tables for given sample sizes and observed allele numbers
2iHW pf
Ewens-Watterson Test Example
Drosophila pseudobscura collected from winery
Xanthine dehydrogenase alleles
15 alleles observed in 89 chromosomes
fHW = 0.366
Generated fe by simulation: mean 0.168
feHartl and Clark 2007
How would you interpret this result?
Most Loci Look Neutral According to Ewens-Watterson Test
Exp
ecte
d H
omoz
ygos
ity
f e
Hartl and Clark 2007
DNA Sequence Polymorphisms DNA sequence is ultimate view of standing genetic variation: no
hidden alleles
Is this really true?
What about back mutation?
Signatures of past evolution are contained in DNA sequence
Neutral theory presents null model
Departures due to:
Selection
Demographic events
- Bottlenecks, founder effects- Population admixture
Sequence Alignment Necessary first step for comparing sequences within and
between species
Many different algorithms
Tradeoff of speed and accuracy
Quantifying Divergence of Sequences
Nucleotide diversity (π) is average number of pairwise differences between sequences
ijij
ji ppN
N
1
where
N is number of sequences in sample,
pi and pj are frequency of sequences i and j in the sample,
and
πij is the proportion of sites that differ between sequences i and j
Sample Calculation of π
A->B, 1 differenceA->C, 1 differenceB->C, 2 differences
5 10 15 20 25 30 35A
B
C
01867.0
)35/2)(33.0)(33.0()35/1)(33.0)(33.0()35/1)(33.0)(33.0(2
3
ijij
ji ppN
N
1
On average, there are 18.67 polymorphisms per kb between pairs of haplotypes in the population
Tajima’s D Statistic
Infinite Sites Model: each new mutation affects a new site in a sequence
Expected number of polymorphic sites in all sequences:
mE
)(
eN4where m is length of sequence, and
where n is number of different sequences compared
m
Sample Calculation of S
Two polymorphic sitesS=2
5 10 15 20 25 30 35A
B
C
5.12
1
1
111
11
n
i ia 33.1
5.1
2
1
a
SS
01867.0 65.0)35)(01867.0( m
Tajima’s D Statistic Two different ways of estimating same parameter:
Deviation of these two indicates deviation from neutral expectations
m 1a
SS
Sd
)(dV
dD where V(d) is variance of d
Tajima’s D Expectations D=0: Neutrality
D>0
Balancing Selection: Divergence of alleles (π) increases
OR
Bottleneck: S decreases
D<0
Purifying or Positive Selection: Divergence of alleles decreases
OR
Population expansion: Many low frequency alleles cause low average divergence
Sd
Balancing Selection
Balancing
selection
‘balanced’ mutation
Neutral mutation
Slide adapted from Yoav Gilad
Should increase nucleotide diversity () Decreases polymorphic sites (S)
initially. D>0Sd
Recent Bottleneck
Rare alleles are lost Polymorphic sites (S) more severely affected than
nucleotide nucleotide diversity () D>0
Standard neutral model
Sd
Positive Selection and Purifying Selection
sweep
S
Slide adapted from Yoav Gilad
Advantageous mutation
Neutral mutation Should decrease both nucleotide
diversity () and polymorphic sites (S) initially.
S recovers due to mutation recovers slowly: insensitive to
rare alleles D<0
s sTime
recovery
Sd
Standard neutral model
Often two main haplotypes, some
rare alleles
Rapid Population Growth will also result in an excess of rare alleles even for neutral loci
Slide adapted from Yoav Gilad
Tim
e
Rapid population size increase
Most alleles are rare
eN4
Most alleles are rare Nucleotide diversity ()
depressed Polymorphic sites (S)
unchanged or even enhanced : 4Neμ is large
D<0
Sd
How do we distinguish these two forms of divergence (selection vs demography)?