non-stationary population genetic models with selection: theory and inference
DESCRIPTION
Non-stationary population genetic models with selection: Theory and Inference. Scott Williamson and Carlos Bustamante. Cornell University. Inferring natural selection from samples. Statistical tests of the neutral theory (lots) Methods for detecting selective sweeps (lots) - PowerPoint PPT PresentationTRANSCRIPT
Non-stationary population genetic models with selection:Theory and Inference
Scott Williamson
and Carlos Bustamante
Cornell University
Inferring natural selection from samples
• Statistical tests of the neutral theory (lots)
• Methods for detecting selective sweeps (lots)
• Parametric inference: estimating selection parameters, etc.
• Quantification of selective constraint, deleterious mutation
The demography problem
• Many existing methods assume random mating, constant population size
• These assumptions don’t apply in most natural populations
• The effect of demography can mimic the effect of natural selection
Natural selection and population growth
• Inferring selection from the frequency spectrum while correcting for demography
• The McDonald-Kreitman test: does recent population growth cause you to misidentify negative selection as adaptive evolution?
The frequency spectrum: an example
Site
Sequence
Frequency class:
A G G C T T A A AA T G C T C G A AG T G T T C A C GA G G C T C A A GA G A C C C G A A
163
975
1972
2188
3529
4424
4961
5286
7019
1
2
3
4
5
1 2 1 1 1 4 2 1 3
Ancestral Derived
1 2 3 4
1
2
3
4
5
Frequency class
Cou
nt
The frequency spectrum
1 2 3 4 5 6 7 8 9
2
4
6
8
10
Natural selection and the frequency spectrum
Frequency class
Cou
ntEquilibrium neutral and positively selected
frequency spectra
Neutral
2Ns=2
1 2 3 4 5 6 7 8 9
2
4
6
8
10
Natural selection and the frequency spectrum
Frequency class
Cou
ntEquilibrium neutral and negatively selected
frequency spectra
Neutral
2Ns=-2
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
Natural selection vs. demography
Frequency class
Cou
ntNon-stationary neutral and equilibrium selected
frequency spectra
Population growth, neutral
Equilibrium, 2Ns=-2
How do we distinguish selection from demography?
McDonald-Kreitman approach:• Use a priori information to classify changes as
“neutral” (e.g. synonymous, non-coding) or “potentially selected” (e.g. non-synonymous)
• Putatively neutral changes are treated as a standard for patterns of neutral evolution in a particular sample
• Potentially selected sites are compared to the neutral standard
Can we develop a neutral standard for the frequency spectrum?
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Comparing frequency spectra for different classes of mutation
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
This talk:• Likelihood ratio test of neutrality
at potentially selected sites, using information from the neutral sites
• Biologically meaningful measure of the difference between the two spectra
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
A model-based approach:
1. Fit a neutral demographic model to estimate demographic parameters
2. Given those parameter estimates, fit a selective demographic model to estimate selection parameters, test hypotheses
Comparing frequency spectra for different classes of mutation
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Comparing frequency spectra for different classes of mutation
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
Requirements:1. Demographic model
2. Frequency spectrum predictions from the model under neutrality
3. Frequency spectrum predictions from the model subject to natural selection
Theory: population growth model
2-epoch model
time
NA
NC
now
Po
pula
tion
siz
e
=NA/NC
Model parameters: ,
Theory: predicting the frequency spectrum
Definitions:
xi Number of sites in frequency class i
f(q,t;) Distribution of allele frequency, q, at time t
Predictions:
1
01];[ dqtqfqqi
nxE inii ;,
n Sample size
1
1];[
];[n
j i
i
xE
xEniP ;,
Theory: the distribution of allele frequency
Poisson Random Field approach (Sawyer and Hartl 1992):
• Use single-locus diffusion theory to predict the distribution of allele-frequency
• If sites are independent (i.e. in linkage equilibrium) and identically distributed, then the single-locus theory applies across sites
To get f, we need to solve the diffusion equation:
;,;;,;;, tqfqMdq
dtqfqV
dq
dtqf
dt
d2
2
2
1
Theory: time-dependent solution, neutral case
Kimura’s (1964) solution, given some initial allele frequency, p:
tii
iii
eqCpCii
piptq
)1(2
12/31
2/31
1
2
2121)1(
))21(1)(12(|,
tqfqqdq
dtqf
dt
d,, 1
2
12
2
The forward equation under neutrality:
Theory: time-dependent solution, neutral case
Kimura’s (1964) solution, given some initial allele frequency, p:
Applying Kimura’s solution to the 2-epoch model: ancestral mutations
dtNtqdpp
pqqf c
1
0 021
2
1 /|,
|,,;
Distribution of allele frequency:
1
0; , , |Af q q p dp
p
0
1; , , |1/ 2
2C cf q q t N dt
Theory: time-dependent solution, neutral case
Expected frequency spectrum after
a change in population size (=0.01)
1 2 3 4 6 75 8 9
0.2
0.4
0.6
0.8
frequency class
P(i
,n;,0.01)
Theory: time-dependent solution, neutral case
Multinomial likelihood:
,;,ln)!ln()!ln()x|,( niPxxnn
ii
n
ii
1
1
1
1
Maximum likelihood estimates of and
Likelihood ratio test of population growth
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Comparing frequency spectra for different classes of mutation
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
Requirements:1. Demographic model
2. Frequency spectrum predictions from the model under neutrality
3. Frequency spectrum predictions from the model subject to natural selection
Theory: time-dependent solution, selected case
;,;,;, tqfqqdq
dtqfqq
dq
dtqf
dt
dSSS 11
2
12
2
The forward equation with selection:
where =2NCs
Initial condition:
2
12
1
1
1
10
e
e
qqqf
q
S
)(
;,
Theory: time-dependent solution, selected case
1. Numerically solve the forward equation using the Crank-Nicolson finite differencing scheme
2. Use this approximation of f to evaluate the likelihood function:
,,;,ln)!ln()!ln()x|,,( niPxxnn
ii
n
ii
1
1
1
1
3. Fix and to their MLEs from the neutral data
4. Optimize the likelihood for . Likelihood ratio test of neutrality:
)x|ˆ,ˆ,()x|ˆ,ˆ,ˆ( 02LRT
Theory: time-dependent solution, selected case
How can we be sure that the numerical solution actually works?
• Von Neumann stability analysis: solution is unconditionally stable
• Numerical solution converges to the stationary distribution after ~4NC generations
• Comparison with time-dependent neutral predictions: Kimura, Crank, and Nicolson all agree with each other
Human Polymorphism Data
• From Stephens et al. (2001)
• 80 individuals, geographically diverse ancestry
• 313 genes, 720 kb sequenced
• ~3000 SNPs (72% non-coding, 13% synonymous, 15% non-synonymous)
Results for non-coding changes, assuming neutrality
Model MLEs ln(L)
2-epoch = 0.016
= 0.13
-5674.6
Equilibrium neutral
-6046.6 (P0, d.f. 2)
Goodness-of-fit
-5608.3 (P=0.54, d.f. 76)
Results for non-synonymous changes, categorized by Grantham’s distance
Category S P-value
conservative 136 -2.24 0.52
moderate 137 -6.08 0.07
radical 107 -8.44 0.02
all nonsyn 380 -4.88 0.10
Ongoing work and future directions
1. Simulate, simulate, simulate
• How robust is the method to different types of demographic forces?
• How does linkage among some sites affect the analysis?
• How does estimation error affect the LRTs?
2. Numerical solution for different demographic scenarios (e.g. bottleneck, population structure)
3. Variable selective effects among new mutations
The McDonald-Kreitman test
Sn Number of non-synonymous segregating sites
Dn Number of non-synonymous fixed differences
Ss Number of synonymous segregating sites
Ds Number of synonymous fixed differences
n s
n s
S S
D D Adaptive evolution
n s
n s
S S
D D Negative selection
Extensions: Sawyer and Hartl (1992), Rand and Kann (1996), Smith and Eyre-Walker (2002), Bustamante et al. (2002), others
Demography and the McDonald-Kreitman test
• Robust to different demographic scenarios because it implicitly conditions on the underlying genealogy (see Nielsen 2001)
• However, under some demographic scenarios it’s possible to misidentify the type of selection
• Weak negative selection with population growth
When the population size is small, non-synonymous deleterious mutations might be fixed by drift
Once the population size becomes large, the level of non-synonymous polymorphism would be reduced (relative to the level of synonymous polymorphism)
n s
n s
S S
D D
Demography and the McDonald-Kreitman test
• Over what range of parameter values might you misidentify negative selection as adaptive evolution?
• How large is the effect?
Eyre-Walker (2002):
• Addressed these questions, finding that recent population growth or bottlenecks can cause you to misidentify negative selection
• Assumed that levels of polymorphism and fixation rates changed instantaneously with population size
Demography and the McDonald-Kreitman test
1
01 1 ; , ,
nnn nE S q q f q dq
2
1
0
22 1; , ,
2 1
1 ; , ,
nn div n
n
n
E D t fe
q f q dq
1
01 1 ;0, ,
nns sE S q q f q dq
1
02 1;0, , 1 ;0, ,
2ns
s div s sE D t f q f q dq
where tdiv is the divergence time, measured in 2NC generations
Demography and the McDonald-Kreitman test
NI n s
s n
S D
S D
0.1 10.01
1
10
=1, tdiv=4
0.1 10.01
=1, tdiv=10
1
10
0.1 10.01
1
10
=0.1, tdiv=4
=0.1, tdiv=10
0.1 10.01
1
10
(=NA/NC)
Exp
ect
ed
Neu
tra
lity
Ind
ex
(NI)
Demography and the McDonald-Kreitman test: Preliminary results
1. It is possible to misidentify negative selection for some parameter combinations
2. But…the parameter range over which this is true is probably smaller than previously thought, as is the magnitude of the effect
Summary
1. Model-based approach to correcting for demography while inferring selection
• Evidence for very recent population growth in humans
• Reasonable estimates of selection parameters for classes of non-synonymous changes
2. McDonald-Kreitman test: negative selection + population growth problem not as severe as previously thought
3. Numerical methods for solving the diffusion are fast, accurate, and fun!
Acknowledgements
Collaborator: Carlos Bustamante
Data: Genaissance Pharmaceuticals
Helpful discussions: Bret Payseur, Rasmus Nielsen, Matt Dimmic, Jim Crow, Hiroshi Akashi, Graham Coop