![Page 1: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/1.jpg)
Non-stationary population genetic models with selection:Theory and Inference
Scott Williamson
and Carlos Bustamante
Cornell University
![Page 2: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/2.jpg)
Inferring natural selection from samples
• Statistical tests of the neutral theory (lots)
• Methods for detecting selective sweeps (lots)
• Parametric inference: estimating selection parameters, etc.
• Quantification of selective constraint, deleterious mutation
![Page 3: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/3.jpg)
The demography problem
• Many existing methods assume random mating, constant population size
• These assumptions don’t apply in most natural populations
• The effect of demography can mimic the effect of natural selection
![Page 4: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/4.jpg)
Natural selection and population growth
• Inferring selection from the frequency spectrum while correcting for demography
• The McDonald-Kreitman test: does recent population growth cause you to misidentify negative selection as adaptive evolution?
![Page 5: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/5.jpg)
The frequency spectrum: an example
Site
Sequence
Frequency class:
A G G C T T A A AA T G C T C G A AG T G T T C A C GA G G C T C A A GA G A C C C G A A
163
975
1972
2188
3529
4424
4961
5286
7019
1
2
3
4
5
1 2 1 1 1 4 2 1 3
Ancestral Derived
1 2 3 4
1
2
3
4
5
Frequency class
Cou
nt
The frequency spectrum
![Page 6: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/6.jpg)
1 2 3 4 5 6 7 8 9
2
4
6
8
10
Natural selection and the frequency spectrum
Frequency class
Cou
ntEquilibrium neutral and positively selected
frequency spectra
Neutral
2Ns=2
![Page 7: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/7.jpg)
1 2 3 4 5 6 7 8 9
2
4
6
8
10
Natural selection and the frequency spectrum
Frequency class
Cou
ntEquilibrium neutral and negatively selected
frequency spectra
Neutral
2Ns=-2
![Page 8: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/8.jpg)
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
Natural selection vs. demography
Frequency class
Cou
ntNon-stationary neutral and equilibrium selected
frequency spectra
Population growth, neutral
Equilibrium, 2Ns=-2
![Page 9: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/9.jpg)
How do we distinguish selection from demography?
McDonald-Kreitman approach:• Use a priori information to classify changes as
“neutral” (e.g. synonymous, non-coding) or “potentially selected” (e.g. non-synonymous)
• Putatively neutral changes are treated as a standard for patterns of neutral evolution in a particular sample
• Potentially selected sites are compared to the neutral standard
Can we develop a neutral standard for the frequency spectrum?
![Page 10: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/10.jpg)
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Comparing frequency spectra for different classes of mutation
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
This talk:• Likelihood ratio test of neutrality
at potentially selected sites, using information from the neutral sites
• Biologically meaningful measure of the difference between the two spectra
![Page 11: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/11.jpg)
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
A model-based approach:
1. Fit a neutral demographic model to estimate demographic parameters
2. Given those parameter estimates, fit a selective demographic model to estimate selection parameters, test hypotheses
Comparing frequency spectra for different classes of mutation
![Page 12: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/12.jpg)
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Comparing frequency spectra for different classes of mutation
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
Requirements:1. Demographic model
2. Frequency spectrum predictions from the model under neutrality
3. Frequency spectrum predictions from the model subject to natural selection
![Page 13: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/13.jpg)
Theory: population growth model
2-epoch model
time
NA
NC
now
Po
pula
tion
siz
e
=NA/NC
Model parameters: ,
![Page 14: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/14.jpg)
Theory: predicting the frequency spectrum
Definitions:
xi Number of sites in frequency class i
f(q,t;) Distribution of allele frequency, q, at time t
Predictions:
1
01];[ dqtqfqqi
nxE inii ;,
n Sample size
1
1];[
];[n
j i
i
xE
xEniP ;,
![Page 15: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/15.jpg)
Theory: the distribution of allele frequency
Poisson Random Field approach (Sawyer and Hartl 1992):
• Use single-locus diffusion theory to predict the distribution of allele-frequency
• If sites are independent (i.e. in linkage equilibrium) and identically distributed, then the single-locus theory applies across sites
To get f, we need to solve the diffusion equation:
;,;;,;;, tqfqMdq
dtqfqV
dq
dtqf
dt
d2
2
2
1
![Page 16: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/16.jpg)
Theory: time-dependent solution, neutral case
Kimura’s (1964) solution, given some initial allele frequency, p:
tii
iii
eqCpCii
piptq
)1(2
12/31
2/31
1
2
2121)1(
))21(1)(12(|,
tqfqqdq
dtqf
dt
d,, 1
2
12
2
The forward equation under neutrality:
![Page 17: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/17.jpg)
Theory: time-dependent solution, neutral case
Kimura’s (1964) solution, given some initial allele frequency, p:
Applying Kimura’s solution to the 2-epoch model: ancestral mutations
dtNtqdpp
pqqf c
1
0 021
2
1 /|,
|,,;
Distribution of allele frequency:
1
0; , , |Af q q p dp
p
0
1; , , |1/ 2
2C cf q q t N dt
![Page 18: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/18.jpg)
Theory: time-dependent solution, neutral case
Expected frequency spectrum after
a change in population size (=0.01)
1 2 3 4 6 75 8 9
0.2
0.4
0.6
0.8
frequency class
P(i
,n;,0.01)
![Page 19: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/19.jpg)
Theory: time-dependent solution, neutral case
Multinomial likelihood:
,;,ln)!ln()!ln()x|,( niPxxnn
ii
n
ii
1
1
1
1
Maximum likelihood estimates of and
Likelihood ratio test of population growth
![Page 20: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/20.jpg)
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Comparing frequency spectra for different classes of mutation
Frequency class
Cou
nt
Observed frequency spectra
Putatively neutral
Potentially selected
Requirements:1. Demographic model
2. Frequency spectrum predictions from the model under neutrality
3. Frequency spectrum predictions from the model subject to natural selection
![Page 21: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/21.jpg)
Theory: time-dependent solution, selected case
;,;,;, tqfqqdq
dtqfqq
dq
dtqf
dt
dSSS 11
2
12
2
The forward equation with selection:
where =2NCs
Initial condition:
2
12
1
1
1
10
e
e
qqqf
q
S
)(
;,
![Page 22: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/22.jpg)
Theory: time-dependent solution, selected case
1. Numerically solve the forward equation using the Crank-Nicolson finite differencing scheme
2. Use this approximation of f to evaluate the likelihood function:
,,;,ln)!ln()!ln()x|,,( niPxxnn
ii
n
ii
1
1
1
1
3. Fix and to their MLEs from the neutral data
4. Optimize the likelihood for . Likelihood ratio test of neutrality:
)x|ˆ,ˆ,()x|ˆ,ˆ,ˆ( 02LRT
![Page 23: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/23.jpg)
Theory: time-dependent solution, selected case
How can we be sure that the numerical solution actually works?
• Von Neumann stability analysis: solution is unconditionally stable
• Numerical solution converges to the stationary distribution after ~4NC generations
• Comparison with time-dependent neutral predictions: Kimura, Crank, and Nicolson all agree with each other
![Page 24: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/24.jpg)
Human Polymorphism Data
• From Stephens et al. (2001)
• 80 individuals, geographically diverse ancestry
• 313 genes, 720 kb sequenced
• ~3000 SNPs (72% non-coding, 13% synonymous, 15% non-synonymous)
![Page 25: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/25.jpg)
Results for non-coding changes, assuming neutrality
Model MLEs ln(L)
2-epoch = 0.016
= 0.13
-5674.6
Equilibrium neutral
-6046.6 (P0, d.f. 2)
Goodness-of-fit
-5608.3 (P=0.54, d.f. 76)
![Page 26: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/26.jpg)
Results for non-synonymous changes, categorized by Grantham’s distance
Category S P-value
conservative 136 -2.24 0.52
moderate 137 -6.08 0.07
radical 107 -8.44 0.02
all nonsyn 380 -4.88 0.10
![Page 27: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/27.jpg)
Ongoing work and future directions
1. Simulate, simulate, simulate
• How robust is the method to different types of demographic forces?
• How does linkage among some sites affect the analysis?
• How does estimation error affect the LRTs?
2. Numerical solution for different demographic scenarios (e.g. bottleneck, population structure)
3. Variable selective effects among new mutations
![Page 28: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/28.jpg)
The McDonald-Kreitman test
Sn Number of non-synonymous segregating sites
Dn Number of non-synonymous fixed differences
Ss Number of synonymous segregating sites
Ds Number of synonymous fixed differences
n s
n s
S S
D D Adaptive evolution
n s
n s
S S
D D Negative selection
Extensions: Sawyer and Hartl (1992), Rand and Kann (1996), Smith and Eyre-Walker (2002), Bustamante et al. (2002), others
![Page 29: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/29.jpg)
Demography and the McDonald-Kreitman test
• Robust to different demographic scenarios because it implicitly conditions on the underlying genealogy (see Nielsen 2001)
• However, under some demographic scenarios it’s possible to misidentify the type of selection
• Weak negative selection with population growth
When the population size is small, non-synonymous deleterious mutations might be fixed by drift
Once the population size becomes large, the level of non-synonymous polymorphism would be reduced (relative to the level of synonymous polymorphism)
n s
n s
S S
D D
![Page 30: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/30.jpg)
Demography and the McDonald-Kreitman test
• Over what range of parameter values might you misidentify negative selection as adaptive evolution?
• How large is the effect?
Eyre-Walker (2002):
• Addressed these questions, finding that recent population growth or bottlenecks can cause you to misidentify negative selection
• Assumed that levels of polymorphism and fixation rates changed instantaneously with population size
![Page 31: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/31.jpg)
Demography and the McDonald-Kreitman test
1
01 1 ; , ,
nnn nE S q q f q dq
2
1
0
22 1; , ,
2 1
1 ; , ,
nn div n
n
n
E D t fe
q f q dq
1
01 1 ;0, ,
nns sE S q q f q dq
1
02 1;0, , 1 ;0, ,
2ns
s div s sE D t f q f q dq
where tdiv is the divergence time, measured in 2NC generations
![Page 32: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/32.jpg)
Demography and the McDonald-Kreitman test
NI n s
s n
S D
S D
0.1 10.01
1
10
=1, tdiv=4
0.1 10.01
=1, tdiv=10
1
10
0.1 10.01
1
10
=0.1, tdiv=4
=0.1, tdiv=10
0.1 10.01
1
10
(=NA/NC)
Exp
ect
ed
Neu
tra
lity
Ind
ex
(NI)
![Page 33: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/33.jpg)
Demography and the McDonald-Kreitman test: Preliminary results
1. It is possible to misidentify negative selection for some parameter combinations
2. But…the parameter range over which this is true is probably smaller than previously thought, as is the magnitude of the effect
![Page 34: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/34.jpg)
Summary
1. Model-based approach to correcting for demography while inferring selection
• Evidence for very recent population growth in humans
• Reasonable estimates of selection parameters for classes of non-synonymous changes
2. McDonald-Kreitman test: negative selection + population growth problem not as severe as previously thought
3. Numerical methods for solving the diffusion are fast, accurate, and fun!
![Page 35: Non-stationary population genetic models with selection: Theory and Inference](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56814a46550346895db761e6/html5/thumbnails/35.jpg)
Acknowledgements
Collaborator: Carlos Bustamante
Data: Genaissance Pharmaceuticals
Helpful discussions: Bret Payseur, Rasmus Nielsen, Matt Dimmic, Jim Crow, Hiroshi Akashi, Graham Coop