on the use of markov chains and perron frobenuis …on the use of markov chains and perron frobenuis...
TRANSCRIPT
On the use of Markov chains and PerronFrobenuis Theorem in Population Genetics
Ola HossjerDept. of MathematicsStockholm University
June 2015
Motivation and Outline
I Conservation biology: Protect genetic diversity in order toI Prevent inbreedingI Keep species viableI Improve environmental adjustment of species
I Outline:I Populations with substructureI Formulas for long term rate of loss of genetic variants
Effective population size Ne
I Population of size N
I Ne : Size of ideal population with same rate of loss of geneticvariants as studied population (smaller Ne → faster rate)
I Short term protection rule: Ne ≥ 50
I Ne/N varies between species
Genetic drift for size N populationI Discrete time t = 0, 1, 2, . . ..
I Genes g = 1, . . . , 2N.
I Two variants white and black
I νtg nr. of offspring of g , time t.
I Freq. of black, time t + 1, is
Xt+1 = 12N |{g ; g is black at t + 1}|
= 12N
∑g ;g black at time t νtg .
I {Xt} Markov chain, state space
X = {0, 12N , . . . ,
2N−12N , 1}
= {0} ∪ {1} ∪ { 12N , . . . ,
2N−12N }
= X0 ∪ X1 ∪ X2.
Absorbing states: X0, X1
Transient states: X2
2N = 10,Xt = 0.4,
Xt+1 = 0.5,νt2 = νt5 = νt6 = 0,νt1 = νt3 = νt4 = νt9 = νt,10 = 1,νt8 = 2,νt7 = 3.
Ideal population: Wright-Fisher (WF) model1Children pick parental genes independently;
{νtg}2Ng=1 ∼ Mult
(2N;
1
2N, . . . ,
1
2N
),
so that the transition kernel of {Xt} is
P = (P(x , y))x,y∈X ,
P(x , y) = P(Xt+1 = y |Xt = x) =(
2N2Ny
)x2Ny (1− x)2N(1−y).
Feller (1951) showed that the eigenvalues of P are
λ1 = λ2 = 1, λj =(2N − 1)(2N − 2) · · · (2N − j + 2)
(2N)j−2, j = 3, . . . , 2N+1,
so that the largest non-unit eigenvalue is
λ = λ3 = 1− 1
2N. (1)
It gives the asymtotic rate of fixation of one variant;
limt→∞
P(Xt ∈ X2)
λt= C , (0 < C <∞).
1Fisher (1921), Wright (1931).
Cannings modelCannings (1974) showed more generally that
λ1 = λ2 = 1, λj = E
(j−1∏g=1
νtg
), j = 3, . . . , 2N + 1,
if {νtg}2Ng=1 are exchangeable, so that in particular,
λ = λ3 = E (νt1νt2) = 1 + Cov(νt1, νt2) = 1− p,
with coalescence probability
p = −Cov(νt1, νt2)
= E [νt1(νt1−1)]2N−1
= 2N · E[(νt1
2
)]/(
2N2
)= P (two offspring have the same parent) .
Eigenvalue effective size2 NeE = size of a WF population with fixationrate λ:
λ = 1− 1
2NeE=⇒ NeE =
1
2(1− λ)=
N − 12
E [νt1(νt1 − 1)].
2Crow (1954), Ewens (1982).
Gene diversities
Introduce (predicted) gene diversities at time t = 0, 1, 2, . . .
Ht = 2Xt(1− Xt),= P(two genes picked with repl. have diff. variants|Xt)
ht = E (Ht)= P(two genes picked with repl. have diff. variants).
It can be shown thatht+1 = λht .
Hence gene diversities
ht = λth0 = (1− p)th0, (2)
tend to zero at the same multiplicative rate as non-fixationprobabilities P(Xt ∈ X2).
But gene diversities and coalescence probabilities (p) are easier toanalyze theoretically!!
Structured populationDivide into s subpopulations i = 1, . . . , s. Let
2Ni = number of genes in subpop. i , (∑
i Ni = N)Xti = fraction of black variants in subpop. i at time t,νtkig = nr. of offspring of gene g of subpop. k at time
t that end up in subpop. i at time t + 1,exchangeable for g = 1, . . . , 2Nk .
This gives dynamics
Xt+1,i =1
2Ni
s∑k=1
∑g ;g∈ subpop. k at time t,
g is black
νtkig .
Find, under suitable conditions,
NeE =1
2(1− λ),
where λ is rate of fixation, so that for some 0 < C <∞,
limt→∞
P(non-fixation at time t)
λt= C .
Structured population, contd.Let
Xt = (Xt1, . . . ,Xts).
If reproduction is time invariant, {Xt} is Markov chain with state space
X = {0, 12N1
, . . . , 2N1−12N1
, 1} × . . .× {0, 12Ns
, . . . , 2Ns−12Ns
, 1}= X0 ∪ X1 ∪ X2.
where
X0 = {(0, . . . , 0)},X1 = {(1, . . . , 1)},X2 = X \ (X0 ∪ X1),
and
P = (P(x, y))x,y∈X ,P(x, y) = P(Xt+1 = y|Xt = x),
λ = 3rd largest eigenvalue of P.
[c]
Gene diversities for structured populationIntroduce (predicted) gene diversities3 at time t = 0, 1, 2, . . .
Htij = Xti (1− Xtj) + (1− Xti )Xtj ,= P(two genes picked with repl. from i and j have diff. variants|Xt)
htij = E (Htij)= P(two genes picked with repl. from i and j have diff. variants)
between all pairs of subpopulations i and j . The column vector
ht = vec((htij)
si,j=1
)with s2 predicted gene diversities satisfies
ht+1 = Aht =⇒ ht = Ath0, (3)
whereA = (Aij,kl)ij,kl∈{1,...,s}×{1,...,s}
is a square matrix of order s2, and4
λ = λmax(A). (4)
3Maruyama (1970), Felsenstein (1972), Nei (1973).4By Perron-Frobenius Theorem applied to P.
Backward migration and coalescence theory to find A
If children pick parental subpopulations independently, with
Bik = probability by which genes in i pickparental subpopulations from k ∈ {1, . . . , s},
pijk = coalescence probability within k= P(two genes from i , j with parents in k , have same parent)
= Nk
2NiNjBikBjk
(E(νtki1(νtki1−1))
1− 12Ni
){i=j}
E (νtki1νtkj1){i 6=j}.
Then (3) holds, with
Aij,kl =
(1− 1
2Ni
){i=j}(
1− pijk
1− 12Nk
){k=l}
BikBjl . (5)
Motivation of (3) and (5)
We have that
ht+1,ij = P(genes from i and j at time t + 1 have different variants)= P(different genes picked from i and j at time t + 1)∑
k,l ·P(gene from i has parent from k at time t)
·P(gene from j has parent from l at time t)·P(different parents from k and l)·P(different variants of the different parents at time t)
= (1− 12Ni
){i=j}∑k,l BikBjl(1− pijk){k=l} · ht,kl/(1− 1
2Nk){k=l}
=∑
k,l Aij,klht,kl .
with Aij ,kl as in (5). In vector form this writes
ht+1 = Atht .
Computation of NeE
Subpop 1Subpop 2
Subpop 3
Subpop 4
Subpop 5
N4=400
N1=200
N3=50
N5=100
N2=400
5
2
10
3
4
252
5
(Ni )si=1 = (200, 400, 50, 400, 100),
N =∑s
i=1 Ni = 1150,
(Bik)si,k=1 =
0.94 0.05 0 0.01 00.0125 0.9825 0.005 0 00 0.1 0.82 0.08 00.005 0 0.0075 0.9875 00 0 0 0.05 0.95
,
Repr. = {(νtkig )2Nkg=1}
sk=1
∼ Mult(2Ni ;
Bi12N1, . . . , Bi1
2N1, . . . , Bis
2Ns, . . . , Bis
2Ns
)independently for i = 1, . . . , s,
pijk = 1/(2Nk),
NeE = 970,
Gene diversity effective size NeGLet
Wij = P(choose gene pair from i and j),
and collect them into row vector of length s2:
W = vec ((Wij)1≤i,j≤s)′ .
Predicted gene diversity for two randomly sampled genes at time t, is
ht = P(the two genes have different variants)=
∑1≤i,j≤s Wijhtij
= Wht
= WAth0.
It follows from (1) and (2), that for Wright-Fisher model
ht =
(1− 1
2N
)t
· h0. (6)
Gene diversity effective size over time interval [0, t] solves (6), i.e.
NeG ([0, t]) =1
2
[1−
(hth0
)1/t] t→∞→ 1
2(1− λ)= NeE .
Local/global NeG and NeE
Constant subpop. sizes
0 20 40 60 80 100
020
040
0600
800
100
0120
0
Time
Eff
ecti
veSiz
e
Subpop 1Subpop 2Subpop 3Subpop 4Subpop 5Subpop 1,2,3,4,5Asymptotic Limit
Solid - with replacementDashed - without replacement
Local bottleneck in 1
0 20 40 60 80 100
020
0400
600
800
100
0
Time
Eff
ecti
veSiz
eSubpop 1Subpop 2Subpop 3Subpop 4Subpop 5Subpop 1,2,3,4,5Asymptotic Limit
Solid - with replacementDashed - without replacement
Blocked migration 1-2
0 20 40 60 80 100
020
040
0600
800
100
0120
0
Time
Eff
ecti
veSiz
e
Subpop 1Subpop 2Subpop 3Subpop 4Subpop 5Subpop 1,2,3,4,5Asymptotic Limit
Solid - with replacementDashed - without replacement
Horizontal: NeE
Solid: t → NeG ([0, t]) for whole population and subpopulationsGlobal weights: Wij = 1/s2
Local weights, subpopulation k : Wij = 1{(i ,j)=(k,k)}
Proof of (4)
We have that
ht = Wht = WAth0 = Cλmax(A)t + o(λmax(A)t
)(7)
as t →∞. But also
ht =∑
ij Wijhtij=
∑ij WijE [Xti (1− Xtj) + Xtj(1− Xti )]
= E [φ(Xt)]= E [E (φ(Xt)|X0)]
=∑
x,y π(x)P(t)(x, y)φ(y),
(8)
where
φ(x) =∑
i ,j Wij [xi (1− xj) + xj(1− xi )] ,
π(x) = P(X0 = x),
Pt =(P(t)(x, y); x, y ∈ X
).
Perron-FrobeniusBlock decompose transition matrix as
P =
1 0 00 1 0
P20 P21 P22
=⇒ Pt =
1 0 00 1 0
P(t)20 P
(t)21 Pt
22
,
Since P22 is non-negative, irreducible and aperiodic, it has uniquelargest eigenvalue λ = λ3, and therefore
P(t)(x, y) = λtr(x)l(y) + o(λt), x, y ∈ X2, (9)
wherel = (l(x); x ∈ X2)r = (r(x); x ∈ X2)′
are left and right eigenvectors of P2 with eigenvalue λ andcomponents
l(x) > 0,r(x) > 0,
(10)
for all x ∈ X2.
Proof of (4), contd.
Use (8), (9), (10) and the fact that
φ(x) = 0, x ∈ X0 ∪ X1,φ(x) > 0, x ∈ X2 (if all Wij > 0),π(x) > 0, for some x ∈ X2,
to conclude
ht = λt∑
x,y∈X2π(x)r(x)l(y)φ(y) + o(λt)
= λt∑
x∈X2π(x)r(x) ·
∑y∈X2
l(y)φ(y) + o(λt)
= Cλt + o(λt),
with C > 0. Combining this with (7), we finally deduce
λ = λmax(A).
Other topicsI Various types of structure (geographic, age, sex, combinations, . . . ).
I Computer program GESP (Olsson et al, 2015).
I Large population asymptotics:
NeE =N
C1+ o(N) = NeC + o(N) as N →∞,
where NeC is coalescence effective size5 and C1 a coalescence rate.
I Small migration asymptotics:
NeE =N
C2B+ o(B−1) as B → 0,
where B = long term rate of subpopulation change in ancestral line.
I Leading right eigenvector of A to assess subpopulationdifferentiation6
5Nordborg and Krone (2002), Sjodin et al. (2005).6FST of Wright (1943), GST of Nei (1973).
References
Hossjer, O. (2011). Coalescence theory for a general class of structured populations with fast migration. Advancesin Probability Theory 43(4), 1027-1047.
Hossjer, O., Jorde, P.E. and Ryman, N. (2013). Quasi equilibrium approximations of the fixation index underneutrality: The island model. Theoretical Population Biology 84, 9-24.
Ryman, N., Allendorf, F.W., Jorde P.E., Laikre, L. and Hossjer, O. (2013). Samples from subdivided populationsyield biased estimates of effective size that overestimate the rate of loss of genetic variation. Molecular EcologyResources 14, 87-99.
Olsson, F. Hossjer, O., Laikre, L. and Ryman, N. (2013). Characteristics of the variance effective population sizeover time using an age structured model with variable size. Theoretical Population Biology 90, 91-103.
Hossjer, O. (2014). Spatial autocorrelation for subdivided populations with invariant migration schemes.Methodology and Computing in Applied Probability 16(4), 777-810. DOI 10.1007/s11009-013-9321-3.
Hossjer, O. and Ryman, N. (2014). Quasi equilibrium, variance effective population size and fixation index formodels with spatial structure. Journal of Mathematical Biology 69(5), 1057-1128. DOI 10.1007/s00285-013-0728-9.
Hossjer, O., Olsson, F., Laikre, L. and Ryman, N. (2014). A new general analytical approach for modeling patternsof genetic differentiation and effective size of subdivided populations over time. Mathematical Biosciences 258,113-133. DOI: 10.1016/j.mbs.2014.10.001.
Hossjer, O., Olsson, F., Laikre, L. and Ryman, N. (2015). Metapopulation Inbreeding Dynamics, Effective Size andSubpopulation Differentiation - a General Analytical Approach for Diploid Organisms. Theoretical PopulationBiology 102, 40-59. DOI:10.1016/j.tpb.2015.03.006.
Hossjer, O. (2015). On the eigenvalue effective size in structured populations. To appear in Journal ofMathematical Biology. Available online, DOI 10.1007/s00285-014-0832-5.
THANKS!