on the use of markov chains and perron frobenuis …on the use of markov chains and perron frobenuis...

On the use of Markov chains and PerronFrobenuis Theorem in Population Genetics

Ola HossjerDept. of MathematicsStockholm University

June 2015

Motivation and Outline

I Conservation biology: Protect genetic diversity in order toI Prevent inbreedingI Keep species viableI Improve environmental adjustment of species

I Outline:I Populations with substructureI Formulas for long term rate of loss of genetic variants

Effective population size Ne

I Population of size N

I Ne : Size of ideal population with same rate of loss of geneticvariants as studied population (smaller Ne → faster rate)

I Short term protection rule: Ne ≥ 50

I Ne/N varies between species

Genetic drift for size N populationI Discrete time t = 0, 1, 2, . . ..

I Genes g = 1, . . . , 2N.

I Two variants white and black

I νtg nr. of offspring of g , time t.

I Freq. of black, time t + 1, is

Xt+1 = 12N |{g ; g is black at t + 1}|

= 12N

∑g ;g black at time t νtg .

I {Xt} Markov chain, state space

X = {0, 12N , . . . ,

2N−12N , 1}

= {0} ∪ {1} ∪ { 12N , . . . ,

2N−12N }

= X0 ∪ X1 ∪ X2.

Absorbing states: X0, X1

Transient states: X2

2N = 10,Xt = 0.4,

Xt+1 = 0.5,νt2 = νt5 = νt6 = 0,νt1 = νt3 = νt4 = νt9 = νt,10 = 1,νt8 = 2,νt7 = 3.

Ideal population: Wright-Fisher (WF) model1Children pick parental genes independently;

{νtg}2Ng=1 ∼ Mult

(2N;

1

2N, . . . ,

1

2N

),

so that the transition kernel of {Xt} is

P = (P(x , y))x,y∈X ,

P(x , y) = P(Xt+1 = y |Xt = x) =(

2N2Ny

)x2Ny (1− x)2N(1−y).

Feller (1951) showed that the eigenvalues of P are

λ1 = λ2 = 1, λj =(2N − 1)(2N − 2) · · · (2N − j + 2)

(2N)j−2, j = 3, . . . , 2N+1,

so that the largest non-unit eigenvalue is

λ = λ3 = 1− 1

2N. (1)

It gives the asymtotic rate of fixation of one variant;

limt→∞

P(Xt ∈ X2)

λt= C , (0 < C <∞).

1Fisher (1921), Wright (1931).

Cannings modelCannings (1974) showed more generally that

λ1 = λ2 = 1, λj = E

(j−1∏g=1

νtg

), j = 3, . . . , 2N + 1,

if {νtg}2Ng=1 are exchangeable, so that in particular,

λ = λ3 = E (νt1νt2) = 1 + Cov(νt1, νt2) = 1− p,

with coalescence probability

p = −Cov(νt1, νt2)

= E [νt1(νt1−1)]2N−1

= 2N · E[(νt1

2

)]/(

2N2

)= P (two offspring have the same parent) .

Eigenvalue effective size2 NeE = size of a WF population with fixationrate λ:

λ = 1− 1

2NeE=⇒ NeE =

1

2(1− λ)=

N − 12

E [νt1(νt1 − 1)].

2Crow (1954), Ewens (1982).

Gene diversities

Introduce (predicted) gene diversities at time t = 0, 1, 2, . . .

Ht = 2Xt(1− Xt),= P(two genes picked with repl. have diff. variants|Xt)

ht = E (Ht)= P(two genes picked with repl. have diff. variants).

It can be shown thatht+1 = λht .

Hence gene diversities

ht = λth0 = (1− p)th0, (2)

tend to zero at the same multiplicative rate as non-fixationprobabilities P(Xt ∈ X2).

But gene diversities and coalescence probabilities (p) are easier toanalyze theoretically!!

Structured populationDivide into s subpopulations i = 1, . . . , s. Let

2Ni = number of genes in subpop. i , (∑

i Ni = N)Xti = fraction of black variants in subpop. i at time t,νtkig = nr. of offspring of gene g of subpop. k at time

t that end up in subpop. i at time t + 1,exchangeable for g = 1, . . . , 2Nk .

This gives dynamics

Xt+1,i =1

2Ni

s∑k=1

∑g ;g∈ subpop. k at time t,

g is black

νtkig .

Find, under suitable conditions,

NeE =1

2(1− λ),

where λ is rate of fixation, so that for some 0 < C <∞,

limt→∞

P(non-fixation at time t)

λt= C .

Structured population, contd.Let

Xt = (Xt1, . . . ,Xts).

If reproduction is time invariant, {Xt} is Markov chain with state space

X = {0, 12N1

, . . . , 2N1−12N1

, 1} × . . .× {0, 12Ns

, . . . , 2Ns−12Ns

, 1}= X0 ∪ X1 ∪ X2.

where

X0 = {(0, . . . , 0)},X1 = {(1, . . . , 1)},X2 = X \ (X0 ∪ X1),

and

P = (P(x, y))x,y∈X ,P(x, y) = P(Xt+1 = y|Xt = x),

λ = 3rd largest eigenvalue of P.

[c]

Gene diversities for structured populationIntroduce (predicted) gene diversities3 at time t = 0, 1, 2, . . .

Htij = Xti (1− Xtj) + (1− Xti )Xtj ,= P(two genes picked with repl. from i and j have diff. variants|Xt)

htij = E (Htij)= P(two genes picked with repl. from i and j have diff. variants)

between all pairs of subpopulations i and j . The column vector

ht = vec((htij)

si,j=1

)with s2 predicted gene diversities satisfies

ht+1 = Aht =⇒ ht = Ath0, (3)

whereA = (Aij,kl)ij,kl∈{1,...,s}×{1,...,s}

is a square matrix of order s2, and4

λ = λmax(A). (4)

3Maruyama (1970), Felsenstein (1972), Nei (1973).4By Perron-Frobenius Theorem applied to P.

Backward migration and coalescence theory to find A

If children pick parental subpopulations independently, with

Bik = probability by which genes in i pickparental subpopulations from k ∈ {1, . . . , s},

pijk = coalescence probability within k= P(two genes from i , j with parents in k , have same parent)

= Nk

2NiNjBikBjk

(E(νtki1(νtki1−1))

1− 12Ni

){i=j}

E (νtki1νtkj1){i 6=j}.

Then (3) holds, with

Aij,kl =

(1− 1

2Ni

){i=j}(

1− pijk

1− 12Nk

){k=l}

BikBjl . (5)

Motivation of (3) and (5)

We have that

ht+1,ij = P(genes from i and j at time t + 1 have different variants)= P(different genes picked from i and j at time t + 1)∑

k,l ·P(gene from i has parent from k at time t)

·P(gene from j has parent from l at time t)·P(different parents from k and l)·P(different variants of the different parents at time t)

= (1− 12Ni

){i=j}∑k,l BikBjl(1− pijk){k=l} · ht,kl/(1− 1

2Nk){k=l}

=∑

k,l Aij,klht,kl .

with Aij ,kl as in (5). In vector form this writes

ht+1 = Atht .

Computation of NeE

Subpop 1Subpop 2

Subpop 3

Subpop 4

Subpop 5

N4=400

N1=200

N3=50

N5=100

N2=400

5

2

10

3

4

252

5

(Ni )si=1 = (200, 400, 50, 400, 100),

N =∑s

i=1 Ni = 1150,

(Bik)si,k=1 =

0.94 0.05 0 0.01 00.0125 0.9825 0.005 0 00 0.1 0.82 0.08 00.005 0 0.0075 0.9875 00 0 0 0.05 0.95

,

Repr. = {(νtkig )2Nkg=1}

sk=1

∼ Mult(2Ni ;

Bi12N1, . . . , Bi1

2N1, . . . , Bis

2Ns, . . . , Bis

2Ns

)independently for i = 1, . . . , s,

pijk = 1/(2Nk),

NeE = 970,

Gene diversity effective size NeGLet

Wij = P(choose gene pair from i and j),

and collect them into row vector of length s2:

W = vec ((Wij)1≤i,j≤s)′ .

Predicted gene diversity for two randomly sampled genes at time t, is

ht = P(the two genes have different variants)=

∑1≤i,j≤s Wijhtij

= Wht

= WAth0.

It follows from (1) and (2), that for Wright-Fisher model

ht =

(1− 1

2N

)t

· h0. (6)

Gene diversity effective size over time interval [0, t] solves (6), i.e.

NeG ([0, t]) =1

2

[1−

(hth0

)1/t] t→∞→ 1

2(1− λ)= NeE .

Local/global NeG and NeE

Constant subpop. sizes

0 20 40 60 80 100

020

040

0600

800

100

0120

0

Time

Eff

ecti

veSiz

e

Subpop 1Subpop 2Subpop 3Subpop 4Subpop 5Subpop 1,2,3,4,5Asymptotic Limit

Solid - with replacementDashed - without replacement

Local bottleneck in 1

0 20 40 60 80 100

020

0400

600

800

100

0

Time

Eff

ecti

veSiz

eSubpop 1Subpop 2Subpop 3Subpop 4Subpop 5Subpop 1,2,3,4,5Asymptotic Limit


Blocked migration 1-2

0 20 40 60 80 100

020

040

0600

800

100

0120

0

Time

Eff

ecti

veSiz

e

Subpop 1Subpop 2Subpop 3Subpop 4Subpop 5Subpop 1,2,3,4,5Asymptotic Limit


Horizontal: NeE

Solid: t → NeG ([0, t]) for whole population and subpopulationsGlobal weights: Wij = 1/s2

Local weights, subpopulation k : Wij = 1{(i ,j)=(k,k)}

Proof of (4)

We have that

ht = Wht = WAth0 = Cλmax(A)t + o(λmax(A)t

)(7)

as t →∞. But also

ht =∑

ij Wijhtij=

∑ij WijE [Xti (1− Xtj) + Xtj(1− Xti )]

= E [φ(Xt)]= E [E (φ(Xt)|X0)]

=∑

x,y π(x)P(t)(x, y)φ(y),

(8)

where

φ(x) =∑

i ,j Wij [xi (1− xj) + xj(1− xi )] ,

π(x) = P(X0 = x),

Pt =(P(t)(x, y); x, y ∈ X

).

Perron-FrobeniusBlock decompose transition matrix as

P =

1 0 00 1 0

P20 P21 P22

=⇒ Pt =

1 0 00 1 0

P(t)20 P

(t)21 Pt

22

,

Since P22 is non-negative, irreducible and aperiodic, it has uniquelargest eigenvalue λ = λ3, and therefore

P(t)(x, y) = λtr(x)l(y) + o(λt), x, y ∈ X2, (9)

wherel = (l(x); x ∈ X2)r = (r(x); x ∈ X2)′

are left and right eigenvectors of P2 with eigenvalue λ andcomponents

l(x) > 0,r(x) > 0,

(10)

for all x ∈ X2.

Proof of (4), contd.

Use (8), (9), (10) and the fact that

φ(x) = 0, x ∈ X0 ∪ X1,φ(x) > 0, x ∈ X2 (if all Wij > 0),π(x) > 0, for some x ∈ X2,

to conclude

ht = λt∑

x,y∈X2π(x)r(x)l(y)φ(y) + o(λt)

= λt∑

x∈X2π(x)r(x) ·

∑y∈X2

l(y)φ(y) + o(λt)

= Cλt + o(λt),

with C > 0. Combining this with (7), we finally deduce

λ = λmax(A).

Other topicsI Various types of structure (geographic, age, sex, combinations, . . . ).

I Computer program GESP (Olsson et al, 2015).

I Large population asymptotics:

NeE =N

C1+ o(N) = NeC + o(N) as N →∞,

where NeC is coalescence effective size5 and C1 a coalescence rate.

I Small migration asymptotics:

NeE =N

C2B+ o(B−1) as B → 0,

where B = long term rate of subpopulation change in ancestral line.

I Leading right eigenvector of A to assess subpopulationdifferentiation6

5Nordborg and Krone (2002), Sjodin et al. (2005).6FST of Wright (1943), GST of Nei (1973).

References

Hossjer, O. (2011). Coalescence theory for a general class of structured populations with fast migration. Advancesin Probability Theory 43(4), 1027-1047.

Hossjer, O., Jorde, P.E. and Ryman, N. (2013). Quasi equilibrium approximations of the fixation index underneutrality: The island model. Theoretical Population Biology 84, 9-24.

Ryman, N., Allendorf, F.W., Jorde P.E., Laikre, L. and Hossjer, O. (2013). Samples from subdivided populationsyield biased estimates of effective size that overestimate the rate of loss of genetic variation. Molecular EcologyResources 14, 87-99.

Olsson, F. Hossjer, O., Laikre, L. and Ryman, N. (2013). Characteristics of the variance effective population sizeover time using an age structured model with variable size. Theoretical Population Biology 90, 91-103.

Hossjer, O. (2014). Spatial autocorrelation for subdivided populations with invariant migration schemes.Methodology and Computing in Applied Probability 16(4), 777-810. DOI 10.1007/s11009-013-9321-3.

Hossjer, O. and Ryman, N. (2014). Quasi equilibrium, variance effective population size and fixation index formodels with spatial structure. Journal of Mathematical Biology 69(5), 1057-1128. DOI 10.1007/s00285-013-0728-9.

Hossjer, O., Olsson, F., Laikre, L. and Ryman, N. (2014). A new general analytical approach for modeling patternsof genetic differentiation and effective size of subdivided populations over time. Mathematical Biosciences 258,113-133. DOI: 10.1016/j.mbs.2014.10.001.

Hossjer, O., Olsson, F., Laikre, L. and Ryman, N. (2015). Metapopulation Inbreeding Dynamics, Effective Size andSubpopulation Differentiation - a General Analytical Approach for Diploid Organisms. Theoretical PopulationBiology 102, 40-59. DOI:10.1016/j.tpb.2015.03.006.

Hossjer, O. (2015). On the eigenvalue effective size in structured populations. To appear in Journal ofMathematical Biology. Available online, DOI 10.1007/s00285-014-0832-5.

THANKS!

on the use of markov chains and perron frobenuis …on the use of markov chains and perron frobenuis...

Documents