n-gene coalescent problems probability of the 1 st success after waiting t, given a time-constant, a...

15
N-gene Coalescent Problems • Probability of the 1 st success after waiting t, given a time- constant, a ~ p, of success 03/27/22 Comp 790– Continuous-Time Coalescence 1 Exp ( a , t )= ae at E ( Exp ( a , t ))= 1 a Var ( Exp ( a , t ))= 1 a 2

Upload: avis-palmer

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

N-gene Coalescent Problems

• Probability of the 1st success after waiting t, given a time-constant, a ~ p, of success

04/18/23 Comp 790– Continuous-Time Coalescence 1

Exp(a,t)=ae−at

E(Exp(a,t))=1a

Var(Exp(a,t))= 1a2

Review N-genes

• Likelihood k genes have a distinct lineage is:

• Manipulating a little

• Where, for large N, 1/N2 is negligible

04/18/23 Comp 790– Continuous-Time Coalescence 2

(2N−1)2N

(2N−2)2N

L(2N−(k−1))

2N= 1− i

2Ni=1

k−1

The 1st gene can choose its parent freely, but the next k-1 must choose from the remainderGenes without a child

1− i2N

i=1

k−1

∏ ≈1− j2N

i=1

k−1

∑ +O 1N2( )=1−

k2

⎝ ⎜

⎠ ⎟12N

+O 1N2( )

Approx N-gene Coalescence

• Approximate probability k-genes have different parents:

• The probability two or more have a common parent:

• Repeated distinct lineages for j generations leads to a geometric distribution, with

04/18/23 Comp 790– Continuous-Time Coalescence 3

1−k2

⎝ ⎜

⎠ ⎟12N

1− 1−k2

⎝ ⎜

⎠ ⎟12N

⎝ ⎜ ⎜

⎠ ⎟ ⎟=

k2

⎝ ⎜

⎠ ⎟12N

P(N=j)≈ 1−k2

⎝ ⎜

⎠ ⎟12N

⎝ ⎜ ⎜

⎠ ⎟ ⎟

j−1k2

⎝ ⎜

⎠ ⎟12N

p=k2

⎝ ⎜

⎠ ⎟12N

Recall that the 2-gene case had a similar form, but with 1 in place of the combinatorial. Here the combinatorial terms accounts for all possible k-choose-2 pairs, which are treated independently

Impact of Approximation

• Approximation is not “proper” for all values of k < 2N

• Considering the following values of N

04/18/23 Comp 790– Continuous-Time Coalescence 4

1−k2

⎝ ⎜

⎠ ⎟12N

=1−k(k−1)4N

<0 fork> 16N+1 +12

N 10 100 1000 10000 100000 1000000

k 7 21 64 201 633 2001

Fix N and Vary k

• Comparing the actual to the approximation

04/18/23 Comp 790– Continuous-Time Coalescence 5

Concrete Example

• In a population of 2N = 10 the probability that 3 genes have one ancestor in the previous generation is:

• The probability that all 3 have a different ancestor is:

• The remaining probability is that the 3 genes have two parents in the previous generation

04/18/23 Comp 790– Continuous-Time Coalescence 6

110

110

= 1100

The 1st gene can choose its parent freely, while the next 2 must choose the same one

1010

910

810

= 72100

The ist gene can choose its parent from the 10, while the next 2 must choose the remainder

1− 1100

− 72100

= 27100

Example Continued

• The probability is that 2 or more genes have common parents in the previous generation is:

• By our approximation term the probability that two or more genes share a common parent is:

• Leads to a MRCA estimate of

04/18/23 Comp 790– Continuous-Time Coalescence 7

27100

+ 1100

= 28100

The probability that 2 have common parents plus the probability all 3 have a common parent

32

⎝ ⎜

⎠ ⎟110

= 310

error= 310

− 28100

= 2100

Error in approximation for k=3, 2N=10

1p= 1

32

⎝ ⎜

⎠ ⎟110

=103

=3.33

For Large N and Small k

• For 2N > 100, the agreement improves, so long as k << 2N

• The advantage of the approximation is that it fit’s the “form” of a geometric distribution, an thus can be generalized to a continuous-time model

04/18/23 Comp 790– Continuous-Time Coalescence 8

Continuous-time Coalescent

• In the Wright-Fisher model time is measures in discrete units, generations.

• A continuous time approximation is conceptually more useful, and via the given approximation, computationally simple

• Moreover, a continuous model can be constructed that is independent of the population size (2N), so long as our sample size, k, is much smaller (one of those rare cases where a small sample size simplifies matters)

• The only time we will need to consider population size (2N) is when we want to convert from time back into generations.

04/18/23 Comp 790– Continuous-Time Coalescence 9

Continuous-time Derivation

• As before, let , where j is now time measured in generations

• It follows that j = 2Nt translates continuous time, t, back into generations j. In practice floor(2Nt) is used to assign a discrete generation number.

• The waiting time, , for k genes to have k – 1 or fewer ancestors is exponentially distributed, , derived from t = j/2N, M=2N and

• Giving:

04/18/23 Comp 790– Continuous-Time Coalescence 10

Tkc

t= j2N

Tkc ~Exp k

2( )( )

p= k2( ) / 2N

P Tkc ≤t( )=1−e

− k2

⎛ ⎝ ⎜ ⎞ ⎠ ⎟t

The probability that k genes will have k-1 or fewer ancestors at some time greater than or equal to t

Visualization

• Plots of , for k = [3, 4, 5, 6]

04/18/23 Comp 790– Continuous-Time Coalescence 11

P Tkc ≤t( )

k=3

k=4

k=5

k=6

Continuous Coalescent Time Scale

• In the continuous-time time constant is a measure of ancestral population size, with the original at time 0, ½ the original at time 0.5, and ¼ at 1.0

04/18/23 Comp 790– Continuous-Time Coalescence 12

1 2 3 4 5 6

0.0

0.5

1.0

1.3

t

0

N

2N

2.6N

Population size

A Coalescent Model

• The continuous coalescent lends itself to generative models• The following algorithm constructs a plausible genealogy for

n genes

• This model is backwards, it begins from the current populations and posits ancestry, in contrast to a forward algorithm like those used in the first lecture

04/18/23 Comp 790– Continuous-Time Coalescence 13

1. Start with k = n genes2. Simulate the waiting time, , to the next event,3. Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformly

among the pairs4. Merge I and J into one gene and decrease the sample size

by one, k k -15. Repeat from step 2 while k > 1

Tkc

Tkc ~Exp k

2( )( )

k2( )

Properties of a Coalescent Tree

• The height, Hn, of the tree is the sum of time epochs, Tj, where there are j = n, n-1, n-2, … , 2, 1 ancestors.

• The distribution of Hn amounts to a convolution of the exponential variables whose result is:

• Where

• With

04/18/23 Comp 790– Continuous-Time Coalescence 14

P Hn ≤t( )= e− k

2

⎛ ⎝ ⎜ ⎞ ⎠ ⎟t

k−1

n

∑ (−1)k−1(2k−1)F(k)G(k)

F(k)=n(n−1)(n−2)L (n−k+1)G(k)=n(n+1)(n+2)L (n+k−1)

E(Hn)= E(Tj)=21

j(j−1)=2 1−1

n( )j=2

n

∑j=2

n

Var(Hn)= Var(Tj)j=2

n

∑ =4 1j2 (j−1)2j=2

n

As n ∞, E(Hn) 2,and, if n=2, E(H2)=1.

Thus, the waiting time for n genes to find their common ancestor is less than twice the time for 2!

04/18/23 Comp 790– Continuous-Time Coalescence 15