evolutionary genetics: part 5 coalescent simulations · 2013-01-07 · population genetics: 4...

Post on 27-Jun-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Evolutionary Genetics: Part 5

Coalescent simulations

S. peruvianum

S. chilense

Winter Semester 2012-2013

Prof Aurélien TellierFG Populationsgenetik

Color code

Color code:

Red = Important result or definition

Purple: exercise to do

Green: some bits of maths

Population genetics: 4 evolutionary forces

random genomic processes

(mutation, duplication, recombination, gene conversion)

natural

selection

random demographic

process (drift)

random spatial

process (migration)

molecular diversity

Simulating sequence data

How to simulate?

How to simulate?

How to simulate?

Algorithm to generate sequence data

� Put k+n where n is the sample size

� Choose an exponential variable with parameter k(k-1+θ)/2

� With probability:

� (k-1)/(k-1+θ) the event is a coalescent event

� And with probability θ/(k-1+θ) the event is a mutation

� If a coalescent event occurs choose a pair of lineages to coalesce, k becomes then

k-1

� If a mutation event occurs, choose a lineage to mutate, k is unchanged

� Repeat all this until k=1

Simulations 1

What is θ ?????

Simulations 1

Do you see the same numbers? WHY?

Simulations 1

Simulations 1

pdf(file=‘‘constant_tree.pdf‘‘)

Dev.off()

4 –t 5 –T > treefile.tre

Simulations 1: neutral and constant size

Simulations 2: neutral and expansion

t1 = 0.5 = time at which the expansion starts in the past

x = 0.1 = the population in the past is 0.1*N0

Present population size = N0

Ancestral population size = x*N0

Time t1 of expansionIn 4N0 generations

Do you see a problem ??? What is N0 ???

Simulations 2: neutral and expansion

-eN 0.5 0.1

0.5 = time at which the expansion starts in the past

0.1 = the population in the past is 0.1*N0

-eN 0.05 0.1 – T > expansion.tre

4

4

Simulations 2: trees of expansion

pdf(file=‘‘expansion-tree.pdf‘‘)

Dev.off()

expansion.tre

Simulations 3: crash or bottleneck?

For a crash:

./ms 10 4 –t 5 -eN 0.5 5

Present population size = N0

Ancestral population size = x*N0

Time t1 of expansionIn 4N0 generations

Simulations 3: crash or bottleneck?

For a bottleneck:

./ms 10 4 –t 5 -eN 0.5 0.25 -eN 0.75 2

Present population size = N0

Ancestral population size = x2*N0

Time t1

Time t2

Bottleneck population size = x1*N0

t1 x1t2 x2

Simulations 2: trees of expansion

Exercise

Summarize the ms output

Exercise

Exercise

Then save the output in a file:

> test1.out

Exercise

Now using R

Load the file:

test <- read.table(“test1.out“,header=FALSE)

Then draw graphs:

pdf(file=‘‘summary_neutral_constant.pdf‘‘)

hist(test[,2],main=“Theta_Pi Tajima“)

hist(test[,4],main=“Theta_Watterson“)

hist(test[,6],main=“Tajima D“)

Dev.off()

Then do the same for an expansion, decline or bottleneck

Exercise

Final simulations

� Using msmsplay on your computer

� Command line is similar

� Can see directly the site Frequency-Spectrum

� Can you compare the site frequency spectrum with values of Tajima‘s D ?

� Lets simulate neutral model, expansion, decline

� What differences we see?

Some data analysis

� Use datasets:

� Use DnaSP to calculate usual statistics:

� Diversity = θW , θπ

� Site frequency spectrum

� Tajima‘s D

� What do you conclude on these various data?

� Do you have an idea of the past demography of these populations?

� Why do you need several independent loci ?

top related