genealogical trees, coalescent theory, and the analysis of ... › ~epxing › cbml › coalescent1...
TRANSCRIPT
![Page 1: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/1.jpg)
Genealogical trees, coalescent theory,and the analysis of genetic
polymorphisms
Magnus Nordborg
University of Southern California
1
![Page 2: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/2.jpg)
The importance of history
• Genetic polymorphism data represent the outcome ofa single, highly complex, non-repeatable evolutionaryhistory
• Traditional analysis methods cannot take this intoaccount
• The stochastic process known as “the coalescent”presents a coherent statistical framework foranalyzing genetic polymorphism data
2
![Page 3: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/3.jpg)
The importance of history: mutations arerandom
G T
G T
T TG G GGGG
MRCA
3
![Page 4: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/4.jpg)
The importance of history: trees are random
4
![Page 5: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/5.jpg)
Modeling genetic polymorphism
At a minimum, models must include:
• coalescence (who begat whom, and when)
• mutation
• recombination
5
![Page 6: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/6.jpg)
Recombinationmakes it possiblefor linked sites to
have differentgenealogies
induced trees
breakpoint
recombination
coalescence
6
![Page 7: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/7.jpg)
What is the coalescent?
• The coalescent is a stochastic process that iswell-suited for modeling polymorphism data
• It is a natural extension to classical populationgenetics models
7
![Page 8: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/8.jpg)
Coalescence: picking parents
N = 10th
e pa
st
n = 3
T(3)
T(2)
8
![Page 9: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/9.jpg)
The rate of coalescence
The rate at which lineages find each other depends on:
• The population size: the per-generation probability ofcoalescence is ∝ 1/N
• The number of lineages: the rate of coalescence whenthere are k lineages is
(k2
)• A number of other demographic factors, such as
inbreeding, age structure, and the variance inreproductive success
Because the per-generation probability of coalescence is onthe order of 1/N , we use a continuous-time approximationwhere time is measured in units of N generations
9
![Page 10: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/10.jpg)
Mutation
• Selectively neutral mutations are added to thebranches of the tree afterwards according to a ratethat depends on the per-generation probability ofmutation
• The expected number of mutations on a branchdepends on its length — the expected number ofmutations on the tree depends on the total branchlength of the tree
• Any mutation model can be used
10
![Page 11: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/11.jpg)
Recombination
• Recombination breaks up lineages according to a ratethat depends on the per-generation probability ofrecombination
• There will be more recombination in the genealogy ofa longer chromosomal segment
• Any recombination model can be used
• The coalescent with recombination generates arandom graph — or a forest of trees
11
![Page 12: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/12.jpg)
A graph or a forest. . .
induced trees
breakpoint
recombination
coalescence
12
![Page 13: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/13.jpg)
A walk through tree space
0 0.2 0.4 0.6 0.8 1
1.5
2
2.5
3
3.5
4
chromosomal position
time
to M
RCA
13
![Page 14: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/14.jpg)
The trees are correlated
0 0.2 0.4 0.6 0.8 1
1.5
2
2.5
3
3.5
4
1 3 2 654 621543
14
![Page 15: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/15.jpg)
The trees are correlated
0 0.2 0.4 0.6 0.8 1
1.5
2
2.5
3
3.5
4
13 2 654 621543 621543
15
![Page 16: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/16.jpg)
Recombination is common
0 0.2 0.4 0.6 0.8 1
1.5
2
2.5
3
3.5
4
this may be 10 kb!
these arejunctions
these are mutations
16
![Page 17: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/17.jpg)
Recombination is as common as mutation
• If 1 cM ∼ 1 Mb, then the probability ofrecombination per bp per generation is ∼ 10−8
• The probability of mutation per bp per generation isestimated to be at most 10−8
• It follows that a sample of sequences will contain asmany junctions as polymorphisms
17
![Page 18: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/18.jpg)
Genealogical graphs can in general not bereconstructed
• Even with infinitely many polymorphisms, asubstantial fraction of all junctions would not bedetected
• In reality, there are clearly too few polymorphisms perjunction to estimate the graph
• Remember: a phylogenetic algorithm will alwaysreconstruct a tree, regardless of whether there existsa tree to be reconstructed. . .
18
![Page 19: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/19.jpg)
We do not in general wish to reconstructgenealogical graphs
• Population genetics is not phylogenetics!
• Gene genealogies are of no interest per se — they arerandom outcomes of an underlying evolutionaryprocess, and are of interest only insofar as theycontain information about this process
19
![Page 20: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/20.jpg)
Gene trees and species trees
Phylogenetic methods estimatespecies trees by estimating genetrees; they are appropriate if andonly if the latter are stronglycorrelated with the former
20
![Page 21: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/21.jpg)
Phylogenetic methods are not applicable towithin-species data
Africa Europe Asia
Africa
Africa Europe Asia
Africa
Africa Europe Asia
Africa
a) Out-of-AfricaModel
b) MultiregionalModel
c) CandelabraModel
millionyears ago
1
0
0.5
migration
Africans
Schematic version of the human mtDNA
tree
non-Africans
• We must consider the likelihood of the data underalternative models
21
![Page 22: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/22.jpg)
A likelihood framework
Phylogenetics:L = P(D|G, µ)
Population genetics:
L =∑G
P(D|G, µ)P(G, α)
Here D is the data, G the genealogy, µ the mutationmodel, and α the demographic model
Note that G is a nuisance parameter in population genetics
22
![Page 23: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/23.jpg)
Uses of the coalescent
• A mathematical modeling tool
• A simulation tool for hypothesis testing andexploratory data analysis
• The basis for full likelihood inference
23
![Page 24: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/24.jpg)
The simplicity and elegance of the coalescentprocess makes it a powerful modeling tool
At least for the standard coalescent, it is often possible toderive results analytically
• Estimators and test, e.g., Tajima’s D statistic
• Illuminating theoretical results, e.g., the probabilitythat a sample of size n contains the MRCA of theentire population is
n− 1n + 1
24
![Page 25: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/25.jpg)
Almost any scenario can be simulated usingthe coalescent
• Coalescent simulations are enormously more efficientthan classical methods
• Simulated data can be compared with real data — orused to evaluate the feasibility of a study before it iscarried out
25
![Page 26: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/26.jpg)
Example: ancient Neanderthal mtDNA
986 modern humans
Neanderthalts Te
Tr• Modern humans
monophyletic
• Tr > 4Te
Does this prove that Neanderthals and modern humans didnot interbreed?
26
![Page 27: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/27.jpg)
Example: ancient Neanderthal mtDNA
986 modern humans
Neanderthalts Te
Tr
Med
iterra
nean
Assuming that they didinterbreed, what is theprobability of getting atree like the oneobserved just bychance?
Coalescent simulations showed that this probability is higheven for large amounts of interbreeding
27
![Page 28: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/28.jpg)
Full likelihood analysis
• In principle possible
• In practice difficult
• Unless major breakthroughs are made, not likely to beapplicable to genomic polymorphism data
28
![Page 29: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/29.jpg)
What is the main insight from coalescenttheory?
That very large numbers of loci arerequired to answer most questions!
29
![Page 30: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/30.jpg)
Population Genomics is upon us!
• Data sets containing 100’s and 1000’s of loci alreadyexist
• Within 10 years, it seems likely that whole-genomecomparisons between species will be common, andthat we will have whole genome sequences from1000’s of humans
30
![Page 31: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/31.jpg)
Less assumptions — more data
We will be able to use empirically estimated distributions oftest statistics rather than theoretically predicted ones
200 300 400 500 600position in kb
-2
-1
0
1
2
D
31
![Page 32: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/32.jpg)
Selective sweeps
• Fixation of new alleles leaves a footprint in thepattern of genomic variation
• Can we find the genes that “make us human”
Selection
Advantageous variant
32
![Page 33: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/33.jpg)
How many genes?
33
![Page 34: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/34.jpg)
Teosinte to corn: < 10,000 years; five genes?
teosinte maize maize with tb1 mutation
34
![Page 35: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/35.jpg)
What’s the use polymorphism data?
• Whole-genome properties
– demographic (sensu lato) history
– molecular evolution
– genetic mechanisms
• The history of individual loci — selection
– divergence between human and other primates
– traces of selection within the last million years
35
![Page 36: Genealogical trees, coalescent theory, and the analysis of ... › ~epxing › CBML › coalescent1 › gpl2002.pdf · Genealogical trees, coalescent theory, and the analysis of genetic](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f04979e7e708231d40ebbe0/html5/thumbnails/36.jpg)
The history and future of multi-locusmethods
36