phylogeny tree reconstruction 1 4 3 2 5 1 4 2 3 5
Post on 19-Dec-2015
220 views
TRANSCRIPT
![Page 1: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/1.jpg)
Phylogeny Tree Reconstruction
1 4
3 2 5
1 4 2 3 5
![Page 2: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/2.jpg)
Final Exam
• 24-hour, takehome exam
• More straight-forward questions than in homeworks
• Please email Michael and Serafim by Friday, with your preference of day to take exam
• Exam starts Sunday, …, Thursday noon; ends Monday, ..., Friday noon
![Page 3: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/3.jpg)
Number of labeled unrooted tree topologies
• How many possibilities are there for leaf 4?
1 2
3
44
4
![Page 4: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/4.jpg)
Number of labeled unrooted tree topologies
• How many possibilities are there for leaf 4?
For the 4th leaf, there are 3 possibilities
1 2
3
4
![Page 5: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/5.jpg)
Number of labeled unrooted tree topologies
• How many possibilities are there for leaf 5?
For the 5th leaf, there are 5 possibilities
1 2
3
4
5
![Page 6: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/6.jpg)
Number of labeled unrooted tree topologies
• How many possibilities are there for leaf 6?
For the 6th leaf, there are 7 possibilities
1 2
3
4
5
![Page 7: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/7.jpg)
Number of labeled unrooted tree topologies
• How many possibilities are there for leaf n?
For the nth leaf, there are 2n – 5 possibilities
1 2
3
4
5
![Page 8: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/8.jpg)
Number of labeled unrooted tree topologies
• #unrooted trees for n taxa: (2n-5)*(2n-7)*...*3*1 = (2n-5)! / [2n-3*(n-3)!]
• #rooted trees for n taxa: (2n-3)*(2n-5)*(2n-7)*...*3 = (2n-3)! / [2n-2*(n-2)!]
1 2
3
4
5
N = 10#unrooted: 2,027,025#rooted: 34,459,425
N = 30#unrooted: 8.7x1036
#rooted: 4.95x1038
![Page 9: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/9.jpg)
Search through tree topologies: Branch and Bound
Observation: adding an edge to an existing tree can only increase the parsimony cost
Enumerate all unrooted trees with at most n leaves:
[i3][i5][i7]……[i2N–5]]
where each ik can take values from 0 (no edge) to k
At each point keep C = smallest cost so far for a complete tree
Start B&B with tree [1][0][0]……[0]
Whenever cost of current tree T is > C, then: T is not optimal Any tree extending T with more edges is not optimal:
Increment by 1 the rightmost nonzero counter
![Page 10: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/10.jpg)
Bootstrapping to get the best trees
Main outline of algorithm
1. Select random columns from a multiple alignment – one column can then appear several times
2. Build a phylogenetic tree based on the random sample from (1)
3. Repeat (1), (2) many (say, 1000) times
4. Output the tree that is constructed most frequently
![Page 11: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/11.jpg)
Probabilistic Methods
A more refined measure of evolution along a tree than parsimony
P(x1, x2, xroot | t1, t2) = P(xroot) P(x1 | t1, xroot) P(x2 | t2, xroot)
If we use Jukes-Cantor, for example, and x1 = xroot = A, x2 = C, t1 = t2 = 1,
= pA¼(1 + 3e-4α) ¼(1 – e-4α) = (¼)3(1 + 3e-4α)(1 – e-4α)
x1
t2
xroot
t1
x2
![Page 12: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/12.jpg)
Probabilistic Methods
• If we know all internal labels xu,
P(x1, x2, …, xN, xN+1, …, x2N-1 | T, t) = P(xroot)jrootP(xj | xparent(j), tj, parent(j))
• Usually we don’t know the internal labels, therefore
P(x1, x2, …, xN | T, t) = xN+1 xN+2 … x2N-1 P(x1, x2, …, x2N-1 | T, t)
xroot = x2N-1
x1
x2 xN
xu
![Page 13: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/13.jpg)
Computing the Likelihood of a Tree
• Define P(Lk | a): probability of subtree rooted at xk, given that xk = a
• Then, P(Lk | a) = (b P(Li | b) P(b | a, tki))(c P(Lj | c) P(c | a, tki))
xk
xixj
tkitkj
![Page 14: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/14.jpg)
Felsenstein’s Likelihood Algorithm
To calculate P(x1, x2, …, xN | T, t)
Initialization:Set k = 2N – 1
Recursion: Compute P(Lk | a) for all a If k is a leaf node:
Set P(Lk | a) = 1(a = xk)If k is not a leaf node:
1. Compute P(Li | b), P(Lj | b) for all b, for daughter nodes i, j
2. Set P(Lk | a) = b,c P(b | a, tki)P(Li | b) P(c | a, tkj) P(Lj | c)
Termination:
Likelihood at this column = P(x1, x2, …, xN | T, t) = aP(L2N-1 | a)P(a)
![Page 15: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/15.jpg)
Probabilistic Methods
Given M (ungapped) alignment columns of N sequences,
• Define likelihood of a tree:
L(T, t) = P(Data | T, t) = m=1…M P(x1m, …, xnm, T, t)
Maximum Likelihood Reconstruction:
• Given data X = (xij), find a topology T and length vector t that maximize likelihood L(T, t)
![Page 16: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/16.jpg)
Some new sequencing technologies
![Page 17: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/17.jpg)
Molecular Inversion Probes
![Page 18: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/18.jpg)
Molecular Inversion Probes
![Page 19: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/19.jpg)
Single Molecule Array for Genotyping—Solexa
![Page 20: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/20.jpg)
Nanopore Sequencing
http://www.mcb.harvard.edu/branton/index.htm
![Page 21: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/21.jpg)
Nanopore Sequencing
http://www.mcb.harvard.edu/branton/index.htm
![Page 22: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/22.jpg)
Nanopore Sequencing—Assembly
• Resulting reads are likely to look different than Sanger reads: Long (perhaps 10,000bp-1,000,000bp) High error rate (perhaps 10% – 30%) Two colors?
• A/ CTG• AT/ CG• AG/ CT
• How can we assemble under such conditions?
![Page 23: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/23.jpg)
Pyrosequencing
![Page 24: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/24.jpg)
Pyrosequencing on a chip
Mostafa Ronaghi, Stanford Genome Technologies Center
454 Life Sciences
![Page 25: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/25.jpg)
Pyrosequencing Signal
![Page 26: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/26.jpg)
Pyrosequencing—Assembly
• Resulting reads are likely to look different than Sanger reads: Short (currently 100 to 200 bp) Low error rates, except in homopolymeric runs (AAA…, CCC…, etc) Currently, not known how to do paired reads on a chip
?
![Page 27: Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5](https://reader034.vdocuments.mx/reader034/viewer/2022051415/56649d385503460f94a118e5/html5/thumbnails/27.jpg)
Polony Sequencing