phylogeny and molecular evolution - bgubccg161/wiki.files/phylogeny...• bioinformatics algorithms...
TRANSCRIPT
![Page 1: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/1.jpg)
1
Phylogeny
and
Molecular Evolution
Distance Based Phylogeny
![Page 2: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/2.jpg)
2/62
Credit
• Serafim Batzoglou (UPGMA slides) http://www.stanford.edu/class/cs262/Slides
• Notes by Nir Friedman, Dan Geiger, Shlomo Moran, Ron
Shamir, Sagi Snir, Michal Ziv-Ukelson
• Durbin et al.
• Jones and Pevzner’s lecture notes
• Bioinformatics Algorithms book by Phillip Compeau and
Pavel Pvzner – all book photos shown in this lecture
are from there.
![Page 3: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/3.jpg)
Phylogenetic Trees
3/62
![Page 4: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/4.jpg)
Phylogenetic Trees are Unordered
4/62
![Page 5: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/5.jpg)
Phylogenetic Trees could be Rooted or Unrooted
5/62
![Page 6: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/6.jpg)
6/62
Type of Tree Reconstruction • Character-based
• Input is a multiple alignment of the sequences at the
leaves. (find the topology that best explains the
evolution of leaf sequences via mutations)
• Distance-based
• Input is a matrix of distances between species.
![Page 7: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/7.jpg)
Distance Based Tree
Reconstruction
7/62
![Page 8: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/8.jpg)
Distance Based Tree
Reconstruction
8/62
![Page 9: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/9.jpg)
9/62
Distances in Trees
• Edges may have weights, which reflect:
• Number of mutations on evolutionary path from one species to another
• Or, time estimate for evolution of one species into another
• In a tree T with n leaves, we often compute the length of a path between leaves i and j, dij(T)
• dij refers the the distance between i and j and is the sum of the weight of the edges between i and j
![Page 10: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/10.jpg)
10/62
Distance in Trees (cont’d)
For i = 1, j = 4, dij is:
d(1,4) = 12 + 13 + 14 + 17 + 13 = 69
i
j
![Page 11: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/11.jpg)
12/62
Additive Distance Matrices
A distance matrix is called ADDITIVE if there
exists a tree T with dij(T) = Dij
![Page 12: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/12.jpg)
13/62
Additive Distance Matrices
Is this matrix additive???
![Page 13: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/13.jpg)
Additive Distance Matrices
A distance matrix is
called ADDITIVE if
there exists a tree T
with dij(T) = Dij
![Page 14: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/14.jpg)
15/62
Additive Distance Matrices
Is this matrix additive???
![Page 15: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/15.jpg)
16/62
Additive Distance Matrices
A distance matrix is
called ADDITIVE if
there exists a tree T
with dij(T) = Dij
NONADDITIVE
otherwise
![Page 16: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/16.jpg)
Additive Matrices have a Simple
Tree Fitting
17/62
![Page 17: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/17.jpg)
18/62
Distance Based Phylogeny Problem
• Goal:Reconstruct an evolutionary tree from a
distance matrix
• Input: n x n distance matrix Dij
• Output: weighted unrooted (or rooted) tree T
with n leaves fitting D
• If D is additive, this problem has a solution
and there are simple algorithms to solve it
(we will not learn them in class)
• However usually D is not additive
![Page 18: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/18.jpg)
Rooted Ultrametric Trees
20/62
![Page 19: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/19.jpg)
21/62
UPGMA Unweighted Pair Group Method with Arithmetic Mean
• UPGMA is a clustering algorithm that:
• Computes the distance between clusters using
average pairwise distance
• Cssigns a height to every vertex in the tree, effectively
assuming the presence of a molecular clock and
dating every vertex
• Assumes the matrix D is additive, so the generated
tree fits D.
• If D is not additibe, UPGMA will generate a heuristic
solution that does not fit D
• Produces ultrametric trees – all leaves are equi-distant
from the root.
![Page 20: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/20.jpg)
22/62
Clustering in UPGMA
Given two disjoint clusters Ci, Cj of sequences,
1
dij = ––––––––– {p Ci, q Cj}dpq
|Ci| |Cj|
Note that if Ck = Ci Cj, then distance to another cluster Cl is:
dil |Ci| + djl |Cj|
dkl = ––––––––––––––
|Ci| + |Cj|
![Page 21: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/21.jpg)
23/62
UPGMA Algorithm
Initialization:
Assign each xi into its own cluster Ci (clusters of size 1)
Define one leaf per sequence, height 0
Iteration:
Find two clusters Ci, Cj s.t. dij is min
Let Ck = Ci Cj
Define node connecting Ci, Cj, & place it at height dij/2
Delete Ci, Cj
Termination:
When two clusters i, j remain, place root at height dij/2
![Page 22: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/22.jpg)
24/62
UPGMA Algorithm (cont’d)
1 4
32 5
1 4 2 3 5
![Page 23: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/23.jpg)
25/62
UPGMA Algorithm (cont’d)
1 4
32 5
1 4 2 3 5
![Page 24: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/24.jpg)
26/62
UPGMA Algorithm (cont’d)
1 4
32 5
1 4 2 3 5
![Page 25: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/25.jpg)
27/62
UPGMA Algorithm (cont’d)
1 4
32 5
1 4 2 3 5
![Page 26: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/26.jpg)
28/62
UPGMA Algorithm (cont’d)
1 4
32 5
1 4 2 3 5
![Page 27: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/27.jpg)
29/62
Example
![Page 28: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/28.jpg)
31/62
UPGMA’s Weakness
• The algorithm produces an ultrametric tree:
the distance from the root to any leaf is the
same
• UPGMA assumes a constant molecular
clock: all species represented by the
leaves in the tree are assumed to
accumulate mutations (and thus evolve)
at the same rate. This is one of the
major pitfalls of UPGMA.
![Page 29: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/29.jpg)
32/62
UPGMA’s Weakness: Example
2
3
4
11 4 32
Correct tree UPGMA
![Page 30: Phylogeny and Molecular Evolution - BGUbccg161/wiki.files/phylogeny...• Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner –all book photos shown in this lecture](https://reader033.vdocuments.mx/reader033/viewer/2022042622/5fa3f868f2d035363b1ee327/html5/thumbnails/30.jpg)
33