effects of rooting on phylogenic algorithms margareta ackerman joint work with david loker and dan...
TRANSCRIPT
![Page 1: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/1.jpg)
Effects of Rooting on Phylogenic Algorithms
Margareta Ackerman
Joint work with
David Loker and Dan Brown
![Page 2: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/2.jpg)
Hierarchical Clustering & Phylogency
![Page 3: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/3.jpg)
Phylogeny is an application of Hierarchical Clustering.
They are closely related!
Phylogeny meets Hierarchical Clustering
Unfortunately, there is a
disconnect between
these fields.
![Page 4: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/4.jpg)
A step towards bridging the gap:
We bring techniques from cluster analysis to study Phylogenetic algorithms.
We apply a recent framework for clustering algorithm selection to Phylogeny
[(Ackerman, Ben-David, and Loker, ‘10), (Ackerman, Ben-David, and Loker, ‘10), (Ackerman & Ben-David, IJCAI ‘11), (Zedah and Ben-David, ‘09)]
Bridging the Gap
![Page 5: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/5.jpg)
Given the same input, different Phylogenetic algorithms can produce radically different results.
5
How should a user decide which algorithm to use?
Selecting Phylogenetic Algorithms
![Page 6: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/6.jpg)
This framework lets a user utilize prior knowledge to select an algorithm
• Identify properties that distinguish between different input-output behaviour of clustering paradigms
• The properties should be:1) Intuitive and “user-friendly”2) Useful for distinguishing clustering
algorithms
6
Framework for Selecting Phylogenetic Algorithms
![Page 7: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/7.jpg)
• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction
Outline
![Page 8: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/8.jpg)
A common solution:
Introduce distant taxa (or, elements) and root where the distant taxa connect with the ingroup.
How to Root Phylogenetic Trees?
E
![Page 9: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/9.jpg)
The addition of an outgroup can CHANGE the topology of the ingroup.
When Rooting Changes the Ingroup
After adding outgroup E
![Page 10: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/10.jpg)
Empirical studies demonstrate that when using some algorithms, ingroup topology can be disrupted when an outgroup is added [(Holland et. al., ‘03), (Shavit et. al., ‘07), (Lin et. al, ‘02), (Slack et. al., ‘03) ]
We perform a theoretical analysis of this phenomenon, proving that some algorithms are immune to this problem, while others are highly volatile.
This Happens in Practice!
![Page 11: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/11.jpg)
Independently of our work, it was shown that when using BME, the ingroup topology can change arbitrarily when an outlier is added (Cueto and Matsen, 2010)
Previous Work
![Page 12: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/12.jpg)
• Linkage-based algorithms (including UPGMA) do not change ingroup when the outgroup is sufficiently far away
• Using Neighbor Joining, ingroup topology is effected by outgroups even if the outgroup is arbitrarily far away
Our Contributions
![Page 13: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/13.jpg)
• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction
Outline
![Page 14: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/14.jpg)
C_i is a cluster in a dendrogram D if there exists a node in the dendrogram so that C_i is the set of its leaf descendents.
14
Formal Setup
![Page 15: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/15.jpg)
C = {C1, … , Ck} is a clustering in a dendrogram D if
– Ci is a cluster in D for all 1≤ i ≤ k, and
– Clusters are disjoint 15
Formal Setup
![Page 16: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/16.jpg)
A Hierarchical Clustering Algorithm A maps
Input: A data set X with a distance function d, denoted (X,d)
toOutput: A dendrogram of X
The distance between Y X ⊆ and Z X ⊆ is the length of the minimum edge between them
d(Y,Z) = miny in Y, z in Z d(y,z)16
Formal Setup
![Page 17: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/17.jpg)
• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction
Outline
![Page 18: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/18.jpg)
Given a data set (XuO, d) and algorithm A,
X is unaffected by O
if A(X, d) is a sub-dendrogram of A(XuO, d).
Otherwise, X is affected by O.
A(X,d) A(O,d) A(XuO,d)
Unaffected by an Outgroup
![Page 19: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/19.jpg)
Ingroup
Algorithm A is outgroup-independent if for any data sets (X, d) and (O, d’), if (X,d) and (O,d’) are sufficiently far apart then X is unaffected by O.
Outgroup
Outgroup Independence
![Page 20: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/20.jpg)
Algorithm A is outgroup-independent if for any data sets (X, d) and (O, d’), if (X,d) and (O,d’) are sufficiently far apart then X is unaffected by O.
A(X,d) A(O,d’) A(XuO,d*)
d* puts (X,d) and (O,d’) sufficiently far apart
Outgroup Independence
![Page 21: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/21.jpg)
An algorithm A is outgroup volatile if for any data set (X,d) and any constant c, there exist (O,d’) with distance between X and O at least c, such that X is affected by O.
If O is a singleton, then A is outlier volatile.
Outgroup Volatility
![Page 22: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/22.jpg)
• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction
Outline
![Page 23: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/23.jpg)
Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.
We use the following general result to show that Linkage-Based algorithms are outgroup-independent.
![Page 24: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/24.jpg)
If we select a cluster from the dendrogram, and run the algorithm the data underlying this cluster, we obtain a result that is consistent with the original dendrogram.
D = A(X,d) D’ = A(X’,d)X’={x1, …, x4}
24
Locality
![Page 25: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/25.jpg)
A(X,d)
C
C on dataset (X,d)C on dataset (X,d’)
Outer-consistent change
25
If A is outer-consistent, then A(X,d’) will also include the clustering C.
Outer Consistency
![Page 26: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/26.jpg)
Given any pair of data sets (X, d) and
(X’, d’), there exists d* over XuX’, so that X and X’ are the children of the root in A(XuX’, d*).
2-Richness
(X,d) (X, d’)
(X, d*)
X
A(X uO,d*)
X’
![Page 27: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/27.jpg)
Proof: We want to show that given any
if the data sets are placed sufficiently far apart,
then A(X,d) is a sub-dendrogram of A(XuO, d*).
Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.
(X,d) (O, d’)
(X uO,d’’)
A(X,d)
A(X uO,d*)
![Page 28: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/28.jpg)
Proof: First, apply 2-richness. Given
there exists d’’ over X uO,
so that X and O are children of A(X uO,d’’).
Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.
(X,d) (O, d’)
(X uO,d’’)
X
A(X uO,d’’)
O
c
![Page 29: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/29.jpg)
Proof:
Let d* be any distance function extending d and d’ where the min distance between X and O is at least c.
Then by outer-consistency, X and O are children of the root of A(X uO,d*).
Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.
(X uO,d’’)
X
A(X uO,d*)
O
c
(X uO,d*)
![Page 30: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/30.jpg)
Proof:
Finally, by locality, A(X,d) is a sub-dendrogram of A(X uO,d*).
Therefore, whenever (X,d) and (O,d’) are sufficiently far apart, X is unaffected by O.
Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.
X
A(X uO,d*)
O
A(X,d)
A(X uO,d*)
![Page 31: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/31.jpg)
• Create a leaf node for every element of X
Insert image
31
Linkage Based Algorithm
![Page 32: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/32.jpg)
• Create a leaf node for every element of X
• Repeat the following until a single tree remains:– Consider clusters represented by the remaining root
nodes.
32
Linkage Based Algorithm
![Page 33: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/33.jpg)
• Create a leaf node for every elements of X
• Repeat the following until a single tree remains:– Consider clusters represented by the remaining root
nodes. Merge the closest pair of clusters by assigning them a common parent node.
33
?
Linkage Based Algorithm
![Page 34: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/34.jpg)
• The choice of linkage function distinguishes between different linkage-based algorithms.
• Examples of common linkage-functions– UPGMA: average between-cluster distance– Single-linkage: shortest between-cluster distance– Complete-linkage: maximum between-cluster
distanceX1 X2
34
Examples of Linkage Based Algorithms
![Page 35: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/35.jpg)
Proof:
We can show that all linkage-based algorithms are 2-outer-rich, outer-consistent, and local.
Result follows by previous Theorem.
Theorem:All Linkage-Based algorithms are outgroup independent.
![Page 36: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/36.jpg)
• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction
Outline
![Page 37: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/37.jpg)
Most widely-used distance-based method for phylogenetic reconstruction
Works well in practice If there is a tree that fits the distance
matrix (additive), it will find it
Neighbour Joining
![Page 38: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/38.jpg)
This remains the case when distances of the ingroup are additive.
Theorem: Neighbor joining is outlier volatile.
![Page 39: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/39.jpg)
Theorem: Given any data set (X,d), there exists a set of outliers O and a distance function d∗ over X O ∪ extending d, where d∗(X,O) can be arbitrarily large, such that NJ(X O, d∪ ∗)|X is an arbitrary dendrogram.
Outgroups can lead to arbitrary dendrograms
A(X,d) A(X uO,d*)|X
![Page 40: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/40.jpg)
• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction
Outline
![Page 41: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/41.jpg)
• Present a formal framework for the analysis of the effects of outgroups on the ingroup topology for computationally efficiently hierarchical algorithms
• Prove that all Linkage-Based algorithms, which include UPGMA, are outgroup independent
• Prove that NJ is outgroup volatile • This only addresses rooting - We do not claim
that UPGMA is in general better than NJ.
Conclusions
![Page 42: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown](https://reader035.vdocuments.mx/reader035/viewer/2022070411/56649c9d5503460f9495c915/html5/thumbnails/42.jpg)
• How to choose outgroups for rooting NJ?
• Perform a similar analysis of Likelihood methods
Future Work