barking up the wrong treelength

Barking Up the Wrong Treelength

Kevin Liu, Serita Nelesen, Sindhu Raghavan, C. Randal Linder, and Tandy Warnow

IEEE TCCB 2009

Minimizing Treelength

Generalized Input: set S of sequences and a function f(s, s') for

the edit distance between sequences s and s' Output: A tree T, leaf-labelled by set S, with

additional sequences labelling the internal nodes of T, so as to minimize treelength (total edit distance on the edges of the tree)

Fixed Tree variant

POY

POY (from the American Museum of Natural History, Ward Wheeler and colleagues) is the main software for this.

Minimizing treelength is also known as “Direct Optimization”

POY has passionate adherents who believe in treelength

POY also has been heavily criticized

POY

Input: set S of sequences (unaligned), gap-open cost, gap-extend cost, and transition/transversion ratio

Default settings for gap-open and gap-extend in POY are “simple” (gap-open cost is 0)

POY can also be used to score a fixed input tree under the desired treelength definition.

Ogden and Rosenberg 2007

Ogden and Rosenberg study compared POY 3.0 to MP(ClustalW) Model conditions – mostly 16 taxa (some 64 taxon trees),

K2P substitution model, short gaps (expected length 4) Optimization Problem – Multiple edit distances, all on simple

gap penalties (gap-open cost is 0) Performance metrics

Tree errors Alignment errors No mention of treelength

Result: MP(ClustalW) much more accurate than POY

O&R concluded that Treelength is BAD!

O&R simulation study showed that POY alignments worse than ClustalW more than 99% of the time, and POY trees less accurate than ClustalW on average.

“Therefore, traditional multiple sequence alignment approaches appear to vastly outperform direct optimization-like approaches in terms of alignment accuracy, at least for the data sets and parameter settings that have been examined thus far.” Ogden and Rosenberg 2007

Treelength is BAD!

“Although our data represents a fairly simple case, for data sets similar to these the traditional two-step approach will almost always give a more accurate alignment and will most likely recover equally or more accurate phylogenetic relationships than direct optimization as implemented in POY.” Ogden and Rosenberg 2007

Our question

Does minimizing treelength work poorly in general,

or

Is it minimizing treelength under simple gap penalties that works poorly?

Gap penalties

Simple: a gap of length k costs kC Affine: a gap of length k costs Copen+kCextend

Other types of penalties are possible

“Treelength not so bad!”(paraphrasing Liu et al 2009)

Liu et al. 2009 show Treelength can be a good criterion, if based

upon affine gap penalty We developed POY*: a version of POY which

uses: a particular affine gap penalty, and a particular starting tree

Our Study 2008

Our study compares POY 4.0 to multiple methods Model conditions – 25 and 100 taxa, GTR+Gamma

for the substitution model, short and long gaps Optimization Problem – Multiple edit distances,

based upon both simple and affine gap penalties Results

Tree error Alignment error Treelength

Gap cost functions we studied

Simple1 – all mismatches and indels cost 1 Simple2 – indels cost 2, transversions cost 2 and

transitions cost 1 Affine – gap of length k costs 4 + k, transversions cost

2, and transitions cost 1

Simulation Study Overview

Model trees Birth-death Deviation from ultrametricity

Sequence evolution Estimation of trees and alignments Statistics


Model trees Sequence evolution

GTR model of evolution from Tree of Life project Gamma-distributed rates across sites Gap model

Estimation of trees and alignments Statistics


Model trees Sequence evolution Estimation of trees and alignments

POY POY* - POY with particular starting tree (Probtree,

using a particular Affine gap penalty Several two-phase methods (best alignments

followed by MP and ML) PS (POY-score) on various trees

Statistics


Model trees Sequence evolution Estimation of trees and alignments Statistics

1. Alignment error

2. Tree error

3. Treelength under each gap cost function

Simulation Study Model Conditions

4 model conditions 80 replicate datasets apiece Different numbers of taxa allow us to explore

taxonomic sampling effects

Results – Alignment

Errors Simple vs. affine

penalties Note: story

changes for affine penalties, especially on long gap event distribution

Alignment Error: ClustalW vs. POY*

POY* better than ClustalW over 50% in (b), and 90% of time under (a)

Compare with Ogden and Rosenberg, who find ClustalW better than POY 99.9% of time

Results – Alignment

Errors

PS is POY used to estimate alignments on various trees

Note: PS produces worse alignments than ClustalW if simple gap cost functions are used, even if applied to the true tree

Tree error

POY and POY* both use the same gap penalty (affine)

Results shown on 100 taxon short gap simulated datasets (results for other models similar)

Tree Error



Tree error



How well does POY solve its optimization problem?

We examine the treelength found by POY for various model conditions

We let treelength be defined by simple1, simple2, or affine

We compare treelengths found by POY to treelengths achievable in each model condition (as produced by scoring the true tree and other trees)

Results – Simple Treelength Criteria

Results – Affine

Treelength Criterion

Results - Treelengths

POY search finds short trees for simple gap penalties, but not for affine

Can we propose a better POY search for affine penalties?

POY*

How well does POY solve its optimization problem?

Simple gap penalties: excellent performance Affine gap penalties: poor performance

But POY* optimizes both well.

The difference is just the starting tree.

Is it a good idea to optimize treelength?

Simple gap penalties: NO! Worse trees and worse alignments.

Affine gap penalties: Let’s see.

POY vs. POY* using affine gap

Insights

Simple gap penalties were a main cause behind Ogden and Rosenberg's findings

Unable to obtain accurate POY alignments and trees under a simple treelength criterion

Using affine penalties, POY*: Obtains alignments that are more accurate than ClustalW 90% of long gap

datasets, 75% of medium, 55% of short Has tree accuracy that is comparable to the best two-phase method (ML

on good alignments) But poorer alignments than the best alignment methods (e.g., Probtree)

Conclusions Distinguish between the optimization problem,

and the heuristic methods used for those problems

The treelength optimization criteria chosen has a significant impact on the tree and alignment error Simple alignment and trees aren't competitive

relative to two-phase methods, and improving simple criteria treelengths doesn't get better trees

Affine criteria story is still open Can we find shorter trees than two-phase trees? How accurate are such shorter trees?

Questions?

barking up the wrong treelength

Documents