mareike fischer how many characters are needed to reconstruct the true tree? mareike fischer and...
DESCRIPTION
Mareike Fischer Previous Approaches 1.Churchill, von Haeseler, Navidi (1992) 4 taxa scenario Observations: The probability of reconstructing the true tree increases with the length of the interior edge. “Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.” more characters Rec. Prob. int. edgeTRANSCRIPT
Mareike Fischer
How many characters are needed to reconstruct the true tree?
Mareike Fischer
and Mike Steel
Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07
Mareike Fischer
The Problem
Given: Sequence of characters (e.g. DNA)
Wanted: Reconstruction of the ‘true’ tree
Solution: Maximum Parsimony, Maximum Likelihood, etc.
But: Is the sequence long enough for a reliable reconstruction?
Mareike Fischer
Previous Approaches
1.Churchill, von Haeseler, Navidi (1992)
• 4 taxa scenario• Observations:
The probability of reconstructing the true tree increases with the length of the interior edge.
“Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.”
more charactersRe
c. P
rob.
int. edge
Mareike Fischer
Previous Approaches
2. Yang (1998)
• 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length
• 5 different tree-shapes were investigated• Observations:
‘Farris Zone’: MP
better
‘Felsenstein Zone’: ML better
The optimal length for the interior edge ranges
between 0.015 and 0.025.
Tree length
Rec.
Pro
b.
Mareike Fischer
Our Approach
• Limitation: Most previous approaches are based on simulations.
• Our approach: Mathematical analysis of influence of branch lengths on tree reconstruction.
• We investigate MP first and consider other methods afterwards.
Mareike Fischer
Already known
x
y
y
y
y
Here, the number k of
characters needed to
reconstruct the true
tree grows at rate .
But what happens if we fix the ratio (y:=px), and thentake the value of x that minimizes k?
Steel and Székely (2002):
Mareike Fischer
Our Approach
Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, 2-state symmetric model.
x
px
px
px
px
Mareike Fischer
Main Result
k grows at least at rate p2
For the optimal value of x, k grows at rate p2
For ‘reliable’ MP reconstruction:
Mareike Fischer
Idea of Proof: 1. Applying the CLT
. Then (by CLT)
SetXi i.i.d., and
Note that the true tree T1 will be favored over T2 if and only if Zk>0.
Mareike Fischer
Idea of Proof: 2. The Hadamard Representation
Since the Xi are i.i.d., μk and σk depend only on k and the probabilities P(X1=1) and P(X1=-1).
These probabilities can using the ‘Hadamard Representation’:
(Here, θ=e-
2x.)
Thus, for fixed p, the ratio
to find a value of x that minimizes k.
Note that P(X1=1) and P(X1=-1) only depend on x and p.
can be used
Mareike Fischer
Summary and Extension
• For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p2.
• Can other methods do better (e.g. rate p)? No! [Can be shown using the ‘Hellinger distance’.]
Mareike Fischer
Outlook
Questions for future work:• What happens when you approach the
‘Felsenstein Zone’?
• What happens in general with different tree shapes or more taxa?
Mareike Fischer
Thanks…
… to my supervisor Mike Steel,
… to the Newton Institute for organizing this great conference,
… to the Allan Wilson Centre for financing my research,
… to YOU for listening or at least waking up early enough to read this message .
Mareike Fischer
The only true tree…
Merry Christmas!
… is a Christmas tree .
(And it does not even require reconstruction!)