perfect phylogeny mle for phylogeny lecture 14

29
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1

Upload: jola

Post on 11-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Perfect Phylogeny MLE for Phylogeny Lecture 14. Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1. Final Exam Details. The Final Exam will take Place on Thursday, 3.2.04, 0900, at Taub 4. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

.

Perfect Phylogeny MLE for Phylogeny

Lecture 14

Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1

ינואר 2005: הוספתי שקפים מparsimony בסוף ההרצאה.
Page 2: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

2

Final Exam Details

The Final Exam will take Place on Thursday, 3.2.04, 0900, at Taub 4.

Allowed Material: Course&Tutorial slides+ the textbooks of the course (Durbin et el, Setubal&Meidanis, Gusfield).

Page 3: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

3

2. The perfect phylogeny problem

A character is assumed to be a property which distinguishes between species (e.g. dental structure).

A characters state is a value of the character (human dental structure).

Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.

Page 4: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

4

Characters as Colorings

A coloring of a tree T=(V,E) is a mapping C:V [set of colors]

A partial coloring of T is a mapping defined on a subset of the vertices U V:

C:U [set of colors]

U=

אחרי ההרצאה שיניתי את הצגת ה"קמירות"
Page 5: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

5

Each character defines a (partial) coloring of the correspondeing phylogenetic tree:

Characters as Colorings (2)

Species ≡ VerticesStates ≡ Colors

Page 6: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

6

Convex Colorings (and Characters)

C

Definition: A (partial/total) coloring of a tree is convex iff its d-carriers are mutually disjoint

Let T=(V,E) be a partially colored tree, and d be a color. The d-carrier is the minimal subtree of T containing all vertices colored d

Page 7: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

7

A character is Homoplasy free (avoids reversal and convergence transitions)

The corresponding (partial) coloring is convex

Convexity Homoplasy Freedom

Page 8: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

8

The Perfect Phylogeny Problem

Input: a set of species, and many characters. Question: is there a tree T containing the species

as vertices, in which all the characters (colorings) are convex?

(always possible for one chracter)

Page 9: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

9

Input: Partial colorings (C1,…,Ck) of a set of vertices U (in the example: 3 total colorings: left, center, right, each by two colors).

Problem: Is there a tree T=(V,E), s.t. UV and for i=1,…,k,, Ci is a convex (partial) coloring of T?

RBRRRRBBRRRB

The Perfect Phylogeny Problem(pure graph theoretic setting)

NP-Hard In general, in P for some special cases

Page 10: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

10

Perfect Phylogeny for a 0-1 MatrixRows correspond to objects, columns to characters.

Each character has two states: 0 (non exists) or 1 (exists).

A tree T is a perfect phylogeny for the matrix iff it has the following properties:

A. Each of the n objects corresponds to a leaf of T.

B. Each of the m characters labels exactly one edge of T.

C. Object p has character i i labels an edge on the path from p to the root.

Note: [B and C] [each character is convex on T]

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0A

E

D

C

BC4

C3 C2

C1

C5

consider associating the characters with the LCA of the tree having them (instead of with edges) - this will make the construction of the tree in the efficient algorithm be given dircectly by the linked lists.
Page 11: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

11

Perfect Phylogeny for a 0-1 MatrixBy the definition, for each character C there is one edge in which it is

converted from 0 to 1. In the below tree, the edge on which character C2 is converted to 1 is marked. The resulted tree is convex for this character.

C1 C2 C3 C4 C5

A 1

B 0

C 1

D 0

E 1A

E

D

C

B

C2

Page 12: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

12

The (Binary) Perfect Phylogeny Problem

Problem: Given a 0-1 matrix M, determine if it has a perfect phylogeny in which the root has 0 for all characters, and construct one if it does. (Note: edges are labeled by characters: edge labeled by i represent changing character i’s state from 0 to 1). As we show below, the answer is yes for our matrix:

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0A

ED

C

B

C4

C3 C2

C1

C5

Page 13: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

13

Efficient algorithm for the Binary Perfect Phylogeny Problem

Definition: Given a 0-1 matrix M, Ok={j:Mjk=1}, ie: Ok is the set of objects that have character Ck.

Theorem: M has a perfect phylogenetic tree iff the sets {Oi} are laminar, ie: for all i, j, either Oi and Oj are disjoint, or one includes the other.

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 1

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 1

Laminar Not Laminar

Page 14: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

14

Proof

: Assume M has a perfect phylogeny, and let Ci, Cj be given.

Consider the edges labeled Ci and Cj.

Case 1: There is a root to leaf path containing both edges. Then one is included in the other (C2 and C1 below).

Case 2: not case 1. Then they are disjoint (C2 and C3).

A

ED

C

B

C4

C3 C2

C1

C5

Page 15: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

15

Proof (cont.)

: Assume for all i, j, either Oi and Oj are disjoint, or one includes the other. We prove by induction on the number of characters that M has a perfect phylogenetic tree for the matrix.

Basis: one character. Then there are at most two objects, one with and one without this character.

C1

A 1

B 0

C1

AB

Page 16: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

16

Proof (cont.)

: Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O1 is not contained in Oj for j > 1.

Let S1 be the set of objects j for which Mj1= 1, and S2 be the remaining objects. Then each character belongs to objects in S1 or S2, but not both (prove!). By induction there are trees T1 and T2 for S1 and S2. Combining them as below gives the desired tree.

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 1 0 0 0 0

T1 T2

1

Page 17: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

17

Efficient Implementation1 Sort the columns (characters) by decreasing value when considered as binary numbers. (Time complexity: O(mn), using radix sort).

Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj.

Proof: Oi – Oj > 0 means the 1’s in Oi are not covered by the 1’s in Oj.

C1 C2 C3 C4 C5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 0 1

D 0 0 1 1 0

E 0 1 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Page 18: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

18

Efficient Implementation(2)2. Make a backwards linked list of the 1’s in each row (leftmost 1 in each row points at itself). Time complexity: O(mn).

C2 C1 C3 C5 C4

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column. Can be checked in O(mn) time.

Page 19: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

19

Examples

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 1 1 0

laminarNot laminar

Page 20: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

20

Efficient Implementation(3)3. When the matrix is laminar, the tree edges corresponding to characters are defined by the backwards links in the matrix.

C2 C1 C3 C5 C4

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0A

ED

C

B

C4

C3 C2

C1

C5

remaining edges and leaves are determined by the characters of each object. Needs O(mn) time.

Page 21: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

21

A scenario where Maximum Parsimony (and Perfect Phylogeny) are misleading

A

A A

1 4

32

Consider a model with 4 letters (DNA), where the probability for a substitution is proportional to time.

In the following topology, 2 and 3 are likely to be like the origin, but 4 and 5 can be different. In this case, Maximum Parsimony is misleading.

Page 22: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

22

Parsimony may be useless/misleading

AACG

AGGG

I UninformativeII UninformativeIII Uninformative

A

A A

1 4

32

IV Misinformative

For leaves 1,4 there are 4 combinations of substitution. In the first three, all three topologies will obtain the same parsimony score.

In the fourth, a wrong topology will score best

Page 23: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

23

Parsimony may be UselessCase I

A

A A

1 4

32

AA

1

2

3

4

A

A

A

A

1

3

2

4

A

A

A

A

1

4

2

3

A

A

A

A

Score=0 Score=0 Score=0

Page 24: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

24

Parsimony Imay be uselessCase II

A

A A

1 4

32

GA

1

2

3

4

A

A

A

G

1

3

2

4

A

A

A

G

1

4

2

3

A

G

A

A

Score=1 Score=1 Score=1

Page 25: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

25

Parsimony may be misleadingCase III

A

A A

1 4

32

GC

1

2

3

4

A

A

C

G

1

3

2

4

A

A

C

G

1

4

2

3

A

G

C

A

Score=2 Score=2 Score=2

Page 26: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

26

Parsimony may be misleadingCase III

A

A A

1 4

32

CC

1

2

3

4

A

A

C

C

1

3

2

4

A

A

C

C

1

4

2

3

A

C

C

A

Score=2 Score=2 Score=1

Page 27: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

27

Parsimony may be misleading

A

C A

1 4

32

AC

C A

A

C A

1 4

32

AC

A A

Will infer correctly only in the rare case of a change on the central edge, or

In an even more rare case of a parallel change from A to C on the pendant edges to 1 and 2.

Page 28: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

28

3. Maximum Likelihood ApproachConsider the phylogenetic tree to be a stochastic process.

AGAGGA

AAAAAG

AAA AGA

AAA

The likelihood of transition from character a to charcter b is given by parameters b|a . The liklihood of a letter a in the root is qa.

Given the complete tree, its probability is defined by the values of the b|a ‘s and the qa’s.

Page 29: Perfect Phylogeny   MLE for Phylogeny   Lecture 14

29

Maximum Likelihood Approach(2)

When the data consists only of the leaves sequences (but the topology is fixed):

AGAGGA

AAAAAG

Write down the likelihood of the data (leaves sequences) given the tree. Use EM to estimate the b|a

parameters.

When the tree is not given: Search for the tree that maximizes Prob(data|Tree, EM)