linear programming for phylogenetic reconstruction based …stelo/cpm/cpm05/cpm05_10_2_tang.pdf ·...

70
Linear Programming for Phylogenetic Reconstruction Based on Gene Rearrangements Jijun Tang [email protected] Department of Computer Science and Engineering University of South Carolina – p. 1/3

Upload: ngothuy

Post on 25-Jul-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Linear Programming for

Phylogenetic Reconstruction

Based on Gene Rearrangements

Jijun [email protected]

Department of Computer Science and EngineeringUniversity of South Carolina

– p. 1/30

Page 2: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Acknowledgment

• Joint work with Bernard Moret (University ofNew Mexico).

• Supported by National ScienceFoundation and U. of South Carolina.

– p. 2/30

Page 3: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Overview

• Introduction to gene-order data

• GRAPPA and the computational challenge

• Linear programming setup

• Experimental design

• Experimental results

• Conclusions

– p. 3/30

Page 4: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

What Is A Phylogeny?

– p. 4/30

Page 5: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

What Is A Phylogeny?

• The evolutionary history of a group oforganisms

– p. 4/30

Page 6: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

What Is A Phylogeny?

• The evolutionary history of a group oforganisms

• Usually takes the form of a tree:• Modern organisms are placed at the leaves

• Edges denote evolutionary relationships

– p. 4/30

Page 7: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Example

– p. 5/30

Page 8: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Gene-Order Data

– p. 6/30

Page 9: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Gene-Order Data

• Chromosome can be represented by anordering of signed genes• Linear or circular

• Sign of a gene represents gene orientation

– p. 6/30

Page 10: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Gene-Order Data

• Chromosome can be represented by anordering of signed genes• Linear or circular

• Sign of a gene represents gene orientation

• The gene order can be rearranged byevolutionary events such as:• Inversion, transposition and inverted transposition

• Deletion and insertion

– p. 6/30

Page 11: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Gene-Order Rearrangements

12

3 7

4 65

8

7

85

6

1

43

2

7

85

6

1

−4−3

−2

1

7

65

8−4

−3

−2

InversionInverted Transposition

Transposition

– p. 7/30

Page 12: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Reconstruction Methods

– p. 8/30

Page 13: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Reconstruction Methods

• Distance based methods:Neighbor-joining and its variants

– p. 8/30

Page 14: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Reconstruction Methods

• Distance based methods:Neighbor-joining and its variants

• Bayesian method:Badger

– p. 8/30

Page 15: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Reconstruction Methods

• Distance based methods:Neighbor-joining and its variants

• Bayesian method:Badger

• Maximum parsimony based on encoding:MPBE, MPME

– p. 8/30

Page 16: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Reconstruction Methods

• Distance based methods:Neighbor-joining and its variants

• Bayesian method:Badger

• Maximum parsimony based on encoding:MPBE, MPME

• Direct optimization method:BPAnalysis, GRAPPA, MGR

– p. 8/30

Page 17: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Direct Optimization Methods

– p. 9/30

Page 18: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Direct Optimization Methods

• Goal: to reconstruct phylogeny withminimum # of rearrangement events

– p. 9/30

Page 19: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Direct Optimization Methods

• Goal: to reconstruct phylogeny withminimum # of rearrangement events

• Computationally hard even for only threegenomes• Median problem for three is NP hard under general

distance definition• Find the content of the median genome

to minimize the sum of the distances fromthe median to the three genomes

– p. 9/30

Page 20: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Reconstruction Example

12 11

12

−8

−5

−4−3

9−7

−610

12−5

−4

−9 −8−7−6

1011

12

−3

12

89

1011

12

−5

−7−6 4 3

1 1211

9

2−5

−7−6 4 −8

−3

10

1211

10

98

4−6−7

−52 1

3

1 1211

109

876

2−5

−4−3

−8−9

7−65

4

3 10−2−1−11

−12

3

45 6 7 8

9

10−2−1−12

−11

2 1

10

84

1211

765

9

3

−7 6−5

−410

−2−1

98

3−12

−11

4

3

7−65−810

−2−1

−9

−12−11

45

68

−910

−2−1−3

1211

7

(1,3) (9)

(6,9) (4,7) (6) (8,9)

(6) (7,8)− −(4,9)

(11,2)(3,5)

– p. 10/30

Page 21: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

GRAPPA

– p. 11/30

Page 22: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

GRAPPA

• Genome Rearrangements Analysis underParsimony and other PhylogeneticAlgorithms

– p. 11/30

Page 23: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

GRAPPA

• Genome Rearrangements Analysis underParsimony and other PhylogeneticAlgorithms

• Started as an effort to reimplement theBPAnalysis of Sankoff and Blanchette

– p. 11/30

Page 24: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

GRAPPA

• Genome Rearrangements Analysis underParsimony and other PhylogeneticAlgorithms

• Started as an effort to reimplement theBPAnalysis of Sankoff and Blanchette

• Used algorithmic techniques to improvethe speed• A tightened lower bound to discard bad trees

before scoring them

• Profiling, cache awareness, etc

– p. 11/30

Page 25: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Algorithm Outline

– p. 12/30

Page 26: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Algorithm Outline

• Consider each tree topology in turn

– p. 12/30

Page 27: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Algorithm Outline

• Consider each tree topology in turn

• For each tree• Test the lower bound, if it exceeds the best so far,

continue to the next tree

• Initialize the internal nodes by some means

• Compute medians of three iteratively until nochange occurs

– p. 12/30

Page 28: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Algorithm Outline

• Consider each tree topology in turn

• For each tree• Test the lower bound, if it exceeds the best so far,

continue to the next tree

• Initialize the internal nodes by some means

• Compute medians of three iteratively until nochange occurs

• Return the lowest score tree

– p. 12/30

Page 29: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Scoring a Tree

� �� �� �� �� �� �

� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �

� �� �� �

� �� �� �

� �� �� �

� �� �� �� �� �� �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �C

1

3 4 52

B

A

– p. 13/30

Page 30: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Scoring a Tree

� �� �� �� �� �� �

� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �

� �� �� �

� �� �� �

� �� �� �

� �� �� �� �� �� �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C

1

3 4 52

B

A

– p. 13/30

Page 31: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Scoring a Tree

� �� �� �� �� �� �

� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �

� �� �� �

� �� �� �

� �� �� �

� �� �� �� �� �� �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C

1

3 4 52

B

A

– p. 13/30

Page 32: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Scoring a Tree

� �� �� �� �� �� �

� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �

� �� �� �

� �� �� �

� �� �� �

� �� �� �� �� �� �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C

1

3 4 52

B

A

– p. 13/30

Page 33: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Scoring a Tree

� �� �� �� �� �� �

� �� �� �� �� �� �� � �� � �� � �� �� �� �� �� �� �� �� �� �

� �� �� �

� �� �� �

� �� �� �

� �� �� �� �� �� �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � � � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �

� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � �C

1

3 4 52

B

A

– p. 13/30

Page 34: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Computational Challenge

– p. 14/30

Page 35: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Computational Challenge

• Scoring a tree is very expensive

– p. 14/30

Page 36: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Computational Challenge

• Scoring a tree is very expensive

• When the genomes are distant, a medianmay take days or months to be solved

– p. 14/30

Page 37: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Computational Challenge

• Scoring a tree is very expensive

• When the genomes are distant, a medianmay take days or months to be solved

• It needs to solve the median problemsiteratively

– p. 14/30

Page 38: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Computational Challenge

• Scoring a tree is very expensive

• When the genomes are distant, a medianmay take days or months to be solved

• It needs to solve the median problemsiteratively

• Can we find the tree score without solvingthe median problems?

– p. 14/30

Page 39: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Linear Programming Approach

– p. 15/30

Page 40: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Linear Programming Approach

• Goal: minimize the tree length

– p. 15/30

Page 41: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Linear Programming Approach

• Goal: minimize the tree length

• What do we know?

– p. 15/30

Page 42: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Linear Programming Approach

• Goal: minimize the tree length

• What do we know?• The pairwise distance matrix

• A given tree topology

– p. 15/30

Page 43: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Linear Programming Approach

• Goal: minimize the tree length

• What do we know?• The pairwise distance matrix

• A given tree topology

• Approach:• Finding useful constraints

• Using linear programming method to minimize thetree length

– p. 15/30

Page 44: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Median Problem

– p. 16/30

Page 45: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Median Problem

23

1

2 3

0

d12

d10 d13

d30d20

d

d01 + d02 + d03 ≤d12 + d23 + d13

2

– p. 16/30

Page 46: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Median Problem

23

1

2 3

0

d12

d10 d13

d30d20

d

d01 + d02 + d03 ≤d12 + d23 + d13

2

More than 98% cases we have

d01 + d02 + d03=d12 + d23 + d13

2

– p. 16/30

Page 47: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Constraint on Internal Node

d

A

M

C

BA,Bd

kd

A,Cdk+2d

k+1d

B,C

∀M, dk + dk+1 + dk+2 =dA,B + dA,C + dB,C

2

– p. 17/30

Page 48: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Equations

– p. 18/30

Page 49: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Equations

d

1

N+1

2

N+2 2N−3

N

N−1

2N−2

1,2d1d

1,N+2d

2d

2,N+2d3d

2N−3,Nd

2N−3d

2N−5d

2N−3,N−1d

2N−4d

N−1,N

– p. 18/30

Page 50: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Equations

d

1

N+1

2

N+2 2N−3

N

N−1

2N−2

1,2d1d

1,N+2d

2d

2,N+2d3d

2N−3,Nd

2N−3d

2N−5d

2N−3,N−1d

2N−4d

N−1,N

d1 + d2 + d3 =d1,2 + d2,N+2 + d1,N+2

2

· · ·

d2N−5 + d2N−4 + d2N−3 =d2N−3,N−1 + dN−1,N + d2N−3,N

2

– p. 18/30

Page 51: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Problems

d

1

N+1

2

N+2 2N−3

N

N−1

2N−2

1,2d1d

1,N+2d

2d

2,N+2d3d

2N−3,Nd

2N−3d

2N−5d

2N−3,N−1d

2N−4d

N−1,N

– p. 19/30

Page 52: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Problems

d

1

N+1

2

N+2 2N−3

N

N−1

2N−2

1,2d1d

1,N+2d

2d

2,N+2d3d

2N−3,Nd

2N−3d

2N−5d

2N−3,N−1d

2N−4d

N−1,N

• There are ≈ 5N variables,but only N − 2 equations · · ·

– p. 19/30

Page 53: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Problems

d

1

N+1

2

N+2 2N−3

N

N−1

2N−2

1,2d1d

1,N+2d

2d

2,N+2d3d

2N−3,Nd

2N−3d

2N−5d

2N−3,N−1d

2N−4d

N−1,N

• There are ≈ 5N variables,but only N − 2 equations · · ·

• There are many (and redundant) triangular inequations

– p. 19/30

Page 54: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Inequality Equations

• We want to pick up a minimum number ofinequations to cover all the variables

• We know only the distance matrix and treetopology

• Choices:for each pair of genomes, find the two shortest pathsfrom one to another, and build one inequation for eachpath

– p. 20/30

Page 55: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Inequality Equations

d

1

N+1

2

N+2 2N−3

N

N−1

2N−2

1,2d1d

1,N+2d

2d

2,N+2d3d

2N−3,Nd

2N−3d

2N−5d

2N−3,N−1d

2N−4d

N−1,N

d1,2 ≤ d1 + d3

dN−1,N ≤ d2N−4 + d2N − 3

· · ·

d1,N−1 ≤ d1,N+2 + · · · + d2N−3,N−1

d1,N−1 ≤ d1,N+2 + · · · + d2N−5,+d2N−4

– p. 21/30

Page 56: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Sum-up

• Examine every tree

• For each tree (with N genomes)• Minimize the sum of 2N − 3 edge lengths• ≈ 5N variables total• N − 2 equations, < 2N(N − 1) inequations

• These numbers are relatively small if N < 20

• Use lp_solve to find the length of the tree

• Return tree(s) with the minimum length

– p. 22/30

Page 57: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Experimental Design

• Real datasets—limited samples

• Simulation• Generate a tree (true tree) from different

topologies: uniform, birth-death, · · ·• Assign edge lengths based on the expected

evolutionary rate• Assign gene content to each genome based on the

edge length• Use GRAPPA to find a tree (inferred tree)

• Compare inferred tree with true tree to determinethe accuracy

– p. 23/30

Page 58: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Topological Accuracy

– p. 24/30

Page 59: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Topological Accuracy

• False positive:an edge is in the inferred tree,not in the true tree

• False negative:an edge is in the true tree,not in the inferred tree

– p. 24/30

Page 60: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Topological Accuracy

• False positive:an edge is in the inferred tree,not in the true tree

• False negative:an edge is in the true tree,not in the inferred tree

Goal: to minimize FP and FN

– p. 24/30

Page 61: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Simulation Details

• Number of genomes (N ): 12

• Number of genes (n): 200, 500 and 1000

• Expected # of events on each edge:0.05n − 0.15n

• Tree topologies: uniform and birth-death

• Datasets on each combination: 10

– p. 25/30

Page 62: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Simulation Details

• Number of genomes (N ): 12

• Number of genes (n): 200, 500 and 1000

• Expected # of events on each edge:0.05n − 0.15n

• Tree topologies: uniform and birth-death

• Datasets on each combination: 10

– p. 25/30

Page 63: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Simulation Details

• Number of genomes (N ): 12

• Number of genes (n): 200, 500 and 1000

• Expected # of events on each edge:0.05n − 0.15n

• Tree topologies: uniform and birth-death

• Datasets on each combination: 10

– p. 25/30

Page 64: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Simulation Details

• Number of genomes (N ): 12

• Number of genes (n): 200, 500 and 1000

• Expected # of events on each edge:0.05n − 0.15n

• Tree topologies: uniform and birth-death

• Datasets on each combination: 10

– p. 25/30

Page 65: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Simulation Details

• Number of genomes (N ): 12

• Number of genes (n): 200, 500 and 1000

• Expected # of events on each edge:0.05n − 0.15n

• Tree topologies: uniform and birth-death

• Datasets on each combination: 10

– p. 25/30

Page 66: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

FN (500 genes, BD tree)

20

15

10

5

0 72 64 56 48 40 32 24

FN

rat

e (n

=50

0)

r

NJLP

– p. 26/30

Page 67: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

FP (500 genes, BD tree)

20

15

10

5

0 72 64 56 48 40 32 24

FP

rat

e (n

=50

0)

r

NJLP

– p. 27/30

Page 68: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

FN (1000 genes, uniform tree)

25

20

15

10

5

0 144 128 112 96 80 64 48

FN

rat

e (n

=10

00)

r

NJLP

– p. 28/30

Page 69: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

FP (1000 genes, uniform tree)

25

20

15

10

5

0 144 128 112 96 80 64 48

FP

rat

e (n

=10

00)

r

NJLP

– p. 29/30

Page 70: Linear Programming for Phylogenetic Reconstruction Based …stelo/cpm/cpm05/cpm05_10_2_Tang.pdf · 2005-07-08 · Linear Programming for Phylogenetic Reconstruction Based on Gene

Conclusion

• Linear programming gives us a new andaccurate method for difficult datasets

• Can be applied to any distance

• Has potential to be used for large andcomplex genomes

• Can be extended to solve the medianproblems

– p. 30/30