dcjuc: a maximum parsimony simulator for constructing phylogenetic tree of genomes with unequal...
TRANSCRIPT
![Page 1: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/1.jpg)
DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents
Zhaoming YinBader-Polo Joint Group Meeting, Nov 11, 2013
![Page 2: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/2.jpg)
Contribution
• Research Aspect
-A framework to solve the maximum parsimonious tree with the input of unequal genome contents.
-Proved Adequate subgraph theory is applicable in unequal contents data which reduces search space.
-provide a benchmark for the HPC community.
• Engineering Aspect
-Implement software with many state of the art features such as supertree method, GAS initialization method, spectral partition etc.
-The software can produce a tree with not only topologies, but also type/number of different evolution events (visualization!).
![Page 3: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/3.jpg)
Why Phylogenetic Tree Problem is Hard?• For N genomes, there are (N-3)!! number of
possible tree topologies.• For each topology, we need to compute at least
one different median, the possible median order are (g-2)!! . g is the number of genes.
• To validate each possible median, if the gene content has duplications, it’s NP hard.
• So the complexity type of computing the MP tree with uneuqal contents genomes is:
NP hard over NP hard over NP hard!
![Page 4: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/4.jpg)
Phylogenetic Tree
This picture presents the phylogeny of the “12 Drosophila.”
From http://insects.eugenes.org/species
![Page 5: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/5.jpg)
Maximum Parsimony Concept
5
1
23
4
13 2
4
6 5 6
5
1 4 2 3
6
Of all possible topologies, the maximum parsimonious tree is the one that has the minimum total tree length
![Page 6: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/6.jpg)
Genome Rearrangement
http://ai.stanford.edu/~serafim/CS374_2006/presentations/lecture17.ppt
![Page 7: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/7.jpg)
Genome RearrangementIn 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution.
1 2 3 4 5 6 7 8 9 10
1 2 –6 –5 -4 -3 7 8 9 10
1 2 7 8 3 4 5 6 9 10
1 2 7 8 –6 -5 -4 -3 9 10
Inversion:
Transposition:
Inverted Transposition:
![Page 8: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/8.jpg)
Genome Median Computation
5
1
23
4
14 2
3
65 6
1
2
3
5
4
6
1
2
3
5
4
6
![Page 9: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/9.jpg)
Genome Median Computation
1
2
3
5
4
6
1,2,3
1,-3,-2-2,-1,3
1,2,3 = 2 moves2,-1,3 = 5 moves…..
![Page 10: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/10.jpg)
Step 1: Spectral Partition
![Page 11: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/11.jpg)
Step 2: Compute MP Tree for Each Sub-Disk
![Page 12: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/12.jpg)
Step 2-1: How to Compute Median (BNB)
1
2
3 45
6
78
1
2
3 45
6
78
1
2
3 45
6
78
1
2
3 45
6
78
1
2
3 45
6
78
1
2
3 45
6
78
1
2
3 45
6
78
1
2
3 45
6
78
![Page 13: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/13.jpg)
Step 2-2: How to Compute Median (LK)
………………….
stop
![Page 14: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/14.jpg)
Step 2-2: How to Evaluate Median
1
med1, 2, 3, 3, 4, 6, 5
1, 2, 3, 4, 3, 6, 5
1, 2, 3, 4, 6, 3, 5
1, 2, 5, 4, 6, 3, 3
Dis(m,1)+Dis(m,2)+Dis(m,3)
23
![Page 15: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/15.jpg)
Step 2-2: How to Evaluate Median
1, 2, 3, 3, 4, 6, 5
1, 2, 3, 4, 3, 5
Find a mapping first (NP hard) dis=1
1, 2, 3, 3, 4, 6, 5
-2, -1, 3, 3, 4, 5
Complete the loss (polynomial) dis =2
1, 2, 3, 4, 6, 5
-2, -1, 3, 4, 6, 5
Compute DCJ (polynomial) dis =3
1, 2, 3, 4, 6, 5
1, 2, 3, 4, 6, 5
![Page 16: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/16.jpg)
Step 3: Merge Disks
Decomposition of The disks
Construct a tree for each disk
Merge the tree usingA specific consensus method:Strict, majority etc…
Disambiguation
![Page 17: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/17.jpg)
Step 4: Initialization
1
2
3
5
4
6
X
1 2
c
b
e
d
Init by insertionWhich is local
Init by prospectionWhich is global.
![Page 18: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/18.jpg)
Step5: Iterative Refinement
12
3 4
a
b
![Page 19: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/19.jpg)
Review
• Step 1: Spectral partition• Step 2: Subtree construction• Step 3: Supertree merge• Step 4: Initialization of complete tree using
General Adequate Subgraph (GAS) method.
• Step 5: Iterative Refinement until the complete tree converged.
![Page 20: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/20.jpg)
Result—Simulated Data
seed#Theta+#gamma+#phi operations
We know the total number of evolution event in the model tree
We grow our own tree
![Page 21: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/21.jpg)
Result--Accuracy
%of duplication 0.1% of loss 0.1Theta is % of inversion
There are 8 species2*8-3 =13edges.So the average accuracy is ~90%
![Page 22: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/22.jpg)
Result – Real Data
SCRaMbLE Matrix
• We can represent a SCRaMbLEd strain by its vector.• The sign gives the orientation. • The color encodes the position in the synthetic chromosome.
![Page 23: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/23.jpg)
Result – Real Data
#inversion:#insertion/deletion:#duplication
![Page 24: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/24.jpg)
Parallel Method [Bader 05]
Parallel search
Load Balancing
![Page 25: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/25.jpg)
Experimental Results (Parallel)
![Page 26: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/26.jpg)
Why Many-core BnB?
• So many distributed memory MIP BnB frameworks (PICO, PEBBL, ALPS, COIN-OR).
• Load balance of distributed BnB is highly relied on Ramp up, run time load balancing is not efficient.
• But nowadays Peta-flops machines are mostly hybrid systems(distributed + many-core (or accelerators)).
![Page 27: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/27.jpg)
Experimental Results (Intel Phi knapsack)
![Page 28: DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cad5503460f94970297/html5/thumbnails/28.jpg)