guide trees and progressive multiple sequence alignment
DESCRIPTION
Guide Trees and Progressive Multiple Sequence Alignment. James A. foster And Luke Sheneman 1 October 2008 Initiative for Bioinformatics and Evolutionary Studies (IBEST). Multiple Sequence Alignment. Abstract representation of sequence homology - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/1.jpg)
JAMES A. FOSTERAnd Luke Sheneman
1 October 2008
INITIATIVE FOR BIOINFORMATICS AND EVOLUTIONARY STUDIES (IBEST)
Guide Trees and Progressive Multiple Sequence Alignment
![Page 2: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/2.jpg)
Multiple Sequence Alignment
Abstract representation of sequence homologyHomologous molecular characters
(nucleotides/residues) organized in columnsGaps (-) represent sequence indels
![Page 3: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/3.jpg)
Multiple Sequence Alignment
Many bioinformatics analyses depend on MSA.
First step in inferring phylogenetic trees MSA technique is at least as important as inference
method and model parameters (Morrison & Ellis, 1997)
Structural and functional sequence analyses
![Page 4: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/4.jpg)
Progressive Alignment
Idea: align “closely related” sequences first, two at a time with “optimal” subalignments (dynamic programming)
Problem: once a gap, always a gapAdvantage: fast
![Page 5: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/5.jpg)
Guide Trees and Alignment Quality
How important is it to find “good” guide trees?
How much time should be spent looking for “better” guide trees?
![Page 6: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/6.jpg)
Hypothesis
Guide trees that are closer to the true phylogeny lead to better sequence alignments
Guide trees that are further from the true tree produce less accurate alignments.
The effect is measurable.
The correlation is significant.
![Page 7: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/7.jpg)
Previous Work
Folk wisdom, intuition: it matters, a lot Basis for Clustal, and most other pMSA implementations
Nelesen et al. (PSB ’08): doesn’t matter, much No strong correlation No large effect
Edgar (2004): bad trees are sometimes better UPGMA guide trees ultrametric but outperform NJ
![Page 8: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/8.jpg)
Experimental Design: strategy
For both natural data and simulation data, with reliable alignments and phylogenies:
Explore the space of possible guide trees, moving outward from the “true tree” Use each tree as a guide tree, perform pMSA Compare quality of resulting alignment with known
optimal value
![Page 9: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/9.jpg)
Experimental Design: Naturally Evolved Case
![Page 10: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/10.jpg)
Experimental Design: Degrading Guide Trees
Random Nearest Neighbor Interchange (NNI) Swaps two neighboring internal branches
• Random Tree Bisect/Reconnect (TBR)• Randomly bisect tree• Randomly reconnect two trees
Images: hyphy.org
![Page 11: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/11.jpg)
TreeBASE (“natural”) Input Datasets
![Page 12: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/12.jpg)
Experimental Design: Simulated Evolution Case
![Page 13: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/13.jpg)
![Page 14: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/14.jpg)
![Page 15: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/15.jpg)
![Page 16: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/16.jpg)
![Page 17: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/17.jpg)
![Page 18: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/18.jpg)
Conclusions
Statistically significant correlation between guide tree quality and alignment quality Independent of tree transformation operator Independent of alignment distance metric
But very small absolute change in qualityNon-linear / logarithmic
Largest alignment quality effect 5-10 steps from phylogeny
The lesson: it helps to improve a really good guide tree, otherwise it helps but only a
little
![Page 19: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/19.jpg)
Acknowledgements
Dr. Luke Sheneman (mostly his slides!)
Faculty, staff, and students of BCB Jason Evans Darin Rokyta
Funding sources: NIH P20 RR16454 NIH NCRR 1P20 RR16448 NSF EPS 00809035
![Page 20: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/20.jpg)
Experimental Design: metrics
 =pmsa(S, T) where S is the set of input sequences where T is the guide tree (hidden parameters: pairwise algorithm, tie breaking
strategy)AQ = CompareAlignments(A*, Â)
QSCORE (A*, Â) -> TC-error, SP-error Nelesen had a nicer metric: error of estimated
phylogenyTdist = TreeDistance(T*, T)
Upper bound estimate of edit distance via NNI or TBR
![Page 21: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/21.jpg)
Alternative Scoring metric
Idea: “quality” of an alignment is distance from the phylogeny it produces to the “true” phylogeny
AQ = KTreeDist(ML_est(A*),ML_est( Â)) ML_est(A): max likelihood estimate of the phylogeny
behind MSA A (we used RAXML) KTreeDist(T1,T2): scales T2 to T2, measures Branch
Length Distance (Sorio-Kurasko et al. 07; Kuhner & Felsenstein 94)
Data sets: from L1 sequences in mammals, bats, humans, hand aligned A*
![Page 22: Guide Trees and Progressive Multiple Sequence Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56815dc7550346895dcbf341/html5/thumbnails/22.jpg)
All methods pretty are good
0.000
0.200
0.400
0.600
0.800
1.000K tree distance (ML to true)
hardMegabatXISTMisc
Clustal, Mafft (f), Mafft (s), Muscle (f), Muscle (s)