phylogenetic workflows
DESCRIPTION
Phylogenetic Workflows: Tree Building and Post-tree AnalysesGiven at the Dept for Ecology and Evolutionary Biology, University of Arizona in 2011A phylogenetic workflow example showcasing iPlant CyberinfrastructureTRANSCRIPT
Phylogenetic Workflows:Tree Building andPost-tree Analyses
Naim MatasciThe iPlant Collaborative
Plant Biology 2011August 6-10, 2011
Why is the tree of life important?
“Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”
Nothing in biology makes sense except in the light of evolution.
T. G. Dobzahnsky
Scalability
Ackerly, 2009; J. Felsenstein, ca. 1980; Ranger Cluster at TACC
iPlant Tree of Life Grand Challenge
Large phylogenetic inferenceBuilding a tree of life for up to 500,000 green plants
Tree VisualizationScalable visualization for small to large trees
Data Assembly and IntegrationAcquisition, organization and processing the data
Taxonomic IntelligenceSorting out different names for the same species
Tree ReconciliationResolving discordant gene and species trees
Trait EvolutionUsing trees to understand how traits evolved
Ancestral state of Hawaiian lobelioids
Lobelia niihauensis (Image: David Eickhoff)
Cyanea leptostegia (Image: Karl Magnacca)
(Schulter et al. 1997, Paradis 2004)
Continuous Ancestral Character Estimation
?
Obtain sequences
•GetSeq
Obtain sequences
•GetSeq
Align sequences
•Muscle
Align sequences
•MuscleBuild Tree
•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Visualize Tree
•iPlant Tree Viewer
Visualize Tree
•iPlant Tree Viewer
Integrate Data
•Lopper•TNRS
Integrate Data
•Lopper•TNRS
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Obtain sequences
•GetSeq
Obtain sequences
•GetSeq
Align sequences
•Muscle
Align sequences
•MuscleBuild Tree
•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Visualize Tree
•iPlant Tree Viewer
Visualize Tree
•iPlant Tree Viewer
Integrate Data
•Lopper•TNRS
Integrate Data
•Lopper•TNRS
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
>gi|1835233|emb|Z83147.1| S.nepaulensis rbcL geneTTATTATACTCCTGAATAYGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTGCTCAGCCTGGAGTTCCACCCGAAGAAGCGGGGGCCGCGGTAGCTGCGGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAACCTTGATCGTTACAAAGGGCGATGCTACAACATAGAGCCCGTTGCTGGAGAAGAAAATCAATTTATTGCTTATGTAGCTTATCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTACTGCGTATTGTAAAACTTTCCAAGGACCGCCTCATGGGATCCAAGTTGAAAGAGATAAATTGAACAAGTATGGTCGTCCCTTGCTGGGATGTACTATTAAACCTAAATTGGGGTTATCGGCTAAAAACTACGGTAGAGCAGTTTATGAATGTCTACGCGGTGGGCTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGACCGTTTCGTATTTTGTGCCGAAGCAATTTTTAAAGCACAGTCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCTACTGCAGGTACATGTGAAGAAATGATGAAAAGGGCTATATTT
>gi|1835227|emb|Z83136.1| S.foetidissimum rbcL geneAAGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTGACTTATTATACTCCTGACTATGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCAGGGGCCGCGGTAGCTGCCGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAGCCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTNGCTGGAGAAGAAAATCAATATATTGCTTATGTAGCTTATCCTTTAGACCTYTTTGAAGAAGGTTCTGTTACTAATATGTKNACTTCCATTGTGGGGAATGTATTTGGGTTCAAAGCCCTGCGTGCTTTACGTCTGGAAGATCTGCGAATCCCTCCTGCGTATTCTAAAACTTTCCAAGGACCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAACAAGTACGGTCGTCCCCTGTTGGGATGTACTATTAAACCTAAATTGGGGTTATCTGCTAAAAACTACGGTAGAGCGGTTTATGAATGTCTCCGCGGTGGACTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGATCGTTTCTTATTTTGTGCCGAAGCACTTTATAAAGCACAGGCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCT
>gi|1834456|emb|Z83132.1| G.urceolata rbcL geneAACTAAAGCGGGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTAACTTATTATACTCCTGACTATGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCGGGGGCCGCCGTAGCTGCCGAATCCTCCACTGGTACATGGACAACTGTGTGGACCGACGGACTTACTAGCCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTGGCTGGAGAAGAAAATCAATTTATTGCTTATGTAGCTTACCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTGTTGCGTATGCTAAAACTTTCCAAGGGCCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAATAAGTATGGTCGTCCCCTG
Get Sequences
• Retrieves nucleotide and amino acid sequences from NCBI's GenBank
• Automatically includes species name and taxon ID
Get sequences DEMO
Obtain sequences
•GetSeq
Obtain sequences
•GetSeq
Align sequences
•Muscle
Align sequences
•MuscleBuild Tree
•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Visualize Tree
•iPlant Tree Viewer
Visualize Tree
•iPlant Tree Viewer
Integrate Data
•Lopper•TNRS
Integrate Data
•Lopper•TNRS
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
muscleDEMO
Obtain sequences
•GetSeq
Obtain sequences
•GetSeq
Align sequences
•Muscle
Align sequences
•MuscleBuild Tree
•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Visualize Tree
•iPlant Tree Viewer
Visualize Tree
•iPlant Tree Viewer
Integrate Data
•Lopper•TNRS
Integrate Data
•Lopper•TNRS
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Improved Tree Building Tools
NINJA/WINDJAMMER (Travis Wheeler)Neighbor-Joining implementation that can analyze > 200K species
Six day run time reduced 32-fold to 4.5 hours for 220K species data set
Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set
RAxML-Light (Alexandros Stamatakis)
Large Scale Maximum Likelihood implementation
55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)
RAxML DEMO
Obtain sequences
•GetSeq
Obtain sequences
•GetSeq
Align sequences
•Muscle
Align sequences
•MuscleBuild Tree
•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Visualize Tree
•iPlant Tree Viewer
Visualize Tree
•iPlant Tree Viewer
Integrate Data
•Lopper•TNRS
Integrate Data
•Lopper•TNRS
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Tree Visualization
• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information
iPlant Tree Viewer
http://portnoy.iplantcollaborative.org/
Live tree view demo
Obtain sequences
•GetSeq
Obtain sequences
•GetSeq
Align sequences
•Muscle
Align sequences
•MuscleBuild Tree
•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Visualize Tree
•iPlant Tree Viewer
Visualize Tree
•iPlant Tree Viewer
Integrate Data
•Lopper•TNRS
Integrate Data
•Lopper•TNRS
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Obstacles
Lopper DEMO
Lobelia kauaensisLobelia villosaGaleatella gloria-montisTrematolobelia kauaiensisTrematolobelia macrostachysLobelia hypoleucaNeowimmeria yuccoidesLobelia niihauensisBrighamia insignisBrighamia rockiiDelissea rhytidospermaDelissea subcordataCyanea acuminataCyanea hirtellaCyanea coriaceaDelissea leptostegiaClermontia kakeanaClermontia parvifloraClermontia arborescensClermontia fauriei
The TNRS: A Taxonomic Name Resolution Service for Plants
Tonight from 5:30 - 7:30 in Exhibit Hall A.Poster number P21011.
Obtain sequences
•GetSeq
Obtain sequences
•GetSeq
Align sequences
•Muscle
Align sequences
•MuscleBuild Tree
•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)
Visualize Tree
•iPlant Tree Viewer
Visualize Tree
•iPlant Tree Viewer
Integrate Data
•Lopper•TNRS
Integrate Data
•Lopper•TNRS
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood
CACE DEMO