phylogenetic workflows

76
Phylogenetic Workflows: Tree Building and Post-tree Analyses Naim Matasci The iPlant Collaborative Plant Biology 2011 August 6-10, 2011

Upload: naim-matasci

Post on 09-May-2015

387 views

Category:

Technology


0 download

DESCRIPTION

Phylogenetic Workflows: Tree Building and Post-tree AnalysesGiven at the Dept for Ecology and Evolutionary Biology, University of Arizona in 2011A phylogenetic workflow example showcasing iPlant Cyberinfrastructure

TRANSCRIPT

Page 1: Phylogenetic Workflows

Phylogenetic Workflows:Tree Building andPost-tree Analyses

Naim MatasciThe iPlant Collaborative

Plant Biology 2011August 6-10, 2011

Page 2: Phylogenetic Workflows

Why is the tree of life important?

“Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”

Page 3: Phylogenetic Workflows

Nothing in biology makes sense except in the light of evolution.

T. G. Dobzahnsky

Page 4: Phylogenetic Workflows

Scalability

Ackerly, 2009; J. Felsenstein, ca. 1980; Ranger Cluster at TACC

Page 5: Phylogenetic Workflows

iPlant Tree of Life Grand Challenge

Large phylogenetic inferenceBuilding a tree of life for up to 500,000 green plants

Tree VisualizationScalable visualization for small to large trees

Data Assembly and IntegrationAcquisition, organization and processing the data

Taxonomic IntelligenceSorting out different names for the same species

Tree ReconciliationResolving discordant gene and species trees

Trait EvolutionUsing trees to understand how traits evolved

Page 6: Phylogenetic Workflows

Ancestral state of Hawaiian lobelioids

Lobelia niihauensis (Image: David Eickhoff)

Cyanea leptostegia (Image: Karl Magnacca)

Page 7: Phylogenetic Workflows
Page 8: Phylogenetic Workflows

(Schulter et al. 1997, Paradis 2004)

Continuous Ancestral Character Estimation

?

Page 9: Phylogenetic Workflows

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Page 10: Phylogenetic Workflows

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Page 11: Phylogenetic Workflows

>gi|1835233|emb|Z83147.1| S.nepaulensis rbcL geneTTATTATACTCCTGAATAYGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTGCTCAGCCTGGAGTTCCACCCGAAGAAGCGGGGGCCGCGGTAGCTGCGGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAACCTTGATCGTTACAAAGGGCGATGCTACAACATAGAGCCCGTTGCTGGAGAAGAAAATCAATTTATTGCTTATGTAGCTTATCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTACTGCGTATTGTAAAACTTTCCAAGGACCGCCTCATGGGATCCAAGTTGAAAGAGATAAATTGAACAAGTATGGTCGTCCCTTGCTGGGATGTACTATTAAACCTAAATTGGGGTTATCGGCTAAAAACTACGGTAGAGCAGTTTATGAATGTCTACGCGGTGGGCTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGACCGTTTCGTATTTTGTGCCGAAGCAATTTTTAAAGCACAGTCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCTACTGCAGGTACATGTGAAGAAATGATGAAAAGGGCTATATTT

>gi|1835227|emb|Z83136.1| S.foetidissimum rbcL geneAAGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTGACTTATTATACTCCTGACTATGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCAGGGGCCGCGGTAGCTGCCGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAGCCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTNGCTGGAGAAGAAAATCAATATATTGCTTATGTAGCTTATCCTTTAGACCTYTTTGAAGAAGGTTCTGTTACTAATATGTKNACTTCCATTGTGGGGAATGTATTTGGGTTCAAAGCCCTGCGTGCTTTACGTCTGGAAGATCTGCGAATCCCTCCTGCGTATTCTAAAACTTTCCAAGGACCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAACAAGTACGGTCGTCCCCTGTTGGGATGTACTATTAAACCTAAATTGGGGTTATCTGCTAAAAACTACGGTAGAGCGGTTTATGAATGTCTCCGCGGTGGACTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGATCGTTTCTTATTTTGTGCCGAAGCACTTTATAAAGCACAGGCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCT

>gi|1834456|emb|Z83132.1| G.urceolata rbcL geneAACTAAAGCGGGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTAACTTATTATACTCCTGACTATGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCGGGGGCCGCCGTAGCTGCCGAATCCTCCACTGGTACATGGACAACTGTGTGGACCGACGGACTTACTAGCCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTGGCTGGAGAAGAAAATCAATTTATTGCTTATGTAGCTTACCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTGTTGCGTATGCTAAAACTTTCCAAGGGCCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAATAAGTATGGTCGTCCCCTG

Page 12: Phylogenetic Workflows

Get Sequences

• Retrieves nucleotide and amino acid sequences from NCBI's GenBank

• Automatically includes species name and taxon ID

Page 13: Phylogenetic Workflows

Get sequences DEMO

Page 14: Phylogenetic Workflows
Page 15: Phylogenetic Workflows
Page 16: Phylogenetic Workflows
Page 17: Phylogenetic Workflows
Page 18: Phylogenetic Workflows
Page 19: Phylogenetic Workflows
Page 20: Phylogenetic Workflows
Page 21: Phylogenetic Workflows
Page 22: Phylogenetic Workflows

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Page 23: Phylogenetic Workflows

muscleDEMO

Page 24: Phylogenetic Workflows
Page 25: Phylogenetic Workflows
Page 26: Phylogenetic Workflows
Page 27: Phylogenetic Workflows
Page 28: Phylogenetic Workflows
Page 29: Phylogenetic Workflows
Page 30: Phylogenetic Workflows

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Page 31: Phylogenetic Workflows

Improved Tree Building Tools

NINJA/WINDJAMMER (Travis Wheeler)Neighbor-Joining implementation that can analyze > 200K species

Six day run time reduced 32-fold to 4.5 hours for 220K species data set

Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set

RAxML-Light (Alexandros Stamatakis)

Large Scale Maximum Likelihood implementation

55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)

Page 32: Phylogenetic Workflows

RAxML DEMO

Page 33: Phylogenetic Workflows
Page 34: Phylogenetic Workflows
Page 35: Phylogenetic Workflows
Page 36: Phylogenetic Workflows
Page 37: Phylogenetic Workflows
Page 38: Phylogenetic Workflows
Page 39: Phylogenetic Workflows

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Page 40: Phylogenetic Workflows

Tree Visualization

• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information

Page 41: Phylogenetic Workflows

iPlant Tree Viewer

http://portnoy.iplantcollaborative.org/

Page 42: Phylogenetic Workflows

Live tree view demo

Page 43: Phylogenetic Workflows
Page 44: Phylogenetic Workflows
Page 45: Phylogenetic Workflows
Page 46: Phylogenetic Workflows
Page 47: Phylogenetic Workflows
Page 48: Phylogenetic Workflows
Page 49: Phylogenetic Workflows

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Page 50: Phylogenetic Workflows

Obstacles

Page 51: Phylogenetic Workflows

Lopper DEMO

Page 52: Phylogenetic Workflows
Page 53: Phylogenetic Workflows
Page 54: Phylogenetic Workflows
Page 55: Phylogenetic Workflows
Page 56: Phylogenetic Workflows
Page 57: Phylogenetic Workflows
Page 58: Phylogenetic Workflows
Page 59: Phylogenetic Workflows
Page 60: Phylogenetic Workflows
Page 61: Phylogenetic Workflows
Page 62: Phylogenetic Workflows
Page 63: Phylogenetic Workflows
Page 64: Phylogenetic Workflows

Lobelia kauaensisLobelia villosaGaleatella gloria-montisTrematolobelia kauaiensisTrematolobelia macrostachysLobelia hypoleucaNeowimmeria yuccoidesLobelia niihauensisBrighamia insignisBrighamia rockiiDelissea rhytidospermaDelissea subcordataCyanea acuminataCyanea hirtellaCyanea coriaceaDelissea leptostegiaClermontia kakeanaClermontia parvifloraClermontia arborescensClermontia fauriei

Page 65: Phylogenetic Workflows

The TNRS: A Taxonomic Name Resolution Service for Plants

Tonight from 5:30 - 7:30 in Exhibit Hall A.Poster number P21011.

Page 66: Phylogenetic Workflows

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Page 67: Phylogenetic Workflows

CACE DEMO

Page 68: Phylogenetic Workflows
Page 69: Phylogenetic Workflows
Page 70: Phylogenetic Workflows
Page 71: Phylogenetic Workflows
Page 72: Phylogenetic Workflows
Page 73: Phylogenetic Workflows
Page 74: Phylogenetic Workflows
Page 75: Phylogenetic Workflows
Page 76: Phylogenetic Workflows