eukref. a community effort towards phylogenetic-based curation of ribosomal databases for...
TRANSCRIPT
A community effort towards phylogenetic-based curation
of ribosomal databases for environmental sequencing
Javierdel Campo
LauraParfrey
Motivation
Integrate expert views on taxonomy into public database resources
Improve resources for high throughput sequence annotation
• Catalyze experts in protist taxonomy to engage in curation and validation of a ribosomal DNA marker gene database for eukaryotic lineages across the tree of life.
• Synthesize the efforts of individual curators to produce a phylogenetically curated ribosomal DNA marker gene database for eukaryotes.
• Use the improved reference database to characterize the environmental distribution of eukaryotic microbes from large-scale HTES datasets.
Aims
1) HTES 18S rDNA sequence retrieval
2) Reference database annotation
3) Community analysis using classification
High-throughput environmental sequence (HTES) analysis of eukaryotes
Starting reference database 18S phylogeny
Use the phylogeny to improve classification
Reference database 18S phylogenyAfter curation
Integrate environmental metadata
Where was this sequence isolated?• Fresh water or marine?• Aerobic or anoxic?• Host information? (symbiotic clades)
0 500 1000 1500 2000 2500 3000 3500 HTS readsA manually curated reference DBThe opisthos example
Outputs for each group
• Set of curated sequences– Files with chimeric sequences and short sequences
• Alignment of these sequences• Phylogenetic tree• Database
– Full classification (unlimited ranks)– Environmental metadata
• Open access after 1 year embargo (if desired)
18S reference DB curation pipeline (simplified)
18S reference DB curation pipeline
• A refined curation pipeline, associated computational tools, and curation instructions
• Reference databases for individual lineages
• Synthesis of classification for each group
Workshop Outputs
After the workshop…
• Continued curation.
• Recruit new curators using refined tools.
• Coordinate with other groups.
• Integrate data from different curation efforts (into a cohesive database).
• Data sharing and distribution.
Acknowledgments
Thank you!Advisers and participants