anl soil metagenomics 2014 soil reference database - let's do this

24

Upload: adina-chuang-howe

Post on 03-Jul-2015

142 views

Category:

Science


3 download

DESCRIPTION

Talk at 2014 ANL Soil Metagenomics Meeting

TRANSCRIPT

Page 1: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
Page 2: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Is it time for a (community) effort towards a soil reference

database?

Erick Cardenas, James Cole, Maude David, Aaron Garoutte, Adina Howe, Janet Jansson, Dave Myrold, James Tiedje, and you?

Modified version of slides will be available after presentation: http://www.slideshare.net/adinachuanghowe

Page 3: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

The most important hands in soil microbiology

Page 4: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Significance of a soil-specific reference

• Need standardized resource to connect sequencing data at different levels

• Integrate sequencing data towards soil health and productivity

• Broadly enable “connecting the dots”

Genes

Organisms

Communities

Ecosystems

Page 5: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Soil metagenomic challenges

• The amount we know…

• Incredible microbial diversity

• Spatial heterogeneity

• Complex dynamics

• Lack of reference genomes (bacteria, archaea, fungal)

Page 6: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

HUMAN MICROBIOME PROJECT

Page 7: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Lessons from HMP

• 2009 Goals:

– Take advantage of high throughput technologies to characterize human microbiome of large number of samples

– Determine whether associations between changes in the microbiome and health disease

– Provide a standardized data resource and new technological approaches to enable such studies to be undertaken broadly in scientific community

Page 8: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

HMP metagenomic challenges

Soil

• Incredible microbial diversity

• Spatial heterogeneity

• Complex dynamics

• Lack of reference genomes (bacteria, archaea, fungal)

HMP

• Microbial diversity

• Individual variation

• Complex host-associated dynamics

• Lack of reference genomes?

Page 9: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

The HMP reference genome effort

• Add at least 900-3000 additional reference bacterial genome sequences to public database

• Thorough representation of domains and major body sites

Page 10: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Not only sequencing….but access to data

Currently, over 1000 bacterial genomes at various stages of sequencing

Page 11: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Tools: Opening doors broadly

Metaphlan, Nature Methods 9, 811-814 (2012)

Nature Reviews Genetics, 15, 577-584 (2014)

Vital et al., mBio, Vol 5., 2014

Page 12: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Another example: GEBA

Comparison of • rRNA tree of life• genome

sequence in the DSMZ culture collection

Are there any general benefits that come from this "phylogeny driven" approach?

Page 13: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Simpact of “targeted” sequencing of improved references

Higher rate of discovery and characterization of new gene families

New ways to link distantly related homologs that would otherwise go undetected

Significant phylogenetic expansions of known protein families

Enrichment of genetic diversity

Page 14: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Can a similar strategy benefit soil studies?

Page 15: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

What could we use it for?

• Target isolation and sequencing efforts; creation of a “most wanted” list

• Soil specific framework for larger scale sequencing and proteomic efforts to identify taxonomic and functional information

• Genome-centric investigation of soil genomes (e.g., distribution of shared genes among soil phyla); development of improved biomarkers for high throughput assays

• Providing data to tool developers to make bioinformatics/visualization easier for soil-specific studies

Page 16: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

What are the challenges?

• How do we defined a soil organism?

– Origin form soil?

– 16S rRNA gene sequence matched one from soil?

– What level of finishing is adecuate?

Page 17: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

What are the challenges?

• What is the most critical/practical metadata?

– Soil location

– Soil taxonomy

– Links to RefSeq IDs

– Is the strain available and where?

Page 18: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

What are the challenges?

• Who to include?

– Fungi! Archaea!

Page 19: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

What are the challenges?

• Expert curators?

– You?

– Tiered hierarchy of curation level

Page 20: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Some initial efforts

RefSoil (2011)Erick Cardenas, Aaron Garoutte, Adina Howe, Jim Tiedje

Bacterial genomes retrieved from Gold database , and , and selected those associated with soil habitats

Manually curated to exclude obligated human pathogens and extremophiles

Databases can be biased and redundant

Proteobacteria, 267

Firmicutes, 92

Actinobacteria, 75

Bacteroidetes, 12

Cyanobacteria, 7

Tenericutes, 5

Acidobacteria, 5

Other, 29

492 organisms19 phyla

Page 21: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

NCBI Reference Genomes described as originating from soil

Proteobacteria

Actinobacteria

Firmicutes

Bacteroidetes

Cyanobacteria

Acidobacteria

Page 22: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Protein Models for Functions: FOAM Database

Nucl. Acids Res. (2014)doi: 10.1093/nar/gku702

Page 23: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

Some Motivation

60 terrestrial NEON sites distributed across 20 ecoclimatic domainsTerrestrial scale streaming of lots of data including sequencing data for each site

Page 24: ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this

If you’d like to contribute

• Join the breakout session Thursday evening (6-7 pm)

• Know someone with genomes / database, let us know? Want to contribute? Have an opinion? Have funding?

Adina Howe, [email protected]