metagenomics 2015 module1-part1 · course overview • module 1: introduction – definitions,...
TRANSCRIPT
Canadian&Bioinforma,cs&Workshops&
www.bioinforma,cs.ca&
2 Module #: Title of Module
Module&1&Introduc,on&to&Metagenomics!
Robert&Beiko&
Rob Beiko [email protected] @rob_beiko
Module!1!! bioinformatics.ca en.wikipedia.org
Module!1!! bioinformatics.ca
Avery– MacLeod– McCarty experiment
en.wikipedia.org
Module!1!! bioinformatics.ca
Course overview
• Module 1: Introduction – definitions, approaches, considerations
• Module 2: Marker genes – measuring community diversity • Module 3: Metagenome taxonomy – classifying and binning
sequence reads • Module 4: Metagenome function – databases and pathways • Module 5: Metatranscriptomics: data, taxonomy, function • Module 6: Biomarker discovery
Module!1!! bioinformatics.ca
General Learning Objectives
At the end of this workshop, you will be able to: • Define the objectives of different types of metagenomic
projects • Process raw data files using appropriate quality control • Run standard pipelines for marker-gene, metagenome and
metatranscriptome datasets • Analyze results using statistical and network approaches • Recognize the technical limitations of metagenomic studies
Module!1!! bioinformatics.ca
Learning objectives of Module 1
You will be able to: • Apply key terms in metagenomics, for example
microbial communities, OTUs, metadata • Define the objectives of a metagenomic experiment,
with appropriate choice of technology • Interpret the contents of sequence files • Acquire data from online resources and reference
databases
Module!1!! bioinformatics.ca
Defining Metagenomics • Microbiome: Attributed to Joshua Lederberg by Hooper and Gordon
(2001): “the collective genome of our indigenous microbes (microflora), the idea being that a comprehensive genetic view of Homo sapiens as a life-form should include the genes in our microbiome”
• Is also used to mean microbiota, the set of microorganisms found in a particular setting
• Metagenome: Handelsman et al. (1998) “…advances in molecular biology and eukaryotic genomics, which have laid the groundwork for cloning and functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
• Does not encompass marker-gene surveys (e.g., 16S) This report says it does.
Module!1!! bioinformatics.ca
The big picture Explore the relationship between microbes and their habitat
To accomplish this, we use a series of experimental and computational techniques to make inferences about the community: - Marker genes - Metagenomes - Metatranscriptomes - Metaproteomes - Metametabolomes - “Culturomes”
Module!1!! bioinformatics.ca
Why metagenomics?
• The “great plate count anomaly”: <1% of organisms across many habitats are culturable (reviewed in Amann et al., 1995: PMID 7535888) – CONTROVERSIAL; probably not true for habitats such as human body sites
• In any event, it would be nearly impossible to culture ALL constituents of a given microbiome sample (apart from trivially simple ones)
• Metagenomics offers an effective (if imperfect) way to profile the structure and function of microbial communities
Module!1!! bioinformatics.ca
Human gut microbiome: 2-3 million genes
Typically > 160 “species” at any given sampling time
Host: ~25,000 genes
Qin et al., Nature (2010)
The Human Microbiome
Module!1! bioinformatics.ca
A Brief History of Metagenomics and Things Like It
Module!1!! bioinformatics.ca
1970s
1960: pyrimidine tract sequencing via depurination 1955: insulin protein sequence
1965: Atlas of Protein Sequence and Structure (Eck, Dayhoff)
Frederick Sanger, Margaret Dayhoff en.wikipedia.org
Module!1!! bioinformatics.ca
Staden (1979)
“The continuing rapid fall in the cost of computer components is making it possible for most DNA sequencing laboratories to have their own small computer. The fact that DNA sequencing is now a fast procedure, and the availability of computers gives the possibility of more efficient overall strategies for sequence determination.”
Module!1!! bioinformatics.ca
T4 genome map: Wood and Revel, 1976
PhiX174 phage genome: Sanger et al., 1977
Module!1!! bioinformatics.ca
1980s
Norm Pace http://pacelab.colorado.edu
1980: “Dr. Dayhoff established an on-line computer database and a sophisticated retrieval system, accessable by phone to outside users, in September 1980” http://www.dayhoff.cc/MODBiography.html
Module!1!! bioinformatics.ca
Octopus Spring: Stahl et al., 1985
Module!1!! bioinformatics.ca
1990s
Jo Handelsman en.wikipedia.org
Module!1!! bioinformatics.ca
2000s
Oded Béjà rbni.technion.ac.il Jill Banfield ourenvironment.berkeley.edu
Module!1!! bioinformatics.ca
2010s
“The microbiome of”: Roller derby Kissing Mobile phones Beer Irish rugby players
Jessica Green
Rob Knight
Module!1! bioinformatics.ca
(Very) high-level workflows
Module!1!! bioinformatics.ca
The big picture
Microbial sample
Generate “Meta-omic” data
Process data (QC,
etc.) Analysis
Module!1!! bioinformatics.ca
Marker genes
Extract DNA
Amplify with
targeted primers
Filter errors, build
clusters
Diversity analysis
Module!1!! bioinformatics.ca
Metagenomes
Extract DNA
Sequence random
fragments
QC, assemble, annotate
Diversity, function analysis
Module!1!! bioinformatics.ca
Metatranscriptomes
Extract RNA,
subtract rRNA
Sequence cDNA QC
Gene expression,
function
Module!1!! bioinformatics.ca
Scaling up
Metadata
Langille et al., Microbiome (2014)
Module!1! bioinformatics.ca
Examples of “Metagenomics”
Module!1!! bioinformatics.ca Remediation of C. difficile infection: Lawley et al., PLoS Pathogens (2012)
Module!1!! bioinformatics.ca Analysis of membrane proteins in the GOS dataset: Patel et al.,Genome Res (2010)
Module!1!! bioinformatics.ca
Metagenomic / metatranscriptomic AMD analysis - Hua et al., ISME J (2015) Draft genomes at MG-RAST
Module!1!! bioinformatics.ca Metabolites and microbes in bacterial vaginosis: Srinivasan et al.,Genome Res (2010)
Module!1!! bioinformatics.ca
Impact of low-dose penicillin on mouse development – Cox et al., Cell (2014)
Module!1!! bioinformatics.ca
Sequencing technologies
Sanger Ion Torrent Roche 454
Illumina *Seq Pacific Biosciences Nanopore
Module!1! bioinformatics.ca
Resources
Module!1!! bioinformatics.ca
16S
GreenGenes: MacDonald et al. ISME
J (2012)
SILVA: Quast et al. NAR (2013)
rrnDB: Stoddard et al. NAR (2014)
RDP II: Cole et al. NAR (2013)
Module!1!! bioinformatics.ca
Genomes
PATRIC GenBank Genomes
GOLD Ensembl Genomes
Module!1!! bioinformatics.ca
“Metagenomes”
EBI metagenomics MG-RAST
HMP DACC
Module!1!! bioinformatics.ca
Function
KEGG
UniProtKB
CARD
Gene Ontology
Module!1! bioinformatics.ca
Major concerns in metagenomic analysis
Module!1!! bioinformatics.ca
Data Quality
• Sequencing errors – Introduced in workup – Error rates, error type (PacBio: 10% random, Illumina –
0.1% substitution) • Chimeras
– Amplification artifacts, cloning of restriction fragments
Module!1!! bioinformatics.ca
Comparability / Reproducibility
• 16S: different V regions give different results • Different sequencing platforms / sampling
conditions ALSO give different results – Eisen paper about different recoveries under different
conditions • Workflow complexity / plethora of tools
Module!1!! bioinformatics.ca
Morgan Langille Useless, not published
“Middle-aged”
Young
Reference
Old
Module!1!! bioinformatics.ca
Linkage and resolution
• Strain-level diversity in metagenomes will often be missed by amplicon (esp. short-read) and shotgun approaches
• This may be especially important between samples
• Should you assemble metagenomic reads? What are the assumptions?
Module!1!! bioinformatics.ca
16S is not the only option!!
Martiny et al. (2009) Env Micro
Ribosomal intergenic transcribed spacer regions (ITS)
Module!1!! bioinformatics.ca
Taxonomy and OTUs
RDP taxonomic predictions +
taxonomy in general
OTUs – arbitrary, quasi-phylogenetic
Seed sequences
???
De novo
97%
Module!1! bioinformatics.ca
Functional annotation problems
CAFA (Radivojac et al., Nat Meth 2013)
Misannotations across databases (Schnoes et al., PLoS Comp Biol 2009) Coverage vs accuracy
Module!1! bioinformatics.ca
We&are&on&a&Coffee&Break&&&Networking&Session&