metagenomics 2015 module1-part1 · course overview • module 1: introduction – definitions,...

24
Canadian Bioinforma,cs Workshops www.bioinforma,cs.ca

Upload: others

Post on 26-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Canadian&Bioinforma,cs&Workshops&

www.bioinforma,cs.ca&

2 Module #: Title of Module

Page 2: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module&1&Introduc,on&to&Metagenomics!

Robert&Beiko&

Rob Beiko [email protected] @rob_beiko

Module!1!! bioinformatics.ca en.wikipedia.org

Page 3: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Avery– MacLeod– McCarty experiment

en.wikipedia.org

Module!1!! bioinformatics.ca

Course overview

•  Module 1: Introduction – definitions, approaches, considerations

•  Module 2: Marker genes – measuring community diversity •  Module 3: Metagenome taxonomy – classifying and binning

sequence reads •  Module 4: Metagenome function – databases and pathways •  Module 5: Metatranscriptomics: data, taxonomy, function •  Module 6: Biomarker discovery

Page 4: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

General Learning Objectives

At the end of this workshop, you will be able to: •  Define the objectives of different types of metagenomic

projects •  Process raw data files using appropriate quality control •  Run standard pipelines for marker-gene, metagenome and

metatranscriptome datasets •  Analyze results using statistical and network approaches •  Recognize the technical limitations of metagenomic studies

Module!1!! bioinformatics.ca

Learning objectives of Module 1

You will be able to: •  Apply key terms in metagenomics, for example

microbial communities, OTUs, metadata •  Define the objectives of a metagenomic experiment,

with appropriate choice of technology •  Interpret the contents of sequence files •  Acquire data from online resources and reference

databases

Page 5: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Defining Metagenomics •  Microbiome: Attributed to Joshua Lederberg by Hooper and Gordon

(2001): “the collective genome of our indigenous microbes (microflora), the idea being that a comprehensive genetic view of Homo sapiens as a life-form should include the genes in our microbiome”

•  Is also used to mean microbiota, the set of microorganisms found in a particular setting

•  Metagenome: Handelsman et al. (1998) “…advances in molecular biology and eukaryotic genomics, which have laid the groundwork for cloning and functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

•  Does not encompass marker-gene surveys (e.g., 16S) This report says it does.

Module!1!! bioinformatics.ca

The big picture Explore the relationship between microbes and their habitat

To accomplish this, we use a series of experimental and computational techniques to make inferences about the community: -  Marker genes -  Metagenomes -  Metatranscriptomes -  Metaproteomes -  Metametabolomes -  “Culturomes”

Page 6: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Why metagenomics?

•  The “great plate count anomaly”: <1% of organisms across many habitats are culturable (reviewed in Amann et al., 1995: PMID 7535888) – CONTROVERSIAL; probably not true for habitats such as human body sites

•  In any event, it would be nearly impossible to culture ALL constituents of a given microbiome sample (apart from trivially simple ones)

•  Metagenomics offers an effective (if imperfect) way to profile the structure and function of microbial communities

Module!1!! bioinformatics.ca

Human gut microbiome: 2-3 million genes

Typically > 160 “species” at any given sampling time

Host: ~25,000 genes

Qin et al., Nature (2010)

The Human Microbiome

Page 7: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1! bioinformatics.ca

A Brief History of Metagenomics and Things Like It

Module!1!! bioinformatics.ca

1970s

1960: pyrimidine tract sequencing via depurination 1955: insulin protein sequence

1965: Atlas of Protein Sequence and Structure (Eck, Dayhoff)

Frederick Sanger, Margaret Dayhoff en.wikipedia.org

Page 8: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Staden (1979)

“The continuing rapid fall in the cost of computer components is making it possible for most DNA sequencing laboratories to have their own small computer. The fact that DNA sequencing is now a fast procedure, and the availability of computers gives the possibility of more efficient overall strategies for sequence determination.”

Module!1!! bioinformatics.ca

T4 genome map: Wood and Revel, 1976

PhiX174 phage genome: Sanger et al., 1977

Page 9: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

1980s

Norm Pace http://pacelab.colorado.edu

1980: “Dr. Dayhoff established an on-line computer database and a sophisticated retrieval system, accessable by phone to outside users, in September 1980” http://www.dayhoff.cc/MODBiography.html

Module!1!! bioinformatics.ca

Octopus Spring: Stahl et al., 1985

Page 10: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

1990s

Jo Handelsman en.wikipedia.org

Module!1!! bioinformatics.ca

2000s

Oded Béjà rbni.technion.ac.il Jill Banfield ourenvironment.berkeley.edu

Page 11: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

2010s

“The microbiome of”: Roller derby Kissing Mobile phones Beer Irish rugby players

Jessica Green

Rob Knight

Module!1! bioinformatics.ca

(Very) high-level workflows

Page 12: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

The big picture

Microbial sample

Generate “Meta-omic” data

Process data (QC,

etc.) Analysis

Module!1!! bioinformatics.ca

Marker genes

Extract DNA

Amplify with

targeted primers

Filter errors, build

clusters

Diversity analysis

Page 13: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Metagenomes

Extract DNA

Sequence random

fragments

QC, assemble, annotate

Diversity, function analysis

Module!1!! bioinformatics.ca

Metatranscriptomes

Extract RNA,

subtract rRNA

Sequence cDNA QC

Gene expression,

function

Page 14: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Scaling up

Metadata

Langille et al., Microbiome (2014)

Module!1! bioinformatics.ca

Examples of “Metagenomics”

Page 15: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca Remediation of C. difficile infection: Lawley et al., PLoS Pathogens (2012)

Module!1!! bioinformatics.ca Analysis of membrane proteins in the GOS dataset: Patel et al.,Genome Res (2010)

Page 16: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Metagenomic / metatranscriptomic AMD analysis - Hua et al., ISME J (2015) Draft genomes at MG-RAST

Module!1!! bioinformatics.ca Metabolites and microbes in bacterial vaginosis: Srinivasan et al.,Genome Res (2010)

Page 17: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Impact of low-dose penicillin on mouse development – Cox et al., Cell (2014)

Module!1!! bioinformatics.ca

Sequencing technologies

Sanger Ion Torrent Roche 454

Illumina *Seq Pacific Biosciences Nanopore

Page 18: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1! bioinformatics.ca

Resources

Module!1!! bioinformatics.ca

16S

GreenGenes: MacDonald et al. ISME

J (2012)

SILVA: Quast et al. NAR (2013)

rrnDB: Stoddard et al. NAR (2014)

RDP II: Cole et al. NAR (2013)

Page 19: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Genomes

PATRIC GenBank Genomes

GOLD Ensembl Genomes

Module!1!! bioinformatics.ca

“Metagenomes”

EBI metagenomics MG-RAST

HMP DACC

Page 20: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Function

KEGG

UniProtKB

CARD

Gene Ontology

Module!1! bioinformatics.ca

Major concerns in metagenomic analysis

Page 21: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Data Quality

•  Sequencing errors –  Introduced in workup –  Error rates, error type (PacBio: 10% random, Illumina –

0.1% substitution) •  Chimeras

–  Amplification artifacts, cloning of restriction fragments

Module!1!! bioinformatics.ca

Comparability / Reproducibility

•  16S: different V regions give different results •  Different sequencing platforms / sampling

conditions ALSO give different results –  Eisen paper about different recoveries under different

conditions •  Workflow complexity / plethora of tools

Page 22: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

Morgan Langille Useless, not published

“Middle-aged”

Young

Reference

Old

Module!1!! bioinformatics.ca

Linkage and resolution

•  Strain-level diversity in metagenomes will often be missed by amplicon (esp. short-read) and shotgun approaches

•  This may be especially important between samples

•  Should you assemble metagenomic reads? What are the assumptions?

Page 23: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1!! bioinformatics.ca

16S is not the only option!!

Martiny et al. (2009) Env Micro

Ribosomal intergenic transcribed spacer regions (ITS)

Module!1!! bioinformatics.ca

Taxonomy and OTUs

RDP taxonomic predictions +

taxonomy in general

OTUs – arbitrary, quasi-phylogenetic

Seed sequences

???

De novo

97%

Page 24: Metagenomics 2015 Module1-Part1 · Course overview • Module 1: Introduction – definitions, approaches, considerations • Module 2: Marker genes – measuring community diversity

Module!1! bioinformatics.ca

Functional annotation problems

CAFA (Radivojac et al., Nat Meth 2013)

Misannotations across databases (Schnoes et al., PLoS Comp Biol 2009) Coverage vs accuracy

Module!1! bioinformatics.ca

We&are&on&a&Coffee&Break&&&Networking&Session&