[2013.10.29] albertsen genomics metagenomics

100
Genomics and Metagenomics Mads Albertsen Introduction to community systems microbiology 2013 CENTER FOR MICROBIAL COMMUNITIES

Upload: madsalbertsen

Post on 06-May-2015

903 views

Category:

Spiritual


4 download

TRANSCRIPT

Page 1: [2013.10.29] albertsen genomics metagenomics

Genomicsand

MetagenomicsMads Albertsen

Introduction to community systems microbiology2013

CENTER FOR MICROBIAL COMMUNITIES

Page 2: [2013.10.29] albertsen genomics metagenomics

Agenda

Genomics• Introduction• Assembly• Validation• Metabolic reconstruction (SM @ Thursday)

Metagenomics• History• Pitfalls• Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 3: [2013.10.29] albertsen genomics metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Genome = Parts list of a single genome

Page 4: [2013.10.29] albertsen genomics metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How to get from sequenced DNA to metabolic model?

Page 5: [2013.10.29] albertsen genomics metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Extract DNA Shear DNA Sequence

Wet lab work

Bioinformatics

Reads50-500 bp

Assembly ScaffoldingContigs

1kb – 100 kbp

N N

ScaffoldsHopefully Mbp

Page 6: [2013.10.29] albertsen genomics metagenomics

Definitions

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Read

Paired-end read

Mate-pair read

Insert size

Contig

Scaffold

Coverage

N

> 1 kbp insert

A sequenced piece of DNA

Sequencing both ends of a short DNA fragment

Sequencing both ends of a long DNA fragment

The length of the DNA fragment

A set of overlapping DNA segments that represents a consensus region of DNA

Contigs separated by gaps of known length

The number of times a specific position in the genome is covered by reads

50-500 bp

300-600 bp insert

length

4x

Page 7: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Genome

Fragment

Sequence

AssembleScaffold 1

Inspiration: http://goo.gl/VOZVVg

Scaffold 2

Paired-end reads

Contig 1 Contig 19Contig 10

Page 8: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYInspiration: http://goo.gl/VOZVVg

Genome(3.000.000 letters)

Reads(50-500 letters each)

Sequencing Assembly

Genome(3.000.000 letters)

Page 9: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

“It was the best of times, it was the worst of times, it was the age of

wisdom, it was the age of foolishness, it was the epoch of belief, it was

the epoch of incredulity,.... “

Dickens, Charles. A Tale of Two Cities. 1859. London: Chapman Hall

Example: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

Page 10: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

Way too much data to make all vs. all comparison

Page 11: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

theageofwi

the hea eag age geo eof ofw fwi

sthebestof

sth the heb ebe bes est sto tof

astheageof

ast sth the hea eag age geo eof

worstoftim

wor ors rst sto tof oft fti tim

Imesitwast

ime mes esi sit itw twa was ast

Reads

Kmers (k = 3)

Step 1: Convert reads into kmers

Page 12: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

ast sth the hea eag age geo eof

Step 2: Join kmers with n-1 overlap

Page 13: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

ast sth the hea eag age geo eofthe hea eag age geo eof ofw fwi

sth theheb ebe bes est sto tof

wor ors rststo tof

tim

oft

fti

ast

was

twa

itw sit esi mes ime

Step 2: Join kmers with n-1 overlap

Do the same for all reads…

Page 14: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

Step 3: Simplify the graph

Page 15: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

“It was the best of times, it was the worst of times, it was the age of

wisdom, it was the age of foolishness, it was the epoch of belief, it was

the epoch of incredulity,.... “ Dickens, Charles. A Tale of Two Cities. 1859. London: Chapman Hall

It was the

be

wor

age

epochst

oftimes

wisdom

foolishness belief

incredulity

=Contigs

Page 16: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

What is the minimum kmer size that results in a single contig?

Kmer = 3

Page 17: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

What is the minimum kmer size that results in a single contig?

Kmer = 3

ItwasthebestoftimesitwastheworstoftimesitwastheageofwisdomitwastheageoffoolishnessitwastheepochofbeliefitwastheepochofincredulityKmer = 10

Page 18: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

Repeat = repeated DNA sequence that can’t be spanned by reads

Page 19: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

Why not just increase the kmer size?

Page 20: [2013.10.29] albertsen genomics metagenomics

Assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYExample: http://goo.gl/nMWDAkVelvet example courtesy of J. Leipzig 2010

the hea eag age geo eof ofw fwi

theageofw heageofwi

theageofwi

Errors!

Kmer = 3 Kmer = 9

Kmers with errors = 2/2

Kmers with errors = 3/8

Page 21: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

I’ve assembled my 4.3 Mbp genome into 25 scaffolds

with a N50 of 553 kbp.

Is it a good assembly?

Page 22: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

N50

Page 23: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Estimating repeat content

Page 24: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

4 repeats in 2 copies each

Page 25: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

4 repeats in 2 copies each

How could I close this genome?

Page 26: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How complete is the genome?

Page 27: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

100-106 Essential single copy genes

(can also be used to identify contamination)

Gen

es

Phyla

Survey of essential single copy genes across sequenced phyla

Page 28: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYInspect the assembly

Page 29: [2013.10.29] albertsen genomics metagenomics

Validation

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

• N50 does not make much sense

• Repeat content versus the number of scaffolds• Calculate the percentage of essential genes• Inspect the assembly

Page 30: [2013.10.29] albertsen genomics metagenomics

Metabolic reconstruction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

4.3 Mbp genome… and so what?

(@Thursday)

Page 31: [2013.10.29] albertsen genomics metagenomics

Metagenomics

Mads AlbertsenIntroduction to community systems microbiology

2013

CENTER FOR MICROBIAL COMMUNITIES

Page 32: [2013.10.29] albertsen genomics metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Genome = Parts list of a single genome

Page 33: [2013.10.29] albertsen genomics metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenome = Parts list of the community

Photo: D. Kunkel; color, E. Latypova

Page 34: [2013.10.29] albertsen genomics metagenomics

Introduction

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 35: [2013.10.29] albertsen genomics metagenomics

Introduction

PubMed: metagenom*[Title/Abstract]

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 36: [2013.10.29] albertsen genomics metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”

- J. Handelsman et al., 1998

PubMed: metagenom*[Title/Abstract]

Sequencing costs

http://www.genome.gov/sequencingcosts/

Page 37: [2013.10.29] albertsen genomics metagenomics

Introduction

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metagenomics ≠ Amplicon sequencing

Page 38: [2013.10.29] albertsen genomics metagenomics

Sequencing and assembly

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

≈3.000.000 bppr. genome

≈1000 bp+contigs

150 bp reads

Page 39: [2013.10.29] albertsen genomics metagenomics

Assigning information

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

Function

Taxonomy

Databases

Binning

Page 40: [2013.10.29] albertsen genomics metagenomics

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Rusch et al., 2007 Plos Biology

Exploration

Qin et al., 2010 Nature

• 6.3 Gbp of sequence (2x Human genomes, 2000 x Bacterial genomes)

• Most sequences were novel compared to the databases

• 127 Human gut metagenomes• 600 Gbp sequence (200 x Human genomes)• 3.3 million genes identified• Minimal gut metagenome definded

Page 41: [2013.10.29] albertsen genomics metagenomics

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

• A characteristic microbial fingerprint for each of the nine different ecosystem types

Dinsdale et al., 2008 Nature

Comparative Specific functions

Hess et al., 2011 Science

• Identified 27.755 putative carbohydrate-active genes from a cow rumen metagenome

• Expressed 90 candidates of which 57% had enzymatic activity against cellulosic substrates

Page 42: [2013.10.29] albertsen genomics metagenomics

What have metagenomics been used for?

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

• Genome extraction from low complexity metagenome

• Candidatus Accumulibacter phosphatis• The first genome of a polyphosphate

accumulating organism (PAO) with a major role en enhanced biological phosphorus removal

Extracting genomes

• Genome extraction of low abundant species (< 0.1%) from metagenomes

• First complete TM7 genome• Access to genomes of the ”uncultured

majority”

Garcia Martin et al., 2006 Nat. Biotechnol. Albertsen et al., 2013 Nat. Biotechnol.

Page 43: [2013.10.29] albertsen genomics metagenomics

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 44: [2013.10.29] albertsen genomics metagenomics

Metagenomics made easy

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Great resources – but use with care

Page 45: [2013.10.29] albertsen genomics metagenomics

MG-RAST example

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

Page 46: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Dataset overview

Page 47: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

FunctionTaxonomy

Taxonomy and Function overview

Page 48: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Compare with other samples

Samples Functional categories

Page 49: [2013.10.29] albertsen genomics metagenomics

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

You always get billions of data!

Page 50: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Pitfalls

Is your DNA extraction OK?... and the samples you want to compare with?

Did you sequence enough?Did you know the GC bias of your protocol?Did you normalize for sequencing depth?Did you use the same sequencing platform?

Assembly = data not quantitative!Are you comparing assembled data with reads?

Page 51: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Databases

Contigs

Databases

...you only see what is in the database

Annotated metagenome

Page 52: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

What is in the databases?

PhylaClassOrderSpecies

2946

1001268

90249405

99322

Genomes 16S

Finshed Genomes in IMGVs.

Greengenes 16S rRNA database

Note: only including 1 strain pr. species

*97% clustering

*

Page 53: [2013.10.29] albertsen genomics metagenomics

MG-RAST example

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Contigs

650.000 EBPR proteins with taxonomy assigned

How similar are they to the genomes in the database?

Page 54: [2013.10.29] albertsen genomics metagenomics

Sludge microbes vs. Database genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

650.000 EBPR proteins

Note: not abundance weighted

Page 55: [2013.10.29] albertsen genomics metagenomics

Sludge microbes vs. Database genomes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

650.000 EBPR proteins1.260.000 Human gut

Qin et al., 2010 NatureRAST ID: 4448044.3

Note: not abundance weighted

Page 56: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Sludge microbes vs. Database genomes

The 7 genera with most EBPR proteins assigned

Page 57: [2013.10.29] albertsen genomics metagenomics

Effect of missing genomes

What is the effect of not having closely related genomes in the database?

1. Remove a genome from the database

2. Search the removed genome against the database

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 58: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Best hit

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Accumulibacter phosphatis

blastp

Related genomes

4326 proteins

Page 59: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Best hit

Accumulibacter phosphatis

blastp

Related genomes

4326 proteinsAzoarcus

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Page 60: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Accumulibacter phosphatis

blastp

Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID

Assigned to Proteobacteria

Related genomes

4326 proteins

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Page 61: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Accumulibacter phosphatis

blastp

Genus

No hits 261

Bacteria 325

Proteobacteria 860

Beta- 853

Rhodocyclaceae 1149

4326 proteins:• 27% correctly

classified on genus level

• 54% not assigned the correct class

• 101 genera identified

Related genomes

Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID

Assigned to Proteobacteria

4326 proteins

Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5

Page 62: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA

Nitrospira defluvii

Bacteria 1268Nitrospirae 3

blastp

Related genomes

4268 proteins:• 1% correctly

classified on phylum level

Phylum

Page 63: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA+

KEGG

Nitrospira defluvii

blastp

Related genomesBacteria 1268Nitrospirae 3

What about function?

Page 64: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

MEGAN LCA+

KEGG

Nitrospira defluvii

blastp

Related genomesBacteria 1268Nitrospirae 3

Page 65: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Effect of missing genomes

Nitrospira defluvii

blastp

Related genomes

MEGAN LCA+

KEGG

Bacteria 1268Nitrospirae 3

Page 66: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Implication of missing genomes

Function A

Function B

Function C

Function D

Page 67: [2013.10.29] albertsen genomics metagenomics

Pitfalls

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

You always get billions of data!

Page 68: [2013.10.29] albertsen genomics metagenomics

Metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

”If you want to understand the ecosystem

you need to understand the individual species

in the ecosystem”

Page 69: [2013.10.29] albertsen genomics metagenomics

Metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Lion + Eagle ≠ Flying Lion

Page 70: [2013.10.29] albertsen genomics metagenomics

Potentials

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Page 71: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Who - when, where and why?

Page 72: [2013.10.29] albertsen genomics metagenomics

Culturing

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How do we get the genomes?

Few microorganisms can be easily cultured (<<5%)Microorganisms needs to be studied in their environment

Page 73: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How do we get the genomes?

What you think you study What you actually study

Page 74: [2013.10.29] albertsen genomics metagenomics

Single cell genomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How do we get the genomes?

CulturingFew microorganisms can be easily cultured (<<5%)Microorganisms needs to be studied in their environment

Only routinely performed in specialized labsVery incomplete genomes (mean 40%, range 10-90%)

https://www.bigelow.org/

Page 75: [2013.10.29] albertsen genomics metagenomics

Single cell genomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

How do we get the genomes?

CulturingFew microorganisms can be easily cultured (<<5%)Microorganisms needs to be studied in their environment

Only routinely performed in specialized labsVery incomplete genomes (mean 40%, range 10-90%)

Metagenomics

https://www.bigelow.org/

Page 76: [2013.10.29] albertsen genomics metagenomics

DNA extraction

Sequencing

Assembly Contigs

1000+ bp

100-150 bp

Reads

Metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Why not full genomes?

100++ Abundant species (≈3 Mbp each)

Page 77: [2013.10.29] albertsen genomics metagenomics

DNA extraction

Sequencing

Assembly Contigs

1000+ bp

100-150 bp

Reads

Metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Why not full genomes?

1. Micro-diversity

2. Separation of genomes (Binning)

100++ Abundant species (≈3 Mbp each)

Page 78: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Not 1 strain

Many closely related strains

AAAAAAAAAAAAAA

AAAAAAAAATAAAA

AAAAAAAAACAAAA

AAAAAAAAA

TAAAA

CAAAA

What you get

AAAAA

Assembly

Extracting genomes

Page 79: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Low micro-diversityHigh micro-diversity

Short term enrichment

Extracting genomes

Page 80: [2013.10.29] albertsen genomics metagenomics

DNA extraction

Sequencing

Assembly Contigs

1000+ bp

100-150 bp

Reads

Metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Why not full genomes?

1. Micro-diversity

2. Separation of genomes (Binning)

100++ Abundant species (≈3 Mbp each)

Page 81: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Genomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method

Complex sample

PhD student

”Binning”

Binning

Page 82: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Genomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method

Complex sample

PhD student

”Binning”

Problems:- Short pieces of sequence (1-10kbp)- Local sequence divergence

Binning

Page 83: [2013.10.29] albertsen genomics metagenomics

Sequence composition-independent binning

Sample 1

Abun

danc

e

Sample 2

Abun

danc

e

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 84: [2013.10.29] albertsen genomics metagenomics

Sequence composition-independent binning

Sample 1 Sample 2

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

Abun

danc

e

Abun

danc

e

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 85: [2013.10.29] albertsen genomics metagenomics

1. Reduce micro-diversity

2. Use multiple related samples

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 86: [2013.10.29] albertsen genomics metagenomics

1. Reduce micro-diversity

2. Use multiple related samples

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

Abundance Sample 1

Abun

danc

e Sa

mpl

e 2

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Binning

Page 87: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYH. Daims & C. Dorninger, DOME, University of Vienna

• Nitrospira enrichment running for years

• 3 dominant species

• No micro-diversity

Binning

Page 88: [2013.10.29] albertsen genomics metagenomics

Short term enrichment

Full-scale EBPR plantSBR reactor

Days 1. Reduction of (micro)-diversityCENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 89: [2013.10.29] albertsen genomics metagenomics

Short term enrichment

Full-scale EBPR plantSBR reactor

2. Two different

DNA extraction methods

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 90: [2013.10.29] albertsen genomics metagenomics

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 91: [2013.10.29] albertsen genomics metagenomics

Colored using a set of 100 phylogenetic marker genes

TM7-1 (1.6%)

TM7-2 (0.7%)

TM7-3 (0.2%)

TM7-4 (0.06%)

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 92: [2013.10.29] albertsen genomics metagenomics

Zoom on target

TM7-2 (0.7%)

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 93: [2013.10.29] albertsen genomics metagenomics

Zoom on target

PC2

PC1

TM7-2

PCA on genomic signatures

TM7-2 (0.7%)

Colored using a set of 100 phylogenetic marker genes

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 94: [2013.10.29] albertsen genomics metagenomics

Colored using a set of 100 phylogenetic marker genes

TM7-1 (1.6%)

Candidate phylum TM7

Saccharibacteria

Candidatus Saccharimonas aalborgensis

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Page 95: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

Phyla

Genes (HMM models)

Essential single copy genesAssembly inspection

Genome validation

Page 96: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.

http://madsalbertsen.github.io/multi-metagenome/Short: goo.gl/0ctA3

• Guides• Workflow scripts• Example data• All the code• Reccomendations

Multi-metagenome

Page 97: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

...add more samples!

Complex samples

S. M. Karst, AAU

Page 98: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

It’s just a potential!

..and a poorly translated description of it.

Page 99: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

Metabolites

Proteins

mRNA

DNA

Meta-bolomics

Meta-proteomics

Meta-transcriptomics

Meta-genomics

Data integration

In Situ methods

Community structure Microbial functions

Extraction

P-Removal:

N-Removal:

-Removal:

Foaming:

Ethanol production:

Microbial needsEcology

Understanding ecosystems

Page 100: [2013.10.29] albertsen genomics metagenomics

CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

G.W. Tyson

Per H. NielsenSimon J. McIllroySøren M. KarstEB group

C. Dorringer H. Daims M. WagnerP. Hugenholtz

University of Vienna

University of Queensland

Questions? @MadsAlbertsen85

[email protected]