microbial bioinformatics

36
Microbial Bioinformatics Keith A. Crandall, PhD, FAAS, FLS Director, Computational Biology Institute Director, GW Genomics Core Co-Director, Informatics, Clinical and Translational Science Institute CN Co-Director, Institute for Biomedical Sciences Genomics and Bioinformatics Program Professor, Department of Biostatistics and Bioinformatics, GWSPH Professor, Department of Biological Sciences, CCAS Research Associate, Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution

Upload: others

Post on 21-Dec-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microbial Bioinformatics

Microbial Bioinformatics

Keith A. Crandall, PhD, FAAS, FLSDirector, Computational Biology Institute

Director, GW Genomics CoreCo-Director, Informatics, Clinical and Translational Science Institute CN

Co-Director, Institute for Biomedical Sciences Genomics and Bioinformatics ProgramProfessor, Department of Biostatistics and Bioinformatics, GWSPH

Professor, Department of Biological Sciences, CCASResearch Associate, Department of Invertebrate Zoology, US National Museum of Natural History,

Smithsonian Institution

Page 2: Microbial Bioinformatics

16S rRNA Sequencing Timeline

Page 3: Microbial Bioinformatics

Mic

robi

al N

GS

Am

plico

n (ta

rget

ed)

sequ

encin

g

Page 4: Microbial Bioinformatics

• Gold standard bacteria and archaea (16S rRNA): variable (loops) and conserved (stems) regions

• Fungi (ITS)• Protozoa (18S rRNA)

Microbial NGS Amplicon (targeted) sequencing

16S rRNA

Page 5: Microbial Bioinformatics

Microbial NGSFrom microbial taxonomic profiles to biological questions

Phyla Genera

S1 S2 S3 S1 S2 S3

Phyla Sample1 Sample2 Sample3Actinobacteria 18.8 7.9 8.9Firmicutes 44.8 21.4 38.3Fusobacteria 3.4 2.2 4.8Proteobacteria 28.2 67.1 44.1

Page 6: Microbial Bioinformatics

Microbiome Analyses - Metagenomics

Page 7: Microbial Bioinformatics

16S - Metataxonomy

Page 8: Microbial Bioinformatics

16S – Advantages vs Disadvantages?

● Advantages

○ Cost

○ Samples

○ Ease of analysis

○ Reference databases

○ PCR based -> lower starting DNA template

● Disadvantages

○ Only a single locus

○ No functional information

○ Often not discriminatory at the species level – or even genus level

○ No strain differentiation

○ No pathogenicity inferences

○ No drug resistance inferences

Page 9: Microbial Bioinformatics

16S - Cost

Page 10: Microbial Bioinformatics

Approach

Page 11: Microbial Bioinformatics

What does an Illumina library need to look like?

p5 Index2 Rd1 seq primer Rd2 seq primerIndex1 p7

16S amplicon insert5’3’

3’5’

Page 12: Microbial Bioinformatics

Making amplicon libraries

16S gene

Rd2 primer overhang overhang

Rd1 primer overhang

5’

3’

5’

3’

*DNA is synthesized in the 5’ to 3’ direction

2-step PCR edition

Page 13: Microbial Bioinformatics

Making amplicon libraries

PCR Amplicon Rd2 primer overhang overhang

Rd1 primer overhang

5’

3’5’

3’

*DNA is synthesized in the 5’ to 3’ direction

2-step PCR edition

Product

Page 14: Microbial Bioinformatics

Making amplicon libraries

PCR Amplicon Rd2 primer overhang overhang

Rd1 primer overhang

5’

3’

5’

3’

*DNA is synthesized in the 5’ to 3’ direction

2-step PCR edition

5’

3’

3’

5’

p5 Index2

Index1 p7

Page 15: Microbial Bioinformatics

p5 Index2 Rd1 seq primer Rd2 seq primerIndex1 p7

16S amplicon insert5’3’

3’5’

Page 16: Microbial Bioinformatics

5’

3’

Index1 p7

Making amplicon libraries

16s gene5’

3’

5’

3’

*DNA is synthesized in the 5’ to 3’ direction

1-step PCR edition

3’

5’

p5Index2 misc.

misc.

Page 17: Microbial Bioinformatics

p5 Index2 Misc. seqs Misc. seqs Index1 p7

16S amplicon insert5’3’

3’5’

Misc. seq + gene-specific primer region used as custom sequencing primer

Page 18: Microbial Bioinformatics

One step PCR Primer StructureSB501 - Forward primer option

AATGATACGGCGACCACCGAGATCTACACCTACTATATATGGTAATTGTGTGCCAGCMGCCGCGGTAA

Adapter - Allows binding to the flow cellSB501 - Barcoded Primer - Different for every primerPad - Boost the primer melting temperatureLink - Anticomplementary to known sequencesV4f - 16S V4 region forward primer

Page 19: Microbial Bioinformatics

How many PCR steps?One-step PCR

● PROS○ Fewer steps○ Less optimization○ Less possibility for

contamination● CONS

○ Less options for optimization

○ Less sensitive ○ Expensive/less stable

primers

Two-step PCR

● PROS○ Well-established○ Highly sensitive○ Cheaper primers

● CONS○ Possibility of amplicon

contamination○ Higher possibility for user

error/contamination○ More steps○ More optimization

Page 20: Microbial Bioinformatics

Don’t Trust Your Data

Page 21: Microbial Bioinformatics

Tools & Databases● Mothur (mothur.org) – full 16S analysis suite● QIIME (qiime.org) – full 16S analysis suite● MG-RAST server (metagnomics.anl.gov) – 16S and WGS● PathoScope (GitHub) – 16S and WGS● CloVR (clovr.org) – 16S and WGS● Animalcules (R Shiny) – downstream hypothesis testing● DADA2 – 16S analysis suite, etc.

● Ribosomal Database Project (RDP)● GreenGenes● SILVA (arb-silva.de)

Page 22: Microbial Bioinformatics

Basic Analysis Steps● Remove all those adapters you put on for sequencing!● Remove unwanted reads and sequencing and PCR error

○ Read length, error score (remember fastq!)● Assemble paired ends to make a contig● Map contigs against a reference library● Call taxa

● Characterize Diversity (alpha

Page 23: Microbial Bioinformatics

QIIME2 Workflow

Page 24: Microbial Bioinformatics

From 16S rRNA fastq files to table of microbial abundance and taxonomy#ASV IDsample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 taxonomyASV1 23408 7345 38 1947 1066 82761 2679 1681 1135 1650 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia/ShigellaASV2 149 174 21237 2619 2344 58 61 26 2232 60 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; KlebsiellaASV3 68 141 0 0 7 0 0 0 28 18 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; ProteusASV4 11829 14760 1586 27 26 2084 41 1314 993 103 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV5 1395 0 551 2895 1010 1259 191 39 176 2003 Firmicutes; Bacilli; Lactobacillales; Aerococcaceae; AerococcusASV6 0 218 0 0 0 0 0 0 104 0 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; KlebsiellaASV7 353 39 12 58 12 22 37 0 30 17 Firmicutes; Bacilli; Lactobacillales; Enterococcaceae; EnterococcusASV8 0 0 2625 13431 55640 67 13 19 2414 502 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV9 0 0 0 5537 2332 25 18 20 19 1133 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; PluralibacterASV10 3984 0 128 1538 341 297 94 12 54 1170 Actinobacteria; Actinobacteria; Actinomycetales; Actinomycetaceae; ActinotignumASV11 74 7268 0 0 0 0 0 0 129 0 Firmicutes; Bacilli; Lactobacillales; Lactobacillaceae; LactobacillusASV12 56 63 29 23 91 12 38 0 512 648 Firmicutes; Bacilli; Bacillales; Staphylococcaceae; StaphylococcusASV13 0 0 0 0 0 0 0 0 0 0 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; CitrobacterASV14 0 0 0 17 0 8 7 0 46 0 Firmicutes; Bacilli; Lactobacillales; Lactobacillaceae; LactobacillusASV15 403 0 133 721 288 278 0 0 0 323 Firmicutes; Bacilli; Lactobacillales; Aerococcaceae; AerococcusASV16 409 0 20 101 0 50 0 0 52 445 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV17 374 17 0 0 0 28 17 0 114 16 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; LactococcusASV18 0 0 0 0 0 48 0 507 0 0 Bacteroidetes; Bacteroidia; Bacteroidales; Prevotellaceae; Prevotella_7ASV19 0 0 0 0 0 0 0 0 0 0 Actinobacteria; Actinobacteria; Bifidobacteriales; Bifidobacteriaceae; GardnerellaASV20 50 0 0 22 0 0 69 0 183 26 Firmicutes; Negativicutes; Selenomonadales; Veillonellaceae; Veillonella

Page 25: Microbial Bioinformatics

Power Considerations for Experimental Design

Page 26: Microbial Bioinformatics

Experimental Considerations – Sample Storage

Page 27: Microbial Bioinformatics

Experimental Considerations – Extraction Method

Page 28: Microbial Bioinformatics

Bias From Analysis Approaches● OTUs vs ASVs (operational taxonomic units, amplicon sequence

variants)● Bioinformatics pipeline● Reference database

● Lots to worry about!

Page 29: Microbial Bioinformatics

Operational Taxonomic Units● Why no species?● Same 16S, different genomes● Same species, different 16S● OTUS are clusters of sequences that

are within a small x% genetic distance from one another (typically 3%)

Page 30: Microbial Bioinformatics

mothur● QC● Cluster sequences with

97% identify● Form OTUs● Classify OTUs● Taxonomy table output

Page 31: Microbial Bioinformatics

How do you classify reads?● Align to a reference database● Silva is the most popular and has

collected data for over 20 years● >600 million sequences

Page 32: Microbial Bioinformatics

DADA2 Pipeline - ASVs

● More taxonomic Resolution

● ASVs are consistent

Callahan et al. Nature Methods 2016

Page 33: Microbial Bioinformatics

DADA2 will model sequencing error!

Page 34: Microbial Bioinformatics

Resolution and Accuracy

Abundance predictions in DADA2 (ASV) are more accurate than with mothur (OTUs)

Page 35: Microbial Bioinformatics

Summary● 16S data are informative for a diversity of questions in microbiome

research● They have an extreme cost advantage for analyzing large numbers

of samples● One needs to take care in sample collection, storage, DNA

extraction, PCR, data analyses, and reference databases to obtain accurate and replicable results

● There are a wide variety of tools available for QC and taxonomic assignment of 16S data. Then one needs to move to R for further statistical analyses.

Page 36: Microbial Bioinformatics

Tutorials!!

● QIIME2

● Muthor

● DADA2