microbial bioinformatics

Post on 21-Dec-2021

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Microbial Bioinformatics

Keith A. Crandall, PhD, FAAS, FLSDirector, Computational Biology Institute

Director, GW Genomics CoreCo-Director, Informatics, Clinical and Translational Science Institute CN

Co-Director, Institute for Biomedical Sciences Genomics and Bioinformatics ProgramProfessor, Department of Biostatistics and Bioinformatics, GWSPH

Professor, Department of Biological Sciences, CCASResearch Associate, Department of Invertebrate Zoology, US National Museum of Natural History,

Smithsonian Institution

16S rRNA Sequencing Timeline

Mic

robi

al N

GS

Am

plico

n (ta

rget

ed)

sequ

encin

g

• Gold standard bacteria and archaea (16S rRNA): variable (loops) and conserved (stems) regions

• Fungi (ITS)• Protozoa (18S rRNA)

Microbial NGS Amplicon (targeted) sequencing

16S rRNA

Microbial NGSFrom microbial taxonomic profiles to biological questions

Phyla Genera

S1 S2 S3 S1 S2 S3

Phyla Sample1 Sample2 Sample3Actinobacteria 18.8 7.9 8.9Firmicutes 44.8 21.4 38.3Fusobacteria 3.4 2.2 4.8Proteobacteria 28.2 67.1 44.1

Microbiome Analyses - Metagenomics

16S - Metataxonomy

16S – Advantages vs Disadvantages?

● Advantages

○ Cost

○ Samples

○ Ease of analysis

○ Reference databases

○ PCR based -> lower starting DNA template

● Disadvantages

○ Only a single locus

○ No functional information

○ Often not discriminatory at the species level – or even genus level

○ No strain differentiation

○ No pathogenicity inferences

○ No drug resistance inferences

16S - Cost

Approach

What does an Illumina library need to look like?

p5 Index2 Rd1 seq primer Rd2 seq primerIndex1 p7

16S amplicon insert5’3’

3’5’

Making amplicon libraries

16S gene

Rd2 primer overhang overhang

Rd1 primer overhang

5’

3’

5’

3’

*DNA is synthesized in the 5’ to 3’ direction

2-step PCR edition

Making amplicon libraries

PCR Amplicon Rd2 primer overhang overhang

Rd1 primer overhang

5’

3’5’

3’

*DNA is synthesized in the 5’ to 3’ direction

2-step PCR edition

Product

Making amplicon libraries

PCR Amplicon Rd2 primer overhang overhang

Rd1 primer overhang

5’

3’

5’

3’

*DNA is synthesized in the 5’ to 3’ direction

2-step PCR edition

5’

3’

3’

5’

p5 Index2

Index1 p7

p5 Index2 Rd1 seq primer Rd2 seq primerIndex1 p7

16S amplicon insert5’3’

3’5’

5’

3’

Index1 p7

Making amplicon libraries

16s gene5’

3’

5’

3’

*DNA is synthesized in the 5’ to 3’ direction

1-step PCR edition

3’

5’

p5Index2 misc.

misc.

p5 Index2 Misc. seqs Misc. seqs Index1 p7

16S amplicon insert5’3’

3’5’

Misc. seq + gene-specific primer region used as custom sequencing primer

One step PCR Primer StructureSB501 - Forward primer option

AATGATACGGCGACCACCGAGATCTACACCTACTATATATGGTAATTGTGTGCCAGCMGCCGCGGTAA

Adapter - Allows binding to the flow cellSB501 - Barcoded Primer - Different for every primerPad - Boost the primer melting temperatureLink - Anticomplementary to known sequencesV4f - 16S V4 region forward primer

How many PCR steps?One-step PCR

● PROS○ Fewer steps○ Less optimization○ Less possibility for

contamination● CONS

○ Less options for optimization

○ Less sensitive ○ Expensive/less stable

primers

Two-step PCR

● PROS○ Well-established○ Highly sensitive○ Cheaper primers

● CONS○ Possibility of amplicon

contamination○ Higher possibility for user

error/contamination○ More steps○ More optimization

Don’t Trust Your Data

Tools & Databases● Mothur (mothur.org) – full 16S analysis suite● QIIME (qiime.org) – full 16S analysis suite● MG-RAST server (metagnomics.anl.gov) – 16S and WGS● PathoScope (GitHub) – 16S and WGS● CloVR (clovr.org) – 16S and WGS● Animalcules (R Shiny) – downstream hypothesis testing● DADA2 – 16S analysis suite, etc.

● Ribosomal Database Project (RDP)● GreenGenes● SILVA (arb-silva.de)

Basic Analysis Steps● Remove all those adapters you put on for sequencing!● Remove unwanted reads and sequencing and PCR error

○ Read length, error score (remember fastq!)● Assemble paired ends to make a contig● Map contigs against a reference library● Call taxa

● Characterize Diversity (alpha

QIIME2 Workflow

From 16S rRNA fastq files to table of microbial abundance and taxonomy#ASV IDsample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 taxonomyASV1 23408 7345 38 1947 1066 82761 2679 1681 1135 1650 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia/ShigellaASV2 149 174 21237 2619 2344 58 61 26 2232 60 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; KlebsiellaASV3 68 141 0 0 7 0 0 0 28 18 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; ProteusASV4 11829 14760 1586 27 26 2084 41 1314 993 103 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV5 1395 0 551 2895 1010 1259 191 39 176 2003 Firmicutes; Bacilli; Lactobacillales; Aerococcaceae; AerococcusASV6 0 218 0 0 0 0 0 0 104 0 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; KlebsiellaASV7 353 39 12 58 12 22 37 0 30 17 Firmicutes; Bacilli; Lactobacillales; Enterococcaceae; EnterococcusASV8 0 0 2625 13431 55640 67 13 19 2414 502 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV9 0 0 0 5537 2332 25 18 20 19 1133 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; PluralibacterASV10 3984 0 128 1538 341 297 94 12 54 1170 Actinobacteria; Actinobacteria; Actinomycetales; Actinomycetaceae; ActinotignumASV11 74 7268 0 0 0 0 0 0 129 0 Firmicutes; Bacilli; Lactobacillales; Lactobacillaceae; LactobacillusASV12 56 63 29 23 91 12 38 0 512 648 Firmicutes; Bacilli; Bacillales; Staphylococcaceae; StaphylococcusASV13 0 0 0 0 0 0 0 0 0 0 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; CitrobacterASV14 0 0 0 17 0 8 7 0 46 0 Firmicutes; Bacilli; Lactobacillales; Lactobacillaceae; LactobacillusASV15 403 0 133 721 288 278 0 0 0 323 Firmicutes; Bacilli; Lactobacillales; Aerococcaceae; AerococcusASV16 409 0 20 101 0 50 0 0 52 445 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV17 374 17 0 0 0 28 17 0 114 16 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; LactococcusASV18 0 0 0 0 0 48 0 507 0 0 Bacteroidetes; Bacteroidia; Bacteroidales; Prevotellaceae; Prevotella_7ASV19 0 0 0 0 0 0 0 0 0 0 Actinobacteria; Actinobacteria; Bifidobacteriales; Bifidobacteriaceae; GardnerellaASV20 50 0 0 22 0 0 69 0 183 26 Firmicutes; Negativicutes; Selenomonadales; Veillonellaceae; Veillonella

Power Considerations for Experimental Design

Experimental Considerations – Sample Storage

Experimental Considerations – Extraction Method

Bias From Analysis Approaches● OTUs vs ASVs (operational taxonomic units, amplicon sequence

variants)● Bioinformatics pipeline● Reference database

● Lots to worry about!

Operational Taxonomic Units● Why no species?● Same 16S, different genomes● Same species, different 16S● OTUS are clusters of sequences that

are within a small x% genetic distance from one another (typically 3%)

mothur● QC● Cluster sequences with

97% identify● Form OTUs● Classify OTUs● Taxonomy table output

How do you classify reads?● Align to a reference database● Silva is the most popular and has

collected data for over 20 years● >600 million sequences

DADA2 Pipeline - ASVs

● More taxonomic Resolution

● ASVs are consistent

Callahan et al. Nature Methods 2016

DADA2 will model sequencing error!

Resolution and Accuracy

Abundance predictions in DADA2 (ASV) are more accurate than with mothur (OTUs)

Summary● 16S data are informative for a diversity of questions in microbiome

research● They have an extreme cost advantage for analyzing large numbers

of samples● One needs to take care in sample collection, storage, DNA

extraction, PCR, data analyses, and reference databases to obtain accurate and replicable results

● There are a wide variety of tools available for QC and taxonomic assignment of 16S data. Then one needs to move to R for further statistical analyses.

Tutorials!!

● QIIME2

● Muthor

● DADA2

top related