bioinformatics tools for the diagnostic laboratory - t.seemann - antimicrobials 2016 - melb, au -...

61
Bioinformatic tools for the diagnostic laboratory A/Prof Torsten Seemann Victorian Life Sciences Computation Initiative (VLSCI) Microbiological Diagnostic Unit Public Health Laboratory (MDU PHL) Doherty Applied Microbial Genomics (DAMG) The University of Melbourne ASA 2016 - Melbourne, AU - Sat 27 Feb 2016

Upload: torsten-seemann

Post on 14-Jan-2017

1.923 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Bioinformatic tools for the diagnostic laboratory

A/Prof Torsten Seemann

Victorian Life Sciences Computation Initiative (VLSCI)Microbiological Diagnostic Unit Public Health Laboratory (MDU PHL)

Doherty Applied Microbial Genomics (DAMG)The University of Melbourne

ASA 2016 - Melbourne, AU - Sat 27 Feb 2016

Page 2: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Doherty Applied Microbial Genomics

Lead bioinformatician ♥ microbial genomics

Page 3: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Whole genome sequencing

Page 4: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

The currency of genomics

Reads

Reads are stored in FASTQ files

Genome

Page 5: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Types of sequence reads

100 - 300 bp (paired)

100 - 400 bp

5,000 - 15,000+ bp

5,000 - 50,000+ bp

Page 6: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

What data do we really have?

Isolate genomeSequenced reads

Other isolates in sequencing run

ContaminationSequencing adaptorsSpike-in controls eg. phiX

Unsequenced regions

Page 7: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Do we have enough data?

∷ Depth: expressed as fold-coverage of genome eg. 25x: means each base sequenced 25 times (on average)

∷ Coverage: the % of genome sequenced with depth > 0

25x

Page 8: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Genome data itself is of limited value.

Needs “extra” information

□ location: Australia 37.8S,145.0E □ date: 2015 2015-07-20□ source: human 60yo male faecal swab□ etc.

Metadata

Page 9: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Got my reads, now what?

Page 10: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016
Page 11: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Two options

∷ De novo genome assembly: reconstruct original sequence from reads alone: like a giant jigsaw puzzle: “create”

∷ Align to reference: identify where each read fits on a related genome: can not always be uniquely placed: “compare”

Page 12: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

De novo genome assemblyAmplified DNA

Shear DNA

Sequenced reads

Overlaps

Layout

Consensus ↠ “Contigs”

Page 13: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

The effect of read length

250 bp - Illumina - $200 8000 bp - Pacbio - $2000

Page 14: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

The problem with repeatsRepeat copy 1 Repeat copy 2

Collapsed repeat consensus

1 locus

4 contigs

Page 15: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Align to referenceSeven short 4bp readsAGTC TTAC GGGA CTTT

TAGG TTTA ATAG

Aligned to 31bp referenceAGTCTTTATTATAGGGAGCCATAGCTTTACAAGTC TAGG ATAG TTAC

TTTA GGGA CTTT

Page 16: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Eight short 4bp readsAGTC TTAC GGGA CTTT

TAGG TTTA ATAG TTAT

Aligned to 31bp referenceAGTCTTTATTATAGGGAGCCATAGCTTTACAAGTC TAGG ATAG TTAC

TTTA GGGA CTTT TTAT TTAT

Ambiguous alignment

D’oh!

Page 17: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Best practice

■ Use both approaches□ reference-based + de novo

■ Best of both worlds□ and worst of both worlds - interpretation is non-trivial

■ Still need□ good epidemiology, metadata and domain knowledge!

Page 18: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

The one true assay?

Page 19: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Applications of WGS

∷ Diagnostics: species ⇒ subspecies ⇒ strain identification: in silico antibiogram and virulence profile

∷ Surveillance: in silico genotyping - MLST, serotyping, VNTR, MLVA: what’s lurking in our hospital/community?

∷ Forensics: outbreak detection: source tracking

Page 20: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Isolate identification

∷ Can be done in seconds∷ Directly from reads (or subset)

∷ Scan against index of unique k-mers (oligoes)∷ Species level accurate (on average)

∷ Great for quality control !

Kraken,MetaPhlan,OneCodex

Page 21: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

One Codex example metagenome output

Page 22: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Antibiogram

∷ The “resistome”

∷ Resistance specific genes: we have good databases of these: easy to identify to exact allele eg. blaNDM-9

∷ New alleles conferring resistance: databases are poor (exceptions include M.tb): novel mechanisms arrive de novo

ResFinder, CARD, ARG-Annot

SRST2, ABRicate

Page 23: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

ABRicate example E.faecium outputSTART END GENE COVERAGE COVERAGE_MAP GAPS %COVERAGE %IDENTITY

7140 7902 erm(B) 1-762/762 ========/====== 1 100.00 99.08

8627 9421 aph(3')-III 1-795/795 =============== 0 100.00 100.00

11040 11948 ant(6)-Ia 1-345/909 =====.......... 0 35.00 100.00

15456 16257 lnu(B) 1-804/804 ========/====== 2 99.75 99.63

573128 575046 tet(M) 1-1920/1920 ========/====== 1 99.95 99.95

770130 770792 VanR-B 1-663/663 =============== 0 100.00 99.25

770792 772135 VanS-B 1-1344/1344 =============== 0 100.00 99.63

772306 773112 VanY-B 1-807/807 =============== 0 100.00 100.00

773130 773957 VanW-B 1-828/828 =============== 0 100.00 97.58

773954 774925 VanH-B 1-972/972 =============== 0 100.00 99.38

774918 775946 VanA-B 1-1029/1029 =============== 0 100.00 98.93

775952 776560 VanX-B 1-609/609 =============== 0 100.00 96.72

2352083 2352631 aac(6')-Ii 1-549/549 =============== 0 100.00 99.64

2789984 2791462 msr(C) 1-1479/1479 =============== 0 100.00 98.92

Page 24: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Virulence profile

∷ The “virulome”

∷ Curated databases : known virulence genes: pathogenicity islands

∷ Caveats: variable representation across organisms

VirulenceFinder,VFDB, MvirDB,

ViPR, PAI DB

Page 25: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Backward compatibility

MLSTResistomeVirulomeNG-MAST

MLVAVNTR

SerotypingPhage typing

PFGE

SRST2, mlst, ngmaster, lissero,and many more!

Page 26: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

When typing lets us down

Page 27: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Typing resolution

Page 28: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Focus on a small “informative” section

Page 29: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Genotype shows isolates are related

Page 30: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

D’oh!

Page 31: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Exploiting the whole genome

Page 32: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

A familiar tree

Page 33: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Every SNP is sacred

∷ Chocolate bar tree: branches were based on phenotypic attributes: size, colour, filling, texture, ingredients, flavour

∷ Genomic trees: want to use every part of the genome sequence: need to find all differences between isolates

Page 34: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Finding differences

AGTCTGATTAGCTTAGCTTGTAGCGCTATATTATAGTCTGATTAGCTTAGAT

ATTAGCTTAGATTGTAG

CTTAGATTGTAGC-C

TGATTAGCTTAGATTGTAGC-CTATAT

TAGCTTAGATTGTAGC-CTATATT

TAGATTGTAGC-CTATATTA

TAGATTGTAGC-CTATATTAT

SNP Deletion

Reference

Reads

Snippy, VarScan, SAMtools, GATKand many more!

Page 35: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

SNP distance matrix

Page 36: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Annotated tree

∷ 1 SNP resolution

∷ Distinguishes clades within genotypes

∷ Interpretation is not straightforward

10 SNPs

L. monocytogenes

Page 37: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Same tree!

Dendrogram

Spanning

Radial

Page 38: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Reference based analysis

∷ Implies you have a “close” reference: need to be careful with draft genomes

∷ Very sensitive: single mutation precision

∷ May not be complete: ignores novel DNA in your isolate

Page 39: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016
Page 40: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Inferring transmission

∷ Identical sequence does not imply transmission

∷ Easier to rule out than in

Page 41: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

The pan genome

Page 42: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Align all your isolate genomes

Page 43: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Find “common” segments

Page 44: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

The core genome

Core is common to all & has similar sequence.

Page 45: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Example pan genome Roary, LS-BSR, OrthoMCL, Degust

Rows are genomes, columns are genes.

Page 46: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Core

∷ Common DNA∷ Vertical evolution

∷ Genotyping∷ Phylogenetics

∷ Novel DNA∷ Lateral transfer∷ Plasmids∷ Mobile elements

∷ Partly unexploited

Accessory

Page 47: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Progress at MDU-PHL

Page 48: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Traditional workflow

Page 49: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Modern workflow

Page 50: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Nullarbor

∷ Software pipeline: does “reads to report”: cloud image available (mGVL)

∷ Under active development: used at MDU-PHL for past year for routine jobs: also used by USA CDC Enterics, FSS Qld, and research

∷ National access programme underway

null arbor“no trees”

Page 51: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Doherty Applied Microbial Genomics

■ Non-profit service available□ fixed price per isolate

■ Genome sequencing□ Illumina NextSeq 500

■ Bioinformatics analysis□ Nullarbor

■ Report□ QC, typing, resistome, phylogeny□ plus your raw data

Page 52: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Sharing is caring

Page 53: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Open science

∷ Crowd-sourcing provably works: EHEC outbreak 2011: Ebola, MERS, Zika

∷ But only if people share: sequencing data: metadata: software source code for analysis

Page 54: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

GenomeTrakr

∷ International cooperation : Led by FDA + NCBI: >20 collaborating institutes inc. UK PHE, DK DTU, MX: Salmonella and Listeria

∷ Public SRA BioProject #183844 : Real-time submission of WGS genome reads: Nightly updates of phylogenomic trees: Contains ~25,000 strains of Salmonella

Page 55: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

“GenomeTrakka”

∷ A shared online system for all Australian labs: upload samples: automated standard/specific analyses: simple reports and visualization: easy to submit to international archives (SRA)

∷ Access control

: each lab controls their own data: jurisdictions can share data in national outbreaks

Page 56: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Final thoughts

Page 57: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Does WGS deliver?

Yes!Bioinformatics Epidemiology

Technology

Microbiology

This meansscientists

not just software

Domain expertise

Always changing...

Page 58: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Acknowledgements

Ben HowdenTim Stinear

Dieter BulachJason Kwong

Anders G da Silva

Page 59: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016
Page 60: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

Contact

tseemann.github.io

[email protected]

@torstenseemann

Page 61: Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobials 2016 - Melb, AU - sat 27 feb 2016

The EndThank you for listening.