making use of ngs data: from reads to trees and annotations

36
João André Carriço, PhD Microbiology Institute/Institute for Molecular Medicine Faculty of Medicine, University of Lisbon Portugal Making Use of NGS Data: from Reads to Trees and Annotations http://im.fm.ul.pt http://imm.fm.ul.pt http://www.joaocarrico.info WORKSHOP 24: NGS FOR MICROBIAL GENOMIC SURVEILLANCE AND MORE - ONE TECHNOLOGY FITS ALL

Upload: joao-andre-carrico

Post on 14-Jan-2017

571 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Making Use of NGS Data: From Reads to Trees and Annotations

João André Carriço, PhDMicrobiology Institute/Institute for Molecular MedicineFaculty of Medicine, University of LisbonPortugal

Making Use of NGS Data: from Reads to Trees and

Annotations

http://im.fm.ul.pthttp://imm.fm.ul.pthttp://www.joaocarrico.info

WORKSHOP 24:NGS FOR MICROBIAL

GENOMIC SURVEILLANCE AND MORE - ONE

TECHNOLOGY FITS ALL

Page 2: Making Use of NGS Data: From Reads to Trees and Annotations

Conflicts of interest

Nothing to disclose

Page 3: Making Use of NGS Data: From Reads to Trees and Annotations

Disclaimer This presentation is not intended to cover all available

software or databases (we would need several weeks or months to do that)

I’ll present what I use or intend to use in a near future

I gladly accept any suggestions to included on similar presentations in the future.

It is supposed to be interactive so ask away during the presentation.

Page 4: Making Use of NGS Data: From Reads to Trees and Annotations

Summary What is in the reads FASTQ files

Available Databases Virulence Factors and AMR DBs Sequence-based typing databases: Pubmlst.org / Enterobase

High Throughput Sequencing data analysis (freeware) Prokka Roary Nullabor Microreact.org PHYLOViZ

Commercial Solutions Bionumerics 7.5 CLC Genomics Workbench (CLC Bio) Ridom Seqsphere+

Page 5: Making Use of NGS Data: From Reads to Trees and Annotations

What is in the reads FASTQ files?

Isolate Genome*

Sequenced Reads

Slide Source: Nick Loman

Other isolates in the sequencing run

Contamination

* Chromosome + Plasmids + Phages

Page 6: Making Use of NGS Data: From Reads to Trees and Annotations

Databases

Page 7: Making Use of NGS Data: From Reads to Trees and Annotations

VF DatabasesVirulence Factor Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm) Pathosystems Resource Integration Center (PATRIC)

VF (https)://www.patricbrc.org/) Victors (http://www.phidias.us/victors/) PHI-Base (http://www.phi-base.org/) MvirDB (http://mvirdb.llnl.gov/ )

To know more: - Presentation on the Controversies in interpreting whole genome sequence data session : http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases

Page 8: Making Use of NGS Data: From Reads to Trees and Annotations

Antibiotic Resistance Databases Comprehensive Antibiotic Resistance Database

(CARD) (https://card.mcmaster.ca/)

Repository of Antibiotic resistance Cassetes (RAC) (http://rac.aihi.mq.edu.au/rac/)

Integrall :The integron database (http://integrall.bio.ua.pt/)

(…)

Page 9: Making Use of NGS Data: From Reads to Trees and Annotations

Sequence Based Typing :Pubmlst /BIGSdb

http://www.pubmlst.org

http://bigsdb.web.pasteur.fr/

Page 10: Making Use of NGS Data: From Reads to Trees and Annotations

Sequence Based Typing :Enterobase

slide by @happy_khan

Martin SergeantMark AchtmanNabil-Fareed AlikhanZhemin Zhou

Page 11: Making Use of NGS Data: From Reads to Trees and Annotations

Sequenced my strain…now what?

To know more : http://www.slideshare.net/nickloman/eccmid-2015-so-i-have-sequenced-my-genome-what-now

Reads(fastq files)

contigs(fasta files)

Annotated contigs(gbk/gff files)

Roary :Pan Genome Analysis

Enterobase BIGSdb

Nullabor

PHYLOViZ:Tree + metada visualization

Microreact.org: Tree +metadata +vizualization

Prok

ka

De novo assembler

Page 12: Making Use of NGS Data: From Reads to Trees and Annotations

Prokka Genome annotation made easy by

Torsten Seemann (slides by Torsten) Genome annotation: adding

biological information to the sequence, by describing features

To know more :http://www.slideshare.net/torstenseemann/prokka-rapid-bacterial-genome-annotation-abphm-2013

Available at: https://github.com/tseemann/prokka

Page 13: Making Use of NGS Data: From Reads to Trees and Annotations

Roary Pan genome analysis by Andrew Page Available at: https://sangerpathogens.github.io/Roary/

Core genome

Accessory genome

Pan-genome

Page 14: Making Use of NGS Data: From Reads to Trees and Annotations

Roary Inputs: Annotated de novo assemblies (GFF files)

• Typically from the annotation pipeline

Outputs:• Spreadsheet with presence and absence of genes• Multi-FASTA alignment of core genes so you can build a tree without a

reference• Multi-FASTA alignments for each gene• Plots for the open/closed genome, unique genes• Integrates with Phandango so you can visualise all structural variation• QC report from Kraken to help identify suspect samples

(Slide by Andrew Page)

Page 15: Making Use of NGS Data: From Reads to Trees and Annotations

Roary outputs

Core (n or n-1 strains)

Soft-Core (n-2 or n-3 strains)

Shell( 8(?) to n-3 strains)

Cloud( <8 (?) strains)

Core genome:Core + Soft-Core

Accessory genome:Shell + Cloud

Page 16: Making Use of NGS Data: From Reads to Trees and Annotations

Roary outputs

iCANDY output of presence and absence of genes in accessory genome.S. Weltevreden & public S. enterica genomes

(Slide by Andrew Page)

Page 17: Making Use of NGS Data: From Reads to Trees and Annotations

Nullarbor Complete pipeline from reads to reports by Torsten

Seemann

Objective is automate analysis for everyday use on public health labs /research settings

Uses and distills outputs by a lot of software

Avaliable at: https://github.com/tseemann/nullarbor

Page 18: Making Use of NGS Data: From Reads to Trees and Annotations

Nullarbor

Slide by Torsten Seeman

Page 19: Making Use of NGS Data: From Reads to Trees and Annotations

Nullarbor

From: https://github.com/tseemann/nullarbor

Page 20: Making Use of NGS Data: From Reads to Trees and Annotations

Some Nullarbor outputs in report

Slides by Torsten Seeman

Page 21: Making Use of NGS Data: From Reads to Trees and Annotations

PHYLOViZwww.phyloviz.net

Page 22: Making Use of NGS Data: From Reads to Trees and Annotations

PHYLOViZInputs:- Tab separated txt

(profiles)- Fasta files- Automatic database

retrieval (MLST) Outputs:• goeBURST and

goeBURST MST• Link quality assessment• High quality images

Can be easily applied to:- MLST/ cgMLST/wgMLST- MLVA- SNP data*- Gene Presence/absence

Page 23: Making Use of NGS Data: From Reads to Trees and Annotations

PHYLOViZ 2.0

New features: • Hierarchical clustering • Neighbor-Joining• Project Saving

Page 24: Making Use of NGS Data: From Reads to Trees and Annotations

PHYLOViZ Online Available at http://online.phyloviz.net

Web based version of PHYLOViZ

Allows users to create their own datasets, save them and share their data (privately or publicly)

REST API available

Scalable to thousands of nodes

Tree Analysis tools: Interactive distance matrix NLV graph

Page 25: Making Use of NGS Data: From Reads to Trees and Annotations

PHYLOViZ Online

Slide by @happy_khan

Page 26: Making Use of NGS Data: From Reads to Trees and Annotations

PHYLOViZ Online

Page 27: Making Use of NGS Data: From Reads to Trees and Annotations

PHYLOViZ Online

NLV Graph

Tree cut-off

Full MST

Page 28: Making Use of NGS Data: From Reads to Trees and Annotations

microreact.org

Page 29: Making Use of NGS Data: From Reads to Trees and Annotations

microreact.org

Page 30: Making Use of NGS Data: From Reads to Trees and Annotations

microreact.org

Create Selections

Change tree options

Page 31: Making Use of NGS Data: From Reads to Trees and Annotations

microreact.org Available at http://microreact.org/

Presentation on session Harnessing whole genome sequence data for public health applications : Novel open access tools for WGS-based pathogen surveillance and the identification of high-risk clones

http://eccmidlive.org/#resources/novel-open-access-tools-for-wgs-based-pathogen-surveillance-and-the-identification-of-high-risk-clones

Page 32: Making Use of NGS Data: From Reads to Trees and Annotations

Meet The Experts (available on twitter by order of appearance)

Page 33: Making Use of NGS Data: From Reads to Trees and Annotations

Commercial solutions

• Ridom Seqsphere+ : http://www.ridom.de/seqsphere/ • Applied Maths Bionumerics 7.6: http://www.applied-maths.com/bionumerics• CLCBio Genomic Workbench : http://www.clcbio.com/blog/clc-genomics-workbench-7-5/

Page 34: Making Use of NGS Data: From Reads to Trees and Annotations

Take home messages• Huge variety of software and database

solutions

• There is no single One-Size-Fits-All solution (job security for bioinformaticians)

• Different questions require different approaches

• Always question the results and data provenance

Page 35: Making Use of NGS Data: From Reads to Trees and Annotations

ECCMID2015 Meet-the-expert session on “What bioinformatic tools should I use for analysis of High Throughput Sequencing data for molecular diagnostics? ”

Nick Loman: http://www.slideshare.net/nickloman/eccmid-2015-meettheexpert-bioinformatics-tools

João André Carriço: http://www.slideshare.net/joaoandrecarrico/eccmid-meet-theexpert2015

More references/presentations

Page 36: Making Use of NGS Data: From Reads to Trees and Annotations

Acknowledgments UMMI Members

Bruno Gonçalves Mário Ramirez José Melo-Cristino

INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS