eccmid 2016 - how to build actionable virulome databases

23
Assessing virulence from genomic data - which virulome database? João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbon [email protected] twitter: @jacarrico on SY024 Controversies in interpreting whole genome sequence data CCMID, Amsterdam, Netherlands April 2016

Upload: joao-andre-carrico

Post on 14-Jan-2017

352 views

Category:

Science


0 download

TRANSCRIPT

Page 1: ECCMID 2016 - How to build actionable virulome databases

Assessing virulence from genomic data - which virulome database?

João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico

Session SY024 Controversies in interpreting whole genome sequence data26th ECCMID, Amsterdam, Netherlands 7-12 April 2016

Page 2: ECCMID 2016 - How to build actionable virulome databases

How can we design actionable virulome databases

João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico

Session SY024 Controversies in interpreting whole genome sequence data26th ECCMID, Amsterdam, Netherlands 7-12 April 2016

Page 3: ECCMID 2016 - How to build actionable virulome databases

What is a virulence factor?Virulence Factors: Class of gene products Help pathogens to invade the host

and evade specific host’s defensive mechanisms

Enhance the pathogen’s potential to cause disease

Page 4: ECCMID 2016 - How to build actionable virulome databases

What is a virulence factor?Virulence Factors (example): Bacterial toxins (Endotoxins and Exotoxins) Adherence factors (Pili) Cell surface carbohydrates and proteins that

protect a bacterium (Streptococcal M Protein) Hydrolytic enzymes that may contribute to the

pathogenicity of the bacterium (hyaluronidase) Factors to compete with host nutrient uptake

(Siderophores)Sources: VFDB / Medical Microbiology. 4th edition. (http://www.ncbi.nlm.nih.gov/books/NBK7627/)

Page 5: ECCMID 2016 - How to build actionable virulome databases

Too much –ome will kill you…

Virulome

Core genome Accessory genomeMobilom

e

Page 6: ECCMID 2016 - How to build actionable virulome databases

“Virulome” Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm) Pathosystems Resource Integration Center

(PATRIC) VF (https)://www.patricbrc.org/) Victors (http://www.phidias.us/victors/) PHI-Base (http://www.phi-base.org/) MvirDB (http://mvirdb.llnl.gov/ )

Criteria for choice: Focused mainly on virulence factors DB (as defined in the first slide) excludes Antibiotic resistance databases (CARD, ARDB,ARGO, RAC,…)

Page 7: ECCMID 2016 - How to build actionable virulome databases

VFDB

* Created to facilitate the screening of HTS data

Database last update: Tue Feb 23 22:05:25 2016

Page 8: ECCMID 2016 - How to build actionable virulome databases

PATRIC VF• 6 NIAID priority genera:

• Mycobacterium• Salmonella• Escherichia• Shigella• Listeria• Bartonella

• 1572 VFs• 1071 articles• Use of controlled vocabulary• Integrates VFDB and Victors VF information

• PATRIC supports:• Genome annotation• Comparative Genomics• Transcriptomics• Pathways• Host-pathogen interaction• Disease-related information

• Database last update:• March 2016

Pathosystems Resource Integration Center

Page 9: ECCMID 2016 - How to build actionable virulome databases

Victors

• 5177 Virulence Factors• 126 Pathogens (class/#sp/#VFs):

• Gram + 15 1160 • Gram – 36 3488 • Virus 54 179 • Parasites 13 105 • Fungi 8 245

• Last DB Update: 27/8/2014

Page 10: ECCMID 2016 - How to build actionable virulome databases

PHI-base

• pathogenicity, virulence and effector genes• Fungal• Oomycete • bacterial pathogens

• Hosts:• Animal• Plant • Fungal• Insect hosts.

Page 11: ECCMID 2016 - How to build actionable virulome databases

mVirDB

• Biodefense focused• Last update 2007??• Data still available for download..

Page 12: ECCMID 2016 - How to build actionable virulome databases

Greatest strengths All the databases have:

manually curated data links for the original publication

However manual curation is a huge caveat due to the sustainability of the process

Page 13: ECCMID 2016 - How to build actionable virulome databases

How to use these resources Querying annotation in the the

website

Selecting species of interest, and browsing the website

BLAST query for DNA or Protein

Page 14: ECCMID 2016 - How to build actionable virulome databases

How to use these resources Download the gene/protein

databases and use them as templates for searching own data

Page 15: ECCMID 2016 - How to build actionable virulome databases

How to use these resources

MVLST/MLST-v

Page 16: ECCMID 2016 - How to build actionable virulome databases

How to use these resources With HTS several core genome /whole genome MLST schemas are becoming

available/being developed: Neisseria sp. Campylobacter sp. Staphylococcus aureus Legionella pneumophila Listeria monocitogenes Enterococcus faecium Mycobacterium tuberculosis Acinetobacter baumannii Salmonella enterica E.coli ….

Loci in these schemas can be annotated / linked to the Virulence Factor DBs for automatic allele annotation through these systems

Seqsphere+

http://pubmlst.org/http://bigsdb.web.pasteur.fr/https://enterobase.warwick.ac.uk/

Bionumerics 7.5

Page 17: ECCMID 2016 - How to build actionable virulome databases

Back to the title So far we have seen what is

available

How can we design actionable virulome

databases ?Actionable: able to be done or acted on; having practical value New Oxford American Dictionary

Page 18: ECCMID 2016 - How to build actionable virulome databases

Bioinformatics needs Available databases still lack interfaces

for programmatic access : RESTful APIs would allow:

▪ easy automatic querying from scripts without the need of web interfaces or downloads

▪ Database updates by authorized groups (distributed curation effort)

APIs : Application Programming Interfaces

Page 19: ECCMID 2016 - How to build actionable virulome databases

Bioinformatics needs Existing DBs reuse each others datasets without true

database interoperability: need for common ontologies (controlled vocabularies already exist but are not used by all)

Ontologies and computer readable data formats (json-ld or RDF) can allow for true database interoperability allowing bioinformaticians to extract the targeted information from a single query reaching multiple databases

Page 20: ECCMID 2016 - How to build actionable virulome databases

Controlled vocabularies and Ontologies

Trends Microbiol 17, 279–285 (2009).

Page 21: ECCMID 2016 - How to build actionable virulome databases

Sustainability needs

Major problems of databases Manual curation still a necessity Academic model for sustainability of a

resource: lack of funding leads to “dead” databases

Page 22: ECCMID 2016 - How to build actionable virulome databases

Take home messages Existing virulome databases provide a wealth of data

A large part of the available VF data overlaps between DBs. The overlap largely depends of the last database update and what was included.

They are always a Work in Progress , heavily relying in manual curation

Novel HTS based techniques such as cg/wgMLST can use this databases to annotate schemas and provide a much richer picture of VF diversity at DNA/Protein level.

on VF

Page 23: ECCMID 2016 - How to build actionable virulome databases

Acknowledgments UMMI Members

Mário Ramirez José Melo-Cristino

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC)

INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS