from biological data to clinical applications: positioning a digital infrastructure for the future...

55
From biological data to clinical applications: positioning a digital infrastructure for the future of biomedicine 1 Michel Dumontier , Ph.D. Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University Professeur Associé, Université Laval Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering DERI::Digital Infrastructure for Biomedicine

Upload: michel-dumontier

Post on 07-May-2015

1.667 views

Category:

Technology


0 download

DESCRIPTION

In the quest to translate the results of life science research into effective clinical applications, many are now turning their attention to and also trying to make sense of the large and rapidly growing amount of biological and biomedical data. Indeed, getting a grip on and keeping on top of the daily flood of new information, whether it be the latest in clinical reviews, scientific reports, or raw data is an ever-present and widely-recognized challenge. The limited access to structured, integrated and citable data limits our ability to exploit a rich source of scientific knowledge for clinical and translational research. While keeping the dual goals of increasing our understanding of how living systems respond to chemical agents and translating our combined knowledge into clinical applications, I will discuss our efforts to leverage SemanticWeb technologies to facilitate the formulation, publication, integration, and discovery of biological facts, expert knowledge and services of value to pharmaceutical and clinical research, and more recently, with applications for the patient-centric delivery of health care.

TRANSCRIPT

Page 1: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

From biological data to clinical applications: positioning a digital infrastructure for the

future of biomedicine

1

Michel Dumontier, Ph.D.

Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University

Professeur Associé, Université Laval

Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering

DERI::Digital Infrastructure for Biomedicine

Page 2: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

2 DERI::Digital Infrastructure for Biomedicine

Page 3: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

DERI::Digital Infrastructure for Biomedicine 3

Page 4: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

4 DERI::Digital Infrastructure for Biomedicine

Page 5: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

uncovering a sufficient amount of evidence to support/refute a hypothesis is becoming increasingly difficult

it requires a lot of digging around

DERI::Digital Infrastructure for Biomedicine 5

Page 6: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

continuous growth in research literature

6

Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html

DERI::Digital Infrastructure for Biomedicine

Page 7: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

access to increasing amounts of biomedical data

7 DERI::Digital Infrastructure for Biomedicine

Page 8: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

access to the most effective software to predict, compare and evaluate

8 DERI::Digital Infrastructure for Biomedicine

Page 9: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

ultimately, we answer questions by building sophisticated workflows

9 DERI::Digital Infrastructure for Biomedicine

Page 10: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

What if we could automatically answer a question using available data and services?

10 DERI::Digital Infrastructure for Biomedicine

Page 11: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

The Semantic Web is the new global web of knowledge

11 DERI::Digital Infrastructure for Biomedicine

It involves standards for publishing, sharing and querying facts, expert knowledge and services

It is a scalable approach to the

discovery of independently formulated and distributed knowledge

Page 12: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Link all the data!!!

DERI::Digital Infrastructure for Biomedicine 12

Page 13: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

something you can search, lookup, link to, query for

and check consistency and veracity of

13 DERI::Digital Infrastructure for Biomedicine

Page 14: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

an emerging linked data network

14 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” DERI::Digital Infrastructure for Biomedicine

Page 15: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Life Science Data Contributors

• Bio2RDF • Chem2Bio2RDF • LODD (HCLS)

DERI::Digital Infrastructure for Biomedicine 15

Page 16: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

• > 40 biological datasets from independent

providers • > 3 billion triples

DERI::Digital Infrastructure for Biomedicine 16

Page 17: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

17

linked data for the life sciences

An Open Source Project for the Provision of Scalable, Decentralized Data with Global Mirroring

and Customizable Query Resolution

Francois Belleau, Laval University Marc-Alexandre Nolin, Laval University

Peter Ansell, Queensland University of Technology Michel Dumontier, Carleton University

DERI::Digital Infrastructure for Biomedicine

Page 18: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Bio2RDF resources are identified using IRIs

• Data providers’ record identifiers are maintained from source http://bio2rdf.org/namespace:identifier

• E.g.: DrugBank’s resource IRI for

Leucovorin

http://bio2rdf.org/drugbank:DB00650

DERI::Digital Infrastructure for Biomedicine 18

Page 19: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

vocabulary and resource namespaces are used to describe auxiliary resources

• Vocabulary namespaces are used for dataset specific types and predicates http://bio2rdf.org/drugbank_vocabulary:Drug

• Entities arising from n-ary relations are identified in the resource namespace http://bio2rdf.org/drugbank_resource:DB00440_DB00650

DERI::Digital Infrastructure for Biomedicine 19

Page 20: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

DERI::Digital Infrastructure for Biomedicine 20

Page 21: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Every Bio2RDF dataset now contains provenance metadata

DERI::Digital Infrastructure for Biomedicine 21

Page 22: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Bio2RDF types include biological, information content & processual entities

CTD: Chemical, Disease, Chemical-Disease Interaction, Chemical-Gene Interaction

Entrez Gene: Gene, Model Organism, Publication HGNC: Accession Number, Gene, Gene Symbol iRefIndex: Protein Complex, Protein Interaction MGI: Gene Marker, Gene Symbol PharmGKB: Association, Disease, Drug, Gene SGD: Enzyme, Pathway, Protein, RNA, Reaction, Location, Experiment

DERI::Digital Infrastructure for Biomedicine 22

Page 23: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Question: Find all proteins that interact with beta amyloid (uniprot:P05067)

SELECT * WHERE { ?protein a bio2rdf:Protein . ?protein bio2rdf:interacts_with uniprot:P05067 . }

Heterogeneous biological data on the semantic web is difficult to query

UniProt Protein PDB Protein

iRefIndex Protein

?

Physical interaction?

Pathway interaction?

Genetic interaction?

DERI::Digital Infrastructure for Biomedicine 23

Page 24: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Uncertainty in what is being said with a simple triple

imagine a statement between two types, C1 and C2 C1 R C2 nucleus part-of cell does it mean For every C1 there is a C2 that is related by R? For every C2 there is a C1 that is related by R? For some C1, there is a C2 that is related by R, or vice versa? Every C1 is a kind of C2? or vice versa? C1s and C2s are the same kind? There is no C1 that is also a C2? we need to commit to a particular meaning that can be universally interpreted – this formalization will then hold across datasets

DERI::Digital Infrastructure for Biomedicine 24

Page 25: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

From linked data to linked knowledge through syntactic and semantic normalization.

RDF-based Linked Data is a great first step, but it’s not enough.

DERI::Digital Infrastructure for Biomedicine 25

Page 26: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

ontology as a strategy to

formally represent and integrate

knowledge

26 DERI::Digital Infrastructure for Biomedicine

Page 27: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Have you heard of OWL?

DERI::Digital Infrastructure for Biomedicine 27

Page 28: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

The Web Ontology Language (OWL) Has Explicit Semantics

Can therefore be used to capture knowledge in a machine understandable way

28 DERI::Digital Infrastructure for Biomedicine

Page 29: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

SIO provides an OWL ontology for the representation of diverse biomedical knowledge

DERI::Digital Infrastructure for Biomedicine 29

Page 30: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

30 DERI::Digital Infrastructure for Biomedicine

Page 31: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

uniprot:P05067

uniprot:Protein

is a

sio:protein

is a is a

Semantic data integration, consistency checking and query answering over Bio2RDF with the Semanticscience Integrated Ontology (SIO)

dataset

ontology

Knowledge Base

31 DERI::Digital Infrastructure for Biomedicine

refseq:NP_009225.1

refseq:Protein

is a

is a

uniprot:P05067

uniprot:Protein refseq:Protein

Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. to be presented at Bio-ontologies 2012.

Page 32: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Use CTD & SGD to find all chemicals and proteins that participate in the same GO process

SELECT * FROM <http://bio2rdf.org/ctd> WHERE { ?chemical a sio:SIO_010004. # 'chemical entity' ?chemical rdfs:label ?chemicalLabel. ?chemical sio:SIO_000062 ?process. # 'is participant in' ?process rdfs:label ?processLabel. SERVICE <http://sgd.bio2rdf.org/sparql> { ?protein a sio:SIO_010043. # ‘protein’ ?protein sio:SIO_000062 ?process. ?gene sio:SIO_010078 ?protein. # ‘encodes’ ?gene rdfs:label ?geneLabel. } }

DERI::Digital Infrastructure for Biomedicine 32

Page 33: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

More sophisticated OWL-based Data Integration, Consistency Checking and Discovery

• Checking the consistency of semantic annotations [1] – Formalized semantic annotations in SBML models as OWL axioms.

Automated reasoning uncovered inconsistencies in 16 models. • e.g. alpha-D-glucose phosphate is not the required ATP in an ATP-dependent

reaction (GO + ChEBI + disjoint + closure axioms)

• Finding significant biomedical associations [2] – found significant associations between genes, drugs, diseases and

pathways using Drugbank, PharmGKB, CTD, PID across categories of drugs (ChEBI, ATC, MeSH) and diseases (DO, MeSH)

– 22,653 pathway-disease type associations (6304 over; 16,349 under) • carcinosarcoma (DOID:4236) and Zidovudine Pathway (PharmGKB:PA165859361)

– 13,826 pathway-chemical type associations (12,564 over; 1262 under) • drug clopidogrel (CHEBI:37941) with Endothelin signaling pathway

(PharmGKB:PA164728163);

1. Integrating systems biology models and biomedical ontologies. BMC Systems Biology. 2011. 5 : 124 2. Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012. in press

http://pharmgkb-owl.googlecode.com

DERI::Digital Infrastructure for Biomedicine 33

Page 34: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Translational Medicine Requires Integration of Patient and Biomedical Data

DERI::Digital Infrastructure for Biomedicine 34

Page 35: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Integration of patient record data with Linked Open Data through the Translational Medicine Ontology

DERI::Digital Infrastructure for Biomedicine 35

223 mappings : 60 TMO classes to 201 target classes from over 40 ontologies and 8 datasets

Page 36: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Formalization of the Dubois AD diagnostic criteria for

decision support

DERI::Digital Infrastructure for Biomedicine 36

# the panel is a textual entity dubois:panel2 a iao:IAO_0000300 . dubois:panel2 rdfs:label "Alzheimer Disease diagnostic criteria as reported in panel 2 of dubois et al - pubmed:17616482 [dubois:panel2]". # the panel is about alzheimer disease dubois:panel2 iao:is_about diseasome:74. # the panel is from the article dubois:panel2 ro:part_of <http://bio2rdf.org/pubmed:17616482>. # the panel is about diagnostic criterion dubois:panel2 iao:is_about tmo:TMO_0068. #inclusion criterion dubois:10 rdfs:label "Proven AD autosomal dominant mutation within the immediate family [dubois:10]" ; a tmo:TMO_0069; ro:part_of dubois:panel2; iao:is_about diseasome:74. # exclusion criterion dubois:16 rdfs:label "Major depression [dubois:16]" ; a tmo:TMO_0070; ro:part_of dubois:panel2; iao:is_about diseasome:74.

Page 37: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

TMKB for pharmaceutical and clinical research, and health care

Pharmaceutical Research • Which existing marketed drugs might potentially be re-purposed for

AD because they are known to modulate genes that are implicated in the disease?

– 57 compounds or classes of compounds that are used to treat 45 diseases, including AD, hyper/hypotension, diabetes and obesity

Clinical research • Identify an AD clinical trial for a drug with a different mechanism of

action (MOA) than the drug that the patient is currently taking – Of the 438 drugs linked to AD trials, only 58 are in active trials and only 2

(Doxorubicin and IL-2) have a documented MOA. 78 AD-associated drugs have an established MOA.

Health care • Have any of my AD patients been treated for other neurological

conditions as this might impact their diagnosis? – Patient 2 is also being treated for depression.

DERI::Digital Infrastructure for Biomedicine 37 http://esw.w3.org/topic/HCLSIG/PharmaOntology/Queries

Page 38: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Personal Health Lens

Observation: Patients often look up new/alternative drugs to treat their condition or alleviate side effects. Opportunity: A patient-centric health care application that identifies contraindications for drugs mentioned on web pages using the patient’s own health data

Components: • RDFized patient data • Bio2RDF semantically annotated data • SADI semantic web services to process the page and retrieve data • SHARE automatic workflow composition

DERI::Digital Infrastructure for Biomedicine 38

Page 39: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Mark Wilkinson, UBC Michel Dumontier, Carleton University

Christopher Baker, UNB

The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web Services using OWL classes as service inputs and outputs

http://sadiframework.org

~700 bioinformatic services as of May 29, 2012

SADI enables discovery and access to Semantic Web Services

DERI::Digital Infrastructure for Biomedicine 39

Page 40: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

DERI::Digital Infrastructure for Biomedicine 40

Page 41: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

DERI::Digital Infrastructure for Biomedicine 41

Page 42: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

DERI::Digital Infrastructure for Biomedicine 42

Page 43: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

sources

contraindication

uses the patient’s data

rationale

The SADI+SHARE workflow and reasoning was personalized to YOUR medical data

DERI::Digital Infrastructure for Biomedicine 43

Page 44: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

so how do we get at the supporting evidence?

DERI::Digital Infrastructure for Biomedicine 44

Page 45: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

HyQue

HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to

facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing

– trace a hypothesis to its evaluation, including the data and rules used

DERI::Digital Infrastructure for Biomedicine 45 HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.

Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.

Page 46: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

HyQue Architecture

DERI::Digital Infrastructure for Biomedicine 46

Services

Ontologies

Page 47: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Event-based data model

HyQue events denote a phenomenon involving two objects: ‘agent’ and ‘target’ . In addition, we can specify the location of this event (e.g. located in nucleus, or under some genetic background) Event

‘has agent’ agent

‘has target’ target

‘is located in’ location

‘is negated’ boolean

DERI::Digital Infrastructure for Biomedicine 47

Currently supported events

1. protein-protein binding 2. protein-nucleic acid binding 3. molecular activation 4. molecular inhibition 5. gene induction 6. gene repression 7. transport

Page 48: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

HyQue domain rules CALCULATE a quantitative measure of evidence for an event

‘induce’ rule (maximum score: 5): – Is event negated?

• If yes, subtract 2 – Is event of type ‘induce’?

• If yes, add 1; if no, subtract 1 – Is agent of type ‘protein’ or ‘RNA’?

• If yes, add 1; if type ‘gene’, subtract 1 – Is target of type ‘gene’?

• If yes, add 1; if no, subtract 1 – Does agent have known ‘transcription factor activity’?

• If yes, add 1 – Is event located in the ‘nucleus’?

• If yes, add 1; if no, subtract 1

GO:0010628

CHEBI:36080

SO:0000236

GO:0003700

GO:0005634

DERI::Digital Infrastructure for Biomedicine 48

Page 49: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

DERI::Digital Infrastructure for Biomedicine 49

Combination of system and domain rules to retrieve and score data, and add new triples

:e1 a go:0010628; hyque:agent sgd:Gal4p; hyque:target sgd:GAL1 . hyque:is_negated "0" ;

Event - induction SPIN induction rule

Page 50: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Customization of rules/data sources will generate different evidence-based evaluations

DERI::Digital Infrastructure for Biomedicine 50

Page 51: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Reproducible eScience LOD for Hypothesis, Rules, Data and Evaluation

DERI::Digital Infrastructure for Biomedicine 51

Page 52: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

52 DERI::Digital Infrastructure for Biomedicine

Page 53: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

A digital infrastructure for the future of biomedicine

• Semantic Web technologies offer a powerful integrative platform across facts, expert knowledge and services

• The ability to publish, link to, retrieve, check consistency of, query biomedical knowledge will yield an explosion of health-related applications.

• By formalizing biomedical data, we can integrate molecular to clinical data, and gain insight into how living systems respond to chemical agents – implications drug discovery & delivery of health care

DERI::Digital Infrastructure for Biomedicine 53

Page 54: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

Acknowledgements Bio2RDF Peter Ansell, Francois Belleau, Allison Callahan, Jacques Corbeil, Jose Cruz-Toledo, Alex De Leon, Steve Etlinger, James Hogan, Nichealla Keath, Jean Morissette, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault and Paul Roe HyQue Alison Callahan Lab Glen Newton (NLP), Gordana Lenert (PGx), Dana Klassen @ DERI, Leonid Chepelev @ UoO, Natalia Villanueva-Rosales @ UoTexas, Xueying Chen @ IBM China, Mykola Konyk

OWL-Based Data Integration Robert Hoehndorf, John Gennari, Sarah Wimalaratne, Bernard de Bono, Daniel Cook, and George Gkoutos SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keath, Artjom Klein, Luke McCarthy, Silvane Paixao, Ben Vandervalk, Natalia Villanueva-Rosales, Mark Wilkinson W3C HCLS: J Luciano, B Andersson, C Batchelor, O Bodenreider, T Clark, C Denney, C Domarew, T Gambet, L Harland, A Jentzsch, V Kashyap, P Kos, J Kozlovsky, T Lebo, SM Marshall, JP McCusker, DL McGuinness, C Ogbuji, E Pichler, R Powers, E Prud hommeaux, M Samwald, L Schriml, PJ Tonellato, PL Whetzel, J Zhao, S Stephens, C Denney, J Luciano, J McGurk, Lynn Schriml, and Peter J. Tonellato.

DERI::Digital Infrastructure for Biomedicine 54

Page 55: From Biological Data to Clinical Applications: Positioning a digital infrastructure for the future of biomedicine

dumontierlab.com [email protected]

DERI::Digital Infrastructure for Biomedicine

Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier

55