using ontologies to make sense of unstructured medical data

28
Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD [email protected]

Upload: pennie

Post on 13-Jan-2016

21 views

Category:

Documents


1 download

DESCRIPTION

Using ontologies to make sense of unstructured medical data. Nigam Shah, MBBS, PhD [email protected]. NCBO: Key activities. We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using ontologies to make sense of unstructured medical data

Using ontologies to make sense of unstructured medical data

Nigam Shah, MBBS, [email protected]

Page 2: Using ontologies to make sense of unstructured medical data

NCBO: Key activities

• We create and maintain a library of biomedical ontologies.

• We build tools and Web services to enable the use of ontologies and their derivatives.

• We collaborate with scientific communities that develop and use ontologies.

Page 3: Using ontologies to make sense of unstructured medical data

http:

//re

st.b

ioon

tolo

gy.o

rghtt

p://

rest

.bio

onto

logy

.org

Ontology ServicesOntology Services

• Download• Traverse• Search• Comment

• Download• Traverse• Search• Comment

WidgetsWidgets• Tree-view• Auto-complete• Graph-view

• Tree-view• Auto-complete• Graph-view

AnnotationAnnotation

Data AccessData Access

Mapping ServicesMapping Services

• Create• Download• Upload

• Create• Download• Upload

ViewsViews

Term recognitionTerm recognition

Fetch “data” annotated with a given term

Fetch “data” annotated with a given term

http://bioportal.bioontology.orghttp://bioportal.bioontology.org

Page 4: Using ontologies to make sense of unstructured medical data

Annotation service

Process textual metadata to automatically tag text with as many ontology terms as possible.

90 million calls, ~700 GB of data90 million calls, ~700 GB of data

Page 5: Using ontologies to make sense of unstructured medical data

Resource index

Pubmed AbstractsPubmed Abstracts

Adverse Events (AERS)Adverse Events (AERS)

GEOGEO

::

Clinical TrialsClinical Trials

Drug BankDrug Bank

Won 1st prize at the 2010 Semantic Web Challenge @ ISWC

Won 1st prize at the 2010 Semantic Web Challenge @ ISWC

Page 6: Using ontologies to make sense of unstructured medical data

Creating Lexicons

Term – 1:::Term – n

Sentence in Clinical Note – 1:::Sentence in Clinical Note – m

tf df NN JJ … VP …

ID Term-1 150,879 90,000 0.90 0.05 … 0.03 …

ID :

ID Term-n

Syntactic typesSyntactic typesFrequencyFrequency

Frequency counter

Page 7: Using ontologies to make sense of unstructured medical data

Annotation AnalyticsAnnotation Analytics

Analyzing tagged data for hypothesis generation in bioinformatics

Page 8: Using ontologies to make sense of unstructured medical data

Genome

Generic GO based analysis routineGeneric GO based analysis routine

Reference set

Study Set• Get annotations for each

gene in a set

• Count the occurrence of each annotation term in the study set

• Count the occurrence of that term in some reference set (whole genome?)

• P-value for how surprising their overlap is.

Page 9: Using ontologies to make sense of unstructured medical data

Annotation Analytics Landscape

SNOMED-CT

Gene Ontology

Gene Sets

NCIT

ICD-9

Human Disease

Cell Type

MeSH

Drugs, Chemicals

Grant Sets

Paper Sets

Patient Sets

Drug Sets

: ??

Health Indicator Warehouse datasets

Page 10: Using ontologies to make sense of unstructured medical data

Open questions

1. Can we use something other than the GO?

2. Lack of annotations—even today, roughly 20% of genes lack any GO annotation.

3. Annotation bias—annotation with certain ontology terms is not independent of each other.

4. Lack of a systematic mechanism to define a level of abstraction.

Page 11: Using ontologies to make sense of unstructured medical data

Profiling a set of Aging genes

Disease Ontology

~ 30% of genome

261 Age-related genes

Genome

Page 12: Using ontologies to make sense of unstructured medical data

Using ontologies other than GO

ERCC6 nucleoplasmPARP1 protein N-terminus bindingERCC6 nucleoplasmPARP1 protein N-terminus binding

ERCC6 <disease term?> PARP1 <disease term?>ERCC6 <disease term?> PARP1 <disease term?>

Page 13: Using ontologies to make sense of unstructured medical data

ERCC6 GO:0005654 PMID:16107709ERCC6 GO:0008094 PMID:16107709PARP1 GO:0047485 PMID:16107709ERCC6 GO:0005730 PMID:16107709PARP1 GO:0003950 PMID:16107709

http://www.geneontology.org/GO.downloads.annotations.shtmlhttp://www.geneontology.org/GO.downloads.annotations.shtml

Enrichment Analysis with the DO

www.ncbi.nlm.nih.gov/pubmed/16107709www.ncbi.nlm.nih.gov/pubmed/16107709

NCBO Annotator:http://bioportal.bioontology.orgNCBO Annotator:http://bioportal.bioontology.org

{ERCC6, PARP1} PMID:16107709{ERCC6, PARP1} PMID:16107709

{ERCC6, PARP1} {Cockayne syndrome, DNA damage}{ERCC6, PARP1} {Cockayne syndrome, DNA damage}

Page 14: Using ontologies to make sense of unstructured medical data

Annotation Analytics on EMR dataAnnotation Analytics on EMR data

Analysis of tagged data from electronic health records

Page 15: Using ontologies to make sense of unstructured medical data

Profiling patient setsProfiling patient sets

86k patient Reports

ICD9 789.00 (Abdominal pain, unspecified site)

Patient records processed from U. Pittsburgh NLP Repository with IRB approval.

Page 16: Using ontologies to make sense of unstructured medical data

Annotation (Clinical Text)

Page 17: Using ontologies to make sense of unstructured medical data

Term – 1:::Term – nSyntactic typesSyntactic types

FrequencyFrequency

Term recognition tool NCBO Annotator

Term recognition tool NCBO Annotator

NegEx Patterns

NegEx Patterns

NegEx Rules – Negation detection

NegEx Rules – Negation detection

P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9

P1 T1, T2, no T4

… T5, T4, T3

… T4, T3, T1

T8, T9, T4

… T6, T8, T10

T1, T2, no T4

P2

P2

P3

P3

:

:

Pn

PnTerms form a temporal series of tags

Coh

ort

of

Inte

rest

DiseasesDiseases

ProceduresProcedures

DrugsDrugs

BioPortal – knowledge graph

Creating clean lexicons

Annotation Workflow

Furt

her A

naly

sis

Text clinical note

Terms Recognized

Negation detection

Generation of tagged dataGeneration of tagged data

Page 18: Using ontologies to make sense of unstructured medical data

ROR of 2.058, CI of [1.804, 2.349]The X2 statistic has p-value < 10-7

ROR=1.524, CI=[0.872, 2.666] X2 p-value = 0.06816.

Detecting the Vioxx Risk SignalDetecting the Vioxx Risk Signal

Vioxx Patients (1,560)

RA Patients (14,079)

MI Patients (1,827)

VioxxMI (339)

p-value < 1.3x10-24

Page 19: Using ontologies to make sense of unstructured medical data

Detecting Adverse Events

Page 20: Using ontologies to make sense of unstructured medical data

Detecting Adverse Events

Linear Space Features Logarithmic Space Features

Drug frequency Drug frequency

Disease frequency Disease frequency

Observed drug-first fraction Observed co-mention count

Drug-first fraction z-score (fixed drug)

Co-mention count z-score (fixed drug)

Drug-first fraction z-score (fixed disease)

Co-mention count z-score (fixed disease)

Page 21: Using ontologies to make sense of unstructured medical data

Detecting Adverse Events

Page 22: Using ontologies to make sense of unstructured medical data

Detecting Off-label useDetecting Off-label use

Page 23: Using ontologies to make sense of unstructured medical data

Annotation Analytics Landscape

SNOMED-CT

Gene Ontology

Gene Sets

NCIT

ICD-9

Human Disease

Cell Type

MeSH

Drugs, Chemicals

Grant Sets

Paper Sets

AgingAging

Patient Sets

Drug Sets

:

EMRsEMRs

What questions

can we ask?

What questions

can we ask?

Health Indicator Warehouse datasets

Page 24: Using ontologies to make sense of unstructured medical data

Associations and outcomes

Gene Disease Drug Device Procedure Environment

Gene

Disease

Drug

Device

Procedure

Environment

Side effects

Off-label Indications

Enrichment

What questions

can we ask?

What questions

can we ask?

Page 25: Using ontologies to make sense of unstructured medical data

Acknowledgements

• Paea LePendu• Yi Liu• Srinivasan Iyer• Steve Racunas• Anna Bauer-Mehren• Clement Jonquet• Rong Xu

• Mark Musen• NIH – NCBO funding• Mayo Team

• Hongfang Liu• Stephen Wu

• Sylvia Holland• Alex Skrenchuk

Page 26: Using ontologies to make sense of unstructured medical data

Mining Annotations of Grants, Publications

Grants from 1972 to 2007 30 funding agencies

Publications from MedlineOnly “Journal articles”

Page 27: Using ontologies to make sense of unstructured medical data

Sponsorship and AllocationSponsorship and Allocation

Page 28: Using ontologies to make sense of unstructured medical data

Who funds whatWho funds what