the ontology for biobanking chris stoeckert, jie zheng, and mathias brochhausen university of...

20
The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences CTS Ontology Workshop Charleston, SC Sept. 24-25, 2015

Upload: derrick-bell

Post on 20-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

The Ontology for Biobanking

Chris Stoeckert, Jie Zheng, and Mathias BrochhausenUniversity of Pennsylvania and University of Arkansas for Medical Sciences

CTS Ontology WorkshopCharleston, SC

Sept. 24-25, 2015

Page 2: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Finding specimens and associated donor information across Biobanks is difficult to impossible

• Challenges: – Not all biobanks are the same and their terminology needs will vary. – Different terminologies have been created (by biobanks and others) to cover

needs. How do we leverage and share these valuable resources?

• Examples of motivating use cases:– What specimen types (DNA, RNA, Frozen Tissue in OCT) are available for research

and who is the contact person regarding access?– Identify cases and controls from a population of patients that have EDTA blood.

Match these based on prescription and diagnosis data from the patients EMR. Also match basic demographic data collected at the time of recruitment.

– Are circulating tumor cells available from breast cancer participants expressing HER+, ER-, PR- status? What are the passage numbers for the cell lines? Number of freeze/thaw cycles?

Page 3: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

What do we gain with an Ontology for Biobanking?

• Pre-existing resources (e.g. CaTissue) only provide a partial solution.– The NCI Common Biorepository Model lacks robust definitions and

addresses only meta-data about biospecimen collections, not individual specimen or participant data.

• The Ontology for Biomedical Investigations (OBI) and related OBO Foundry ontologies provide a basis for accommodating and integrating relevant terminologies. – OBI covers specimens but is interoperable with other OBO Foundry

ontologies covering areas like disease (The Human Disease Ontology).– Provides mechanisms for referring to non-OBO foundry ontologies

and terminologies.– Using these, we have built an application ontology, the Ontology for

Biobanking.

Page 4: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Ontology for Biomedical Investigations• OBI is about capturing all aspects of a biological and clinical investigation

(investigation, assay, specimen, protocol, device, data, data analysis, etc.) which provides a semantic framework to model an investigation

• Things to know – a member of the OBO Foundry (has gone through a review process)– interoperable with other ontologies following OBO Foundry principles, such as the Gene

Ontology (GO)– uses the Basic Formal Ontology (BFO) as its top level ontology– uses the Information Artifact Ontology (IAO) for general information entities

• Details on OBI can be found at:– http://obi-ontology.org/ – J Biomed Semantics. 2010. Modeling biomedical experimental processes with OBI, Ryan R

Brinkman, Mélanie Courtot, Dirk Derom, Jennifer M Fostel, Yongqun He, Phillip Lord, James Malone, Helen Parkinson, Bjoern Peters, Philippe Rocca-Serra, Alan Ruttenberg, Susanna-Assunta Sansone, Larisa N Soldatova, Christian J Stoeckert, Jr., Jessica A Turner, Jie Zheng, and the OBI consortium

• The release version of OBI is available on:– NCBO Bioportal website: http://bioportal.bioontology.org/ontologies/OBI– Ontobee website: http://www.ontobee.org/browser/index.php?o=OBI

• The link of latest release version of OBI is:– http://purl.obolibrary.org/obo/obi.owl

Page 5: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

OBI high level structure illustrates ontology integration No mapping needed!

BFO

IAO

OBO

OBI

is a

entity

continuant

occurrent

information content entity

plannedprocess

data item

investigation

specimen collection

material componentseparation

plan specification

material combination

assay

document

textual entity

material processing

material entity

biologicalprocess (GO)

material maintenance

processed material

specimen

gross anatomical part (CARO)

organism (NCBI taxonomy)

molecular entity (ChEBI)

organizationdevice

processed specimen

study design

process

specificallydependentcontinuant

independentcontinuant

genericallydependent continuant

quality

role

OBI developers have driven integration by pushing for common adoption of BFO2 classes, updated releases of IAO and RO, and agreement on overlapping terms.

Page 6: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

A strength of OBI is modeling the processes that connect biological source material to the data generated about it

OBI assay Measurement of Glucose concentration in blood

Page 7: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

OBI as basis for the Ontology for Biobanking (OBIB)

• Two independent efforts used OBI as a basis for capturing different aspects of biobanking.– OMIABIS: Biobank administration– Penn biobank ontology: patient to specimen tracking– These were merged without semantic conflicts into OBIB

• Not creating yet another new terminology. Instead we are leveraging the valuable work done by experts in areas we need. – Follow (OBO Foundry) best practices for term re-use– Extend this approach as integrative framework for other

terminologies– Use these as building blocks for including new terms

Page 8: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Ontology for Biobanking (OBIB) is an Application Ontology based on OBI

postal address of biobank contact person

duration time of smoking

questionnaire

informed consent form

information content entity

plannedprocess

data item

investigation

specimen collection

material componentseparation

plan specification

material combination

assay

document

textual entity

material processing

material entity

biologicalprocess (GO)

material maintenance

processed material

specimen

gross anatomical part (CARO)

organism (NCBI taxonomy)

molecular entity (ChEBI)

organizationdevice

processed specimen

study design

process

specificallydependentcontinuant

independentcontinuant

genericallydependent continuant

freezer

average daily use of cigarette datum

patient questionnaire

collection packet

nicotinefreezer rack

blood spot card

buffy coat specimen

smoking behavior

blood spotting

BFO

IAO

OBO

OBI

is a

Biobank

quality

role

study subject survey data

medical record

patient registry data

biobank

biobank organization

fixed tissue specimen

contacting

fixation

Longitudinal study design

identifier

bone marrow specimen

specimen freezing

nicotine material OMIABIS

Page 9: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

With OBIB, we can follow a person through enrollment, getting their history and vitals, and collecting a blood specimen.

Page 10: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Current status of OBIB• Summary statistics for version 2015-04-13

– 511 classes– 79 object properties– 38 annotation properties

• OBIB is open source and is available at:https://github.com/biobanking/biobanking

NCBO Bioportalhttp://bioportal.bioontology.org/ontologies/OBIB

Ontobeehttp://www.ontobee.org/browser/index.php?o=OBIB

• OBIB development is being driven by biobank use cases.– Now from other biobanks!

Page 11: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Penn Medicine Biobank competency question: Obtaining matched case/control cohorts

• Query: Generate lists of potential cases and potential controls for given criteria.

• Cases are patients with Type 2 diabetes that have taken a particular prescription statin on or around the time of recruitment/specimen collection and have an EDTA specimen available. In practice, "around the time of recruitment" was estimated by a prescription within 5 to 250 days prior to the date of recruitment.

• Controls have Type 2 diabetes and have no history of taking statins in any form and must have an EDTA specimen available. Controls are matched by gender, age at recruitment, and body mass index to the cases selected.

• Non-trivial because it requires ad-hoc integration across medical records, prescription orders, case report forms, and specimen inventories.

Page 12: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

OBIB Model Integrates Medical Records, Case Report Forms, and Specimen Inventories

Red text indicates data in the resources to be instantiated in the ontology model.

Page 13: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

We can expand the eMedical record to include ICD-9 codes for diagnosis and RxNorm for drugs.

Make use of other ontologies following OBO Foundry principles such as the Human Disease Ontology and the Drug Ontology. These have internal mappings to UMLS and RxNorm respectively that can be used to

search ICD codes and prescription orders.

Page 14: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Using OBIB to find cohorts of Biobank specimens with the PMBB Carnival System

OBIBOntology for Biobanking

DRONDrug

Ontology

DOIDDisease

Ontology

RDFR2ML

RelationalData

OBIDRONDOID

Application Ontology

?? ???

With the help of local domain experts, OBO1 Ontology experts generate an

ontology model using OBIB that includes the portions of OBO ontologies relevant

to the data sources.

1

For each data source, local data experts reference the ontology model to create an R2RML2 file to map the relational data and their domain

knowledge to a graph format. They instantiate the OBIB model reflecting the naming convention they used for data instances that might

be shared among other data sources.An RDF conversion tool uses the mapping file and the relational data to

generate RDF triples.

2

The RDF data and any relevant OBO Ontologies are loaded into a graph database.

Data from the separate data sources are now related in accordance with the expert's domain

knowledge via the ontologies.

3

Queries can be performed over the graph database by referencing the OBIB model.

No specific knowledge about the structure or format of the original data is necessary.

Any domain knowledge, standards conversions (i.e.SNOMED, ICD) or scientific knowledge in the OBO

Ontologies is available to be queried and reasoned over, even if not in the original data sources.

4

Graph Database

1. OBO - Open Biological and Biomedical Ontologies. <http://www.obofoundry.org/>2. R2RML - RDB to RDF Mapping Language. <http://www.w3.org/2001/sw/rdb2rdf/>

RDF ConversionSoftware

RDFR2RML RDF ConversionSoftware

Page 15: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Future directions for OBIB: Provide a framework for collaboration

• Collaborating across biobanks– Promoting OBIB as a mechanism to find common semantics (based

on reality not what is stored in a database) between biobanks– Started a CTSA-based collaboration between Duke, Michigan, MUSC,

Penn, UAMS.– Want to extend to others (NCI–BBRB)

• Collaborating with the Informed Consent Ontology• Currently working on integrating Duke terminology

– Identify common terms• Duke: sample <-> OBI: specimen• Duke: collect <-> OBI: collecting specimen from organism

– Build Duke terms from OBI/OBIB terms and logical axioms (Defined classes!)• Duke: sample family = (submitted to OBI): collection of specimens that are

the output of material processing of the same input.

Page 16: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Future Directions for OBIB: Extend coverage of related terminologies through OBI

• How can we incorporate LOINC?– We might start with CHEM/Lab Class/Type in LOINC.

• For example:– 5792-7: Glucose [Mass/volume] in Urine by Test strip – 41653-7: Glucose [Mass/ volume] in Capillary blood by

Glucometer– LOINC 41653-7 is very related to OBI_000418: measuring

glucose concentration in blood serum (example shown earlier).

• Generalizing:– LOINC: component system method ::: OBI: analyte assay with

evaluant and specified input – In this example, analyte = glucose, evaluant = blood, specified

input = glucometer– Note: the [Mass/volume] in LOINC is specified in the OBI

output of measurement datum (measurement unit label = milligram per milliliter in this example).

Page 17: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Acknowledgements

• Heather Williams (PMBB)• David Birtwell (PMBB)• OBI Consortium• OBO Foundry• Anna Maria Masci (Duke)• Helena Ellis (Duke)

Page 18: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

CHEM/Lab class/type in LOINC fits Assay in OBI

Established design pattern for assay can be used to programmatically add new analyte assays (and other types).

Page 19: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Hierarchy of OBI assay terms can provide CDISC desired level of granularity

BFO

IAO

OBO

OBI

is a

information content entity

plannedprocess

data item

investigation

specimen collection

material componentseparation

plan specification

material combination

assay

document

textual entity

material processing

material entity

biologicalprocess (GO)

material maintenance

processed material

specimen

gross anatomical part (CARO)

organism (NCBI taxonomy)

molecular entity (ChEBI)

organization

quality

role

analyte assayclinical

chemistry assay

measuring glucose conc. in

blood serum

Current classes

Page 20: The Ontology for Biobanking Chris Stoeckert, Jie Zheng, and Mathias Brochhausen University of Pennsylvania and University of Arkansas for Medical Sciences

Hierarchy of OBI assay terms can provide CDISC desired level of granularity

BFO

IAO

OBO

OBI

is a

information content entity

plannedprocess

data item

investigation

specimen collection

material componentseparation

plan specification

material combination

assay

document

textual entity

material processing

material entity

biologicalprocess (GO)

material maintenance

processed material

specimen

gross anatomical part (CARO)

organism (NCBI taxonomy)

molecular entity (ChEBI)

organization

quality

role

analyte assay

LOINC

clinical chemistry

assay

measuring glucose conc. in

blood serum

Current and proposed classes

clinical glucose assay

5792-7: Glucose [Mass/volume] in Urine

by Test strip

41653-7: Glucose [Mass/ volume] in Capillary blood by Glucometer