the ontology for biobanking chris stoeckert, jie zheng, and mathias brochhausen university of...
TRANSCRIPT
The Ontology for Biobanking
Chris Stoeckert, Jie Zheng, and Mathias BrochhausenUniversity of Pennsylvania and University of Arkansas for Medical Sciences
CTS Ontology WorkshopCharleston, SC
Sept. 24-25, 2015
Finding specimens and associated donor information across Biobanks is difficult to impossible
• Challenges: – Not all biobanks are the same and their terminology needs will vary. – Different terminologies have been created (by biobanks and others) to cover
needs. How do we leverage and share these valuable resources?
• Examples of motivating use cases:– What specimen types (DNA, RNA, Frozen Tissue in OCT) are available for research
and who is the contact person regarding access?– Identify cases and controls from a population of patients that have EDTA blood.
Match these based on prescription and diagnosis data from the patients EMR. Also match basic demographic data collected at the time of recruitment.
– Are circulating tumor cells available from breast cancer participants expressing HER+, ER-, PR- status? What are the passage numbers for the cell lines? Number of freeze/thaw cycles?
What do we gain with an Ontology for Biobanking?
• Pre-existing resources (e.g. CaTissue) only provide a partial solution.– The NCI Common Biorepository Model lacks robust definitions and
addresses only meta-data about biospecimen collections, not individual specimen or participant data.
• The Ontology for Biomedical Investigations (OBI) and related OBO Foundry ontologies provide a basis for accommodating and integrating relevant terminologies. – OBI covers specimens but is interoperable with other OBO Foundry
ontologies covering areas like disease (The Human Disease Ontology).– Provides mechanisms for referring to non-OBO foundry ontologies
and terminologies.– Using these, we have built an application ontology, the Ontology for
Biobanking.
Ontology for Biomedical Investigations• OBI is about capturing all aspects of a biological and clinical investigation
(investigation, assay, specimen, protocol, device, data, data analysis, etc.) which provides a semantic framework to model an investigation
• Things to know – a member of the OBO Foundry (has gone through a review process)– interoperable with other ontologies following OBO Foundry principles, such as the Gene
Ontology (GO)– uses the Basic Formal Ontology (BFO) as its top level ontology– uses the Information Artifact Ontology (IAO) for general information entities
• Details on OBI can be found at:– http://obi-ontology.org/ – J Biomed Semantics. 2010. Modeling biomedical experimental processes with OBI, Ryan R
Brinkman, Mélanie Courtot, Dirk Derom, Jennifer M Fostel, Yongqun He, Phillip Lord, James Malone, Helen Parkinson, Bjoern Peters, Philippe Rocca-Serra, Alan Ruttenberg, Susanna-Assunta Sansone, Larisa N Soldatova, Christian J Stoeckert, Jr., Jessica A Turner, Jie Zheng, and the OBI consortium
• The release version of OBI is available on:– NCBO Bioportal website: http://bioportal.bioontology.org/ontologies/OBI– Ontobee website: http://www.ontobee.org/browser/index.php?o=OBI
• The link of latest release version of OBI is:– http://purl.obolibrary.org/obo/obi.owl
OBI high level structure illustrates ontology integration No mapping needed!
BFO
IAO
OBO
OBI
is a
entity
continuant
occurrent
information content entity
plannedprocess
data item
investigation
specimen collection
material componentseparation
plan specification
material combination
assay
document
textual entity
material processing
material entity
biologicalprocess (GO)
material maintenance
processed material
specimen
gross anatomical part (CARO)
organism (NCBI taxonomy)
molecular entity (ChEBI)
organizationdevice
processed specimen
study design
process
specificallydependentcontinuant
independentcontinuant
genericallydependent continuant
quality
role
OBI developers have driven integration by pushing for common adoption of BFO2 classes, updated releases of IAO and RO, and agreement on overlapping terms.
A strength of OBI is modeling the processes that connect biological source material to the data generated about it
OBI assay Measurement of Glucose concentration in blood
OBI as basis for the Ontology for Biobanking (OBIB)
• Two independent efforts used OBI as a basis for capturing different aspects of biobanking.– OMIABIS: Biobank administration– Penn biobank ontology: patient to specimen tracking– These were merged without semantic conflicts into OBIB
• Not creating yet another new terminology. Instead we are leveraging the valuable work done by experts in areas we need. – Follow (OBO Foundry) best practices for term re-use– Extend this approach as integrative framework for other
terminologies– Use these as building blocks for including new terms
Ontology for Biobanking (OBIB) is an Application Ontology based on OBI
postal address of biobank contact person
duration time of smoking
questionnaire
informed consent form
information content entity
plannedprocess
data item
investigation
specimen collection
material componentseparation
plan specification
material combination
assay
document
textual entity
material processing
material entity
biologicalprocess (GO)
material maintenance
processed material
specimen
gross anatomical part (CARO)
organism (NCBI taxonomy)
molecular entity (ChEBI)
organizationdevice
processed specimen
study design
process
specificallydependentcontinuant
independentcontinuant
genericallydependent continuant
freezer
average daily use of cigarette datum
patient questionnaire
collection packet
nicotinefreezer rack
blood spot card
buffy coat specimen
smoking behavior
blood spotting
BFO
IAO
OBO
OBI
is a
Biobank
quality
role
study subject survey data
medical record
patient registry data
biobank
biobank organization
fixed tissue specimen
contacting
fixation
Longitudinal study design
identifier
bone marrow specimen
specimen freezing
nicotine material OMIABIS
With OBIB, we can follow a person through enrollment, getting their history and vitals, and collecting a blood specimen.
Current status of OBIB• Summary statistics for version 2015-04-13
– 511 classes– 79 object properties– 38 annotation properties
• OBIB is open source and is available at:https://github.com/biobanking/biobanking
NCBO Bioportalhttp://bioportal.bioontology.org/ontologies/OBIB
Ontobeehttp://www.ontobee.org/browser/index.php?o=OBIB
• OBIB development is being driven by biobank use cases.– Now from other biobanks!
Penn Medicine Biobank competency question: Obtaining matched case/control cohorts
• Query: Generate lists of potential cases and potential controls for given criteria.
• Cases are patients with Type 2 diabetes that have taken a particular prescription statin on or around the time of recruitment/specimen collection and have an EDTA specimen available. In practice, "around the time of recruitment" was estimated by a prescription within 5 to 250 days prior to the date of recruitment.
• Controls have Type 2 diabetes and have no history of taking statins in any form and must have an EDTA specimen available. Controls are matched by gender, age at recruitment, and body mass index to the cases selected.
• Non-trivial because it requires ad-hoc integration across medical records, prescription orders, case report forms, and specimen inventories.
OBIB Model Integrates Medical Records, Case Report Forms, and Specimen Inventories
Red text indicates data in the resources to be instantiated in the ontology model.
We can expand the eMedical record to include ICD-9 codes for diagnosis and RxNorm for drugs.
Make use of other ontologies following OBO Foundry principles such as the Human Disease Ontology and the Drug Ontology. These have internal mappings to UMLS and RxNorm respectively that can be used to
search ICD codes and prescription orders.
Using OBIB to find cohorts of Biobank specimens with the PMBB Carnival System
OBIBOntology for Biobanking
DRONDrug
Ontology
DOIDDisease
Ontology
RDFR2ML
RelationalData
OBIDRONDOID
Application Ontology
?? ???
With the help of local domain experts, OBO1 Ontology experts generate an
ontology model using OBIB that includes the portions of OBO ontologies relevant
to the data sources.
1
For each data source, local data experts reference the ontology model to create an R2RML2 file to map the relational data and their domain
knowledge to a graph format. They instantiate the OBIB model reflecting the naming convention they used for data instances that might
be shared among other data sources.An RDF conversion tool uses the mapping file and the relational data to
generate RDF triples.
2
The RDF data and any relevant OBO Ontologies are loaded into a graph database.
Data from the separate data sources are now related in accordance with the expert's domain
knowledge via the ontologies.
3
Queries can be performed over the graph database by referencing the OBIB model.
No specific knowledge about the structure or format of the original data is necessary.
Any domain knowledge, standards conversions (i.e.SNOMED, ICD) or scientific knowledge in the OBO
Ontologies is available to be queried and reasoned over, even if not in the original data sources.
4
Graph Database
1. OBO - Open Biological and Biomedical Ontologies. <http://www.obofoundry.org/>2. R2RML - RDB to RDF Mapping Language. <http://www.w3.org/2001/sw/rdb2rdf/>
RDF ConversionSoftware
RDFR2RML RDF ConversionSoftware
Future directions for OBIB: Provide a framework for collaboration
• Collaborating across biobanks– Promoting OBIB as a mechanism to find common semantics (based
on reality not what is stored in a database) between biobanks– Started a CTSA-based collaboration between Duke, Michigan, MUSC,
Penn, UAMS.– Want to extend to others (NCI–BBRB)
• Collaborating with the Informed Consent Ontology• Currently working on integrating Duke terminology
– Identify common terms• Duke: sample <-> OBI: specimen• Duke: collect <-> OBI: collecting specimen from organism
– Build Duke terms from OBI/OBIB terms and logical axioms (Defined classes!)• Duke: sample family = (submitted to OBI): collection of specimens that are
the output of material processing of the same input.
Future Directions for OBIB: Extend coverage of related terminologies through OBI
• How can we incorporate LOINC?– We might start with CHEM/Lab Class/Type in LOINC.
• For example:– 5792-7: Glucose [Mass/volume] in Urine by Test strip – 41653-7: Glucose [Mass/ volume] in Capillary blood by
Glucometer– LOINC 41653-7 is very related to OBI_000418: measuring
glucose concentration in blood serum (example shown earlier).
• Generalizing:– LOINC: component system method ::: OBI: analyte assay with
evaluant and specified input – In this example, analyte = glucose, evaluant = blood, specified
input = glucometer– Note: the [Mass/volume] in LOINC is specified in the OBI
output of measurement datum (measurement unit label = milligram per milliliter in this example).
Acknowledgements
• Heather Williams (PMBB)• David Birtwell (PMBB)• OBI Consortium• OBO Foundry• Anna Maria Masci (Duke)• Helena Ellis (Duke)
CHEM/Lab class/type in LOINC fits Assay in OBI
Established design pattern for assay can be used to programmatically add new analyte assays (and other types).
Hierarchy of OBI assay terms can provide CDISC desired level of granularity
BFO
IAO
OBO
OBI
is a
information content entity
plannedprocess
data item
investigation
specimen collection
material componentseparation
plan specification
material combination
assay
document
textual entity
material processing
material entity
biologicalprocess (GO)
material maintenance
processed material
specimen
gross anatomical part (CARO)
organism (NCBI taxonomy)
molecular entity (ChEBI)
organization
quality
role
analyte assayclinical
chemistry assay
measuring glucose conc. in
blood serum
Current classes
Hierarchy of OBI assay terms can provide CDISC desired level of granularity
BFO
IAO
OBO
OBI
is a
information content entity
plannedprocess
data item
investigation
specimen collection
material componentseparation
plan specification
material combination
assay
document
textual entity
material processing
material entity
biologicalprocess (GO)
material maintenance
processed material
specimen
gross anatomical part (CARO)
organism (NCBI taxonomy)
molecular entity (ChEBI)
organization
quality
role
analyte assay
LOINC
clinical chemistry
assay
measuring glucose conc. in
blood serum
Current and proposed classes
clinical glucose assay
5792-7: Glucose [Mass/volume] in Urine
by Test strip
41653-7: Glucose [Mass/ volume] in Capillary blood by Glucometer