Ontology of imaging datasets as a prerequisite for ontologies of imaging
biomarkers
Bernard Gibaud
MediCIS, LTSI, U1099 InsermFaculté de médecine, Rennes
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
1
Acknowledgements
• Former partners of the NeuroLOG project (supported by ANR)
• CrEDIBLE project (CNRS initiative for Big Data in science), and my colleagues from this project
• Former colleagues of the DICOM WG6 and WG23, especially David Clunie and Larry Tarbox
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
Gilles Kassel Michel Dojat Bénédicte Batrancourt Lynda Temal Johan Montagnat Alban Gaignard (Amiens) (Grenoble) (Paris) (Paris)
(Sophia-Antipolis)
2
Overview
• Introduction (scope and motivations)
• Part 1. Modeling datasets
• Part 2. Modeling datasets related actions
• Part 3. Modeling imaging biomarkers
• Conclusion
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
3
Introduction
Scope and motivations
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
4
Imaging biomarkers
• Definition of biomarkers (Atkinson 2001)*
– « characteristics that are objectively measured and evaluated as indicators of
• normal biological processes, • pathological processes, • pharmaceutical responses to a therapeutic
intervention »
• Definition of imaging biomarkers– Derived from medical images
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
* Clin Pharmacol & Ther. 2001 Mar;69(3):89-95. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Biomarkers Definitions Working Group.
5
Imaging biomarkers
• Of critical importance in research– Focused clinical research (e.g. controlled clinical
trials)– Translational research
• Link/correlate results obtained in various domains• Need to share them at a broad scale federated imaging biobanks (incl. imaging biomarkers)
• Of critical importance in (future) care delivery– Involved in decision criteria (with other biomarkers)
• Diagnosis• Therapy (prognosis)
– Key aspect of a structured EHR / tasks planning
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
6
General framework
Reality
Human sujectAnimal subject
Specimenetc.
Acquisition
Images
MR imageCT image
PET imageetc.
Imaging biomarkers
Processing
Volume of anatomical structure
Fractal dimensionMean reg. blood
volumeLesion load (MS)
etc.
FactsPlans, etc.
Decision
Diagnosis of ADDiagnosis of MS
Resp to treatmentetc.
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
7
Importance of context
Scientific question to be
answeredor
Clinical question
Set of required imaging
biomarkers
Decision
Detailed imagingprotocol
ProcessingAcquisition
Detailed subject/spec
imen preparation
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
8
Need for standards
• Standard vocabulary used as metadata to consistently refer to:– Imaged objects and phenomena (in reality)– Image acquisition and image processing artifacts (devices,
software)– Images and any relevant datasets resulting from image
processing– Imaging biomarkers– Context and motivation of acquisition / processing of
images
• Standard formats for images– e.g. DICOM, NifTi, TIFF, GIF, JPEG
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
10
Acquisition
Reality Images Imaging biomarkers
FactsPlans, etc.
Processing Decision
What kind of vocabulary ?
• What metadata structure ?– Simple hierarchy of related terms: example *
– Complex hierarchies of data items: example DICOM• DICOM Part 3 : Information Object Definition / Module / Data Element (1260 pages)• DICOM Part 16 : Terminology + Structured Report templates (1034 pages)
– Formal vocabulary (ontology)• Set of related complementary markers: Ontology versus data model ?
• What method of development ?– Defined as a standard (by a standards development body, e.g.
DICOM) – or freely extendable by users ? Example *
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
* Plant et al. New concepts for building vocabulary for cell image ontologies. BMC Bioinformatics 2011, 12:487.
11
Part 1. Modeling of datasets
Experience from the NeuroBase and NeuroLOG projects
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
12
Goals of the NeuroLOG project
(mid-2007 end-2010)• To set up a federated system, allowing the
sharing and re-use of:− Neuroimaging data (images and related technical, demographical
and medical metadata)
− Processing tools published by cooperating partners
• Ontology modeled according to OntoSpec methodology *– Based on DOLCE– OntoSpec semi-formal document
* Kassel 2005
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
13
Example of OntoSpec representation
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
14
OntoNeuroLOG ontology
• Three modules in OWL– (available on BioPortal since 09/2013)– Dataset processing (ONL-DP)
• Includes an ontology of Datasets
– MR dataset acquisition (ONL-MR-DA)– Mental state assessment (ONL-MSA)
• OntoSpec documents on line as well
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
15
Ontology of datasets: scope
• Primarily focusing on, (but not limited to) clinical imaging modalities (CT, MRI, etc.)
• Considers also « processed images », e.g. result of segmentation, registration, diffusion tensors, fiber tracks, etc.
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
17
Ontology of datasets: approach
• Based on DOLCE and a number of core ontologies– Language, IEC (Inscription, Expression &
Conceptualization), D&M (Discourse & Message), etc.
• Considers Dataset as a Proposition (i.e. document content)– is expressed by an Expression (i.e. representation in a
particular format)– is physically realized by an Inscription (e.g. one or more
files)
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
18
Ontology of datasets: major choices
• Fundamentally composed of two parts *:– a Dataset set of values part (aggregate of atomic values) – a Dataset metadata part
• Datasets may be categorized along five classification axes, based on– Imaging modality (e.g. CT dataset, MR dataset, PET dataset)– Some processing that generates them (e.g. Segmentation,
Registration)– What is being explored (e.g. Anatomical dataset, Functional data,
Metabolic dataset)– Number of subjects that it characterizes (e.g. Single subject dataset,
Multiple subjects dataset)– Reconstructed dataset or Non-reconstructed dataset (i.e. raw data)
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
19* Temal et al., JBI 2008
ExamplesTaxonomy of MR dataset Taxonomy of
Parameter quantification dataset
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
20
Dataset: unity criterium
• What determines the content of a dataset ?Example 1: images, i.e. Sets of elementary data values (pixels / voxels)–‘Basic 2D image’ (‘frame’ in DICOM jargon) –or ‘Set of 2D images’ (‘stack’ in DICOM jargon) –or ‘All images acquired in a single acquisition’ (‘multi-frame image’ in DICOM jargon)
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
21
Dataset: unity criterium
• What determines the content of a dataset ?Example 2: tractographic data–A ‘particular tract’ connecting 2 voxels–or ‘All tracts extracted by a tractographic algorithm’
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
22
Dataset: unity criterium
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
23
• Our (pragmatic) choice– Data values obtained during a single acquisition /
processing– And pertaining to a single subject or group of subjects
• Why ?– Strongly related to provenance– Facilitates datasets’ reuse
• Subject-oriented retrieval• Single entry of image processing tools
• But …– May require further description of dataset structure – e.g. MR segmentation image with multiple ROIs
Datasets: Identity criterium
• How to distinguish two dataset instances ?
• Our (pragmatic) choice – Based on creation context (rather than actual
data values)– Consequences
• Two acquisitions always result in distinct datasets• Datasets are immutable: any dataset processing results
in a new dataset
• In practice, how to identify, re-identify datasets ?
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
24
Ontology of datasets: still open issues
how datasets’ values relate to each other ?
– Case 1: Mutidimensional matrix: a 2D ou 3D map of a given parameter (spatio-temporal map)
Modeling with notion of field *• The same measurement is made at every point of a sampling grid• Not limited to scalars, can also concern vectors and tensors
First try made in the context of DICOM WG23 **Needs to be modeled / revisited as a real ontology
• Measured qualities, e.g. MR intensity signal, proton density, density in HU
• Scales of measurement• Sampling grids• Measurement of time-dependent phenomena
* W Kuhn, Core concepts of spatial information for transdisc. Research, IJGIS 2012** Abstract multidimensional image model, DICOM Part 19 (WG23), 2009
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
26
Ontology of datasets: still open issues
how datasets’ values relate to each other ?
– Case 2: Network datasets: e.g. tractography, vascular tree
Needs to model the semantics of nodes and links
– Case 3: Combination of 4 x 4 matrices: e.g. result of image registration
– Case 4: Meshes (e.g. surface of objects)
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
27
Ontology of datasets: still open issues
regions of interest*– ROI (basic taxonomy based on shape)
• To denote a subset of the pixels / voxels within a Dataset
– ROI annotations• To relate a ROI to an object in reality• To associate a measurement referring to this
object (via references to a quality and a quale)
– ROI annotation collection• Several annotations made by the same agent in
the same action
* Temal et al., JBI 2008
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
28
Part 2. Modeling of dataset related actions
Experience from the NeuroLOG project
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
30
Ontology of dataset related actions: scope
• Dataset acquisition– Action involving a subject who physically participates in
the action (as affected)
• Dataset processing– Image processing actions that apply to Datasets and
produce Datasets or Imaging biomarkers
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
31
Acquisition
Reality ImagesImaging
biomarkersFacts
Plans, etc.
Processing Decision
Ontology of dataset acquisition: approach
• Based on DOLCE and a number of core ontologies– Action, Participant role, etc.
• Specifies participating entities and output, i.e.– has for instrument a Planned acquisition protocol – has for instrument a Dataset acquisition equipment – has for result a Dataset (i.e. produced as output
dataset)
– Example MR dataset acquisition
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
32
Ontology of dataset processing: approach
• Based on DOLCE and a number of core ontologies– Action, Participant role, etc.
• Specifies contraints on input / output– has for data a Dataset (i.e. used as input dataset)– has for result a Dataset (i.e. produced as output
dataset)
– Example Diffusion tensor calculation
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
33
Ontology of dataset processing: major choices
• Major classes of Dataset processing, modeled as conceptual actions– Dataset arithmetical operation– Dataset transformation (e.g. Fourier, wavelet)– Filtering (e.g. convolution, mathematical morphology filtering)– Registration– Reconstruction– Resampling– Quantitative parameter estimation– Segmentation– Restoration– Mesh generation– Statistical analysis
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
34
ExamplesTaxonomy of
Quantitative parameter estimationTaxonomy of Registration
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
35
Part 3. Modeling imaging biomarkers
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
36
Modeling imaging biomarkers• Note: It is important to distinguish…
– Imaging biomarker as result of a measurement– from its role in some medical decision (e.g. diagnosis,
prognostic)
– In this talk, we focus on the first, only
• Main aspects to address– Measure– Relation to reality– Provenance– Context
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
37
Imaging biomarkers: measure
• Is the result of some measurement process (manual or implemented in image processing software)
• Indirectly involves a physical object under study, and / or a process under study (dynamic process or longitudinal process) in which this object participates– Note: This object is usually part of the image’s field of view
• Concerns a specific quality of this object, or of the process under study)– Note: This quality may be a complex human construct (e.g.
model-based: fractal dimension, gyrification index)
• Values chosen from a predefined scale of measurement– interval, ratio, ordinal, nominal (categorical)
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
38
Imaging biomarkers: relation to reality
• A Measurement of a quality beared by an object
• Or a Measurement of a temporal quality of the process under study
• (Simple) Examples– Volume of hippocampus (in cm3)– Speed of brain atrophy process – neuronal loss (in
cm3/year)– Mean Fractional Anisotropy over uncinate fasciculus
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
39
Imaging biomarkers: provenance
• Execution of a program implementing some conceptual action
• Resources used of this execution (user, date, platform)
• Input data (datasets, ROIs, imaging biomarkers)• Input parameters (if any)
• Some open issues– Complexity of image processing pipelines
need of description at several granularity levels
– W3C PROV-O, but which upper level ontology ?
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
40
Imaging biomarkers: context
• Case 1: Relation to a research question– Measurement process is part of the execution of
research protocol– Context is provided by the research goal and
protocol
• Case 2: Relation to a clinical question– Measurement process is part of the actions
performed to answer the clinical question (possibly detailed via a protocol, and/or a report template)
– Context is provided by the clinical question and associated clinical information
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
41
Conclusion: how to progress…
• Toward defining suitable ontologies …– Select / improve / complete relevant domain ontologies e.g. OBI / IAO / OCRe / PATO / FMA / RadLex/ QIBO / OntoNeuroLOG
– Especially w.r.t. observation & measurement – Collaborate with DICOM – as well as the editors of important image processing
software (Freesurfer, FSL, SPM, 3D-slicer, etc.)
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
42
Conclusion: how to progress…
• Towards deployment and federation of imaging biobanks1. Setup semantic resources that complement (rather than
replace) existing image repositories• Start with basic (image) dataset categories• Continue with image processing actions and imaging
biomarkers
2. Progressively evolve the image repositories to more closely follow the ontology (entities, relationships)
3. Equip image processing pipelines to natively produce semantic annotations
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
43
Thank you for your attention
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
44
Extra slides
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
45
Ontology: 3-level structure
• Application ontology (called OntoNeuroLOG)• one Foundational ontology (DOLCE)• Several Core ontologies• Several Domain ontologies
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
46
Ontology: 3 representations
1. OntoSpec representation (Kassel, 2005)
– Semi-formal notation (rich semantics)– Numerous axioms
2. OWL-Lite– Edited with PROTÉGÉ– Tailored to perform inferences with CORESE (search
engine)
3. Federated relational schema– Entities and relations are closely linked to concepts and
relations of the ontology
Ontology and Imaging Informatics Workshop, 23-25 June 2014, Amherst (NY)
47