fédération de données et de connaissances distribuées en ... · mediation interface to access...

30
http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 1 fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE consortium CNRS/UNS, laboratoire I3S (UMR7271), équipe MODALIS INRIA/CNRS/UNS, laboratoire I3S (UMR 7271), équipe Wimmics INSERM U1099, laboratoire LTSI, équipe MediCIS U. Picardie, laboratoire MIS, équipe Connaissances CNRS/INSERM/INSA/U. Lyon 1, laboratoire CREATIS (UMR 5220 / U1044)

Upload: others

Post on 25-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 1

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE

Data fusion, semantic alignment, distributed queries

Johan MontagnatCNRS, I3S lab, Modalis team

on behalf of the CrEDIBLE consortiumCNRS/UNS, laboratoire I3S (UMR7271), équipe MODALIS INRIA/CNRS/UNS, laboratoire I3S (UMR 7271), équipe WimmicsINSERM U1099, laboratoire LTSI, équipe MediCISU. Picardie, laboratoire MIS, équipe Connaissances

CNRS/INSERM/INSA/U. Lyon 1, laboratoire CREATIS (UMR 5220 / U1044)

Page 2: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 2

Motivations● Biomedical data

– High heterogeneity: images, clinical data, biomarkers, biology....– Increasing amount / number of (open) sources – Big Data

● Large-scale medical studies

(statistical medical studies, epidemiology...)

– Need for cross-factors analysis – Linked Data● Data (re)analysis opportunities● Translational research

● Centralized approaches encounter limitations– Multiple data source kinds– Large data volumes to transfer / archive / search– Sensitive patient data / complex access control policies– Need to adopt uniform data model & format

● Data is de facto distributed over acquisition centers

Page 3: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 3

Biomedical data mediation & federation– Data federation through distributed querying and query rewriting

– Heterogeneous databases schema mediation– Medical data & metadata:

● raw data + models + processing results + models + provenance...

Client

Federator(query decomposition, planning & results federation)

Mediator(ETL or query rewriting)

Mediator(ETL or query rewriting)

Site 1 Site 2

Query-based access to data

Remote sub-queries

Page 4: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 4

Domain ontology-based federation

Client

Federator

Mediator MediatorSite 1 Site 2

Query-based access to data

Remote sub-queries

Medical domain ontology(reference model)

Data querying

Dataalignment

Page 5: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 5

Challenges and expertise

● Challenges– Representation of data semantics for heterogeneous

data sources → biomedical ontology building– Data federation → distributed query engine– Data mediation → RDB2RDF, ontology alignment

● Partnership

I3S/ModalisINRIA/I3S/WimmicsLTSI/MediCISMIS/ConnaissancesCREATIS

Semantic reference

Semantic representation

Distributed query engine

Medical applications

Page 6: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 6

Scientific networking – annual workshop● Objectives

– Multidisciplinary workshop gathering experts in biomedical data representation, semantic, distribution, federation, integration...

● October 2012: data integration, ontologies, data models, reasoning– Semantic models are now widely accepted although ontologies

resources are not sufficient– Existing systems are mostly centralized but strigent need to

support multi-centric studies● October 2013: data reuse, ontologies, mediation,

federation, graphs– Both horizontal and vertical data partition schemes need to be

addressed– Data models in use are often constructed bottom-up– Expressivity of the query language is important

Page 7: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 7

Bibliography study

● Distributed (semantic) Query Processing– Report CrEDIBLE-12-2, november 2012– A. Gaignard PhD thesis, march 2013, U. Nice Sophia

● Relational data mediation– Report I3S 2013-04-FR, november 2013– Submitted to Journal of Web Semantics

● Ontology of scientific measures (data acquisition, metrology)– Report CrEDIBLE-13-1, december 2013

Page 8: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 8

Reference ontology● 3-levels structure: foundational (DOLCE), core, domain● Domain-specific rules

– Inference abilities

Particular

Endurant Perdurant

Action

Dataset

acquisitionsConceptualization

Expression

Inscription

Dataset

acquisition

equipment

Studies

Examinations

Dataset

processings

Datasets Medical

image

expressions

Medical

image

files

Subjects

Investigators

Centres

Languages

Medical

Image

formats

● DataTop ontology– Current focus

on measurements● Derived relational

schema

Page 9: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 9

Ontology modules

● Modularized ontology to improve reuse and lightweightness– ONL-MR-DA: MR Dataset Acquisition– ONL-DP: Data Processing– ONL-MSA: Mental State Assessment– OntoVIP: Medical Image Simulation

● Wide diffusion– http://bioportal.bioontology.org/ontologies

Page 10: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 10

Data query and federation engine● KGRAM (Knowledge Graph Abstract Machine) Semantic

query engine:– Full support of SPARQL1.1– Generic interface for heterogeneous backends– Flexible architecture facilitating different deployment scenarios

● Mediation interface to access relational data– Federated relational schema derived from the ontology

Page 11: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 11

Distributed Query Processing● Query federator decoupled from data sources● Asynchronous querying of multiple data sources● Query planning and parallel querying

Page 12: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 12

Distributed Query Processing● KGRAM query processing

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) }

Q

Page 13: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 13

Distributed Query Processing● KGRAM query processing

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Page 14: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 14

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Page 15: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 15

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Page 16: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 16

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1

Page 17: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 17

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1

Page 18: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 18

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1 Q2'

Page 19: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 19

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1 Q2'

Page 20: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 20

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1 Q2' Q2''

Page 21: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 21

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1 Q2' Q2''

Page 22: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 22

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1 Q2' Q2''

Page 23: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 23

Distributed Query Processing● KGRAM query processing

● Asynchronous execution

Q1

SELECT ?name ?dateWHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q

Interface

KGRAM core

Data source#1

Data source#2

Q

Q1 Q2' Q2''

Page 24: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 24

Performance results● Mixed (relational / semantic) stores querying

● FedBench standard benchmark

Page 25: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 25

Exploitation in ANR GINSENG (epidemiology)

● Partnership with GINSENG consortium and Mnemotix SME

● Federation of heterogeneous epidemiology repositories– Multiple epidemiology data

acquisition networks– Cross-correlation with “external”

data (e.g. demographic: IGN)● Mediation of the EHR

(Electronic Health Record) data schema

Page 26: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 26

Collaboration with Pitié Salpétrière

● CAC database– Neurodegenerative diseases (esp. Alzheimer's)– Mediation of relational CAC schema

● Comparative study of different relational-to-RDF mediation techniques– R2RML transformation language– Test case for various RDB2RDF conversion tools

Page 27: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 27

Perspectives● DataTop ontology development

– Data distribution (spatial and temporal) and data collections structure

– Alzheimer disease● Query engine architecture

– Code modularity for alternative optimization strategy implementation

● Query performance optimization– Query plan improvement (especially when dealing both with

horizontal and vertical partitioning)● Mediation

– Dynamic mediation of data sources– Integration of new medical data sources

Page 28: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 28

Perspectives: collaborations

● Université de Laval au Québec– Ontology design: Alzheimer's disease data

● ANR CONTINT 2013 BIOMIST– Industrial system for large databases management

(Product Lifecycle Management - PLM)– Application to medical databases in the area of brain

functions analysis ● France Life Imaging excellence infrastructure

– National-level infrastructure for medical data sharing and analysis

Page 29: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 29

Publications● Peer-reviewed

– Medical domain: iDash image informatics workshop, sept. 2012; DCICTIA-MICCAI workshop, oct. 2012

– Distributed Query Processing: Web Intelligence, dec. 2012; Journal of Web Semantics (submitted)

– Mediation: Journal of Web Semantics (submitted)● PhD thesis

– A. Gaignard (march 2013, U. Nice Sophia Antipolis)– N. Cerezo (december 2013, U. Nice Sophia Antipolis)

● Research reports– Workshop summaries: #12-1, #14-1– Annual reports: #12-3, #13-3– Bibliography studies: #12-2, I3S/2013-04-FR

Page 30: fédération de données et de ConnaissancEs Distribuées en ... · Mediation interface to access relational data – Federated relational schema derived from the ontology. ... –

http://credible.i3s.unice.fr MI CNRS – bilan 2013 – Paris, 23/1/2014 30

Conclusions● Query-based data federation grounded on

semantic web standards (SPARQL, RDF, RDFS)– Emphasis on query language expressivity– Support for both horizontal and vertical data partitioning– Broad applicability (given that ontologies are available)

● Ontology-based– Reference model for data alignment and query terms

● Currently using Extract-Transform-Load– Static conversion of data sources in RDF format– Work on dynamic data mediation on-going

● Optimization work on-going– Flexible software architecture– Distributed query processing optimization