stanford'12 intro to ontology based data access for rdbms through query rewriting

Post on 02-Nov-2014

529 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Seminar on Ontology Based Data Access for RDBMSs through query rewriting at Stanford's BMIR lab. 2012.

TRANSCRIPT

ONTOLOGY BASED DATA ACCESS Architecture, Techniques and Systems

Mariano Rodríguez-Muro KRDB Research Group

Free University of Bozen-Bolzano BMIR, Stanford February, 2012

ONTOLOGIES Reasoning and Data

OBDA: Architecture, Techniques and Systems

Ontologies

• A formal conceptualization of a domain of interest • They come in many different

languages: RDFS, OBO, OWL 2, SWRL, etc. • Uses • Documentation • Knowledge Exchange • Discovering new knowledge • Ontologies + Data…

OBDA: Architecture, Techniques and Systems

Instance reasoning •  Instance reasoning •  Infer new information about the data •  Detect inconsistent data •  Use inferred information for complex queries (e.g., SPARQL)

• Queries •  Is :person/mariano an instance of :Mammal? •  Retrieve all instances of :Mammal •  SELECT ?x, ?y WHERE { ?x a :Mammal; :hasAncestor ?y. ?y a :Mammal }

• Requirements •  Fast execution •  Efficient resource management •  Big data, Big ontologies

OBDA: Architecture, Techniques and Systems

The usual workflow

OBDA: Architecture, Techniques and Systems

Reasoner

Source

Application

Communication

Ontology

Inputs

Triples Application Code

Problem with approach •  Software Complexity • Duplication • Data refreshing

• Data structure is lost (PKEYS, FOREIGN KEYS, information about the import procedure)

OBDA: Architecture, Techniques and Systems

Reasoner

Source

Application

Communication

Ontology

Inputs

Triples Application Code

OBDA Models and Architecture

OBDA: Architecture, Techniques and Systems

OBDA as an Architecture

OBDA: Architecture, Techniques and Systems

Reasoner

Source

Application

Direct Communication

Ontology

OBDA Model

Inputs

OBDA Models: Sources and Mappings

“A formal specification of the relationship between data in a data source and the vocabulary of the ontology”

OBDA: Architecture, Techniques and Systems

OBDA Model

Source

Source Declaration A set of mappings

Mapping

“A tuple of 2 queries, one over the source and one over the ontology, with the same signature. Intuitively, a mapping associates the data specified by qs with the answers for qo ”

OBDA: Architecture, Techniques and Systems

qs⊆qo

SELECT id FROM condition WHERE c_id = 3333

⊆ CardiacArrestPatient(?id)èq(?id)

id = (23) <23> rdf:type CardiacArrestPatient

Example OBDA model

OBDA: Architecture, Techniques and Systems

SELECT id FROM condition WHERE c_id = 3333

⊆ CardiacArrestPatient(?id) è q(?id)

SELECT id,name,age,ssn FROM patient ⊆ Patient(?id) ^ name(?id,?name)

^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn)

id [PKEY] name age ssn

12345 John 37 xxx-999

… … … …

Table: patient

patient_id [FKEY] c_id [FKEY]

12345 3333

… …

Table: condition

Example OBDA model

OBDA: Architecture, Techniques and Systems

id [PKEY] name age ssn

12345 John 37 xxx-999

… … … …

Table: patient

patient_id [FKEY] c_id [FKEY]

12345 3333

… …

Table: condition

<12345> rdf:type :Patient. <12345> :name “John”. <12345> :age “37”. <12345> :ssn “xxx-999” <12345> rdf:type :CardiacArrestPatient …

The Pay-off • At least •  The source is documented •  Data handling can be done automatically (by the reasoner) •  Reduced cost of application development and maintenance •  The reasoner can analyze source and mappings to minimize the cost of

inference

• The sweet spot •  On-the-fly data access •  Reasoning by query rewriting •  Exploitation of efficient engines

OBDA: Architecture, Techniques and Systems

QUERY REWRITING

OBDA: Architecture, Techniques and Systems

Query Rewriting in a Nutshell

• Given a query Q, a TBox T, an OBDA model <D, M> to compute a query Q’ such that:

answer(Q,T,mat(D,M)) = answer(Q’,D)

where mat(D,M) is the collection of assertion resulting from “materializing” the mappings into ABox assertions (assertional triples)

OBDA: Architecture, Techniques and Systems

Example OBDA model

OBDA: Architecture, Techniques and Systems

SELECT id FROM condition WHERE c_id = 3333

⤳ CardiacArrestPatient(?id) è q(?id)

SELECT id,name,age,ssn FROM patient ⤳ Patient(?id) ^ name(?id,?name)

^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn)

id [PKEY] name age ssn

12345 John 37 xxx-999

… … … …

Table: patient

patient_id [FKEY] c_id [FKEY]

12345 3333

… …

Table: condition

Query Rewriting: An example

OBDA: Architecture, Techniques and Systems

Ontology (Tbox)

SubClassOf(:CardiacArrest :HearthCondition) SubClassOf(:CardiacArrestPatient :Patient) SubClassOf(:CardiacArrestPatient ObjectSomeValuesFrom(:affectedBy :CardiacArrest))

Query (SPARQL)

SELECT ?p ?name ?ssn WHERE { ?p a :Patient; :name ?name; :ssn ?ssn; :age ?age

:affectedBy [ a :HeartCondition

]. FILTER (?age >= 21 && ?age <= 50) }

Query Rewriting: An example

OBDA: Architecture, Techniques and Systems

Rewritten query

SELECT ?p ?name ?ssn WHERE { {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age

:affectedBy [ a :HeartCondition

]. FILTER (?age >= 21 && ?age <= 50) }

UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age

:affectedBy [ a :CardiacArrest

]. FILTER (?age >= 21 && ?age <= 50) }

UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age; a :CardiacArrestPatient. FILTER (?age >= 21 && ?age <= 50) }

UNION … }

Query Rewriting An Example

OBDA: Architecture, Techniques and Systems

SQL query

SELECT tp.id as p, tp.name as name, tp.age as age FROM patient tp JOIN condition tc ON tp.id = tc.patient_id WHERE c.c_id = 3333 AND tp.age >= 21 AND tp.age <= 50

?p ?name ?ssn

12345 John xxx-999

Answer

“Fast execution even in the presence of millions of assertions”

That Simple? • Warning: Query rewritings can easily grow to exponentially. • Effective query rewriting requires: •  Highly efficient rewriting algorithm that is able to detect redundancy •  Highly efficient SQL generation: •  Detect redundant SQL (w.r.t. constraints and mappings) •  Optimize individual SQL queries (w.r.t. constraints and mappings) •  Generate optimal SQL (w.r.t. the database engine) •  Able to deal with impedance miss-match (URIs and Literals vs. Data values)

•  Database engine tuning (indexing, buffers, disk, etc.)

• Effective query rewriting gives you: •  Fast system initialization •  Small footprint •  Fast query execution

OBDA: Architecture, Techniques and Systems

Efficient Languages (for pure query rewriting)

• RDFS, DL-Lite, OWL 2 QL • Datalog+- • DL-lite/OWL 2 QL/Datalog+- fragments of SWRL

Promising Languages (for combined approaches) •  EL++ and OWL 2 EL •  OWL-Horst and OWL 2 RL •  SWRL with limited recursivity

OBDA: Architecture, Techniques and Systems

SYSTEMS OBDALib, OBDA Plugin for Protègè 4

OBDA: Architecture, Techniques and Systems

OBDA as an Architecture

OBDA: Architecture, Techniques and Systems

Ontology

Reasoner

OBDA Model

Source

Application

Communication

Inputs

OBDALib A Java library for: •  OBDA Model creation and manipulation •  OBDA Model persistence •  Interfaces for OBDA-capable reasoners •  SQL parsing and Datalog translation •  RDBMS metadata extraction libraries •  OBDA model materialization

In the near future: •  Automatic OBDA model generation (compatible with W3C’s RDB2RDF

direct mapping) •  Support for W3C’s R2RML syntax

OBDA: Architecture, Techniques and Systems

OBDA Plugin for Protégé 4

“A plugin to write and test OBDA models interact with OBDA-capable reasoners”

OBDA: Architecture, Techniques and Systems

OBDA Model tab and tools

OBDA: Architecture, Techniques and Systems

OBDA Model tab and tools

OBDA: Architecture, Techniques and Systems

OBDA Model synch

An EditorKitHook plugin to: • Associate an OBDA

model to the editor environment • Synchronize OBDA

models with OBDA-capable reasoners

OBDA: Architecture, Techniques and Systems

DataQuery Tab

OBDA: Architecture, Techniques and Systems

SYSTEMS Quest

OBDA: Architecture, Techniques and Systems

Quest An OBDA-capable reasoner with focus on fast and efficient query answering over very large ontologies and volumes of data. Features: •  Support for RDFS and OWL 2 QL and DL-Lite •  SPARQL

• On-the-fly reasoning based on query rewriting •  Read-only “Virtual OBDA” •  Read/Write “Triple-store” mode

• Generation of highly optimized SQL

• OWLAPI 3 and Protégé support

OBDA: Architecture, Techniques and Systems

Quest in virtual mode

OBDA: Architecture, Techniques and Systems

Ontology

Quest

OBDA Model

Source

Application

JDBC

Inputs

MySQL, PostgreSQL, DB2 and Oracle

Data integration with Quest in virtual mode

OBDA: Architecture, Techniques and Systems

Ontology

Quest

OBDA Model

Database Federator

Application

JDBC

Inputs

E.g., Teiid

Read/Write triple-store mode

OBDA: Architecture, Techniques and Systems

Ontology

Quest

Triples

JDBC Storage

Application

JDBC

Storage is is based on the Semantic Index technique (ISWC11, KR12)

Technique based on “smart index” computation that allows to retrieve hierarchy inferences by means of interval queries (FAST SQL!)

Performance in triple-store mode: Resource Index Experiments •  Input: •  Ontology: The asserted is-a relations in obs_relation (for all RI ontologies) •  Data: The annotations for Clinical Trials.gov •  Queries e.g,.

SELECT ?x WHERE { ?x a :DNA_Repair_Gene; a :Antigen_Gene; a :Cancer_Gene. }

OBDA: Architecture, Techniques and Systems

Performance in triple-store mode: Resource Index Experiments • System setup costs: •  Resource Index workflow: •  Ontology Closure: X ? •  CT annotation closure: 7 days (naïve), 40 mins optimized •  Space requirements for CT: 16 GB + isa-closure: 70 GB

•  Using a naïve implementation of Quest’s reasoning technique for the RI: •  Ontology Closure: 5 mins •  CT annotation closure: none •  Space requirements for CT: 16 GB

• Execution speed: roughly the same • Potential to eliminate all _isa_annotation_tables and the closure of relation_isa.

OBDA: Architecture, Techniques and Systems

DEMO

OBDA: Architecture, Techniques and Systems

CONCLUSIONS

OBDA: Architecture, Techniques and Systems

Summary • OBDA as an architecture •  Benefits: Software Complexity, Optimization and On-the-fly query

answering

• Basis of query rewriting in OBDA •  Introduced •  OBDALib •  OBDA Plugin for Protégé •  Quest

• Briefly mentioned the performance advantages of Quest’s reasoning technique

OBDA: Architecture, Techniques and Systems

Where to go now? • Resource index overhauling? • Demos? • More detail on the techniques? • More details on the systems? • Development and plugins for Protege •  Projects?! • You call it J

OBDA: Architecture, Techniques and Systems

THANK YOU

OBDA: Architecture, Techniques and Systems

top related