ontology-based data access with existential rules
DESCRIPTION
TRANSCRIPT
Ontology-based Data Access
with Existential Rules
Marie-Laure Mugnier
Université Montpellier 2
RuleML 2012
Ontology
Query
Answers ? Knowledge Base
Data
Ontology-‐based Data Access (OBDA)
Patient records Medical ontology
« Patient P suffers from myeloid leukemia and has been prescribed drug D against hypertension »
Query: « find all cancer patients treated for high-blood pressure »
à Use ontological knowledge to infer all deducible answers
Ontology-‐based Data Access: What for?
■ Enrich the vocabulary of data sources à Abstract from a specific database schema
■ Relate the vocabulary of different data sources à Provide a unified view to the user
■ Allow inference of new facts à Allow data incompleteness
Outline
n The existential rule framework for OBDA rule-based, logic-based and graph-based n Decidability, complexity and algorithmic issues n A tool for combining decidable classes of rules
n Perspectives
Data / Facts
Relational Database RDF (Semantic Web)
rdf:type
ex:parent ex:A
F
rdf:type
ex:B
M
A B A C C ? … …
parentOf B …
Male Fem. A …
ex:C
Abstraction in first-order logic
∃x( parentOf(A,B) ∧ parentOf(A,C) ∧ parentOf(C,x) ∧ F(A) ∧ M(B) )
Etc.
ex:parent
Or in graphs / hypergraphs
1 2 A B p
1 2 C p
p 1
2
F M
Ontology (1)
Concepts Relations
+ properties on concepts and relations:
Human
Male Female Adult
Father Mother
sameFamilyAs
parentOf
ancestorOf uncleOf
siblingOf
brotherOf …
motherOf
The relation ancestorOf is transitive The inverse of the relation fatherOf is functional The concepts Male and Female are disjoint The relation siblingOf can be defined from the relation parentOf
Ontology (2)
Abstraction with rules in First-Order Logic
• Specialization relationships between concepts / relations ∀x (Male(x) à Human(x)) ∀x ∀y (parentOf(x,y) à ancestorOf(x,y))
• « ancestorOf is transitive » ∀x ∀y ∀z (ancestorOf(x,y) ∧ ancestorOf (y,z) à ancestorOf (x,z)) • « the inverse of fatherOf is functional » ∀x ∀y ∀z (fatherOf(y,x) ∧ fatherOf(z,x) à y = z) • « Male et Female are disjoint » ∀x (Male(x) ∧ Female(x) à ⊥) • Definition of siblingOf ∀x ∀y ∀z (parentOf(x,y) ∧ parentOf(x,z) à siblingOf(y,z)) ∀x ∀y (siblingOf(x,y) à ∃ z parentOf(z,x) ∧ parentOf(z,y))
« Value Invention »
Body Head
X, Y, Z : tuples of variables
Any conjunction of atoms (on variables and constants)
∀x ∀y (siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y)))
§ Same as Tuple Generating Dependencies (TGDs) § See also Datalog+/- § Same as the logical translation of Conceptual Graph rules
∀X ∀Y ( B[X, Y] →∃Z H[X, Z] )
ExistenCal Rules
Simplified form: siblingOf(x,y) à parentOf(z,x) ∧ parentOf(z,y)
F = siblingOf(A,B)
The resulting fact is F’= F ∪ h(head) [with renaming existential variables of head ]
Value invenCon
R = ∀x ∀y (siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y)))
F’= ∃ z0 (siblingOf(A,B) ∧ parentOf(z0,A) ∧ parentOf(z0,B))
h ={(x,A), (y,B)}
h: body à F
A rule bodyà head is applicable to a fact F if there is a homomorphism h from body to F
1 S
A
B 2
P
P2 S
1 2
2
1 x
y 1
z
1 S
A
B 2
P
P
2
2
1
1 z0
Wi
ExistenCal Rules cover « lightweight » DescripCon Logics
n New DLs tailored for ontology-based data access:
} DL-Lite EL Core of the « tractable profiles » of OWL2
q Existential rules are strictly more expressive:
Human ⊑ ∃parentOf _.Human
Human(x) à parentOf(y,x) ∧ Human (y)
siblingOf(x,y) à parentOf(z,x) ∧ parentOf(z,y) not expressible in DL
q Non-bounded predicate arity provides more flexibility: à direct correspondence with database relations à adding contextual information is easy
P
P2 S
1 2
2
1 x
y 1
z
Logical [and graphical] framework
Existential Rule: ∀X ∀Y ( B[X, Y] → ∃Z H[X, Z] )
Existential Rules
Equality Rules
Constraints
(Union of)
Conjunctive Query
Equality rule: ∀X (B[X] → x = e) with x,e var. or const. Negative constraint: ¬ B or ∀X (B[X] → ⊥) Positive constraint: same form as an existential rule
Knowledge Base
Facts
Answers ?
(∨) ∃X F[X]
Similar Framework: Datalog +/-‐
TGDs
EGDs
Negative Constraints
Knowledge Base
Database
[Cali Gottlob Lukasiewicz PODS 2009]
(Union of)
Conjunctive Query
Answers ?
The Conceptual Graph origins
n Conceptual Graphs introduced in [Sowa 76] [Sowa 84] n Specific research line by Montpellier’s group since 1992 Graph-based knowledge representation and reasoning
« Graph-‐Based Knowledge RepresentaFon: ComputaFonal FoundaFons of Conceptual Graphs », M. Chein & M.-‐L. Mugnier, Springer, 2009
Conceptual Graph vocabulary: 1. parFally (pre-‐)ordered set of concept types
[screenshots from CoGui, http://www.lirmm.fr/cogui]
Conceptual Graph vocabulary: 2. parFally (pre-‐)ordered set of relaCons with their signature [any relaFon arity allowed]
Logical translaCon of the preorders and signatures: p < q ∀x1…xk ( p(x1…xk) → q(x1…xk) ) Signature of r ∀x1…xk ( p(x1…xk) → ti1(x1)…tik(xk))
Basic Conceptual Graph
Allows to represent facts and conjuncCve queries
Logical translaCon (Φ) : existenCally closed conjuncCon of atoms
[more generally: total order on the edges incident to a relation node]
∃x ∃y (Girl(Eva) ∧ Child(x) ∧ Toy(y) ∧ Train(y) ∧ sisterOf(Eva,x) ∧ playWith(Eva,y) ∧ playWith(x,y))
Eva
x
y
Logical soundness [Sowa 84] and completeness [Chein Mugnier 92]: there is a homomorphism from Q to F iff Φ(Q) is entailed by Φ(F) and Φ(vocabulary)
Query Q
Fact F
The Basic CG fragment restricted to binary relaFons is equivalent to RDFS [Baget ISWC’05] [Baget+ ICCS’10]
Homomorphism (with vocabulary preorders integrated)
Richer fragments (nested graphs, rules, constraints, + negaCon, …)
¢ Rules : pairs of basic conceptual graphs
∀x ∀y (Human(x) ∧ Human(y) ∧ siblingOf(x,y) à ∃ z (Adult(z) ∧ parentOf(z,x) ∧ parentOf(z,y)))
¢ Sound and complete forward chaining and backward chaining mechanisms [Salvat Mugnier 1996]
¢ PosiFve and NegaFve constraints ¢ Several ways of combining rules and constraints [Baget Mugnier JAIR 2002]
Outline
n The existential rule framework for OBDA rule-based, logic-based and graph-based n Decidability, complexity and algorithmic issues n A tool for combining decidable classes of rules
n Perspectives
Let’s focus on standard existenCal rules
Given a KB K = (F, R) and a conjunctive query Q, is Q entailed by K ?
n Basic problem: Conjunctive Query Entailment
Existential Rules
Equality Rules
Constraints
Conjunctive Query
Knowledge Base
Facts
K = (F, R)
Q
Answers ?
Forward versus Backward chaining
F Q
R
F Q R
FC
BC
Fact saturation (« chase »)
Query rewriting
Human
p 2 1
22
Forward chaining may not halt
R = Human(x) à parentOf(y,x) ∧ Human(y)
∧ Human(y1) ∧ parentOf(y1, A)
∧ Human(y2) ∧ parentOf(y2, y1) Etc.
F = Human(A)
A
Human
[same non-halting trouble with backward chaining]
Human
2 1 p
Decidability Issues
n Entailment is not decidable
n Many decidable classes exhibited in databases and KR
n Three generic kinds of properties ensuring decidability:
- Saturation by Forward Chaining halts
- Query rewriting by Backward Chaining halts
- Saturation by Forward Chaining does not halt but the generated facts have a tree-like structure
(ParCal) inclusion map of decidable classes
Finite saturation
Tree-shaped saturation
Finite query rewriting
atomic body
frontier-1
weakly- guarded
weakly frontier-guarded
datalog
guarded
weakly- acyclic
acyclic GRD
wa-GRD jointly- acyclic
frontier- guarded
jointly-fg
glut-fg sticky-join
w-sticky-join
sticky
w-sticky
Inclusion dependency
domain-r.
(ParCal) inclusion map of decidable classes
domain-r.
atomic body
frontier-1
weakly- guarded
weakly frontier-guarded
datalog
guarded
weakly- acyclic
acyclic GRD
wa-GRD jointly- acyclic
frontier- guarded
jointly-fg
glut-fg sticky-join
w-sticky-join
sticky
w-sticky
Inclusion dependency
2003 2004
2011 2004,2008
2010
2010
2010
2010
2009
2011
2011
2010
2008 2010
2008
2009,2010
2009
1984 1970s
(ParCal) inclusion map of decidable classes
Finite saturation
Tree-shaped saturation
Finite query rewriting
atomic body
frontier-1
weakly- guarded
weakly frontier-guarded
datalog
guarded
weakly- acyclic
acyclic GRD
wa-GRD jointly- acyclic
frontier- guarded
jointly-fg
glut-fg sticky-join
w-sticky-join
sticky
w-sticky
Inclusion dependency
domain-r.
Datalog
No existential variables
(ParCal) inclusion map of decidable classes
Finite saturation
Tree-shaped saturation
Finite query rewriting
atomic body
frontier-1
weakly- guarded
weakly frontier-guarded
datalog
guarded
weakly- acyclic
acyclic GRD
wa-GRD jointly- acyclic
frontier- guarded
jointly-fg
glut-fg sticky-join
w-sticky-join
sticky
w-sticky
Inclusion dependency
domain-r.
Atomic- body
Body restricted to a single atom
E.g. Human(x) à parentOf(y,x) ∧ Human(y)
Guard only the frontier
Guard only affected variables (possibly mapped on new existentials)
Guard only affected variables from the frontier
Frontier: variables shared by the body and the head
Main classes with (infinite) tree-‐shaped saturaCon
guarded
[Cali+ KR’08]
weakly guarded
[Cali+ KR’08] The frontier has size 1
[Baget+ KR’10]
frontier guarded
[Baget+ KR’10]
weakly frontier guarded
[Baget+ IJCAI’09]
frontier 1
r(x,y) ∧ r(y,z) ∧ s(x,y,z) à r(y,u) ∧ r(z,u)
r(x,y) ∧ r(y,z) ∧ r(x,z) à r(z,u)
r(x,y) ∧ r(y,z) à r(y,u) ∧ r(z,u)
datalog
An atom in the body guards all variables from the body
Atomic-body
Complexity
n Combined complexity Input: F, R, Q Data complexity Input: F (R and Q are fixed)
n Desirable property in the context of large data:
polynomial data complexity
Decidable classes with polynomial data complexity
atomic body
frontier-1
weakly- guarded
weakly frontier-guarded
datalog
guarded
weakly- acyclic
acyclic GRD
wa-GRD jointly- acyclic
frontier- guarded
jointly-fg
glut-fg sticky-join
w-sticky-join
sticky
w-sticky domain-r.
• Interest of query rewriting mechanisms: does not make the data grow • However, the number of generated queries may be prohibitive in practice
Towards efficiency in pracCce
à Use indexing techniques to avoid the above kind of blow-‐up à Algorithms combining both forward and backward chaining à Rewrite into more compact kinds of queries
A
B1
B2
Bn
Q = A(x1) ∧ … ∧ A(xk)
Number of (conjuncFve) rewriFngs: nk
B1(x) à A(x) B2 (x) à B1(x) … Bn(x) à Bn-1(x)
Outline
n The existential rule framework for OBDA rule-based, logic-based and graph-based n Decidability, complexity and algorithmic issues n A tool for combining decidable classes of rules
n Perspectives
Union of decidable sets of rules
n Next question: is the union of two decidable sets of rules still decidable ?
practically: n can we safely merge several ontologies known to be decidable ? n can we build a decidable hybrid language from two languages whose
semantics can be expressed by decidable subsets of rules ?
n Bad news: Almost all classes are pairwise incompatible
n Next question:
which conditions on the interactions between rules ensure compatibility ?
R2 depends on R1 if applying R1 may trigger a new application of R2
F Body2
h Body
1 Head
1
i.e., there exists a fact F s.t. R1 is applicable to F but R2 is not and there is an application of R1 to F leading to F’ s.t. R2 is applicable to F’
Effective computation of dependencies with a unification test
A tool : the Graph of Rule Dependency
R1
R2
Piece-‐based unificaCon
n Existential variables make rule heads complex à unification is more complex too
R1= p(x) à r(x,y) ∧ r(y,z) ∧ r(z,x) R2 = r(u,v) ∧ r(v,u) à q(u)
1 2 r
2
1 p
r 1
2
r x
y
z
R1 1 2 r
u v R2
r
R2 does not depend on R1
q
R2 depends on R1 iff there is a « piece-unifier » of body(R2) with head(R1)
Atomic unification is not sufficient
2 1
R1 R2 : R1 « may trigger » R2 (R2 depends on R1) Rules
Combining decidable classes with the Graph of Rule Dependencies
Combining decidable classes with the Graph of Rule Dependencies
If GRD(R) is without circuit then R is both fes (thus bts) and fus
fes = finite fact saturation fus = finite query rewriting
bts = (possibly infinite) tree-shaped saturation
Datalog (fes)
wa (fes)
fes fg(bts)
ab (fus) fus
dr (fus)
Combining decidable classes with the Graph of Rule Dependencies
If all strongly connected components of GRD(R) are fes then R is fes The same holds for fus (but not for bts)
fes
Combining decidable classes with the Graph of Rule Dependencies
bts
Decidable
Datalog (fes)
wa (fes)
fg(bts)
ab (fus)
dr (fus)
fus
Let R1〉R2 be a partition of R s.t. no rule of R1 depends on a rule of R2 n If R1 is fes and R2 is bts, then R is bts n If R1 is bts and R2 is fus, then R1〉R2 is decidable
Datalog
wa
fes fg
ab fus
dr
Combining decidable classes with the Graph of Rule Dependencies
Recommended algorithm: Use FC-like algorithm on the bts subset à « saturated » fact F* Use query rewriting with the fus subset à rewritten set Q
Check if a query in Q maps to F*
bts
Conclusion -‐ PerspecCves
n An emerging rule-based framework suitable to OBDA
§ simple, § expressive § flexible
n Currently: § A quite clear picture of decidable classes of rules with complexity
analysis § Effervescence around new algorithmic techniques § First implementations for very specific subclasses
n Main challenge: scalability