efficient query answering against dynamic rdf databases

81
Efficient Query Answering against Dynamic RDF Databases François Goasdoué, Ioana Manolescu, Alexandra Roati¸ s Université Paris-Sud & Inria Saclay (OAK project) 20 March 2013

Upload: alexandra-roati

Post on 22-May-2015

173 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Efficient Query Answering against Dynamic RDF Databases

Efficient Query Answering againstDynamic RDF Databases

François Goasdoué, Ioana Manolescu,Alexandra Roatis

Université Paris-Sud & Inria Saclay (OAK project)

20 March 2013

Page 2: Efficient Query Answering against Dynamic RDF Databases

Overview

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 2 / 35

The Resource Description Framework

Basic Graph Pattern Queries

Contributions

Experiments

Related Work

Conclusion

Page 3: Efficient Query Answering against Dynamic RDF Databases

The Resource Description Framework

Page 4: Efficient Query Answering against Dynamic RDF Databases

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

Page 5: Efficient Query Answering against Dynamic RDF Databases

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

RDF Graph:

⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L

U – URIs,L – literals (constants),B – blank nodes

thesubjectshas thepropertypwith the value: theobjecto

Page 6: Efficient Query Answering against Dynamic RDF Databases

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

RDF Graph:

⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L

U – URIs,L – literals (constants),B – blank nodes

thesubjectshas thepropertypwith the value: theobjecto

⊲ built-in property: rdf:type

specify to whichclassesa resource belongs

Page 7: Efficient Query Answering against Dynamic RDF Databases

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

RDF Graph:

⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L

U – URIs,L – literals (constants),B – blank nodes

thesubjectshas thepropertypwith the value: theobjecto

⊲ built-in property: rdf:type

specify to whichclassesa resource belongs

Constructor Triple Relational notation

Class assertion s rdf:type o o(s)Property assertion s p o p(s, o)

Page 8: Efficient Query Answering against Dynamic RDF Databases

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Page 9: Efficient Query Answering against Dynamic RDF Databases

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Example:

the country of_:b1 is Italy

Page 10: Efficient Query Answering against Dynamic RDF Databases

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Example:

the country of_:b1 is Italythe city of the same_:b1 is Genoa

Page 11: Efficient Query Answering against Dynamic RDF Databases

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Example:

the country of_:b1 is Italythe city of the same_:b1 is Genoa

the population ofGenoais an unspecified value_:b2

Page 12: Efficient Query Answering against Dynamic RDF Databases

Running example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 6 / 35

book1

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

Book

English

_:b0

_:b1

Language

writtenIn

hasLanguage

Publication

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range rdfs:subPropertyOf

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

Page 13: Efficient Query Answering against Dynamic RDF Databases

RDF Schema (RDFS)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35

⊲ feature of RDF⊲ enhance the descriptions in graphs⊲ declaresemantic constraintsbetween classes and properties

Page 14: Efficient Query Answering against Dynamic RDF Databases

RDF Schema (RDFS)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35

⊲ feature of RDF⊲ enhance the descriptions in graphs⊲ declaresemantic constraintsbetween classes and properties

Built-in properties:

⊲ subclass relationships:rdfs:subClassOf⊲ subproperty relationships:rdfs:subPropertyOf⊲ typing the first attribute (domain) of a property:rdfs:domain⊲ typing the second attribute (range) of a property:rdfs:range

Constructor Triple Relational notation

Subclass constraint s rdfs:subClassOf o s ⊆ o

Subproperty constraint s rdfs:subPropertyOf o s ⊆ o

Domain typing constraint s rdfs:domain o Πdomain(s) ⊆ o

Range typing constraint s rdfs:range o Πrange(s) ⊆ o

Page 15: Efficient Query Answering against Dynamic RDF Databases

Running example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 8 / 35

book1

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

Book

English

_:b0

_:b1

Language

writtenIn

hasLanguage

Publication

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range rdfs:subPropertyOf

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

Page 16: Efficient Query Answering against Dynamic RDF Databases

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Page 17: Efficient Query Answering against Dynamic RDF Databases

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Entailment – reasoning mechanismset ofexplicit triples& someentailment rules

deriveimplicit information

Page 18: Efficient Query Answering against Dynamic RDF Databases

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Entailment – reasoning mechanismset ofexplicit triples& someentailment rules

deriveimplicit information

Exhaustive application of entailment rules→ saturation(a.k.a.closure)

The saturation of a graph isunique (up to blank node renaming).

Page 19: Efficient Query Answering against Dynamic RDF Databases

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Entailment – reasoning mechanismset ofexplicit triples& someentailment rules

deriveimplicit information

Exhaustive application of entailment rules→ saturation(a.k.a.closure)

The saturation of a graph isunique (up to blank node renaming).

Entailment is part of the RDF specification itself.

The semantics of an RDF graph is its saturation.

Page 20: Efficient Query Answering against Dynamic RDF Databases

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

Page 21: Efficient Query Answering against Dynamic RDF Databases

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

Page 22: Efficient Query Answering against Dynamic RDF Databases

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

2)

book1writtenIn

hasLanguage English

rdfs:subPropertyOf writtenIn hasLanguage

Page 23: Efficient Query Answering against Dynamic RDF Databases

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

2)

book1writtenIn

hasLanguage English

rdfs:subPropertyOf writtenIn hasLanguage

3)

book1writtenIn

Book English

rdfs:domain writtenInrdf:type

Page 24: Efficient Query Answering against Dynamic RDF Databases

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

2)

book1writtenIn

hasLanguage English

rdfs:subPropertyOf writtenIn hasLanguage

3)

book1writtenIn

Book English

rdfs:domain writtenInrdf:type

4)

book1writtenIn

Language English

rdfs:range writtenInrdf:type

Page 25: Efficient Query Answering against Dynamic RDF Databases

Basic Graph Pattern Queries

Page 26: Efficient Query Answering against Dynamic RDF Databases

Basic Graph Pattern (BGP) Queries

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35

⊲ subset of SPARQL

⊲ BGP –conjunction of triple patterns(or triples)

q(x):- t1, . . . , tα

ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L

x ∈ V (distinguished variables)

Page 27: Efficient Query Answering against Dynamic RDF Databases

Basic Graph Pattern (BGP) Queries

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35

⊲ subset of SPARQL

⊲ BGP –conjunction of triple patterns(or triples)

q(x):- t1, . . . , tα

ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L

x ∈ V (distinguished variables)

query evaluationtreats blank nodesin a query asnon-distinguished variables

Page 28: Efficient Query Answering against Dynamic RDF Databases

Basic Graph Pattern (BGP) Queries

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35

⊲ subset of SPARQL

⊲ BGP –conjunction of triple patterns(or triples)

q(x):- t1, . . . , tα

ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L

x ∈ V (distinguished variables)

query evaluationtreats blank nodesin a query asnon-distinguished variables

Example:

q(x, y):- x hasAuthor z, x rdf:type y≡

q(x, y):- x hasAuthor _:b0, x rdf:type y

Page 29: Efficient Query Answering against Dynamic RDF Databases

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

Page 30: Efficient Query Answering against Dynamic RDF Databases

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set

the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation

Page 31: Efficient Query Answering against Dynamic RDF Databases

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set

the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation

Solution:

decoupleRDF entailment from query evaluation

Page 32: Efficient Query Answering against Dynamic RDF Databases

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set

the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation

Solution:

decoupleRDF entailment from query evaluation

Perform apre-processingstep to deal with entailed triples:

⊲ on the database –data saturation⊲ on the queries –query reformulation

Page 33: Efficient Query Answering against Dynamic RDF Databases

Data saturation vs. Query reformulation

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 14 / 35

Data saturation

Advantages:

⊲ straightforward⊲ easy to implement

Drawbacks:

⊲ computation time⊲ additional storage space⊲ must be recomputed upon

database updates

Example:

the YAGO2 dataset doubles insize when computing theRDFS-closure→ 33M to 64M triples

Query reformulation

Advantages:

⊲ database saturation does not needto be (re)computed

Drawbacks:

⊲ every incoming query must bereformulated

⊲ reformulations can beprohibitively large

⊲ difficult to optimize

Example:

a single atom query over YAGO2,can yield of union of > 300 000queries

Page 34: Efficient Query Answering against Dynamic RDF Databases

Contributions

Page 35: Efficient Query Answering against Dynamic RDF Databases

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

Page 36: Efficient Query Answering against Dynamic RDF Databases

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

Page 37: Efficient Query Answering against Dynamic RDF Databases

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

Page 38: Efficient Query Answering against Dynamic RDF Databases

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

Page 39: Efficient Query Answering against Dynamic RDF Databases

The database (DB) fragment of RDF

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35

⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way

Page 40: Efficient Query Answering against Dynamic RDF Databases

The database (DB) fragment of RDF

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35

⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way

An RDF database: db = 〈D, S〉

D & S – disjoint sets of triplesD (RDF) – instance level→ assertionsS (RDFS)– schema level→ semantics

Page 41: Efficient Query Answering against Dynamic RDF Databases

The database (DB) fragment of RDF

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35

⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way

An RDF database: db = 〈D, S〉

D & S – disjoint sets of triplesD (RDF) – instance level→ assertionsS (RDFS)– schema level→ semantics

db =

book1

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

Book

English

_:b0 Language

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type ,

Book

_:b1

Language

writtenIn

hasLanguage

Publication

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range rdfs:subPropertyOf⟩

Page 42: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)

Page 43: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Page 44: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y

Page 45: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication

Page 46: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication∪

q(x,Publication):- x rdf:type Book

Page 47: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication∪

q(x,Publication):- x rdf:type Book∪

q(x,Publication):- x writtenIn z

Page 48: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication∪

q(x,Publication):- x rdf:type Book∪

q(x,Publication):- x writtenIn z

∪ . . .∪q(x, _:b1):- x rdf:type _:b1

∪ . . .

Page 49: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

Page 50: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

≡q(x, _:b1):- x rdf:type z

Page 51: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

≡q(x, _:b1):- x rdf:type z

Answer set:{〈book1, _:b1〉, 〈English, _:b1〉}wrong answer

Page 52: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof thenon-standard evaluationsof these queries ondb produces the correctanswer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

6≡q(x, _:b1):- x rdf:type z

Answer set:{〈book1, _:b1〉}correct answer

Page 53: Efficient Query Answering against Dynamic RDF Databases

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof thenon-standard evaluationsof these queries ondb produces the correctanswer⊲ size of the output:O((6 ∗#db2)#q)

Page 54: Efficient Query Answering against Dynamic RDF Databases

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)

Page 55: Efficient Query Answering against Dynamic RDF Databases

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples

Page 56: Efficient Query Answering against Dynamic RDF Databases

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples

Saturate(db) = db ∪

book1

Language

Publication

_:b1

English

rdf:type

rdf:type

hasLanguage

rdf:type

Page 57: Efficient Query Answering against Dynamic RDF Databases

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)

Page 58: Efficient Query Answering against Dynamic RDF Databases

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)⊲ computation time:O(#db3)

Page 59: Efficient Query Answering against Dynamic RDF Databases

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)⊲ computation time:O(#db3)

What about updates?

Page 60: Efficient Query Answering against Dynamic RDF Databases

Saturation maintenance algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35

Saturate+(db)

⊲ multisetvariant ofSaturate(db)⊲ allowssaturation maintenance upon updates

Page 61: Efficient Query Answering against Dynamic RDF Databases

Saturation maintenance algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35

Saturate+(db)

⊲ multisetvariant ofSaturate(db)⊲ allowssaturation maintenance upon updates

Saturate+(db) = db ∪

Book

book1

Language

Publication

_:b1

English

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

Page 62: Efficient Query Answering against Dynamic RDF Databases

Example of instance insertion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To insert the triple:

book1 FrenchwrittenIn

Page 63: Efficient Query Answering against Dynamic RDF Databases

Example of instance insertion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To insert the triple:

book1 FrenchwrittenIn

First saturate the triple usingdb:

book1

Language

Book

Publication

_:b1

French

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

Page 64: Efficient Query Answering against Dynamic RDF Databases

Example of instance insertion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

French

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

hasLanguage

rdf:type

writtenIn

To insert the triple:

book1 FrenchwrittenIn

First saturate the triple usingdb:

book1

Language

Book

Publication

_:b1

French

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

Theninsert the explicit triple

andthe inferred ones indb.

Page 65: Efficient Query Answering against Dynamic RDF Databases

Example of schema deletion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To delete the triple:

BookwrittenInrdfs:domain

Page 66: Efficient Query Answering against Dynamic RDF Databases

Example of schema deletion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To delete the triple:

BookwrittenInrdfs:domain

First infer affected data triplesusingdb:

book1

Book

Publication

_:b1

rdf:type

rdf:type

rdf:type

Page 67: Efficient Query Answering against Dynamic RDF Databases

Example of schema deletion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To delete the triple:

BookwrittenInrdfs:domain

First infer affected data triplesusingdb:

book1

Book

Publication

_:b1

rdf:type

rdf:type

rdf:type

Thendelete the explicit triple

andthe inferred ones fromdb.

Page 68: Efficient Query Answering against Dynamic RDF Databases

Experiments

Page 69: Efficient Query Answering against Dynamic RDF Databases

Experimental setup

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 24 / 35

• implementation in Java 1.6• deployed on top of a PostgreSQL v8.5 server• 6 indexes – all permutations of the (s, p, o) columns• the spo index is clustering• dictionary encoding

Graph characteristics and saturation times:

Graph Storage Barton DBpedia DBLP

#Schema in memory 101 5, 666 41

#Instance Triple(s, p, o) 34× 106 27× 106 8.4× 106

#Saturation Sat(s, p, o) 39× 106 30× 106 12× 106

Saturation increase (%) 14.91 10.65 41.05

#Multiset SatM(s, p, o, isExp, count) 73.5× 106 66× 106 18.7× 106

Multiset increase (%) 116.89 227.37 121.97

tsat (s) 4, 294 2, 742 748

tsat+ (s) 4, 586 2, 977 799

Page 70: Efficient Query Answering against Dynamic RDF Databases

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 25 / 35

• 26 hand-picked queries (between 1 and 10 triple patterns – 6 on average)• similar query answering times onSat andSatM

� �����

���

��

���

�����

����AB�CDEFB�

������EB����E���B�FDEB�������DE

������EB����E���B�FDE���D�������DE

������EB����E���B�FDEB�������DE���������DB�������

Page 71: Efficient Query Answering against Dynamic RDF Databases

Graph updates

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 26 / 35

• no impact on reformulation• saturation needs to maintainSatM• insertions & deletions• updates of one triple on the data and the schema

� �

����

���

��

���

�����

����AB�CDEFB�

�EB��EC��EB����DEA�������

�EB��EC�F�����DEA�������

�EB��EC��EB����DEA�����

�EB��EC�F�����DEA�����

BC�����EB����DEA�����

BC����F�����DEA�����

Page 72: Efficient Query Answering against Dynamic RDF Databases

Saturation thresholds

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 27 / 35

Thesaturation thresholdof a queryq (st(q)):the smallest integern s.t.

n× tref (q) > n× t

sat(q) + tsat+

tref (q) – time to answerq throughreformulation(usingTriple)tsat(q) – time to answerq based onsaturation(usingSatM)tsat+ – time to saturatedb (createSatM)

� �

��

���

�����

������

�������

���������

����������

������AB

�CDE�CDF��D�����AB D�����AB����C��F��DC����F����DF�

D�����AB����C��F��DC����B�A�DF� D�����AB����C������C�F����DF�

D�����AB����C������C�B�A�DF�

Page 73: Efficient Query Answering against Dynamic RDF Databases

Related Work

Page 74: Efficient Query Answering against Dynamic RDF Databases

Outline of the positioning of our work

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 29 / 35

Query languageexpressive power

SPARQL

BGP queriesrelational

conjunctivequeries RDF fragment

expressive powerDL DB

[1, 3, 5]

[4, 6, 7]

[2]

thiswork

[1] ADJIMAN, P., GOASDOUÉ, F., AND ROUSSET, M.-C. SomeRDFS in the semantic web.JODS 8(2007).

[2] ARENAS, M., GUTIERREZ, C., AND PÉREZ, J. Foundations of RDF databases. InReasoning Web(2009).

[3] CALVANESE, D., GIACOMO, G. D., LEMBO, D., LENZERINI, M., AND ROSATI, R. Tractable reasoning and efficient query answering indescription logics: The DL-Lite family.Journal of Automated Reasoning (JAR) 39, 3 (2007).

[4] GOASDOUÉ, F., KARANASOS, K., LEBLAY, J., AND MANOLESCU, I. View selection in semantic web databases.PVLDB(2011).

[5] GOTTLOB, G., ORSI, G., AND PIERIS, A. Ontological queries: Rewriting and optimization. InICDE (2011). Keynote.

[6] KAOUDI, Z., MILIARAKI, I., AND KOUBARAKIS, M. RDFS reasoning and query answering on DHTs. InISWC(2008).

[7] URBANI, J., VAN HARMELEN, F., SCHLOBACH, S., AND BAL, H. QueryPIE: Backward reasoning for OWL Horst over very largeknowledge bases. InISWC(2011).

Page 75: Efficient Query Answering against Dynamic RDF Databases

Conclusion

Page 76: Efficient Query Answering against Dynamic RDF Databases

Conclusion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35

Summary:

⊲ RDF fragment (extending those studied in the literature)⊲ novelsaturation-andreformulation-based query answering techniques

robust to instance and schema updates⊲ algorithms directly deployable on top of any RDBMS⊲ thorough performance comparison and analysis

Page 77: Efficient Query Answering against Dynamic RDF Databases

Conclusion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35

Summary:

⊲ RDF fragment (extending those studied in the literature)⊲ novelsaturation-andreformulation-based query answering techniques

robust to instance and schema updates⊲ algorithms directly deployable on top of any RDBMS⊲ thorough performance comparison and analysis

Future work:

An automated strategy to choose between the two techniques:

Saturate+(db) / Reformulate(q, db)

Page 78: Efficient Query Answering against Dynamic RDF Databases

Thank you!

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 32 / 35

I you attention

Question

_:b1

_:b2

_:b3

thankpay

ask

ask

ask

rdf:type

rdf:type

rdf:type

Page 79: Efficient Query Answering against Dynamic RDF Databases

Open-world interpretation of RDFS constraints

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 33 / 35

Constraint interpretation:

⊲ closed-world assumption (CWA)any fact not present in the database is assumednot to holddatabase facts do not respect a constraint→ inconsistency

R1 ⊆ R2 – any tuple in the relationR1 must also be in the relationR2

⊲ open-world assumption (OWA)facts may hold even though they arenot in the database

R1 ⊆ R2 – any tuple in the relationR1 is also in the relationR2

TheRDF data model is based onOWA.

Page 80: Efficient Query Answering against Dynamic RDF Databases

RDF meets Relational Database Management Systems (RDBMS)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 34 / 35

RDF graphs:incomplete relational databases based onV-tables

V-tables:allow using variables in their tuples

using a variable multiple times allows expressing joins on unknown values

BGP query answeringboils down toconjunctive query evaluationon asaturated database.

Page 81: Efficient Query Answering against Dynamic RDF Databases

Saturation (related work)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 35 / 35

• J. Broekstra and A. Kampman“Inferencing and truth maintenance in RDF Schema: Exploring a naivepractical approach”in PSSS Workshop, 2003.

• B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov“OWLIM: A family of scalable semantic repositories”Semantic Web, vol. 2, no. 1, 2011.

• C. Gutierrez, C. A. Hurtado, and A. A. Vaisman“RDFS update: From theory to practice”in ESWC, 2011.