foundations of rdf databases

180
Foundations of RDF Databases Claudio Gutierrez Department of Computer Science Universidad de Chile European Semantic Web Conference - ESWC 2008

Upload: independent

Post on 07-Apr-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

Foundations of RDF Databases

Claudio Gutierrez

Department of Computer ScienceUniversidad de Chile

European Semantic Web Conference - ESWC 2008

• Renzo Angles• Marcelo Arenas• Carlos Hurtado• Sergio Muñoz• Jorge Pérez

Joint Work With

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

To the memory ofAlberto Mendelzon,

database theoreticianand Web enthusiast

Inspired by…

1. RDF and Databases2. RDF and Database models3. RDF Query Language– Requirements and Domains– Manifold Views

4. SPARQL– Syntax and Semantics– Complexity– Expressive Power

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Agenda

1. RDF and Databases2. RDF and Database models3. RDF Query Language– Requirements and Domains– Manifold Views

4. SPARQL– Syntax and Semantics– Complexity– Expressive Power

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Agenda

More like a “Computer Science” than a“Web Science” talk

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Disclaimers

A particular view on the subject

Not a survey!

Apologies to Web Scientists

“ The Semantic Web is therepresentation of data on the WorldWide Web. It is a collaborative effortled by W3C with participation from alarge number of researchers andindustrial partners. It is based on theResource Description Framework(RDF)”

The base of the Semantic Web is RDF

http://www.w3.org/2001/sw/

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF Recommendation (1999)

Language for representing

information about

resources in the Web

Particularly metadata

about Web resources

Automation of processing:“RDF is intended for situations in

which this information needs to be

processed by applications, rather

than only displayed to people”

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Layers of the Semantic Web

C. Gutierrez – Foundations of RDF Databases

A Data Processing perspective

r

12 4

36

5q

sw

tu p

ba

h

cf

Unicode URI

Proof

Trust

Digi

tal S

igna

ture

XML + NS + xmlschema( Text + Links )

(entities + relations )

(Concepts + knowledge)

RDF + rdfschema

Logic + Ontology vocabulary

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

• Manage huge volumes of data with logical precision• Separate modeling from implementation levels

The Database Approach

RDF

+ DB

RDF

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

• Manage huge volumes of data with logical precision• Separate modeling from implementation levels

The Database Approach

RDF

+ DB

RDF

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

As opposed to AI: DB primary concern isscalability. Then expressive power

• Manage huge volumes of data with logical precision• Separate modeling from implementation levels

The Database Approach

RDF

+ DB

RDF

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

As opposed to AI: DB primary concern isscalability. Then expressive power

As opposed to IR: DB primary concern isprecision. Then scalability (recall).

Native Data Store

APIs

Data Structure: RDF Graphs

RDBMS

MySQLMSQL

Oracle DB2

Postgres

Files

Applications Services

Query language

SPARQL SeRQL RDQL RQL …..

RDF Database Technology

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

APIs

Data Structure: RDF Graphs

RDBMS

MySQLMSQL

Oracle DB2

Postgres

Files

Applications Services

Query language

SPARQL SeRQL RDQL RQL …..

Native Data Store

This Talk: Database Modeling Level

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

This Talk: Database Modeling Level

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Hence leaving out:• Visualization, APIs, Services, etc.• Indexing, storing, transactions, etc.

This Talk: Database Modeling Level

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Hence leaving out:• Visualization, APIs, Services, etc.• Indexing, storing, transactions, etc.

But also leaving out: Updating / Constraints / Temporality /Optimization / Aggregation / Flexibility /etc. / etc.

1. RDF and Databases2. RDF and Database models3. RDF Query Language

– Requirements and Domains– Manifold Views

4. SPARQL– Syntax and Semantics– Complexity– Expressive Power

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Agenda

Database Models: Coddʼs definition

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Data structures

Query Language

Integrity constraints

Database Models: Coddʼs definition

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Data structures

Query Language

Evolution of Database Models

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF

Evolution of Database Models

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF

Blank NodesRDFSVocabulary

Graph (Triple) structure

RDF Data Structure: three main blocks

subClasstype

domain

range

classproperty

subProperty

?X ?Y

?Z∃

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Blank Nodes

?X ?Y

?Z∃

Vocabulary

subClass

type

domain

range

classproperty

subProperty

Graph (Triple) structure

RDF Data Structure: the core

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Blank Nodes

?X ?Y

?Z∃

Vocabulary

subClass

type

domain

range

classproperty

subProperty

Graph (Triple) structure

RDF Data Structure: the core

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Triple structure:set of statements

Blank Nodes

?X ?Y

?Z∃

Vocabulary

subClass

type

domain

range

classproperty

subProperty

Graph (Triple) structure

RDF Data Structure: the core

Graph structure:linked network ofstatements.

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Triple structure:set of statements

RDF Data Structure: Relational Tables (Triple) view

Subject Predicate Object• Triples as tuples• Set of triples as Tables

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF Data Structure: Relational Tables (Triple) view

Subject Predicate Object• Triples as tuples• Tables of triples

Advantages:+ Well studied and

well understood+ Reuse relational

technologies

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF Data Structure: Relational Tables (Triple) view

Subject Predicate Object• Triples as tuples• Tables of triples

Advantages:+ Well studied and

well understood+ Reuse relational

technologies

Problems (Questions):

- Why yet another syntax for

the relational model?

- Was this the intended

objective of RDF?

- Expressive power limitations

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Graph Database Models:• Data and/or schema are represented by graphs• Query language able to capture main graph operations

and properties• Studied by DB community, but still not well understood

RDF Data Structure: Graph Database Model view

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Graph Database Models:• Data and/or schema are represented by graphs• Query language able to capture main graph operations

and properties• Studied by DB community, but still not well understood

RDF Data Structure: Graph Database Model view

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Golde

nage

PathDegree ofa Node

AdjacentEdges

Neighborhoods

PROPERTYDiameterDistanceFixed-

length path

Language

QueryGraph

F-G

Lorel

GraphDB

Gram

GraphLog

G+

G

RDF Data Structure: Graph query languages

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

PathDegree ofa Node

AdjacentEdges

Neighborhoods

PROPERTYDiameterDistanceFixed-

length path

Language

QueryGraph

F-G

Lorel

GraphDB

Gram

GraphLog

G+

G

RDF Data Structure: Graph query languages

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Green light for graph features!

Vocabulary

subClass

type

domain

range

classproperty

subProperty

Blank Nodes

Graph (Triple) structure

?X ?Y?Z

RDF Data Structure: Triple structure + Blank nodes

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Vocabulary

subClass

type

domain

range

classproperty

subProperty

Blank Nodes

Graph (Triple) structure

?X ?Y?Z

Complexity / Semantics issues:• Deciding entailment becomes

NP-complete.• Deciding core is DP-complete• Semantics of querying not

simple

RDF Data Structure: Triple structure + Blank nodes

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF Data Structure: Ground fragment

Blank NodesVocabulary

Graph (Triple) structure

subClass

type

domain

range

classproperty

subProperty

?X ?Y

?Z∃

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF Data Structure: Ground fragment

Blank NodesVocabulary

Graph (Triple) structure

subClass

type

domain

range

classproperty

subProperty

?X ?Y

?Z∃

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Good News: Blank nodescan be treated orthogonallyto ground fragment.

RDF Data Structure: Ground fragment

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

More good news:• Vocabulary can be reduced to { type, domain, range, subClassOf, subPropertyOf }

RDF Data Structure: Ground fragment

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

More good news:• Vocabulary can be reduced to { type, domain, range, subClassOf, subPropertyOf }• Complex semantic rules and axioms can be avoided

RDF Data Structure: Ground fragment

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

More good news:• Vocabulary can be reduced to { type, domain, range, subClassOf, subPropertyOf }• Complex semantic rules and axioms can be avoided• Structural (internal) constraints of the language can be

separated from user-features. e.g. (Class, type, Resource)

RDF Data Structure: Ground fragment

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

More good news:• Vocabulary can be reduced to { type, domain, range, subClassOf, subPropertyOf }• Complex semantic rules and axioms can be avoided• Structural (internal) constraints of the language can be

separated from user-features. e.g. (Class, type, Resource)• Features which do not add expressive power can be

avoided, e.g. reflexivity of subClass and subProperty.

RDF Data Structure: A minimal fragment

Blank Nodes

?X ?Y

?Z∃

subClass

type

domain

range

subProperty

Vocabulary

Graph (Triple) structure

{subClass, subProperty, type, domain, range}

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF Data Structure: A minimal fragment

Blank Nodes

?X ?Y

?Z∃

subClass

type

domain

range

subProperty

Vocabulary

Graph (Triple) structure

{subClass, subProperty, type, domain, range}

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Theorem: Simple proof system sound and

complete for the semantics of RDF in this

fragment. That is:

G |= F under RDF semantics iff

G |= F under mRDF semantics

RDF Data Structure: A minimal fragment

Blank Nodes

?X ?Y

?Z∃

subClass

type

domain

range

subProperty

Vocabulary

Graph (Triple) structure

{subClass, subProperty, type, domain, range}

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Theorem: Simple proof system sound and

complete for the semantics of RDF in this

fragment. That is:

G |= F under RDF semantics iff

G |= F under mRDF semantics

Theorem: Let G be a restricted graph in thefragment, and t a ground tuple. Deciding ifG |= t can be done in time O(G x log(G))

1. RDF and Databases2. RDF and Database models3. RDF Query Language

– Requirements and Domains– Manifold Views

4. SPARQL– Syntax and Semantics– Complexity– Expressive Power

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Agenda

RDF Query language: Social Networks domain

+ Find triads by typeRanking

+ Selective counting of neighbors+ Operations between attributes+ Change relation direction based onattributes

Diffusion

+ Extract the subnetwork induced by cliquesof size n+ Build a hierarchy of cliques

Cohesive Subgroups

+ Extract subnetwork by timeFrienship+ Two-mode network to one-mode networkAffiliations

+ Group multiple binary relationsCenter and Periphery+ Extract egonetwork of an actor+ Remove relations between groups

Brokers and Bridges

+ Discretize an attributePrestige

+ Loop removalGenealogies andCitations

+ Extract a subnetwork based on attributes+ Group actors based on attributes+ Selective grouping of actors based onattributes

Attributes andRelations

+ Directed to undirected binary relations+ Remove relations

Looking for SocialStructure

Use Case (local)Chapter title

+ Connected components+ Clustering+ Bicomponents andbrockers

Connected components

+ Detect cohesivesubgroups+ Egonetworks+ Input Domain

Groups(k-neighbors, k-core,n-cliques, k-plex, etc.)

+ GeodesicsPaths and CyclesUse Case (Global)Subgraph family

C. Gutierrez – Foundations of RDF Databases

RDF Query Language: Biology domain

Frequent subgraph recognitionTo find biopathways graph motifs

Subsumption testingEnzyme taxonomies

Frequent subgraph recognitionTo find biopathways graph motifs

Subgraph isomorphismChemical info retrieval

Subgraph homomorphismKinaze enzyme

Transitive ClosureFind all products ultimately derived from aparticular reaction

Least common ancestorObserve multiple products are co-regulated

Shortest path queriesIdentify “important” paths from nutrients tochemical outputs

Graph compositionTo construct pathways from individualreactions

Node matchingChemical structure associated with a node

Graph intersection, union, differenceFind the difference in metabolismsbetween two microbes

Majority graph queryTo combine multiple protein interactiongraphs

Graph compositionTo connect pathways, metabolism of co-existing organisms

Graph QueryUse Case

C. Gutierrez – Foundations of RDF Databases

RDF Query Language: Web domain

C. Gutierrez – Foundations of RDF Databases

DistanceWhat is the Erdös distance between authos X and author Y?

PathsAre suspects A and B related?

AdjacencyAll relatives of degree one of Alice

PathsWhat is the influence of article D?

Degree of a nodeWhat is/are the most cited paper/s?Graph QueryUse Case

Minimalist design:– Tags + Bundles (classes)– No inheritance, no intersection, etc.– Renaming

TagsA tag is simply a word you use to describe a bookmark. Unlike folders, youmake up tags when you need them and you can use as many as you like.

RDF Query Language: Tagging domain

C. Gutierrez – Foundations of RDF Databases

• SQL: Great for finding data from tabular representations, can get complexwhen many tables are involved in a given query

RDF Query Language: Standardizationʼs view

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

SQL, XQuery and SPARQL: What's Wrong with this Picture?Jim Melton (Oracle; XML Query Working Group, XML Coord. Group)

Sixth annual W3C Technical Plenary (March 2006)

• SQL: Great for finding data from tabular representations, can get complexwhen many tables are involved in a given query

• XQuery: Great for finding data in tree representations,can get complex when many relationships have to betraversed

RDF Query Language: Standardizationʼs view

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

SQL, XQuery and SPARQL: What's Wrong with this Picture?Jim Melton (Oracle; XML Query Working Group, XML Coord. Group)

Sixth annual W3C Technical Plenary (March 2006)

• SQL: Great for finding data from tabular representations,can get complex when many tables are involved in agiven query

• XQuery: Great for finding data in tree representations,can get complex when many relationships have to betraversed

• SPARQL: Good pattern matching paradigm, especiallywhen relationships have to be used to answer a query

Standardizationʼs view (Jim Melton, Oracle, 2006)

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

SQL, XQuery and SPARQL: What's Wrong with this Picture?Jim Melton (Oracle; XML Query Working Group, XML Coord. Group)

Sixth annual W3C Technical Plenary (March 2006)

• SQL: Great for finding data from tabular representations,can get complex when many tables are involved in agiven query

• XQuery: Great for finding data in tree representations,can get complex when many relationships have to betraversed

• SPARQL: Good pattern matching paradigm, especiallywhen relationships have to be used to answer a query

Standardizationʼs view (Jim Melton, Oracle, 2006)

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

SQL, XQuery and SPARQL: What's Wrong with this Picture?Jim Melton (Oracle; XML Query Working Group, XML Coord. Group)

Sixth annual W3C Technical Plenary (March 2006)

SPARQL = Sympathy Queen?

• RDF is the first level of a logical tower• Emphasis in logic features of RDF model• Keep an eye in extensions to more expressive logics• Bad news: complexity issues

RDF Query Language: Logicianʼs view

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

• How do we answer the most common queries?• How do we cope with APIs and store developments?• Design usually influenced by current programming and

system tools.• Not always concerned with scalability and long term.

RDF Query Language: Developerʼs view

RDF

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF as a graph data model?

?

RDF Query Language: Database theoreticianʼs view

RDF as a relational model?

Graphs

Relations

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

RDF Query Language: Database theoreticianʼs view

Local Global

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Theorem (Gaifman). A property of graphs is expressible by a closedfirst order formula iff it is equivalent to a combination of properties ofthe form

where v1,…,vs denote vertices and d(x,y) denotes distance

v2> 2r

rv1

v3

v4

v5

RDF Query Language: Database theoreticianʼs view

Local Global

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Theorem (Gaifman). A property of graphs is expressible by a closedfirst order formula iff it is equivalent to a combination of properties ofthe form

where v1,…,vs denote vertices and d(x,y) denotes distance

v2> 2r

rv1

v3

v4

v5

Want Local (relational) or global (graph) queries?

SPARQL (W3C Recommendation, 2008)– Relational view of querying– RDF = triples + blanks– Pattern matching

W3C Working Groupʼs view

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

SPARQL (W3C Recommendation, 2008)– Relational view of querying– RDF = triples + blanks– Pattern matching

W3C Working Groupʼs view

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Good News:there is a standard!

SELECT ASKCONSTRUCT DESCRIBEQuery Form

DatasetClause

Where Clause(Graph Pattern)

Triplepattern

FROM

FROM NAMED

Dataset

FILTEROPTIONAL

ANDUNION

X Y Z

X Y

TRUE - FALSE

SPARQL Query (General Structure)

C. Gutierrez – Foundations of RDF Databases - ESWC 2008

Outline

◮ Overview of syntax and semantics of SPARQL

◮ Formal semantics of SPARQL

◮ Complexity of the SPARQL evaluation problem

◮ Expressive Power of SPARQL

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 1 / 29

Core of the language: Example

SELECT ?Name ?EmailWHERE{

?X :name ?Name?X :email ?Email

}

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 2 / 29

Core of the language: Example

SELECT ?Name ?EmailWHERE{

?X :name ?Name?X :email ?Email

}

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 2 / 29

Core of the language: Example

SELECT ?Name ?EmailWHERE{

?X :name ?Name?X :email ?Email

}

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 2 / 29

Core of the language: Example

SELECT ?Name ?EmailWHERE{

?X :name ?Name?X :email ?Email

}

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 2 / 29

Core of the language: Example

SELECT ?Name ?EmailWHERE{

?X :name ?Name?X :email ?Email

}

In general, in a query we have:

H ←

◮ Head: processing of some variables.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 2 / 29

Core of the language: Example

SELECT ?Name ?EmailWHERE{

?X :name ?Name?X :email ?Email

}

In general, in a query we have:

H ← P

◮ Head: processing of some variables.

◮ Body: pattern matching expression.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 2 / 29

Core of the language: Example

SELECT ?Name ?EmailWHERE{

?X :name ?Name?X :email ?Email

}

In general, in a query we have:

H ← P

◮ Head: processing of some variables.

◮ Body: pattern matching expression.

We focus on P .

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 2 / 29

But things can become more complex ...

Interesting features of patternmatching on graphs

◮ Grouping

◮ Optional parts

◮ Nesting

◮ Union of patterns

◮ Filtering

◮ ...

{ P1P2 }

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 3 / 29

But things can become more complex ...

Interesting features of patternmatching on graphs

◮ Grouping

◮ Optional parts

◮ Nesting

◮ Union of patterns

◮ Filtering

◮ ...

{ { P1P2 }

{ P3P4 }

}

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 3 / 29

But things can become more complex ...

Interesting features of patternmatching on graphs

◮ Grouping

◮ Optional parts

◮ Nesting

◮ Union of patterns

◮ Filtering

◮ ...

{ { P1P2OPTIONAL { P5 } }

{ P3P4OPTIONAL { P7 } }

}

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 3 / 29

But things can become more complex ...

Interesting features of patternmatching on graphs

◮ Grouping

◮ Optional parts

◮ Nesting

◮ Union of patterns

◮ Filtering

◮ ...

{ { P1P2OPTIONAL { P5 } }

{ P3P4OPTIONAL { P7OPTIONAL { P8 } } }

}

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 3 / 29

But things can become more complex ...

Interesting features of patternmatching on graphs

◮ Grouping

◮ Optional parts

◮ Nesting

◮ Union of patterns

◮ Filtering

◮ ...

{ { P1P2OPTIONAL { P5 } }

{ P3P4OPTIONAL { P7OPTIONAL { P8 } } }

}UNION{ P9 }

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 3 / 29

But things can become more complex ...

Interesting features of patternmatching on graphs

◮ Grouping

◮ Optional parts

◮ Nesting

◮ Union of patterns

◮ Filtering

◮ ...

{ { P1P2OPTIONAL { P5 } }

{ P3P4OPTIONAL { P7OPTIONAL { P8 } } }

}UNION{ P9FILTER ( R ) }

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 3 / 29

But things can become more complex ...

Interesting features of patternmatching on graphs

◮ Grouping

◮ Optional parts

◮ Nesting

◮ Union of patterns

◮ Filtering

◮ ...

{ { P1P2OPTIONAL { P5 } }

{ P3P4OPTIONAL { P7OPTIONAL { P8 } } }

}UNION{ P9FILTER ( R ) }

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 3 / 29

A standard algebraic syntax

◮ Triple patterns: RDF triples + variables

?X :name "john" (?X , name, john)

◮ Graph patterns: full parenthesized algebra

{ P1 P2 } (P1 AND P2 )

{ P1 OPTIONAL { P2 }} (P1 OPT P2 )

{ P1 } UNION { P2 } (P1 UNION P2 )

{ P1 FILTER ( R ) } (P1 FILTER R )

original SPARQL syntax algebraic syntax

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 4 / 29

A formal semantics for SPARQL

A formal approach is beneficial for:

◮ Providing the user an ultimate guide of language behavior

◮ Clarifying and expliciting corner cases

◮ Helping and simplifying the implementation process

◮ Providing sound foundations

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 5 / 29

A formal semantics for SPARQL

Desiderata for semantics:

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 6 / 29

A formal semantics for SPARQL

Desiderata for semantics:

◮ Compositional approach: The meaning of an expression isdetermined by the meaning of its parts and the way they arecombined.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 6 / 29

A formal semantics for SPARQL

Desiderata for semantics:

◮ Compositional approach: The meaning of an expression isdetermined by the meaning of its parts and the way they arecombined.

◮ Denotational approach: Meaning of expressions is formalizedby assigning mathematical objects which describe themeaning.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 6 / 29

A formal semantics for SPARQL

Desiderata for semantics:

◮ Compositional approach: The meaning of an expression isdetermined by the meaning of its parts and the way they arecombined.

◮ Denotational approach: Meaning of expressions is formalizedby assigning mathematical objects which describe themeaning.

Will present:

◮ A denotational and compositional semantics.

◮ A comparison of it with W3C Semantics of SPARQL

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 6 / 29

Mappings: building block for the semantics

Definition

A mapping is a partial function from variables to RDF terms.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 7 / 29

Mappings: building block for the semantics

Definition

A mapping is a partial function from variables to RDF terms.

The evaluation of a pattern results in a set of mappings.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 7 / 29

Mappings: building block for the semantics

Definition

A mapping is a partial function from variables to RDF terms.

The evaluation of a pattern results in a set of mappings.

Example (Relational view)

◮ Variables 7→ Attributes

◮ Mappings 7→ Tuples

◮ Set of mappings 7→ Tables

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 7 / 29

The semantics of triple patterns

Given an RDF graph and a triple pattern t

Definition

The evaluation of t is the set of mappings that

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 8 / 29

The semantics of triple patterns

Given an RDF graph and a triple pattern t

Definition

The evaluation of t is the set of mappings that

◮ make t to match the graph

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 8 / 29

The semantics of triple patterns

Given an RDF graph and a triple pattern t

Definition

The evaluation of t is the set of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 8 / 29

The semantics of triple patterns

Given an RDF graph and a triple pattern t

Definition

The evaluation of t is the set of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

Example

graph triple evaluation(R1, name, john)(R1, email, [email protected])(R2, name, paul)

(?X , name, ?Y )?X ?Y

µ1: R1 johnµ2: R2 paul

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 8 / 29

The semantics of triple patterns

Given an RDF graph and a triple pattern t

Definition

The evaluation of t is the set of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

Example

graph triple evaluation(R1, name, john)(R1, email, [email protected])(R2, name, paul)

(?X , name, ?Y )?X ?Y

µ1: R1 johnµ2: R2 paul

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 8 / 29

The semantics of triple patterns

Given an RDF graph and a triple pattern t

Definition

The evaluation of t is the set of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

Example

graph triple evaluation(R1, name, john)(R1, email, [email protected])(R2, name, paul)

(?X , name, ?Y )?X ?Y

µ1: R1 johnµ2: R2 paul

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 8 / 29

The bag semantics of triple patterns

Definition (Bag Semantics)

The evaluation of t is the multisetset (bag) of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 9 / 29

The bag semantics of triple patterns

Definition (Bag Semantics)

The evaluation of t is the multisetset (bag) of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

Bag Semantics

◮ Reflects real world practice

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 9 / 29

The bag semantics of triple patterns

Definition (Bag Semantics)

The evaluation of t is the multisetset (bag) of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

Bag Semantics

◮ Reflects real world practice

◮ Not well understood from a theoretical point of view

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 9 / 29

The bag semantics of triple patterns

Definition (Bag Semantics)

The evaluation of t is the multisetset (bag) of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

Bag Semantics

◮ Reflects real world practice

◮ Not well understood from a theoretical point of view

◮ For RDF/SPARQL, really set/bag semantics (sets for datainput, bag for subsequent processing and output).

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 9 / 29

The bag semantics of triple patterns

Definition (Bag Semantics)

The evaluation of t is the multisetset (bag) of mappings that

◮ make t to match the graph

◮ have as domain the variables in t.

Bag Semantics

◮ Reflects real world practice

◮ Not well understood from a theoretical point of view

◮ For RDF/SPARQL, really set/bag semantics (sets for datainput, bag for subsequent processing and output).

This talk will avoid bag semantics details.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 9 / 29

Compatible mappings

Definition

Two mappings are compatible if they agree in their sharedvariables.

Example

?X ?Y ?Z ?Vµ1 : R1 johnµ2 : R1 [email protected]µ3 : [email protected] R2

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 10 / 29

Compatible mappings

Definition

Two mappings are compatible if they agree in their sharedvariables.

Example

?X ?Y ?Z ?Vµ1 : R1 johnµ2 : R1 [email protected]µ3 : [email protected] R2

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 10 / 29

Compatible mappings

Definition

Two mappings are compatible if they agree in their sharedvariables.

Example

?X ?Y ?Z ?Vµ1 : R1 johnµ2 : R1 [email protected]µ3 : [email protected] R2

µ1 ∪ µ2 : R1 john [email protected]

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 10 / 29

Compatible mappings

Definition

Two mappings are compatible if they agree in their sharedvariables.

Example

?X ?Y ?Z ?Vµ1 : R1 johnµ2 : R1 [email protected]µ3 : [email protected] R2

µ1 ∪ µ2 : R1 john [email protected]

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 10 / 29

Compatible mappings

Definition

Two mappings are compatible if they agree in their sharedvariables.

Example

?X ?Y ?Z ?Vµ1 : R1 johnµ2 : R1 [email protected]µ3 : [email protected] R2

µ1 ∪ µ2 : R1 john [email protected]µ1 ∪ µ3 : R1 john [email protected] R2

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 10 / 29

Compatible mappings

Definition

Two mappings are compatible if they agree in their sharedvariables.

Example

?X ?Y ?Z ?Vµ1 : R1 johnµ2 : R1 [email protected]µ3 : [email protected] R2

µ1 ∪ µ2 : R1 john [email protected]µ1 ∪ µ3 : R1 john [email protected] R2

◮ µ2 and µ3 are not compatible

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 10 / 29

Sets of mappings and operations

Let M1 and M2 be sets of mappings:

Definition

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 11 / 29

Sets of mappings and operations

Let M1 and M2 be sets of mappings:

Definition

Join: M1 M2

◮ extending mappings in M1 with compatible mappings in M2

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 11 / 29

Sets of mappings and operations

Let M1 and M2 be sets of mappings:

Definition

Join: M1 M2

◮ extending mappings in M1 with compatible mappings in M2

Difference: M1 r M2

◮ mappings in M1 that cannot be extended with mappings in M2

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 11 / 29

Sets of mappings and operations

Let M1 and M2 be sets of mappings:

Definition

Join: M1 M2

◮ extending mappings in M1 with compatible mappings in M2

Difference: M1 r M2

◮ mappings in M1 that cannot be extended with mappings in M2

Union: M1 ∪M2

◮ mappings in M1 plus mappings in M2 (set theoretical union)

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 11 / 29

Sets of mappings and operations

Let M1 and M2 be sets of mappings:

Definition

Join: M1 M2

◮ extending mappings in M1 with compatible mappings in M2

Difference: M1 r M2

◮ mappings in M1 that cannot be extended with mappings in M2

Union: M1 ∪M2

◮ mappings in M1 plus mappings in M2 (set theoretical union)

Definition

Left Outer Join: M1 M2 = (M1 M2) ∪ (M1 r M2)

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 11 / 29

Semantics of SPARQL operators

Compositional semantics at work:

Definition

Given P1,P2 graph patterns and D an RDF graph:

[[P1 AND P2]]D →

[[P1 UNION P2]]D →

[[P1 OPT P2]]D →

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 12 / 29

Semantics of SPARQL operators

Compositional semantics at work:

Definition

Given P1,P2 graph patterns and D an RDF graph:

[[P1 AND P2]]D → [[P1]]D [[P2]]D

[[P1 UNION P2]]D →

[[P1 OPT P2]]D →

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 12 / 29

Semantics of SPARQL operators

Compositional semantics at work:

Definition

Given P1,P2 graph patterns and D an RDF graph:

[[P1 AND P2]]D → [[P1]]D [[P2]]D

[[P1 UNION P2]]D → [[P1]]D ∪ [[P2]]D

[[P1 OPT P2]]D →

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 12 / 29

Semantics of SPARQL operators

Compositional semantics at work:

Definition

Given P1,P2 graph patterns and D an RDF graph:

[[P1 AND P2]]D → [[P1]]D [[P2]]D

[[P1 UNION P2]]D → [[P1]]D ∪ [[P2]]D

[[P1 OPT P2]]D → [[P1]]D [[P2]]D

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 12 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

?X ?ER1 [email protected]

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

?X ?ER1 [email protected]

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

?X ?Y ?ER1 john [email protected] paul

?X ?ER1 [email protected]

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

?X ?Y ?ER1 john [email protected] paul

?X ?ER1 [email protected]

◮ from the Join

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

?X ?Y ?ER1 john [email protected] paul

?X ?ER1 [email protected]

◮ from the Difference

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Simple example

Example

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

( (?X , name, ?Y ) OPT (?X , email, ?E ) )

?X ?YR1 johnR2 paul

?X ?Y ?ER1 john [email protected] paul

?X ?ER1 [email protected]

◮ from the Union

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 13 / 29

Semantics of FILTER patterns

In a pattern (P FILTER F), the filter expression F is a Booleancombination of atoms.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 14 / 29

Semantics of FILTER patterns

In a pattern (P FILTER F), the filter expression F is a Booleancombination of atoms.

A mapping satisfies an atom:

◮ (?X = c) if it gives the value c to variable ?X

◮ (?X =?Y ) if it gives the same value to ?X and ?Y

◮ bound(?X ) if it is defined for ?X

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 14 / 29

Semantics of FILTER patterns

In a pattern (P FILTER F), the filter expression F is a Booleancombination of atoms.

A mapping satisfies an atom:

◮ (?X = c) if it gives the value c to variable ?X

◮ (?X =?Y ) if it gives the same value to ?X and ?Y

◮ bound(?X ) if it is defined for ?X

Definition

[[P FILTER R ]] = { µ ∈ [[P ]] : µ |= R}= Set of mappings in [[P ]] that satisfy R .

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 14 / 29

Semantics of FILTER patterns

In a pattern (P FILTER F), the filter expression F is a Booleancombination of atoms.

A mapping satisfies an atom:

◮ (?X = c) if it gives the value c to variable ?X

◮ (?X =?Y ) if it gives the same value to ?X and ?Y

◮ bound(?X ) if it is defined for ?X

Definition

[[P FILTER R ]] = { µ ∈ [[P ]] : µ |= R}= Set of mappings in [[P ]] that satisfy R .

Makes sense only if var(R) ⊆ var(P) (safe filters).

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 14 / 29

Complexity (The evaluation problem)

Input:

mapping µ,graph pattern P ,RDF graph D.

Question:

Is the mapping in the evaluation of the pattern against the graph?Formally:

Is it true that µ ∈ [[P ]]D ?

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 15 / 29

Evaluation of simple patterns is polynomial.

Theorem

For patterns using only AND and FILTER operators, the evaluationproblem is polynomial:

O(size of the pattern × size of the graph).

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 16 / 29

Evaluation of simple patterns is polynomial.

Theorem

For patterns using only AND and FILTER operators, the evaluationproblem is polynomial:

O(size of the pattern × size of the graph).

Proof idea◮ Check that the mapping makes every triple to match.

◮ Then check that the mapping satisfies the FILTERs.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 16 / 29

Evaluation including UNION is NP-complete.

Theorem

For patterns using only AND, FILTER and UNION operators, theevaluation problem is NP-complete.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 17 / 29

Evaluation including UNION is NP-complete.

Theorem

For patterns using only AND, FILTER and UNION operators, theevaluation problem is NP-complete.

Proof idea◮ Reduction from 3SAT.

◮ A pattern encodes the propositional formula.

◮ ¬ bound is used to encode negation.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 17 / 29

Evaluation including UNION is NP-complete.

Theorem

For patterns using only AND, FILTER and UNION operators, theevaluation problem is NP-complete.

Proof idea◮ Reduction from 3SAT.

◮ A pattern encodes the propositional formula.

◮ ¬ bound is used to encode negation.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 17 / 29

In general: Evaluation problem is PSPACE-complete.

Theorem

For general patterns that include OPT operator, the evaluationproblem is PSPACE-complete.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 18 / 29

In general: Evaluation problem is PSPACE-complete.

Theorem

For general patterns that include OPT operator, the evaluationproblem is PSPACE-complete.

Proof idea◮ Reduction from QBF

◮ A pattern encodes a quantified propositional formula:

∀x1∃y1∀x2∃y2 · · ·ψ.◮ nested OPTs are used to encode quantifier alternation.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 18 / 29

In general: Evaluation problem is PSPACE-complete.

Theorem

For general patterns that include OPT operator, the evaluationproblem is PSPACE-complete.

Proof idea◮ Reduction from QBF

◮ A pattern encodes a quantified propositional formula:

∀x1∃y1∀x2∃y2 · · ·ψ.◮ nested OPTs are used to encode quantifier alternation.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 18 / 29

Data–complexity is polynomial

Theorem

When patterns are consider to be fixed (data complexity), theevaluation problem is in LOGSPACE.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 19 / 29

Data–complexity is polynomial

Theorem

When patterns are consider to be fixed (data complexity), theevaluation problem is in LOGSPACE.

Proof idea

From data–complexity of first–order logic.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 19 / 29

Expressive Power of SPARQL

◮ A query is a function from the set of input data to the set ofoutput data.

◮ The expressive power of a query language is given by the setof queries it can express.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 20 / 29

Expressive Power of SPARQL

◮ A query is a function from the set of input data to the set ofoutput data.

◮ The expressive power of a query language is given by the setof queries it can express.

Definition (Equivalence of languages)

Two query languages L1 and L2 have the same expressive power ifthey can express the same queries.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 20 / 29

Expressive Power of SPARQL

◮ A query is a function from the set of input data to the set ofoutput data.

◮ The expressive power of a query language is given by the setof queries it can express.

Definition (Equivalence of languages)

Two query languages L1 and L2 have the same expressive power ifthey can express the same queries.

(If the languages operate over different data inputs and outputs,have to normalize them before.)

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 20 / 29

Expressive Power of SPARQL

Three languages we will consider:

SPARQL

W3C Syntax and Semantics(as in W3C Recommendation 15 Jan 2008).

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 21 / 29

Expressive Power of SPARQL

Three languages we will consider:

SPARQL

W3C Syntax and Semantics(as in W3C Recommendation 15 Jan 2008).

SPARQL-S

W3C Syntax and Semantics. Only safe filters allowed.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 21 / 29

Expressive Power of SPARQL

Three languages we will consider:

SPARQL

W3C Syntax and Semantics(as in W3C Recommendation 15 Jan 2008).

SPARQL-S

W3C Syntax and Semantics. Only safe filters allowed.

SPARQL-C

SPARQL with compositional semantics(as presented in this talk).

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 21 / 29

Expressive Power of SPARQL: Safe Patterns

What is the meaning of (P FILTER R) when var(R) 6⊆ var(P)(non-safe filters)?

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 22 / 29

Expressive Power of SPARQL: Safe Patterns

What is the meaning of (P FILTER R) when var(R) 6⊆ var(P)(non-safe filters)?

Example

Possible meanings of (?X name ?Y) FILTER (?Z > 3)

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 22 / 29

Expressive Power of SPARQL: Safe Patterns

What is the meaning of (P FILTER R) when var(R) 6⊆ var(P)(non-safe filters)?

Example

Possible meanings of (?X name ?Y) FILTER (?Z > 3)

1. Non-defined variable ?Z. (Error, False, empty set)

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 22 / 29

Expressive Power of SPARQL: Safe Patterns

What is the meaning of (P FILTER R) when var(R) 6⊆ var(P)(non-safe filters)?

Example

Possible meanings of (?X name ?Y) FILTER (?Z > 3)

1. Non-defined variable ?Z. (Error, False, empty set)

2. All values of ?X, ?Y, ?Z such that the expression matches.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 22 / 29

Expressive Power of SPARQL: Safe Patterns

What is the meaning of (P FILTER R) when var(R) 6⊆ var(P)(non-safe filters)?

Example

Possible meanings of (?X name ?Y) FILTER (?Z > 3)

1. Non-defined variable ?Z. (Error, False, empty set)

2. All values of ?X, ?Y, ?Z such that the expression matches.

3. W3C uses the following:◮ IF the expression is inside an optional, e.g.

P OPT ( (?X name ?Y) FILTER (?Z >3) )and variable ?Z occurs in P, THEN (2.)

◮ ELSE (1.)

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 22 / 29

Expressive Power of SPARQL: Safe Patterns

◮ Patterns with non-safe filter are rare cases.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 23 / 29

Expressive Power of SPARQL: Safe Patterns

◮ Patterns with non-safe filter are rare cases.

◮ Patterns with non-safe filters are simulable with safe ones.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 23 / 29

Expressive Power of SPARQL: Safe Patterns

◮ Patterns with non-safe filter are rare cases.

◮ Patterns with non-safe filters are simulable with safe ones.

Why not avoid them?

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 23 / 29

Expressive Power of SPARQL: Safe Patterns

◮ Patterns with non-safe filter are rare cases.

◮ Patterns with non-safe filters are simulable with safe ones.

Why not avoid them?

Theorem

SPARQL and SPARQL-S have the same expressive power.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 23 / 29

Expressive Power of SPARQL: Safe Patterns

◮ Patterns with non-safe filter are rare cases.

◮ Patterns with non-safe filters are simulable with safe ones.

Why not avoid them?

Theorem

SPARQL and SPARQL-S have the same expressive power.

Proof idea◮ There exists generic procedure to translate non-safe queries

into equivalent safe queries.

◮ It uses case-by-case W3C evaluation rules for non-safe queries.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 23 / 29

Expressive Power of SPARQL: Compositional semantics

◮ Compositional and denotational semantics are desirable.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 24 / 29

Expressive Power of SPARQL: Compositional semantics

◮ Compositional and denotational semantics are desirable.

◮ W3C semantics of SPARQL has a complex three-leveloperational procedure for evaluating patterns.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 24 / 29

Expressive Power of SPARQL: Compositional semantics

◮ Compositional and denotational semantics are desirable.

◮ W3C semantics of SPARQL has a complex three-leveloperational procedure for evaluating patterns.

Occam’s razor: Why not keep things simple and clean?

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 24 / 29

Expressive Power of SPARQL: Compositional semantics

◮ Compositional and denotational semantics are desirable.

◮ W3C semantics of SPARQL has a complex three-leveloperational procedure for evaluating patterns.

Occam’s razor: Why not keep things simple and clean?

Theorem

SPARQL-S and SPARQL-C have the same expressive power.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 24 / 29

Expressive Power of SPARQL: Compositional semantics

◮ Compositional and denotational semantics are desirable.

◮ W3C semantics of SPARQL has a complex three-leveloperational procedure for evaluating patterns.

Occam’s razor: Why not keep things simple and clean?

Theorem

SPARQL-S and SPARQL-C have the same expressive power.

Proof idea

The only non-trivial case is the semantics of patterns of the form(P1 OPT(P2 FILTER C ). Just check both definitions coincide.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 24 / 29

Expressive Power of SPARQL: Relational Algebra

Interesting but not surprising:

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 25 / 29

Expressive Power of SPARQL: Relational Algebra

Interesting but not surprising:

Theorem

SPARQL-C and Relational Algebra have the same expressive power.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 25 / 29

Expressive Power of SPARQL: Relational Algebra

Interesting but not surprising:

Theorem

SPARQL-C and Relational Algebra have the same expressive power.

Proof idea.

1. Use known equivalence between Relational Algebra and versionof Datalog.2. From SPARQL-C to Relational Algebra: idea of transformationwas known, e.g., Cyganiak (to Relational Algebra), Polleres (toDatalog). Had to extend to bag semantics.2. From Datalog to SPARQL-C key issue is sound translation ofnegation.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 25 / 29

Expressive Power of SPARQL: Relational Algebra

Interesting and surprising:

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 26 / 29

Expressive Power of SPARQL: Relational Algebra

Interesting and surprising:

Theorem

W3C SPARQL and Relational Algebra have the same expressivepower.

Proof Idea.

Use previous results.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 26 / 29

Expressive Power of SPARQL: Relational Algebra

Interesting and surprising:

Theorem

W3C SPARQL and Relational Algebra have the same expressivepower.

Proof Idea.

Use previous results.

◮ Results hold for bag and set semantics.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 26 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

◮ Negation (difference) expressible in SPARQL.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

◮ Negation (difference) expressible in SPARQL.

◮ Extension with ASK queries does not add expressive power.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

◮ Negation (difference) expressible in SPARQL.

◮ Extension with ASK queries does not add expressive power.

◮ Could bring to SPARQL most of the machinery of basic SQL.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

◮ Negation (difference) expressible in SPARQL.

◮ Extension with ASK queries does not add expressive power.

◮ Could bring to SPARQL most of the machinery of basic SQL.

B. Foundational:

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

◮ Negation (difference) expressible in SPARQL.

◮ Extension with ASK queries does not add expressive power.

◮ Could bring to SPARQL most of the machinery of basic SQL.

B. Foundational:

◮ SPARQL is a pattern matching version of SQL.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

◮ Negation (difference) expressible in SPARQL.

◮ Extension with ASK queries does not add expressive power.

◮ Could bring to SPARQL most of the machinery of basic SQL.

B. Foundational:

◮ SPARQL is a pattern matching version of SQL.

◮ Only local queries expressible in SPARQL.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Expressive Power of SPARQL: Some consequences

A. Domestic:

◮ Expressive power of SPARQL (limitations and potentialities)completely clarified.

◮ Negation (difference) expressible in SPARQL.

◮ Extension with ASK queries does not add expressive power.

◮ Could bring to SPARQL most of the machinery of basic SQL.

B. Foundational:

◮ SPARQL is a pattern matching version of SQL.

◮ Only local queries expressible in SPARQL.

◮ Still waiting for the third query paradigm: SQL/Tables,XQUERY/Trees, ?/Graphs

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 27 / 29

Conclusions, Final Thougths

◮ RDF is becoming a very relevant data model. Develop it!

◮ We have a ”convenience marriage” with SPARQL. Learn tolove it. RDF and SPARQL are our assets. Be patient!

◮ Simplify, simplify, simplify. Keep everything (but your hope)minimal!

◮ Do not stop the search for ”el Dorado”. Missing graphfeatures will be needed!

◮ Do not reinvent the wheel. Before designing new features orextensions for SPARQL, check if it was tried for SQL.

◮ SPARQL standardization + popularization of RDF +pervasiveness of social networks = explosive combination.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 28 / 29

Conclusions, Final Thougths

◮ RDF is becoming a very relevant data model. Develop it!

◮ We have a ”convenience marriage” with SPARQL. Learn tolove it. RDF and SPARQL are our assets. Be patient!

◮ Simplify, simplify, simplify. Keep everything (but your hope)minimal!

◮ Do not stop the search for ”el Dorado”. Missing graphfeatures will be needed!

◮ Do not reinvent the wheel. Before designing new features orextensions for SPARQL, check if it was tried for SQL.

◮ SPARQL standardization + popularization of RDF +pervasiveness of social networks = explosive combination.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 28 / 29

Conclusions, Final Thougths

◮ RDF is becoming a very relevant data model. Develop it!

◮ We have a ”convenience marriage” with SPARQL. Learn tolove it. RDF and SPARQL are our assets. Be patient!

◮ Simplify, simplify, simplify. Keep everything (but your hope)minimal!

◮ Do not stop the search for ”el Dorado”. Missing graphfeatures will be needed!

◮ Do not reinvent the wheel. Before designing new features orextensions for SPARQL, check if it was tried for SQL.

◮ SPARQL standardization + popularization of RDF +pervasiveness of social networks = explosive combination.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 28 / 29

Conclusions, Final Thougths

◮ RDF is becoming a very relevant data model. Develop it!

◮ We have a ”convenience marriage” with SPARQL. Learn tolove it. RDF and SPARQL are our assets. Be patient!

◮ Simplify, simplify, simplify. Keep everything (but your hope)minimal!

◮ Do not stop the search for ”el Dorado”. Missing graphfeatures will be needed!

◮ Do not reinvent the wheel. Before designing new features orextensions for SPARQL, check if it was tried for SQL.

◮ SPARQL standardization + popularization of RDF +pervasiveness of social networks = explosive combination.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 28 / 29

Conclusions, Final Thougths

◮ RDF is becoming a very relevant data model. Develop it!

◮ We have a ”convenience marriage” with SPARQL. Learn tolove it. RDF and SPARQL are our assets. Be patient!

◮ Simplify, simplify, simplify. Keep everything (but your hope)minimal!

◮ Do not stop the search for ”el Dorado”. Missing graphfeatures will be needed!

◮ Do not reinvent the wheel. Before designing new features orextensions for SPARQL, check if it was tried for SQL.

◮ SPARQL standardization + popularization of RDF +pervasiveness of social networks = explosive combination.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 28 / 29

Conclusions, Final Thougths

◮ RDF is becoming a very relevant data model. Develop it!

◮ We have a ”convenience marriage” with SPARQL. Learn tolove it. RDF and SPARQL are our assets. Be patient!

◮ Simplify, simplify, simplify. Keep everything (but your hope)minimal!

◮ Do not stop the search for ”el Dorado”. Missing graphfeatures will be needed!

◮ Do not reinvent the wheel. Before designing new features orextensions for SPARQL, check if it was tried for SQL.

◮ SPARQL standardization + popularization of RDF +pervasiveness of social networks = explosive combination.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 28 / 29

Conclusions, Final Thougths

◮ RDF is becoming a very relevant data model. Develop it!

◮ We have a ”convenience marriage” with SPARQL. Learn tolove it. RDF and SPARQL are our assets. Be patient!

◮ Simplify, simplify, simplify. Keep everything (but your hope)minimal!

◮ Do not stop the search for ”el Dorado”. Missing graphfeatures will be needed!

◮ Do not reinvent the wheel. Before designing new features orextensions for SPARQL, check if it was tried for SQL.

◮ SPARQL standardization + popularization of RDF +pervasiveness of social networks = explosive combination.

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 28 / 29

Comments, Questions, etc.

Thanks for your attention!

[email protected]

– C. Gutierrez - Foundations of RDF Databases - ESWC 2008 29 / 29