towards an rdf validation language based on regular expression derivatives

28
Towards an RDF Validation Language based on Regular Expression Derivatives Eric Prud'hommeaux World Wide Web Consortium MIT, Cambridge, MA, USA Harold Solbrig Mayo Clinic USA College of Medicine, Rochester, MN, USA Jose Emilio Labra Gayo WESO Research group University of Oviedo Spain Sławek Staworko LINKS, INRIA & CNRS University of Lille, France

Upload: jose-emilio-labra-gayo

Post on 16-Jul-2015

282 views

Category:

Internet


0 download

TRANSCRIPT

Towards an RDF Validation Languagebased on Regular Expression Derivatives

Eric Prud'hommeauxWorld Wide Web

ConsortiumMIT, Cambridge, MA, USA

Harold SolbrigMayo Clinic

USACollege of Medicine, Rochester,

MN, USA

Jose Emilio Labra GayoWESO Research groupUniversity of Oviedo

Spain

Sławek StaworkoLINKS, INRIA & CNRS

University of Lille, France

Overview

Shape Expressions for RDF validation - Justification

Regular Shape Expressions Axiomatic Semantics

Implementation based on Derivatives

Regular Shape Expression Schemas Adapt Axiomatic Semantics to Schemas

Adapt Implementation based on Derivatives

Conclusions & Future work

Shape Expressions

Simple and intuitive language that can:Describe the topology of RDF data

Validate that RDF instance data matches a shape

Two syntaxesCompact syntax (inspired by RelaxNG, Turtle and SPARQL)

RDF

Related to W3c RDF Data Shapes Working Group

Example: RDF model of a Person

Person__

foaf:age xsd:integer

foaf:name xsd:string +

0..*

foaf:knows

:john foaf:age 23;foaf:name "John";foaf:knows :bob .

:bob foaf:age 34;foaf:name "Bob", "Robert" .

<Person> {foaf:age xsd:integer

, foaf:name xsd:string+, foaf:knows @<Person>*}

Shape Expressions Schema

Some RDF data

:mary foaf:age 50, 65 .

E-R Diagram

Why not SPARQL?<Person> {foaf:age xsd:integer

, foaf:name xsd:string+, foaf:knows @<Person>*}ASK { { SELECT ?Person {

?Person foaf:age ?o .} GROUP BY ?Person HAVING (COUNT(*)=1)

}{ SELECT ?Person {

?Person foaf:age ?o .FILTER ( isLiteral(?o) &&

datatype(?o) = xsd:integer )} GROUP BY ?Person HAVING (COUNT(*)=1)

}...

123456789

10...

...{ SELECT ?Person (COUNT(*) AS ?Person_c0) {?Person foaf:name ?o .} GROUP BY ?Person HAVING (COUNT(*)>=1)

}{ SELECT ?Person (COUNT(*) AS ?Person_c1) {

?Person foaf:name ?o .FILTER (isLiteral(?o) &&

datatype(?o) = xsd:string)} GROUP BY ?Person HAVING (COUNT(*)>=1) }

FILTER (?Person_c0 = ?Person_c1)...

...11121314151617181920...

...{ { { SELECT ?Person (COUNT(*) AS ?Person_c2) {

?Person foaf:knows ?o .} GROUP BY ?Person }

{ SELECT ?Person (COUNT(*) AS ?Person_c3) {?Person foaf:knows ?o .FILTER ((isIRI(?o) || isBlank(?o)))

} GROUP BY ?Person HAVING (COUNT(*) >= 1) }FILTER (?Person_c2 = ?Person_c3)

}...

...21222324252627282930...

...UNION {

SELECT ?Person {OPTIONAL { ?Person foaf:knows ?o }FILTER (!bound(?o))

}}

}}

...3132333435363738

12345

Regular Shape Expressions (RSEs)

Simplified version of Shape ExpressionsBased on Regular Expressions

Sets of triples instead of list of characters

Interleave instead of concatenation

Abstract syntax

Shape Expressions vs RSEs*

<Shape1> { foaf:age xsd:integer

, foaf:name xsd:string*}

Example1:Shape Expression RSE

* Note: We are considering a subset of Shape Expressions with Closed Shapes, and inclusive Or

<Shape2> { :a ( 1 )

, :b ( 1 2 ) *}

Example 2:

Cardinalities in RSEs

Cardinalities can be defined as:

Example:

Shape of a RSE:

Example

Simplification rules

It is easy to show that the operators obey:

Matching triples with RSEs

Example matching treeRules employed

Derivatives of RSEs

Brzozowski's algorithm (1964) developed for Regular Expressions

We adapted that algorithm to RSEs

Calculates the derivative of a RSE with respect to a triple t:

Definition:

Calculating the derivative Definitions

Matching using derivatives

Auxiliary function that returns true if a RSE matches the empty graph

The matching relation can be expressed as:

Example trace:

Regular Shape Expression Schemas

Given a set of labels, a RSE schema is a function

where we extend RSEs to admit label references

Example 1:

Example 2:

<Person> {foaf:age xsd:integer

, foaf:knows @<Person>*}

Corresponds to:

From matching to typing

We extend previous definitions to include the notion of typing

A typing associates a label to a node in a context

Definitions on typings

The matching algorithm returns the typing in the context:

Matching RSEs Schemas

We define the matching of a RSE e with a set of triples as a partialfunction that returns a typing.

The function takes a typing context as argument

and we extend previous axiomatic definitions as...

Axiomatic definitions adapted RSE Schemas

Derivative of a RSE in a typing contextWe adapt previous definitions to typing contexts

where

Example:

Implementations

The algorithm has been implemented in Scala

Available at: http://labra.github.io/shexcala

We have also implemented a simplified prototype following the paperdefinitions in Haskell

Available at: http://labra.github.io/Haws

An online version is also available at: http://rdfshape.weso.es

First experimental results

Comparison between derivatives (deriv) and backtracking (back)

Conclusions & Future work

Declarative algorithm to match Regular Shape ExpressionsBased on equational reasoning

Theoretical complexity is unaffectedHowever, the derivatives algorithm behaves better than backtracking in practice

Future work:Prove the correctness of the algorithm

Experimental results

Align this work with current RDF Data Shapes development

End of Presentation

SHACL vs RSEs

At this moment, SHACL is being defined by the RDF Data Shapes WG

Some differences:Open Shapes (allow remaining triples)

Arcs check that there are no other arcs with the same predicate and different values

And operator instead of interleave

Inclusive vs Exclusive-or

Semantics of all these features is under discussion

Example of derivatives that don't match