towards an rdf validation language based on regular expression derivatives
TRANSCRIPT
Towards an RDF Validation Languagebased on Regular Expression Derivatives
Eric Prud'hommeauxWorld Wide Web
ConsortiumMIT, Cambridge, MA, USA
Harold SolbrigMayo Clinic
USACollege of Medicine, Rochester,
MN, USA
Jose Emilio Labra GayoWESO Research groupUniversity of Oviedo
Spain
Sławek StaworkoLINKS, INRIA & CNRS
University of Lille, France
Overview
Shape Expressions for RDF validation - Justification
Regular Shape Expressions Axiomatic Semantics
Implementation based on Derivatives
Regular Shape Expression Schemas Adapt Axiomatic Semantics to Schemas
Adapt Implementation based on Derivatives
Conclusions & Future work
Shape Expressions
Simple and intuitive language that can:Describe the topology of RDF data
Validate that RDF instance data matches a shape
Two syntaxesCompact syntax (inspired by RelaxNG, Turtle and SPARQL)
RDF
Related to W3c RDF Data Shapes Working Group
Example: RDF model of a Person
Person__
foaf:age xsd:integer
foaf:name xsd:string +
0..*
foaf:knows
:john foaf:age 23;foaf:name "John";foaf:knows :bob .
:bob foaf:age 34;foaf:name "Bob", "Robert" .
<Person> {foaf:age xsd:integer
, foaf:name xsd:string+, foaf:knows @<Person>*}
Shape Expressions Schema
Some RDF data
:mary foaf:age 50, 65 .
E-R Diagram
Why not SPARQL?<Person> {foaf:age xsd:integer
, foaf:name xsd:string+, foaf:knows @<Person>*}ASK { { SELECT ?Person {
?Person foaf:age ?o .} GROUP BY ?Person HAVING (COUNT(*)=1)
}{ SELECT ?Person {
?Person foaf:age ?o .FILTER ( isLiteral(?o) &&
datatype(?o) = xsd:integer )} GROUP BY ?Person HAVING (COUNT(*)=1)
}...
123456789
10...
...{ SELECT ?Person (COUNT(*) AS ?Person_c0) {?Person foaf:name ?o .} GROUP BY ?Person HAVING (COUNT(*)>=1)
}{ SELECT ?Person (COUNT(*) AS ?Person_c1) {
?Person foaf:name ?o .FILTER (isLiteral(?o) &&
datatype(?o) = xsd:string)} GROUP BY ?Person HAVING (COUNT(*)>=1) }
FILTER (?Person_c0 = ?Person_c1)...
...11121314151617181920...
...{ { { SELECT ?Person (COUNT(*) AS ?Person_c2) {
?Person foaf:knows ?o .} GROUP BY ?Person }
{ SELECT ?Person (COUNT(*) AS ?Person_c3) {?Person foaf:knows ?o .FILTER ((isIRI(?o) || isBlank(?o)))
} GROUP BY ?Person HAVING (COUNT(*) >= 1) }FILTER (?Person_c2 = ?Person_c3)
}...
...21222324252627282930...
...UNION {
SELECT ?Person {OPTIONAL { ?Person foaf:knows ?o }FILTER (!bound(?o))
}}
}}
...3132333435363738
12345
Regular Shape Expressions (RSEs)
Simplified version of Shape ExpressionsBased on Regular Expressions
Sets of triples instead of list of characters
Interleave instead of concatenation
Abstract syntax
Shape Expressions vs RSEs*
<Shape1> { foaf:age xsd:integer
, foaf:name xsd:string*}
Example1:Shape Expression RSE
* Note: We are considering a subset of Shape Expressions with Closed Shapes, and inclusive Or
<Shape2> { :a ( 1 )
, :b ( 1 2 ) *}
Example 2:
Derivatives of RSEs
Brzozowski's algorithm (1964) developed for Regular Expressions
We adapted that algorithm to RSEs
Calculates the derivative of a RSE with respect to a triple t:
Definition:
Matching using derivatives
Auxiliary function that returns true if a RSE matches the empty graph
The matching relation can be expressed as:
Regular Shape Expression Schemas
Given a set of labels, a RSE schema is a function
where we extend RSEs to admit label references
Example 1:
Example 2:
<Person> {foaf:age xsd:integer
, foaf:knows @<Person>*}
Corresponds to:
From matching to typing
We extend previous definitions to include the notion of typing
A typing associates a label to a node in a context
Definitions on typings
The matching algorithm returns the typing in the context:
Matching RSEs Schemas
We define the matching of a RSE e with a set of triples as a partialfunction that returns a typing.
The function takes a typing context as argument
and we extend previous axiomatic definitions as...
Implementations
The algorithm has been implemented in Scala
Available at: http://labra.github.io/shexcala
We have also implemented a simplified prototype following the paperdefinitions in Haskell
Available at: http://labra.github.io/Haws
An online version is also available at: http://rdfshape.weso.es
Conclusions & Future work
Declarative algorithm to match Regular Shape ExpressionsBased on equational reasoning
Theoretical complexity is unaffectedHowever, the derivatives algorithm behaves better than backtracking in practice
Future work:Prove the correctness of the algorithm
Experimental results
Align this work with current RDF Data Shapes development
SHACL vs RSEs
At this moment, SHACL is being defined by the RDF Data Shapes WG
Some differences:Open Shapes (allow remaining triples)
Arcs check that there are no other arcs with the same predicate and different values
And operator instead of interleave
Inclusive vs Exclusive-or
Semantics of all these features is under discussion