a little sparql in your analytics

54
A Little SPARQL in your Analytics Dr. Neil Brittliff

Upload: dr-neil-brittliff

Post on 22-Jan-2017

267 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: A Little SPARQL in your Analytics

A Little SPARQL in your AnalyticsDr. Neil Brittliff

Page 2: A Little SPARQL in your Analytics

Introduction The Semantic Web, as originally envisioned, is a system that enables machines to "understand" and respond to complex human requests based on their meaning. Such an "understanding" requires that the relevant information sources be semantically structured.

Tim Berners-Lee originally expressed the vision of the Semantic Web as follows:

I have a dream for the Web [in which computers] become capable of analysing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.

Page 3: A Little SPARQL in your Analytics

Some other thoughts…Are you a member of

the SPARQL cultAlex Karp CEO Palantir

Its about graphs not trees

Pascal HitzlerProfessor and Director of Data Science at the

Department of Computer Science and Engineering at Wright State University in Dayton, Ohio

In the six degrees of separation, not all degrees

are equal. Malcolm Gladwell, The Tipping

Point: How Little Things Can Make a Big Difference

Page 4: A Little SPARQL in your Analytics

The Talk Structure Triples and RDF

◦ What is the Big Deal

Resource Description Framework (RDF)◦ A formal way to represent information◦ Some of Nomenclature ◦ Some data structures

SPARQL ◦ A language not dissimilar to SQL but can interrogate the Semantic Web

Ontological Representations◦ A formal way to describe structure

Analytics◦ Some SNA stuff

The Triple Store Implementations Some of my stuff – Dealing with Massive RDF Lists

Page 5: A Little SPARQL in your Analytics

Data Platforms

Page 6: A Little SPARQL in your Analytics

Lets look back…CODAYSL

Hierarchial

Relational

Columnar Key/ValueDocument

Graph

Page 7: A Little SPARQL in your Analytics

The Triple

Subject Predicate ObjectResource or Blank Resource Resource, Literal or Blank

Triple

Note: The URL can identify data in the cloudor on premise.

Note: No need for NULLs

Page 8: A Little SPARQL in your Analytics

RDF• RDF - Resource Description Framework and it is a flexible schema-

less data model

• Standards based – W3C

• RDF Syntaxo N3/Turtleo XML

• Predefined RDF Structureso Bago Seqo Alto List

These are the only data structures

Page 9: A Little SPARQL in your Analytics

RDF Representation@prefix p: <http://www.example.org/personal_details#> . @prefix m: <http://www.example.org/meeting_organization#> .@prefix g: <http://www.another.example.org/geographical#>

<http://www.example.org/people#fred> p:GivenName "fred"; p:hasEmail <mailto:[email protected]>; m:attending <http://meetings.example.com/cal#m1> .

<http://meetings.example.com/cal#m1> g:Location [ g:zip "02139"; g:lat "14.124425"; g:long "14.245" ];

<http://meetings.example.com/cal#m1> m:homePage <http://meetings.example.com/m1/hp>

“14.124425".

g:zip

g:lat

g:long

“02139".

“14.245".

http://meetings.example.com/cal#m1

g:location

CURIE

Page 10: A Little SPARQL in your Analytics

RDF Property Constants

Note: A Predicate is also referred to as property used when the object is a Literal

Property Description Usagerdf:first First Element in a list rdf:Property

rdf:rest Rest of the List rdf:Property

rdf:_i List Sequence rdf:Property

rdf:nil End of the List rdf:Resource

Note: If a Predicate is a number the number value is preceded by an underscore _

Page 11: A Little SPARQL in your Analytics

Nodes and the Blank Nodes

"2010-09-29".

genid:ARP11

sf:Phone-Mobile

sf:marial-status

sf:DOB

“single".sf:mode

“car".

sf:gender

“555-1278-4343".

“F".

"2010-09-29".

sf:Phone-Mobile

sf:marial-status

sf:DOB

“single".sf:mode

“car".

sf:gender

“555-1278-4343".

“F".

genid:ARP11 p:GivenName "fred"; p:hasEmail <mailto:[email protected]>; m:attending <http://meetings.example.com/cal#m1>

Note: Blank nodes are not idempotent !!!

_:a1

Page 12: A Little SPARQL in your Analytics

RDF Structures <rdf:Description rdf:about="http://science-fiction-book/uri"> <ex:authors>

<rdf:Bag><rdf:li "Asimov"><rdf:li "Dick"><rdf:li "Heinlein">

</rdf:Seq> </ex:authors></rdf:Description

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix d: <http://learningsparql.com/ns/data#> . d:myList d:contents _:b1 . _:b1 rdf:first "one" . _:b1 rdf:rest _:b2 . _:b2 rdf:first "two" . _:b2 rdf:rest _:b3 . _:b3 rdf:first "three" . _:b3 rdf:rest _:b4 . _:b4 rdf:first "four" . _:b4 rdf:rest _:b5 . _:b5 rdf:first "five" . _:b5 rdf:rest rdf:nil .

“one"_:b1

“two"_:b2

“three"_:b3

“four"_:b4

_:b5 “five"

rdf:nil

d:mylistrdf:first

rdf:first

rdf:first

rdf:first

rdf:first

rdf:rest

rdf:rest

rdf:rest

rdf:rest

rdf:rest

d:contents

Page 13: A Little SPARQL in your Analytics

SPARQL SPARQL (pronounced "sparkle", a recursive acronym[2]

for SPARQL Protocol and RDF Query Language) is an RDF query language, that is, a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 became an official W3C Recommendation, and SPARQL 1.1 in March, 2013.

Note: Source Wikipedia

Page 14: A Little SPARQL in your Analytics

The Language StructurePREFIX abc: <nul://sparql/exampleOntology#> .

SELECT ?capital ?country WHERE {

?x abc:cityname ?capital ;

<nul://sparql/exampleOntology#isCapitalOf> ?y.

?y abc:countryname ?country ;

abc:isInContinent abc:Africa.

}

CURI

Resource

Note: It is a bit like an SQL Select Statement

FullyQualified

Triple

Variable

Page 15: A Little SPARQL in your Analytics

ConstructCONSTRUCT { ?article dwc:articleTaxonName ?name }WHERE {  ?x txn:hasWikipediaArticle ?article.  ?x txn:scientificName ?name.  ?x a txn:SpeciesConcept.  ?x txn:kingdom "Plantae". }

LIMIT 10

Note: Always returns RDF (triples) !!!

Page 16: A Little SPARQL in your Analytics

DescribeDESCRIBE ?x WHERE { ?x a txn:Occurrence. ?x dcterms:date "2010-09-29".}LIMIT 10

txn:Occurrence

"2010-09-29".

Note: Describe always returns RDF !

Node to describe

Page 17: A Little SPARQL in your Analytics

AskASK { ?x a txn:Occurrence. ?x dcterms:date "2010-09-29".}

Yes or No

Page 18: A Little SPARQL in your Analytics

SelectSELECT ?person ?name ?email WHERE {

  ?person foaf:email ?email.

  ?person foaf:name ?name.

  ?person foaf:skill "internet".

}LIMIT 50

Note: Results are returned in a tabular format

Page 19: A Little SPARQL in your Analytics

Results from the Selectperson name email<http://www.w3.org/People/karl/karl-foaf.xrdf#me> "Karl Dubost" <mailto:[email protected]>

<http://www.w3.org/People/card#amy> "Amy van der Hiel" <mailto:[email protected]>

<http://www.w3.org/People/card#edd> "Edd Dumbill" <mailto:[email protected]>

<http://www.w3.org/People/card#dj> "Dean Jackson" <mailto:[email protected]>

<http://www.w3.org/People/card#edd> "Edd Dumbill" <mailto:[email protected]>

<http://www.aaronsw.com/about.xrdf#aaronsw> "Aaron Swartz" <mailto:[email protected]>

<http://www.w3.org/People/card#i> "Timothy Berners-Lee" <mailto:[email protected]>

<http://www.w3.org/People/EM/contact#me> "Eric Miller" <mailto:[email protected]>

<http://www.w3.org/People/card#edd> "Edd Dumbill" <mailto:[email protected]>

<http://www.w3.org/People/card#dj> "Dean Jackson" <mailto:[email protected]>

<http://www.w3.org/People/card#libby> "Libby Miller" <mailto:[email protected]>

<http://www.w3.org/People/Connolly/#me> "Dan Connolly" <mailto:[email protected]>

Page 20: A Little SPARQL in your Analytics

Select - OptionalPREFIX mo: <http://purl.org/ontology/mo/> . PREFIX foaf: <http://xmlns.com/foaf/0.1/> .

SELECT ?name ?img  ?hp ?loc WHERE { ?a a mo:MusicArtist ; foaf:name ?name . OPTIONAL { ?a foaf:img ?img } OPTIONAL { ?a foaf:homepage ?hp } OPTIONAL { ?a foaf:based_near ?loc }}

Page 21: A Little SPARQL in your Analytics

The Optional Clause Result

foaf:based_near

foaf:name

foaf:based_near

foaf:img

foaf:homepage

mo:MusicArtist

foaf:name

mo:MusicArtist

foaf:based_near

foaf:name

foaf:homepage

foaf:img

foaf:img

Page 22: A Little SPARQL in your Analytics

Select - FilterPREFIX prop: <http://dbpedia.org/property/> .

ASK WHERE { <http://dbpedia.org/resource/Amazon_River> prop:length ?amazon . <http://dbpedia.org/resource/Nile> prop:length ?nile .

FILTER(?amazon > ?nile) .

}Note: Filters are applied after the results are selected

Note: Filters can appear anywhere within the SELECT statement

Page 23: A Little SPARQL in your Analytics

Alternatives - UnionPREFIX go: <http://purl.org/obo/owl/GO#> .PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .PREFIX odo: <http://www.obofoundry.org/ro/ro.owl#> .

SELECT DISTINCT ?label ?process COUNT(*) AS ?count

WHERE { { ?process obo:part_of go:GO_0007165 }

UNION { ?process rdfs:subClassOf go:GO_0007165 }

?process rdfs:label ?label} GROUP BY ?label ORDER BY DESC(COUNT(*))

GROUP BY ?interest ORDER BY DESC(COUNT(*))

COUNT(*) AS ?count

Page 24: A Little SPARQL in your Analytics

SPARQL - UpdatePREFIX prop: <http://dbpedia.org/property/> .PREFIX dc: <http://purl.org/dc/elements/1.1/> INSERT DATA {

<http://example/book1> dc:title "A new book" ; dc:creator "A.N.Other" . }

PREFIX dc: <http://purl.org/dc/elements/1.1/> DELETE DATA {

<http://example/book2> dc:title "David Copperfield" ; dc:creator "Edmund Wells" . }

Note: You can only insert and delete triplets

Page 25: A Little SPARQL in your Analytics

Quads – The GraphCONSTRUCT { GRAPH :g { ?s :p ?o }

{ ?s :p ?o }

<http://purl.org/obo/owl/GO#> { :s ?p :o }WHERE {. ..}

Note: Can be seen as a schema in a relational database

Default Graph

URI

Page 26: A Little SPARQL in your Analytics

SPARQL and Analytics

Page 27: A Little SPARQL in your Analytics

Property PathsSyntax Form Matches

uri A URI or a prefixed name. A path of length one.

^elt Inverse path (object to subject).

(elt) A group path elt, brackets control precedence.

elt1 / elt2 A sequence path of elt1, followed by elt2

elt1 ^ elt2Shorthand for elt1 / ^elt2, that is elt1 followed by the inverse of elt2.

elt1 | elt2 A alternative path of elt1, or elt2 (all possibilities are tried).

elt* A path of zero or more occurrences of elt.

elt+ A path of one or more occurrences of elt.

elt? A path of zero or one elt.

elt{n,m} A path between n and m occurrences of elt.

elt{n} Exactly n occurrences of elt. A fixed length path.

elt{n,} n or more occurrences of elt.

elt{,n} Between 0 and n occurrences of elt.

SELECT ?value WHERE {

:list rdf:rest* [][] rdf:first ?value

}

Note: Note the use [] this tellsthe SPARQL parser that the triples share a common resource.

Page 28: A Little SPARQL in your Analytics

Property Paths cont…PREFIX d: <http://learningsparql.com/ns/data#>

SELECT ?item WHERE {

d:myList d:contents/rdf:rest{2}/rdf:first ?item

}

----------- | item | =========== | "three" | -----------

Page 29: A Little SPARQL in your Analytics

SPARQL and Rinstall.packages(c('SPARQL','igraph','network','ergm'),dependencies=TRUE)

library(SPARQL)library(igraph) library(network) library(ergm)

endpoint <- "http://live.dbpedia.org/sparql"

sparql_prefix <- "PREFIX dbp: <http://dbpedia.org/property/> PREFIX dc: <http://purl.org/dc/terms/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> "

q <- paste(sparql_prefix, 'SELECT ?actor ?movie ?director ?movie_date WHERE { ?m dc:subject <http://dbpedia.org/resource/Category:American_films> . ?m rdfs:label ?movie . FILTER(LANG(?movie) = "en") ?m dbp:released ?movie_date . FILTER(DATATYPE(?movie_date) = xsd:date) ?m dbp:starring ?a . ?a rdfs:label ?actor . FILTER(LANG(?actor) = "en") ?m dbp:director ?d . ?d rdfs:label ?director . FILTER(LANG(?director) = "en") }')

res <- SPARQL(endpoint,q,ns=prefix,extra=options)

$results

Page 30: A Little SPARQL in your Analytics

Ego-centred network measures

Page 31: A Little SPARQL in your Analytics

Ontological Representations

Page 32: A Little SPARQL in your Analytics

RDFSRDF Schema (or RDFS) defines classes and properties.

The resources in the RDFS vocabulary have URIrefs beginning with http://www.w3.org/2000/01/rdf-schema#

ex:Vehicle rdf:type rdfs:Class. ex:Car rdfs:subClassOf ex:Vehicle . ex:Van rdfs:subClassOf ex:Vehicle . ex:Truck rdfs:subClassOf ex:Vehicle . ex:MiniVan rdfs:subClassOf ex:Van . ex:MiniVan rdfs:subClassOf ex:Car .

Class

Vehicle

Van

MiniVan

Truck Car

MiniVan

Page 33: A Little SPARQL in your Analytics

OWL The Ontology Web Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.<owl:Class>

<owl:intersectionOf rdf:parseType="Collection"> <owl:Class>

<owl:oneOf rdf:parseType="Collection">

<owl:Thing rdf:about="#Tosca" />

<owl:Thing rdf:about="#Salome" />

</owl:oneOf> </owl:Class>

<owl:Class> <owl:oneOf rdf:parseType="Collection">

<owl:Thing rdf:about="#Turandot" />

<owl:Thing rdf:about="#Tosca" />

</owl:oneOf> </owl:Class>

</owl:intersectionOf> </owl:Class>

OWL is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. OWL has more facilities for expressing meaning and semantics than XML, RDF, and RDFS, and thus OWL goes beyond these languages in its ability to represent machine interpretable content on the Web.

Page 34: A Little SPARQL in your Analytics

Reification @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix : <http://example/ns#> .

_:a rdf:subject <http://example.org/book/book1> ._:a rdf:predicate dc:title ._:a rdf:book "SPARQL" ._:a :saidBy "Alice" .

_:b rdf:subject <http://example.org/book/book1> ._:b rdf:predicate dc:title ._:b rdf:book "SPARQL Tutorial" ._:b rdf:video "SPARQL Queries" ._:b :saidBy "Bob" .

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX : <http://example/ns#>

SELECT ?book ?titleWHERE{ ?t rdf:subject ?book . ?t rdf:predicate dc:title . ?t rdf:object ?title . ?t :saidBy "Bob" .}

book title<http://example.org/book/book1> "SPARQL Tutorial"<http://example.org/book/book1> "SPARQL Queries"

Page 35: A Little SPARQL in your Analytics

Ontologies The Friend Of A Friend (FOAF) ontology

Project homepage: http://www.foaf-project.org/Namespace: http://xmlns.com/foaf/0.1/Typical prefix: foaf:Documentation: http://xmlns.com/foaf/spec/

The Dublin Core (DC) ontology

Project homepage: http://dublincore.org/Namespace: http://purl.org/dc/elements/1.1/ and http://purl.org/dc/terms/Typical prefix: dc: and dcterm:Documentation: http://dublincore.org/specifications/Description: this is a light weight RDFS vocabulary for describing generic metadata.

Page 36: A Little SPARQL in your Analytics

Ontologies cont… VCARD

Project homepage: http://www.w3.org/TR/vcard-rdf/Namespace: http://www.w3.org/2006/vcard/ns#/Typical prefix: vcard:Documentation: hp://www.w3.org/TR/vcard-rdf/

<vcard:Individual rdf:about="http://example.com/me/corky"> <vcard:fn>Corky Crystal</vcard:fn> <vcard:nickname>Corks</vcard:nickname> <vcard:hasEmail rdf:resource="mailto:[email protected]"/> </vcard:Individual>

Page 37: A Little SPARQL in your Analytics

Implementations

Aduna SesameApache Jena

TDB

Page 38: A Little SPARQL in your Analytics

Aduna Sesame Aduna Sesame

◦ Multiple back-end relational database support◦ MYSQL◦ PostgresDB◦ Oracle (Provided by Oracle)◦ Various Third Party Implementations

◦ Limited support for the property path expression◦ Not great for massive graph retrievals◦ Comes with a complient REST interface◦ Excellent Management Console

Page 39: A Little SPARQL in your Analytics

Sesame Back-ends Ontotext GraphDB™ Ontotext GraphDB™ (formerly OWLIM) is a leading RDF Triplestore built on OWL (Ontology Web

Language) standards, and fully compatible with the Sesame APIs. GraphDB handles massive loads, queries and OWL inferencing in real time. Ontotext offers three versions: GraphDB™ Lite, GraphDB™ Standard and GraphDB™ Enterprise.

CumulusRDF

CumulusRDF is an RDF store on a cloud-based architecture, fully compatible with the Sesame APIs. CumulusRDF provides a REST-based API with CRUD operations to manage RDF data. The current version uses Apache Cassandra as storage backend.

Systap Blazegraph™

Blazegraph™ (formerly known as Bigdata) is an enterprise graph database by Systap, LLC that provides a horizontally scaling, fully Sesame-compatible, storage and retrieval solution for very large volumes of RDF.

Page 40: A Little SPARQL in your Analytics

Apache JenaApache Jena

◦ Multiple back-end relational database support◦ Does support property paths◦ OK on large graph retrievals

ARQ (SPARQL) Query your RDF data using ARQ, a SPARQL 1.1compliant engine. ARQ supports

remote federated queries and free text search.

Fuseki Expose your triples as a SPARQL end-point accessible over HTTP. Fuseki provides

REST-style interaction with your RDF data.

Inference API Reason over your data to expand and check the content of your triple store.

Configure your own inference rules or use the built-in OWL and RDFS reasoners.Note: Originally developed by Hewlett Packard

Page 41: A Little SPARQL in your Analytics

AllegroGraph AllegroGraph

◦ Utilises Aduna Sesame Infrastructure◦ Very Fast◦ Supports ‘free text’◦ Supports Geospatial search

Page 42: A Little SPARQL in your Analytics

OracleHas 2 implementations◦ Has two implementations

◦ A relation back-end (part of the Spatial Pack)Not so Good (not good for large Graphs)

◦ Built on Oracle Big Data/NoSQL technology◦ Utilises Apache Jena

◦ Get Neil’s thumbs up Oracle has nearly two decades of experience working with spatial and graph database technologies. We have combined this with cutting edge research from Oracle Labs to deliver advanced analytics for the NoSQL and Hadoop platform. 

Oracle Big Data Spatial and Graph- Q&A with James Steiner, VP of product management

Melli Annamalai, PhD

Page 43: A Little SPARQL in your Analytics

Bench Mark – Load Times

20 60100

140180

2200

10

20

30

40

50

60

70

80

(Systap) Bigdata(Sesame) Postgres(Oracle) Spatial(Sesame) File

Time in Minutes

Page 44: A Little SPARQL in your Analytics

Bench Mark – Retrieval Times

20 60100

140180

2200

5

10

15

20

25

30

35

40

45

(Systap) Bigdata(Sesame) Postgres(Oracle) Spatial(Sesame) File

Trip

les R

etrie

ved

– 10

00s

Time in Minutes

Page 45: A Little SPARQL in your Analytics

My Stuff

Page 46: A Little SPARQL in your Analytics

Bongo’s Goals Focus on Tabular Data

◦ Develop an efficient RDF list structure◦ Creation and Extraction

◦ Integrates with other RDF Implementations◦ Property Path expression support◦ Nice Thin and Thick GUIs with an accompanying Command Line Interface

Subject Predicate Object

Triple

Key – only for literals values not resources

Page 47: A Little SPARQL in your Analytics

Retrieval PatternsSubject Object Predicate Description

○ Any Any Retrieve all the triplets for a given subject.

○ ○ Any Retrieve all the triplets for a given object.

Any ○ Any Retrieve all the triplets for a given predicate

Any Any ○ Retrieve all triplets for a given object

Any ○ ○ Retrieve all triplets for a specific object and predicate combination

○ Any ○ Return all the triplets for a given subject and object

Any Any Any Return all the triplets contained with in a graph

○ ○ ○ Determine if a specific triple pattern exists within a graph

Page 48: A Little SPARQL in your Analytics

Bongo ArchitectureSPARQL

Neils Stuff

ARQ

Cassandra

CassandraHBASEMapDB

BongoSPARQL Engine

Page 49: A Little SPARQL in your Analytics

Bongo – Thin Client

Page 50: A Little SPARQL in your Analytics

Bongo – Fat Client

Page 51: A Little SPARQL in your Analytics

Snail

Page 52: A Little SPARQL in your Analytics

Conclusion This talk covered: RDF RDF Structures The SPARQL language SPARQL Analytical Tools

Property Paths R Integration

Ontology and Ontological Support Triple Store Implementations My Stuff

Bongo Snail

Note: Read Foundations of Semantic Web Technologies Pacscal Hitzler, Markus Krotzsch and Sebastian Rudolph

Page 53: A Little SPARQL in your Analytics

Easter Egg…

Christmas Halloween

Page 54: A Little SPARQL in your Analytics

Answer

25dec 31oct