balloon fusion: sparql rewriting based on unified co-reference information
DESCRIPTION
Presentation for 5th International Workshop on Data Engineering meets the Semantic Web (DESWeb) In conjunction with ICDE 2014, Chicago IL, USA, March 31, 2014 held by Kai SchlegelTRANSCRIPT
![Page 1: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/1.jpg)
DESWeb 2014ICDE 2014, Chicago IL, USA, March 3
balloon FusionSPARQL Rewriting Based on
Unified Co-Reference Information
Kai Schlegel ([email protected])Florian Stegmaier, Sebastian Bayerl, Michael Granitzer, Harald Kosch
![Page 2: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/2.jpg)
2
Motivation
SPARQL Rewriting & Federation
Intermediate Results
Outline
supported by the European Commission under the Seventh Framework Program
![Page 3: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/3.jpg)
3
Linked Data isthe heart of Semantic Web
“- W3C Semantic Web Group
![Page 4: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/4.jpg)
4
Huge Potential!
![Page 5: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/5.jpg)
5
Developing withLinked Open Data
![Page 6: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/6.jpg)
6
• Easy access to Linked Data• Query Linked Open Data with SPARQL
• Plethora of tools available
• Problems: • Business oriented
• Complex setup
• Maintenance
• „Paper-only“
• Not developer friendly
• Simple and „instant“ SPARQL Query Federation (-as-a-Service)
Motivation
Nothing-as-a-Service
![Page 7: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/7.jpg)
7
• How to get information about the German City „Passau“?
• Problem: LOD is not a single database!
Querying LOD
SPARQL
SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
de.dbpedia.org
Relations, Coordinates, Leader, etc.
What about the population?
SPARQL
![Page 8: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/8.jpg)
8
• Problem: Selection of appropriate endpoints
• Send query to some endpoints and aggregate the results?
Distributed Querying!
SPARQL
SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
de.dbpedia.org
SPARQL
linkedgeodata.org
WHAT ?
![Page 9: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/9.jpg)
9
• Problem: Different identifier for the same semantic concept
Misunderstanding: Co-Referencing
SPARQL
SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
de.dbpedia.org
SPARQL
linkedgeodata.org
WHAT ?
Known problem in linguistic:
It’s a spud! “What?“
I mean potato! “
Co-Referencing: Multiple expressions refer to the same thing.
![Page 10: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/10.jpg)
10
Problem = Solution?
SPARQL-based crawling of co-reference information
Exploit co-reference information for• accomplishing immediate SPARQL rewriting
• performing endpoint selection
• execute automatic query federation
Basic idea: Focusing distributed co-reference information
Main principle: Semantic entites over identifier!
![Page 11: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/11.jpg)
11
Components
balloon toolsuite
![Page 12: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/12.jpg)
12
balloon Overflight• SPARQL based crawling of LOD endpoints
• Query: Ask for subjects and objects which are related with special predicate
• Simplified global view on• Equivalence: owl:SameAs, skos:exactMatch,
coref:coreferenceData, ...
• Graph-Database Neo4j• Equivalence Cluster:
Multiple synonym URIs representing the same semantic entity including Provenance
![Page 13: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/13.jpg)
13
balloon Fusion
SPARQL Federation setup using co-reference information
SPARQL Transformation for each BGP1. Determine synonym URIs
2. Select suitable endpoints
3. Adapt sub-queries to endpoints
4. Federated querying
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
![Page 14: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/14.jpg)
141. Determine synonym URIs
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
![Page 15: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/15.jpg)
15
2. Select suitable endpoints
• Provenance based selection (PBS)• Endpoints which are involved in cluster composition
• Namespace based selection (NBS)• Prefix and Namespace matching of synonym URLs
Summarized: origin of co-reference information and origin of synonym URIs
![Page 16: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/16.jpg)
162. Select suitable endpoints (2)
Assumption: • Provenance information only contains „linkedgeodata.org“
as co-reference origin• Namespaces for freebase and dbpedia available (datahub.io)
PBS:Linked-Geo-Data
Endpoint
NBS:DBPedia endpoint
NBS:Freebaseendpoint
![Page 17: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/17.jpg)
17
3. Adapt sub-queries to endpoints
PBS:Linked-Geo-Data
Endpoint
NBS:DBPedia endpoint
NBS:Freebaseendpoint
SELECT ?p ?o WHERE {<http://rdf.freebase.com/
ns/m.01h5td> ?p ?o.}
SPARQL
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
SELECT ?p ?o WHERE { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. }}
SPARQL
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
![Page 18: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/18.jpg)
18
• W3C SPARQL 1.1 Federated Query Extension (SERVICE)• (Partial) Query can be executed against a remote SPARQL
endpoint
• Distributed sub-queries don‘t contain SPARQL 1.1 features
4. Federated Querying
SPARQL
SELECT ?p ?o WHERE { SERVICE <http://dbpedia.org/sparql> { <http://de.dbpedia.org/resource/Passau> ?p ?o. } UNION { SERVICE <http://www.freebase.com/base/sparql> { <http://rdf.freebase.com/ns/m.01h5td> ?p ? } } UNION { SERVICE <http://linkedgeodata.org/sparql/> { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. }}}}
![Page 19: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/19.jpg)
19
• Endpoint status check• Check routine in terms of availability and latency
• Minimize sub-queries• Group sub-queries with common endpoint
• Push join to endpoint
• SPARQL Features• Condense PBS UNION-construct of synonym URIs
• SPARQL 1.1 VALUES or FILTER with IN operator
• Not well implemented in Linked Data endpoints
Optimizations (ongoing)
![Page 20: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/20.jpg)
20
balloon Overflight Results
![Page 21: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/21.jpg)
21Results from a sounding balloon
![Page 22: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/22.jpg)
22balloon toolsuite
![Page 23: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/23.jpg)
23
Statistics• Datahub.io: Linked Open Data Cloud catalog• 337 datasets in total
• 237 expose a SPARQL endpoint
• 112 successfully queried for co-reference information
• Balloon Dataset (first run)
• 17.6M co-reference statements
• 22.4M distinct URLs
• 8.4M equivalence cluster (~ 2.68 identifier per cluster)
• Pending Analysis• Distribution of cluster sizes, Number of different Hosts per
cluster
• Main representative per cluster & False-Friends
![Page 24: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/24.jpg)
24
Open Source:
• Demo, information and sources available (MIT License)• X as a Service
• SPARQL Rewriting (HTTP API)
• Query Federation (SPARQL)
http://schlegel.github.io/balloon
![Page 25: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/25.jpg)
25
Summary:• SPARQL-based crawling of distributed co-reference
information
• Exploit co-reference information for SPARQL federation
Single Point of Access
![Page 26: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.vdocuments.mx/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/26.jpg)
26
Any questions?
“Research is formalized curiosity. It is poking and prying with a purpose. - Zora Neale Hurston