federated query processing universidad simón bolívar...

82
Dagstuhl Seminar “Federated Semantic Data Management”, 26-30 June 2017 Federated Query Processing over RDF Graphs Maria-Esther Vidal Fraunhofer IAIS, Germany Universidad Simón Bolívar, Venezuela 1

Upload: others

Post on 25-Aug-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Dagstuhl Seminar “Federated Semantic Data Management”, 26-30 June 2017

Federated Query Processing over RDF Graphs

Maria-Esther VidalFraunhofer IAIS, Germany

Universidad Simón Bolívar, Venezuela

1

Page 2: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Motivating Example

Vessels Vessel Observations

Ports

Vessels that have arrived at port of Funchal on 2017-06-21

2

Page 3: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Motivating ExampleDistributed Database

Vessels

Vessel Observations

Ports

mmsi country type

v1 ES tanker

v2 PT tanker

v3 US shipmmsi lat long date

v1 -26.3 56.9 2017-06-21

v1 30.8 70.8 2017-06-23

v2 -26.3 56.9 2017-06-23

v3 -26.3 56.9 2017-06-21

port lat long name

p1 -26.3 56.9 Funchal

p2 30.8 70.8 Lisbon

p3 40.9 89.8 Vigo

Interrelated databases distributed over a computer networkComponents are designed and tuned to work together.

3

Page 4: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Dimensions of Distributed Database Systems

Vessels

Vessel Observations

Ports

Greece

Canada World-Port Index USA

Distribution: Physical distribution of the data over multiple sites.

4

Page 5: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Dimensions of Distributed Database Systems

Vessels

Vessel Observations

Ports

Maritime Traffic

Evergreen Marine World-Port Index

Autonomy: Distribution of the control; degree to which individual systems can operate independently.

5

Page 6: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Dimensions of Distributed Database Systems

Vessels

Vessel Observations

Ports

Maritime Traffic

Evergreen Marine World-Port Index

CSV

JSON

RDF Heterogeneity: Different forms ranging from hardware, differences in network protocols, data models, query languages.

6

Page 7: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Dimensions of Distributed Database Systems [1]Distribution

AutonomyHeterogeneity

7[1] Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems (Third Edition). Springer, 2011.

Page 8: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Agenda1) Distributed Data Management Systems

a) Client-Server Systemsb) Peer-to-Peer Systemsc) Federated Data Management Systems

2) Challenges in Federated Data Management3) Distributed SPARQL Systems 4) Conclusions

8

Page 9: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Distributed Data Management Systems

9

Page 10: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Client-Server SystemsDistribution

AutonomyHeterogeneity

Client-Server Systems: ● Clients run user

applications and interfaces.

● Servers run data management tasks, e.g., query processing and storage.

10

Page 11: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Client-Server Systems

Server Engine

Vessels

Vessel Observations

Ports

Client

Queries Query Answers

11

Page 12: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Peer-to-Peer SystemsDistribution

AutonomyHeterogeneity

Peer-to-Peer Systems: ● Massive distribution● May used different data

models.● Each system manages a

different dataset. ● Peers can

communicate.

12

Page 13: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Peer-to-Peer Systems

Vessels

Vessel Observations

Ports

Maritime Traffic

Evergreen Marine World-Port Index

Peer Peer

Peer

Peers communicate.

13

Page 14: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Federated Data Management (FDM) SystemsDistribution

AutonomyHeterogeneity

FDM Systems: ● Fully autonomous and

have no concept of cooperation.

● May use different data models.

● Each system manages a different database.

14

Page 15: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Federated Data Management (FDM) Systems

Vessels

Vessel Observations

Ports

Maritime Traffic

Evergreen Marine World-Port Index

Federated Engine

CSV RDF

JSON

15

Page 16: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges in Federated Data Management

16

Page 17: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDM

17

Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)

Page 18: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

18

Page 19: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Example

SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

19

SELECT ?mmsi ?name ?flagWHERE { SERVICE C1 { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag.} SERVICE C2 { ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. } SERVICE C3 { ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }}

Page 20: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

20

Page 21: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

21

Page 22: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDM

22

Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)

Page 23: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

23

Page 24: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDM

24

Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)

Page 25: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDM

?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi.?vessel bdo:flag ?flag.

?observ bdo:lat ?lat.?observ bdo:long ?long.?observ bdo:date "2017-06-21" .?observ bdo:mmsi ?mmsi.

?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”

25

Page 26: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDM

26

Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)

Page 27: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDM?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi.?vessel bdo:flag ?flag.

?observ bdo:lat ?lat.?observ bdo:long ?long.?observ bdo:date "2017-06-21" .?observ bdo:mmsi ?mmsi.

?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”

CSV

RDF

JSON

MongoDB

MongoDB

Virtuoso

27

SELECT name, mmsi, flagFROM VESSELS

SELECT lat, long, mmsiFROM OBSERVATIONSWHERE date=”2017-06-21”

SELECT ?lat, ?long, ?mmsiWHERE {?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”}

Page 28: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Challenges for Query Processing in FDM

?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi.?vessel bdo:flag ?flag.

?observ bdo:lat ?lat.?observ bdo:long ?long.?observ bdo:date "2017-06-21" .?observ bdo:mmsi ?mmsi.

?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”

28

MappingintoSPARQL answers

MappingintoSPARQL answers

Operators consume tuples from children and pass output tuples to their parents.

Page 29: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Complexity of FDM Query Processing Tasks

NP-Hard Finding Optimal Plan for Distributed Data Sources [1]

Evaluation of SPARQL Queries is NP-Complete [2]

NP-Hard Finding Optimal Query Decomposition over Distributed RDF Data Sources [3]

29

[1] Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems (Third Edition). Springer, 2011.[2] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3): 16:1-16:45 (2009)[3] Maria-Esther Vidal, Simón Castillo, Maribel Acosta, Gabriela Montoya, Guillermo Palma: On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries. Trans. Large-Scale Data- and Knowledge-Centered Systems 25: 109-149 (2016)

Page 30: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Traditional Federated Query Processing

30

Source Selection and Query Decomposition

Query Optimizer

Query Execution Engine

Statistics Generator

CatalogTraditional Optimize-Then-ExecuteArchitecture

Query

Page 31: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Ideal Federated Management Systems

● Systems able to change their behavior by learning behavior of data providers.

● Receive information from the environment.● Use up-to-date information to change their behavior.● Keep iterating over time to adapt their behavior based

on the environment conditions.

31

Page 32: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptive Query Processing [4]

Adaptivity is needed:▪ Misestimated or missing statistics.▪ Unexpected correlations.▪ Unpredictable costs.▪ Dynamically changing data, workload, and source availability.▪ Changes at rates at which tuples arrive from sources

• Initial Delays.• Slow Delivery.• Bursty Arrivals.

32[4 ] Amol Deshpande, Zachary G. Ives, Vijayshankar Raman: Adaptive Query Processing. Foundations and Trends in Databases, (2007)

Page 33: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptive Federated Query Processing

33

Source Selection and Query Decomposition

Query Optimizer

Query Execution Engine

Statistics Generator

Catalog

Query

Re-optimize

Re-optimize the original plan on-the-fly according to the source conditions.

Page 34: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Levels of Adaptation

34

AdaptationAdaptation Levels

Source Selection

Query Execution

Page 35: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptation Levels

35

Source Selection: searching strategies to select the sources for answering a query according to the real-time source conditions:

● Schema changes● Source availability ● Data distribution changes

Page 36: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptation Levels

36

Query Execution: strategies to adapt query processing to the environment conditions. Adaptation can be at different granularities:

● Fine-grained adaptation of small processes, e.g., ○ Per tuple

● Coarse-grained is attempted for large processes, e.g.,○ Several queries are executed under certain

assumptions about the conditions of the environment

Page 37: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptive Query Processing Strategies

Adaptive Join Processing (Intra-Operator):● Adaptivity performed at tuple level. ● Query operators are able to adapt to the environment

changes, even in the context of a fixed query plan.Non-Pipelined Query Execution (Inter-Operator):● Sub-queries are re-scheduled based on uncertainties:

○ Execution cost of remote access, ○ Size of intermediate results, and ○ Unexpected delays.

37

Page 38: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptive Query Processing Strategies

Adaptive Join Processing (Intra-Operator):● Adaptivity performed at tuple level. ● Query operators are able to adapt to the environment

changes, even in the context of a fixed query plan.Non-Pipelined Query Execution (Inter-Operator):● Sub-queries are re-scheduled based on uncertainties:

○ Execution cost of remote access, ○ Size of intermediate results, and ○ Unexpected delays.

38

Page 39: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Distributed SPARQL Systems

39

Page 40: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

SPARQL Web-based Interfaces

40

SPARQL Endpoints: RESTful services that accept queries over HTTP written in the SPARQL query language and respecting the SPARQL protocol.

SELECT ?long WHERE { ?port <http://bigdataocean.eu/name> <http://bigdataocean.eu/Funchal> ?port <http://bigdataocean.eu/long> ?long.}

http://bigdataocean.eu/sparql?query=SELECT%20%3Flong%20WHERE%20%7B%20%3Fport%20%3Chttp%3A%2F%2Fbigdataocean.eu%2Fproperty%2Fname%3E%20%3Chttp%3A%2F%2Fbigdataocean.eu%2Fresource%2FFunchal%3E.%20%3Fport%20%3Chttp%3A%2F%2Fbigdataocean.eu%2Fproperty%2Flong%3E%20%3Flong%20%7D

Page 41: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

SPARQL 1.1 Federated Extension

41

SERVICE <http://bigdataocean.de/sparql> {SELECT ?long WHERE { ?port <http://bigdataocean.eu/name> <http://bigdataocean.eu/Funchal> ?port <http://bigdataocean.eu/long> ?long.}}

Q1

Q0

The result of evaluating Q0 corresponds to the result of evaluating Q1 in the RDF dataset accessible through the endpoint <http://bigdataocean.de/sparql>

Carlos Buil Aranda, Marcelo Arenas, Óscar Corcho, Axel Polleres: Federating queries in SPARQL 1.1: Syntax, semantics and evaluation. J. Web Sem. 18(1): 1-17 (2013)

Page 42: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

SPARQL 1.1 Federation Extension

42Carlos Buil Aranda, Marcelo Arenas, Óscar Corcho, Axel Polleres: Federating queries in SPARQL 1.1: Syntax, semantics and evaluation. J. Web Sem. 18(1): 1-17 (2013)

Page 43: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

SPARQL Web-based Interfaces

43

Triple Pattern Fragment:

Web-based access interface that allows for evaluating a triple pattern over an RDF dataset.

“Accessing a TPF interface is equivalent to accessing a SPARQL endpoint with queries that contain a single triple pattern only” [5]

[5] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem.( 2016)

Page 44: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Client-Server SPARQL Query Engines [5]

44

Server of Triple Pattern Fragments (TPFs)

Clients of Triple Pattern Fragments (TPFs)

TPF: Web-based interface to execute SPARQL triple patterns ?vessel bdo:mmsi ?mmsi

TPF Clients: Execute SPARQL queries over a TPF Server

SPARQL Query Q

[5] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem.( 2016)

Page 45: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptive Client for Triple Pattern Fragments

45

SPARQL Query Q

Query Optimizer

Adaptive Engine

Routing Policies

Input

Optimized plan

Results for QOutput

TPF Server

nLDE [6]: ● Optimization techniques

tailored for TPFs● Autonomous Eddies● Routing policies for

SPARQL

[6] Maribel Acosta, Maria-Esther Vidal: Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data. International Semantic Web Conference (2015)

Page 46: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Peer-to-Peer SPARQL Query Engines

46

SPARQL Query Q

Avalanche SPARQL Endpoint

Endpoint Directoryor Search Engine

Request Statistics

Distributed join

Avalanche [7]: ● Endpoints may

dereference data from other endpoints.

● On-line statistical information about relevant data sources.

● Queries are executed in a concurrent and distributed fashion.

[7] Cosmin Basca, Abraham Bernstein: Querying a messy web of data with Avalanche. J. Web Sem. (2014)

Page 47: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Federated SPARQL Query Engines

47

SPARQL endpoints: Web-access interfaces that allow for querying RDF data following SPARQL protocol.

Page 48: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Federated SPARQL Query Engines

48

Distribution

AutonomyHeterogeneity

Page 49: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Federated SPARQL Query Engines

49

Federated SPARQL Query Engine

ANAPSID[8]

SPLENDID [9]

Distribution

AutonomyHeterogeneity

[10]

[16]

Page 50: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Federated SPARQL Query Engines

50

Federated SPARQL Query Engine

Distribution

AutonomyHeterogeneity

LILAC[11] FEDRA[12]Fed-DESATUR[3]

MULDER[14]

Extensions

DAW[13]HIBISCUS[15]

ANAPSID[8]

SPLENDID [9]

[10]

[16]

Page 51: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Source Selection

51

Fine-Grained Adaptivity

ANAPSID SPLENDID

Coarse-Grained Adaptivity

No Adaptivity

Fed-DESATURMULDERDAW

HIBISCUS

LILAC FEDRA

Page 52: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Source Selection

52

Fine-Grained Adaptivity

Coarse-Grained Adaptivity

No Adaptivity

MULDER

Page 53: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Source Selection-FedX

SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

53

For triple pattern, (s p o): A query is sent to the SPARQL endpoints for identifying relevant sources

Page 54: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Source Selection-FedX

SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

54

Exclusive groups:Subqueries are executed by only one source

sq1sq2

sq3

sq4

sq5

Page 55: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Source Selection-FedX

55

sq1

sq2

sq3

sq4 sq5

Exclusive groups are executed first in the plan

Page 56: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Source Selection-Mulder

56

Sources are described in terms of Classes and properties.

Page 57: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Source Selection-Mulder

57

Sources are described in terms of Classes and properties.

Source descriptions are exploited during query decomposition.

SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }

Page 58: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

?port bdo:lat ?lat .?port bdo:long ?long . ?port bdo:name "Funchal"

?vessel bdo:name ?name .?vessel bdo:mmsi ?mmsi .?vessel bdo:flag ?flag .

Query Planning-Mulder

?observation bdo:lat ?lat .?observation bdo:long ?long .?observation bdo:date "2017-06-21" .?observation bdo:mmsi ?vessel .

Page 59: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

59

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Page 60: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

60

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Page 61: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

61

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Adaptivity is a costly task

Page 62: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Query Execution

62

Fine-Grained Adaptivity

ANAPSIDSPLENDID

No Adaptivity

Page 63: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptivity During Query Execution

63

Fine-Grained Adaptivity

ANAPSIDSPLENDID

No Adaptivity

Fed-DESATURMULDER

DAW HIBISCUS

LILAC FEDRA

Page 64: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Intra-Operator Adaptivity

Intra-Operator Strategies:

● SPARQL operators able to detect when sources become blocked or data traffic is bursty

● Opportunistically produce results as quickly as data arrives from the sources

● Results are produced incrementally

64

JOIN

Page 65: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Inter-Operator AdaptivityInter-Operator Strategies:

● Produce an answer as soon as it is computed

● Can keep producing intermediate results even when data a source becomes blocked

65

JOIN JOIN

JOIN

Page 66: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Inter-Operator AdaptivityInter-Operator Strategies:

● Produce an answer as soon as it is computed

● Can keep producing intermediate results even when data a source becomes blocked

66

JOIN

JOIN

JOIN

Page 67: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

67

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Page 68: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

68

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Page 69: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

69

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Page 70: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

70

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Page 71: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Empirical Evaluation

71

FedBench Benchmark:● Ten RDF datasets (314K to 44M)

25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)

Ten Complex (C) queries

Completeness versus Execution Time

Adaptivity is a costly task

Page 72: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Adaptive Query Processing

72

No Inter-Operator Strategy

Inter-Operator Strategy

Delays simulated with a Gamma distribution ( =1, =0.3)

Twenty-five queries against DBpedia; basic graph patterns with up to 15 triple patterns. Adaptive query processing strategies are able to speed up query execution in presence of delays

[6] M. Acosta, M.E. Vidal: Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data. ISWC 2015

Page 73: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Data Replication-Aware

73

Replication-Aware

ANAPSID

SPLENDID

No Replication

Fed-DESATURMULDER HIBISCUS

LILAC

FEDRA

DAW

Page 74: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Experimental Study

74

Each dataset is accessible through 10 endpoints. Each endpoint has data for 100 random queries.The same fragment is replicated in at least 3 endpointsWatDiv datasets: 50,000 queriesRest of the datasets: 10,000 queries

Diseasome: 72,445 SWDF: 198,797DBpedia Geo: 1,900,004LinkedMDB: 3,579,610WatDiv1: 104,532WatDiv100: 10,934,518

[11] Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, Maria-Esther Vidal: Decomposing federated queries in presence of replicated fragments. J. Web Sem. (2017)

Page 75: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Experimental Study

75

Each dataset is accessible through 10 endpoints. Each endpoint has data for 100 random queries.The same fragment is replicated in at least 3 endpointsWatDiv datasets: 50,000 queriesRest of the datasets: 10,000 queries

Diseasome: 72,445 SWDF: 198,797DBpedia Geo: 1,900,004LinkedMDB: 3,579,610WatDiv1: 104,532WatDiv100: 10,934,518

Page 76: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Experimental Study

76

Consider replication allows for producing complete answers while the execution time is reduced

Page 77: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Open Issues

77

Distribution

AutonomyHeterogeneity

Distribution

AutonomyHeterogeneity

Diverse types of heterogeneity:Data models, query capabilities,availability.

Page 78: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Open Issues

● Formal Definition

● Access Control● Security ● Privacy

78

● Data Evolution● Synchronization ● Big Data

Page 79: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

Conclusions

79

Page 80: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

References[1] Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems (Third Edition). Springer (2011)[2] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3): 16:1-16:45 (2009)[3] Maria-Esther Vidal, Simón Castillo, Maribel Acosta, Gabriela Montoya, Guillermo Palma: On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries. Trans. Large-Scale Data- and Knowledge-Centered Systems 25: 109-149 (2016)[4 ] Amol Deshpande, Zachary G. Ives, Vijayshankar Raman: Adaptive Query Processing. Foundations and Trends in Databases, (2007)[5] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem.( 2016)[6] Maribel Acosta, Maria-Esther Vidal: Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data. International Semantic Web Conference (2015)

80

Page 81: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

References[7] Cosmin Basca, Abraham Bernstein: Querying a messy web of data with Avalanche. J. Web Sem. (2014)[8] Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, Edna Ruckhaus: ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints. International Semantic Web Conference (2011)[9] Olaf Görlitz, Steffen Staab: SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions. COLD (2011)[10] Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, Michael Schmidt: FedX: Optimization Techniques for Federated Query Processing on Linked Data. International Semantic Web Conference (2011)[11] Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, Maria-Esther Vidal: Decomposing federated queries in presence of replicated fragments. J. Web Sem. (2017)[12] Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, Maria-Esther Vidal: Federated SPARQL Queries Processing with Replicated Fragments. International Semantic Web Conference (2015)

81

Page 82: Federated Query Processing Universidad Simón Bolívar ...materials.dagstuhl.de/files/17/17262/17262.Maria... · Complexity of FDM Query Processing Tasks NP-Hard Finding Optimal Plan

References[13] Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Josiane Xavier Parreira, Helena F. Deus, Manfred Hauswirth: DAW: Duplicate-AWare Federated Query Processing over the Web of Data. International Semantic Web Conference (2013)[14] Kemele M. Endris, Mikhail Galkin, Ioanna Lytra, Mohamed Nadjib Mami, Maria-Esther Vidal, Sören Auer: MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates. International Conference on Database and Expert Systems Applications (2017)[15] Muhammad Saleem, Axel-Cyrille Ngonga Ngomo: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation. Extended Semantic Web Conference (2014)[16] SemaGrow: Optimizing federated SPARQL queries Angelos Charalambidis, Antonis Troumpoukis and Stasinos Konstantopoulos In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015)

82