federated query processing universidad simón bolívar...
TRANSCRIPT
Dagstuhl Seminar “Federated Semantic Data Management”, 26-30 June 2017
Federated Query Processing over RDF Graphs
Maria-Esther VidalFraunhofer IAIS, Germany
Universidad Simón Bolívar, Venezuela
1
Motivating Example
Vessels Vessel Observations
Ports
Vessels that have arrived at port of Funchal on 2017-06-21
2
Motivating ExampleDistributed Database
Vessels
Vessel Observations
Ports
mmsi country type
v1 ES tanker
v2 PT tanker
v3 US shipmmsi lat long date
v1 -26.3 56.9 2017-06-21
v1 30.8 70.8 2017-06-23
v2 -26.3 56.9 2017-06-23
v3 -26.3 56.9 2017-06-21
port lat long name
p1 -26.3 56.9 Funchal
p2 30.8 70.8 Lisbon
p3 40.9 89.8 Vigo
Interrelated databases distributed over a computer networkComponents are designed and tuned to work together.
3
Dimensions of Distributed Database Systems
Vessels
Vessel Observations
Ports
Greece
Canada World-Port Index USA
Distribution: Physical distribution of the data over multiple sites.
4
Dimensions of Distributed Database Systems
Vessels
Vessel Observations
Ports
Maritime Traffic
Evergreen Marine World-Port Index
Autonomy: Distribution of the control; degree to which individual systems can operate independently.
5
Dimensions of Distributed Database Systems
Vessels
Vessel Observations
Ports
Maritime Traffic
Evergreen Marine World-Port Index
CSV
JSON
RDF Heterogeneity: Different forms ranging from hardware, differences in network protocols, data models, query languages.
6
Dimensions of Distributed Database Systems [1]Distribution
AutonomyHeterogeneity
7[1] Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems (Third Edition). Springer, 2011.
Agenda1) Distributed Data Management Systems
a) Client-Server Systemsb) Peer-to-Peer Systemsc) Federated Data Management Systems
2) Challenges in Federated Data Management3) Distributed SPARQL Systems 4) Conclusions
8
Distributed Data Management Systems
9
Client-Server SystemsDistribution
AutonomyHeterogeneity
Client-Server Systems: ● Clients run user
applications and interfaces.
● Servers run data management tasks, e.g., query processing and storage.
10
Client-Server Systems
Server Engine
Vessels
Vessel Observations
Ports
Client
Queries Query Answers
11
Peer-to-Peer SystemsDistribution
AutonomyHeterogeneity
Peer-to-Peer Systems: ● Massive distribution● May used different data
models.● Each system manages a
different dataset. ● Peers can
communicate.
12
Peer-to-Peer Systems
Vessels
Vessel Observations
Ports
Maritime Traffic
Evergreen Marine World-Port Index
Peer Peer
Peer
Peers communicate.
13
Federated Data Management (FDM) SystemsDistribution
AutonomyHeterogeneity
FDM Systems: ● Fully autonomous and
have no concept of cooperation.
● May use different data models.
● Each system manages a different database.
14
Federated Data Management (FDM) Systems
Vessels
Vessel Observations
Ports
Maritime Traffic
Evergreen Marine World-Port Index
Federated Engine
CSV RDF
JSON
15
Challenges in Federated Data Management
16
Challenges for Query Processing in FDM
17
Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)
Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
18
Example
SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
19
SELECT ?mmsi ?name ?flagWHERE { SERVICE C1 { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag.} SERVICE C2 { ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. } SERVICE C3 { ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }}
Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
20
Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
21
Challenges for Query Processing in FDM
22
Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)
Challenges for Query Processing in FDMVessels that have arrived at port of Funchal on 2017-06-21 SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
23
Challenges for Query Processing in FDM
24
Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)
Challenges for Query Processing in FDM
?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi.?vessel bdo:flag ?flag.
?observ bdo:lat ?lat.?observ bdo:long ?long.?observ bdo:date "2017-06-21" .?observ bdo:mmsi ?mmsi.
?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”
25
Challenges for Query Processing in FDM
26
Given a query Q in a formal language, i.e., SPARQL● Identify the relevant data sources for Q (Source Selection)● Decompose Q into subqueries on relevant data sources (Query Decomposition)● Plan evaluation of subqueries against relevant data sources (Query Planning)● Merge data collected from relevant data sources (Query Execution)
Challenges for Query Processing in FDM?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi.?vessel bdo:flag ?flag.
?observ bdo:lat ?lat.?observ bdo:long ?long.?observ bdo:date "2017-06-21" .?observ bdo:mmsi ?mmsi.
?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”
CSV
RDF
JSON
MongoDB
MongoDB
Virtuoso
27
SELECT name, mmsi, flagFROM VESSELS
SELECT lat, long, mmsiFROM OBSERVATIONSWHERE date=”2017-06-21”
SELECT ?lat, ?long, ?mmsiWHERE {?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”}
Challenges for Query Processing in FDM
?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi.?vessel bdo:flag ?flag.
?observ bdo:lat ?lat.?observ bdo:long ?long.?observ bdo:date "2017-06-21" .?observ bdo:mmsi ?mmsi.
?port bdo:lat ?lat.?port bdo:long ?long.?port bdo:name “Funchal”
28
MappingintoSPARQL answers
MappingintoSPARQL answers
Operators consume tuples from children and pass output tuples to their parents.
Complexity of FDM Query Processing Tasks
NP-Hard Finding Optimal Plan for Distributed Data Sources [1]
Evaluation of SPARQL Queries is NP-Complete [2]
NP-Hard Finding Optimal Query Decomposition over Distributed RDF Data Sources [3]
29
[1] Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems (Third Edition). Springer, 2011.[2] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3): 16:1-16:45 (2009)[3] Maria-Esther Vidal, Simón Castillo, Maribel Acosta, Gabriela Montoya, Guillermo Palma: On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries. Trans. Large-Scale Data- and Knowledge-Centered Systems 25: 109-149 (2016)
Traditional Federated Query Processing
30
Source Selection and Query Decomposition
Query Optimizer
Query Execution Engine
Statistics Generator
CatalogTraditional Optimize-Then-ExecuteArchitecture
Query
Ideal Federated Management Systems
● Systems able to change their behavior by learning behavior of data providers.
● Receive information from the environment.● Use up-to-date information to change their behavior.● Keep iterating over time to adapt their behavior based
on the environment conditions.
31
Adaptive Query Processing [4]
Adaptivity is needed:▪ Misestimated or missing statistics.▪ Unexpected correlations.▪ Unpredictable costs.▪ Dynamically changing data, workload, and source availability.▪ Changes at rates at which tuples arrive from sources
• Initial Delays.• Slow Delivery.• Bursty Arrivals.
32[4 ] Amol Deshpande, Zachary G. Ives, Vijayshankar Raman: Adaptive Query Processing. Foundations and Trends in Databases, (2007)
Adaptive Federated Query Processing
33
Source Selection and Query Decomposition
Query Optimizer
Query Execution Engine
Statistics Generator
Catalog
Query
Re-optimize
Re-optimize the original plan on-the-fly according to the source conditions.
Levels of Adaptation
34
AdaptationAdaptation Levels
Source Selection
Query Execution
Adaptation Levels
35
Source Selection: searching strategies to select the sources for answering a query according to the real-time source conditions:
● Schema changes● Source availability ● Data distribution changes
Adaptation Levels
36
Query Execution: strategies to adapt query processing to the environment conditions. Adaptation can be at different granularities:
● Fine-grained adaptation of small processes, e.g., ○ Per tuple
● Coarse-grained is attempted for large processes, e.g.,○ Several queries are executed under certain
assumptions about the conditions of the environment
Adaptive Query Processing Strategies
Adaptive Join Processing (Intra-Operator):● Adaptivity performed at tuple level. ● Query operators are able to adapt to the environment
changes, even in the context of a fixed query plan.Non-Pipelined Query Execution (Inter-Operator):● Sub-queries are re-scheduled based on uncertainties:
○ Execution cost of remote access, ○ Size of intermediate results, and ○ Unexpected delays.
37
Adaptive Query Processing Strategies
Adaptive Join Processing (Intra-Operator):● Adaptivity performed at tuple level. ● Query operators are able to adapt to the environment
changes, even in the context of a fixed query plan.Non-Pipelined Query Execution (Inter-Operator):● Sub-queries are re-scheduled based on uncertainties:
○ Execution cost of remote access, ○ Size of intermediate results, and ○ Unexpected delays.
38
Distributed SPARQL Systems
39
SPARQL Web-based Interfaces
40
SPARQL Endpoints: RESTful services that accept queries over HTTP written in the SPARQL query language and respecting the SPARQL protocol.
SELECT ?long WHERE { ?port <http://bigdataocean.eu/name> <http://bigdataocean.eu/Funchal> ?port <http://bigdataocean.eu/long> ?long.}
http://bigdataocean.eu/sparql?query=SELECT%20%3Flong%20WHERE%20%7B%20%3Fport%20%3Chttp%3A%2F%2Fbigdataocean.eu%2Fproperty%2Fname%3E%20%3Chttp%3A%2F%2Fbigdataocean.eu%2Fresource%2FFunchal%3E.%20%3Fport%20%3Chttp%3A%2F%2Fbigdataocean.eu%2Fproperty%2Flong%3E%20%3Flong%20%7D
SPARQL 1.1 Federated Extension
41
SERVICE <http://bigdataocean.de/sparql> {SELECT ?long WHERE { ?port <http://bigdataocean.eu/name> <http://bigdataocean.eu/Funchal> ?port <http://bigdataocean.eu/long> ?long.}}
Q1
Q0
The result of evaluating Q0 corresponds to the result of evaluating Q1 in the RDF dataset accessible through the endpoint <http://bigdataocean.de/sparql>
Carlos Buil Aranda, Marcelo Arenas, Óscar Corcho, Axel Polleres: Federating queries in SPARQL 1.1: Syntax, semantics and evaluation. J. Web Sem. 18(1): 1-17 (2013)
SPARQL 1.1 Federation Extension
42Carlos Buil Aranda, Marcelo Arenas, Óscar Corcho, Axel Polleres: Federating queries in SPARQL 1.1: Syntax, semantics and evaluation. J. Web Sem. 18(1): 1-17 (2013)
SPARQL Web-based Interfaces
43
Triple Pattern Fragment:
Web-based access interface that allows for evaluating a triple pattern over an RDF dataset.
“Accessing a TPF interface is equivalent to accessing a SPARQL endpoint with queries that contain a single triple pattern only” [5]
[5] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem.( 2016)
Client-Server SPARQL Query Engines [5]
44
Server of Triple Pattern Fragments (TPFs)
Clients of Triple Pattern Fragments (TPFs)
TPF: Web-based interface to execute SPARQL triple patterns ?vessel bdo:mmsi ?mmsi
TPF Clients: Execute SPARQL queries over a TPF Server
SPARQL Query Q
[5] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem.( 2016)
Adaptive Client for Triple Pattern Fragments
45
SPARQL Query Q
Query Optimizer
Adaptive Engine
Routing Policies
Input
Optimized plan
Results for QOutput
TPF Server
nLDE [6]: ● Optimization techniques
tailored for TPFs● Autonomous Eddies● Routing policies for
SPARQL
[6] Maribel Acosta, Maria-Esther Vidal: Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data. International Semantic Web Conference (2015)
Peer-to-Peer SPARQL Query Engines
46
SPARQL Query Q
Avalanche SPARQL Endpoint
Endpoint Directoryor Search Engine
Request Statistics
Distributed join
Avalanche [7]: ● Endpoints may
dereference data from other endpoints.
● On-line statistical information about relevant data sources.
● Queries are executed in a concurrent and distributed fashion.
[7] Cosmin Basca, Abraham Bernstein: Querying a messy web of data with Avalanche. J. Web Sem. (2014)
Federated SPARQL Query Engines
47
SPARQL endpoints: Web-access interfaces that allow for querying RDF data following SPARQL protocol.
Federated SPARQL Query Engines
48
Distribution
AutonomyHeterogeneity
Federated SPARQL Query Engines
49
Federated SPARQL Query Engine
ANAPSID[8]
SPLENDID [9]
Distribution
AutonomyHeterogeneity
[10]
[16]
Federated SPARQL Query Engines
50
Federated SPARQL Query Engine
Distribution
AutonomyHeterogeneity
LILAC[11] FEDRA[12]Fed-DESATUR[3]
MULDER[14]
Extensions
DAW[13]HIBISCUS[15]
ANAPSID[8]
SPLENDID [9]
[10]
[16]
Adaptivity During Source Selection
51
Fine-Grained Adaptivity
ANAPSID SPLENDID
Coarse-Grained Adaptivity
No Adaptivity
Fed-DESATURMULDERDAW
HIBISCUS
LILAC FEDRA
Adaptivity During Source Selection
52
Fine-Grained Adaptivity
Coarse-Grained Adaptivity
No Adaptivity
MULDER
Adaptivity During Source Selection-FedX
SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
53
For triple pattern, (s p o): A query is sent to the SPARQL endpoints for identifying relevant sources
Adaptivity During Source Selection-FedX
SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
54
Exclusive groups:Subqueries are executed by only one source
sq1sq2
sq3
sq4
sq5
Adaptivity During Source Selection-FedX
55
sq1
sq2
sq3
sq4 sq5
Exclusive groups are executed first in the plan
Adaptivity During Source Selection-Mulder
56
Sources are described in terms of Classes and properties.
Adaptivity During Source Selection-Mulder
57
Sources are described in terms of Classes and properties.
Source descriptions are exploited during query decomposition.
SELECT ?mmsi ?name ?flagWHERE { ?vessel bdo:name ?name. ?vessel bdo:mmsi ?mmsi. ?vessel bdo:flag ?flag. ?observ bdo:lat ?lat. ?observ bdo:long ?long. ?observ bdo:date "2017-06-21" . ?observ bdo:mmsi ?mmsi. ?port bdo:lat ?lat. ?port bdo:long ?long. ?port bdo:name “Funchal” }
?port bdo:lat ?lat .?port bdo:long ?long . ?port bdo:name "Funchal"
?vessel bdo:name ?name .?vessel bdo:mmsi ?mmsi .?vessel bdo:flag ?flag .
Query Planning-Mulder
?observation bdo:lat ?lat .?observation bdo:long ?long .?observation bdo:date "2017-06-21" .?observation bdo:mmsi ?vessel .
Empirical Evaluation
59
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Empirical Evaluation
60
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Empirical Evaluation
61
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Adaptivity is a costly task
Adaptivity During Query Execution
62
Fine-Grained Adaptivity
ANAPSIDSPLENDID
No Adaptivity
Adaptivity During Query Execution
63
Fine-Grained Adaptivity
ANAPSIDSPLENDID
No Adaptivity
Fed-DESATURMULDER
DAW HIBISCUS
LILAC FEDRA
Intra-Operator Adaptivity
Intra-Operator Strategies:
● SPARQL operators able to detect when sources become blocked or data traffic is bursty
● Opportunistically produce results as quickly as data arrives from the sources
● Results are produced incrementally
64
JOIN
Inter-Operator AdaptivityInter-Operator Strategies:
● Produce an answer as soon as it is computed
● Can keep producing intermediate results even when data a source becomes blocked
65
JOIN JOIN
JOIN
Inter-Operator AdaptivityInter-Operator Strategies:
● Produce an answer as soon as it is computed
● Can keep producing intermediate results even when data a source becomes blocked
66
JOIN
JOIN
JOIN
Empirical Evaluation
67
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Empirical Evaluation
68
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Empirical Evaluation
69
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Empirical Evaluation
70
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Empirical Evaluation
71
FedBench Benchmark:● Ten RDF datasets (314K to 44M)
25 Queries:● Cross Domain (CD)● Linked Data (LD)● Life Science (LS)
Ten Complex (C) queries
Completeness versus Execution Time
Adaptivity is a costly task
Adaptive Query Processing
72
No Inter-Operator Strategy
Inter-Operator Strategy
Delays simulated with a Gamma distribution ( =1, =0.3)
Twenty-five queries against DBpedia; basic graph patterns with up to 15 triple patterns. Adaptive query processing strategies are able to speed up query execution in presence of delays
[6] M. Acosta, M.E. Vidal: Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data. ISWC 2015
Data Replication-Aware
73
Replication-Aware
ANAPSID
SPLENDID
No Replication
Fed-DESATURMULDER HIBISCUS
LILAC
FEDRA
DAW
Experimental Study
74
Each dataset is accessible through 10 endpoints. Each endpoint has data for 100 random queries.The same fragment is replicated in at least 3 endpointsWatDiv datasets: 50,000 queriesRest of the datasets: 10,000 queries
Diseasome: 72,445 SWDF: 198,797DBpedia Geo: 1,900,004LinkedMDB: 3,579,610WatDiv1: 104,532WatDiv100: 10,934,518
[11] Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, Maria-Esther Vidal: Decomposing federated queries in presence of replicated fragments. J. Web Sem. (2017)
Experimental Study
75
Each dataset is accessible through 10 endpoints. Each endpoint has data for 100 random queries.The same fragment is replicated in at least 3 endpointsWatDiv datasets: 50,000 queriesRest of the datasets: 10,000 queries
Diseasome: 72,445 SWDF: 198,797DBpedia Geo: 1,900,004LinkedMDB: 3,579,610WatDiv1: 104,532WatDiv100: 10,934,518
Experimental Study
76
Consider replication allows for producing complete answers while the execution time is reduced
Open Issues
77
Distribution
AutonomyHeterogeneity
Distribution
AutonomyHeterogeneity
Diverse types of heterogeneity:Data models, query capabilities,availability.
Open Issues
● Formal Definition
● Access Control● Security ● Privacy
78
● Data Evolution● Synchronization ● Big Data
Conclusions
79
References[1] Tamer Ozsu and Patrick Valduriez. Principles of Distributed Database Systems (Third Edition). Springer (2011)[2] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3): 16:1-16:45 (2009)[3] Maria-Esther Vidal, Simón Castillo, Maribel Acosta, Gabriela Montoya, Guillermo Palma: On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries. Trans. Large-Scale Data- and Knowledge-Centered Systems 25: 109-149 (2016)[4 ] Amol Deshpande, Zachary G. Ives, Vijayshankar Raman: Adaptive Query Processing. Foundations and Trends in Databases, (2007)[5] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem.( 2016)[6] Maribel Acosta, Maria-Esther Vidal: Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data. International Semantic Web Conference (2015)
80
References[7] Cosmin Basca, Abraham Bernstein: Querying a messy web of data with Avalanche. J. Web Sem. (2014)[8] Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, Edna Ruckhaus: ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints. International Semantic Web Conference (2011)[9] Olaf Görlitz, Steffen Staab: SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions. COLD (2011)[10] Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, Michael Schmidt: FedX: Optimization Techniques for Federated Query Processing on Linked Data. International Semantic Web Conference (2011)[11] Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, Maria-Esther Vidal: Decomposing federated queries in presence of replicated fragments. J. Web Sem. (2017)[12] Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, Maria-Esther Vidal: Federated SPARQL Queries Processing with Replicated Fragments. International Semantic Web Conference (2015)
81
References[13] Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Josiane Xavier Parreira, Helena F. Deus, Manfred Hauswirth: DAW: Duplicate-AWare Federated Query Processing over the Web of Data. International Semantic Web Conference (2013)[14] Kemele M. Endris, Mikhail Galkin, Ioanna Lytra, Mohamed Nadjib Mami, Maria-Esther Vidal, Sören Auer: MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates. International Conference on Database and Expert Systems Applications (2017)[15] Muhammad Saleem, Axel-Cyrille Ngonga Ngomo: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation. Extended Semantic Web Conference (2014)[16] SemaGrow: Optimizing federated SPARQL queries Angelos Charalambidis, Antonis Troumpoukis and Stasinos Konstantopoulos In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015)
82