the impact of data caching of on query execution for linked data
Post on 29-Aug-2014
1.399 Views
Preview:
DESCRIPTION
TRANSCRIPT
The Impact ofData Caching on
Query Execution for Linked Data
Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research GroupHumboldt-Universität zu Berlin
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2
SELECT DISTINCT ?i ?labelWHERE {
?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ; foaf:topic_interest ?i .
OPTIONAL { ?i rdfs:label ?label FILTER( LANG(?label)="en" || LANG(?label)="") }}ORDER BY ?label
?
Can we query the Web of Dataas of it were a single,giant database?
Our approach: Link Traversal Based Query Execution [ISWC'09]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3
Main Idea
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4
Main Idea
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5
query-localdataset
Main Idea
http://bob.name
?
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6
query-localdataset
Main Idea
http://bob.name
?
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7
query-localdataset
Main Idea
http://bob.name
?
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8
query-localdataset
Main Idea
http://bob.name
?
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
“Descriptor object”
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
n
ame
project ?prj
Query
?prjName
know
s
http://bob.name
?acq
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
knows
http://bob.name
http://alice.name
n
ame
project ?prj
Query
?prjName
know
s
http://bob.name
?acq n
ame
project ?prj
Query
?prjName
know
s
http://bob.name
?acq
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://alice.name
?acq
n
ame
project ?prj
Query
?prjName
know
s
http://bob.name
?acq
knows
http://bob.name
http://alice.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http
://al
ice.
nam
e
?
http://alice.name
?acq
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http
://al
ice.
nam
e
?
http://alice.name
?acq
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http
://al
ice.
nam
e
?
http://alice.name
?acq
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
n
ame
project ?prj
Query
know
s
?acq
?prjName
http://bob.name
http://alice.name
?acq
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://alice.name
?acq
n
ame
?prjName
know
s
http://bob.nameQuery
project ?prj?acq
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
project http://.../AlicesPrj
http://alice.name
http://alice.name
?acq
n
ame
?prjName
know
s
http://bob.nameQuery
project ?prj?acq
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://alice.name http://.../AlicesPrj
?prj?acq
http://alice.name
?acq
n
ame
?prjName
know
s
http://bob.nameQuery
project ?prj?acq
project http://.../AlicesPrj
http://alice.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://alice.name
?acq
n
ame
?prjName
know
s
http://bob.nameQuery
project ?prj?acq
http://alice.name http://.../AlicesPrj
?prj?acq
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20
query-localdataset
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Main Idea
http://alice.name
?acq
n
ame
know
s
http://bob.nameQuery
project?acq
?prjName
?prj
http://alice.name http://.../AlicesPrj
?prj?acq
http://.../AlicesPrj “ … “
?prjName?prj
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21
query-localdataset
Main Idea
http://alice.name
?acq
n
ame
know
s
http://bob.nameQuery
project?acq
?prjName
?prj
http://alice.name http://.../AlicesPrj
?prj?acq
http://.../AlicesPrj “ … “
?prjName?prj
http://alice.name
?acqhttp://.../AlicesPrj “ … “
?prjName?prj
● Intertwine query evaluation with traversal of data links
● We alternate between:● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data● Look up URIs in intermediate
solutions and add retrieved datato the query-local dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22
Characteristics
● Link traversal based query execution:● Evaluation on a continuously augmented dataset● Discovery of potentially relevant data during execution● Discovery driven by intermediate solutions
● Main advantage:● No need to know all data sources in advance
● Limitations:● Query has to contain a URI as a starting point● Ignores data that is not reachable* by the query execution
* formal definition in [LDOW'11a]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23
query-localdataset
The Issue
lab
el
interest?i
Querykn
ows
?acq
?iLabel
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24
query-localdataset
The Issue
lab
el
interest?i
Querykn
ows
?acq
?iLabel
http://bob.name
http://bob.name?
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25
query-localdataset
The Issue
lab
el
interest?i
Querykn
ows
?acq
?iLabel
http://bob.name
knows
http://bob.name
http://alice.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26
query-localdataset
The Issue
lab
el
interest?i
Querykn
ows
?acq
?iLabel
http://bob.name
knows
http://bob.name
http://alice.name
?acq ?iLabel?i
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27
query-localdataset
query-localdataset
The Issue
lab
el
interest?i
Querykn
ows
?acq
?iLabel
http://bob.name
nam
e
know
s
http://bob.nameQuery
project?acq
?prjName
?prj
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28
query-localdataset
query-localdataset
Reusing the Query-Local Dataset
lab
el
interest?i
Querykn
ows
?acq
?iLabel
http://bob.name
nam
e
know
s
http://bob.nameQuery
project?acq
?prjName
?prj
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29
query-localdataset
Reusing the Query-Local Dataset
lab
el
interest?i
Querykn
ows
?acq
?iLabel
http://bob.name
nam
e
know
s
http://bob.nameQuery
project?acq
?prjName
?prj
knows
http://alice.name
http://bob.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30
query-localdataset
Reusing the Query-Local Dataset
lab
el
interest?i
Querykn
ows
?iLabel
?acq
http://bob.name
nam
e
know
s
http://bob.nameQuery
project?acq
?prjName
?prj
http://alice.name
?acq
knows
http://bob.name
http://alice.name
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31
Re-using the query-local dataset (a.k.a. data caching)may benefit
query performance + result completeness
Hypothesis
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32
Contributions
● Systematic analysis of the impact of data caching● Theoretical foundation* ● Conceptual analysis* ● Empirical evaluation of the potential impact
● Out of scope: Caching strategies (replacement, invalidation)
*see [LDOW'11a]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33
Experiment – Scenario
● Information about thedistributed social network of FOAF profiles
● 5 types of queries
● Experiment Setup:● 20 persons● Sequential use➔ 100 queries
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
query execution time (in seconds)
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Experiment – Single Query
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
number of query results
● no reuse experiment:● No data caching
● reuse per query experiment● Reuse of query-local dataset
for 3 executions of each query● Third execution measured
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35
query execution time (in seconds)
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Experiment – Single Query
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
number of query results
● no reuse experiment:● No data caching
● reuse per query experiment● Reuse of query-local dataset
for 3 executions of each query● Third execution measured
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Experiment – Single Query
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
query execution time (in seconds)number of query results
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Experiment – Single Query
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
query execution time (in seconds)number of query results
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Experiment – Single Query
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
query execution time (in seconds)number of query results
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39
Experiment – Complete Sequence
number of query results
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
reuse all queries
query execution time(in seconds)
0,01 0,1 1 10 100
0,01 0,1 1 10 100
● reuse all queries experiment:● Reuse of the query-local
dataset for the complete sequence of all 100 queries
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40
Experiment – Complete Sequence
number of query results
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
reuse all queries
query execution time(in seconds)
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41
Experiment – Complete Sequence
number of query results
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
reuse all queries
query execution time(in seconds)
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42
Experiment – Complete Sequence
number of query results
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
reuse all queries
query execution time(in seconds)
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43
Experiment – Complete Sequence
query execution time(in seconds)
number of query results
(Query No. 61)
(Query No. 62)
(Query No. 63)
(Query No. 64)
(Query No. 65)
ContactInfoDanBri
UnsetPropsDanBri
2ndDegree1DanBri
2ndDegree2DanBri
IncomingDanBri
0 10 20 30 40 50 60
0 10 20 30 40 50 60
no reuse reuse per query
reuse all queries
0,01 0,1 1 10 100
0,01 0,1 1 10 100
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44
Outlook
● Requirements of a data cache:● Replacement mechanism● Coherency mechanism
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45
Cache Replacement
● Cache full → remove descriptor objects
● Replacement strategy● Primary goal: maximize hit rate● Recency-based● Frequency-based● Function-based● Randomized
● Replacement process● Watermarks: high and low
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 46
Studying Cache Replacement?
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47
Studying Cache Replacement?
“Web cache replacement in its general form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48
Studying Cache Replacement?
● 6 quad indexes* in main memory● Size grows linear in the number of quads
● Example (after reuse all queries experiment, 100 queries):● 905 descriptor objects, overall number of 745,756 triples● ca. 103 MB
➔ Available main memory is almost no limit
“Web cache replacement in its general form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003
* as introduced in [LDOW'11b]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49
Cache Coherency
● Data items in the cache may become inconsistent
● Strong cache consistency● Server validation● Client validation
● Weak cache consistency● Time to live (TTL)● Adaptive TTL
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50
Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET● Request with If-Modified-Since header● Possible response: 304 Not Modified
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51
Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET● Request with If-Modified-Since header● Possible response: 304 Not Modified
● Not supported by most Linked Data servers● Experiment based on the CKAN catalog of linked datasets
● 41 out of 154 example resources (26.6%) from 110 datasets
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52
Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:● Expires● Cache-Control: max-age
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again● Conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53
Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:● Expires● Cache-Control: max-age
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again● Conditional GET
● Alternative (due to lack of support in Linked Data servers):● Assume a default TTL for each object● Ordinary GET
37.7%
37.0%
26.6%
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54
Adaptive TTL
● Assumption:● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:● Threshold = 10% ; age = 30 days → TTL = 3 days● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation:● Calculation of age: use Last-Modified response header● Verification with conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55
Adaptive TTL
● Assumption:● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:● Threshold = 10% ; age = 30 days → TTL = 3 days● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation:● Calculation of age: use Last-Modified response header● Verification with conditional GET
● Alternative (due to lack of support in Linked Data servers):● Assume Last-Modified is time of first retrieval● Verification by comparing a response to the current version
35.1%
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56
Summary
● Systematic analysis of the impact of data cache ● Theoretical foundation● Conceptual analysis● Empirical evaluation
● Main findings:● Additional results possible (for semantically similar queries)● Impact on performance may be positive but also negative
● Future work:● Analysis of caching strategies in our context● Main issue: invalidation
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 57
Backup Slides
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58
Contributions
● Theoretical foundation (extension of the original definition)
● Reachability by a Dseed
-initialized execution of a BGP query b
● Dseed
-dependent solution for a BGP query b
● Reachability R(B) for a serial execution of B = b1 , … , b
n
➔ Each solution for bcur
is also R(B)-dependent solution for bcur
● Conceptual analysis of the impact of data caching
● Performance factor: p( bcur
, B ) = c( bcur
, [ ] ) – c( bcur
, B )
● Serendipity factor: s( bcur
, B ) = b( bcur
, B ) – b( bcur
, [ ] )
● Empirical verification of the potential impact
● Out of scope: Caching strategies (replacement, invalidation)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 59
Query Template Contact
SELECT * WHERE { <PERSON> foaf:knows ?p .
OPTIONAL { ?p foaf:name ?name } OPTIONAL { ?p foaf:firstName ?firstName } OPTIONAL { ?p foaf:givenName ?givenName } OPTIONAL { ?p foaf:givenname ?givenname } OPTIONAL { ?p foaf:familyName ?familyName } OPTIONAL { ?p foaf:family_name ?family_name } OPTIONAL { ?p foaf:lastName ?lastName } OPTIONAL { ?p foaf:surname ?surname }
OPTIONAL { ?p foaf:birthday ?birthday }
OPTIONAL { ?p foaf:img ?img }
OPTIONAL { ?p foaf:phone ?phone } OPTIONAL { ?p foaf:aimChatID ?aimChatID } OPTIONAL { ?p foaf:icqChatID ?icqChatID } OPTIONAL { ?p foaf:jabberID ?jabberID } OPTIONAL { ?p foaf:msnChatID ?msnChatID } OPTIONAL { ?p foaf:skypeID ?skypeID } OPTIONAL { ?p foaf:yahooChatID ?yahooChatID }
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60
Query Template UnsetProps
SELECT DISTINCT ?result ?resultLabel WHERE{
?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> .?result rdfs:domain foaf:Person .
OPTIONAL { <PERSON> ?result ?var0 } FILTER ( !bound(?var0) )
<PERSON> foaf:knows ?var2 .?var2 ?result ?var3 .?result rdfs:label ?resultLabel .?result vs:term_status ?var1 .
}ORDER BY ?var1
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61
Query Template Incoming
SELECT DISTINCT ?result WHERE{
?result foaf:knows <PERSON> .
OPTIONAL{
?result foaf:knows ?var1 .FILTER ( <PERSON> = ?var1 )<PERSON> foaf:knows ?result .
}FILTER ( !bound(?var1) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62
Query Template 2ndDegree1
SELECT DISTINCT ?result WHERE{
<PERSON> foaf:knows ?p1 .<PERSON> foaf:knows ?p2 .FILTER ( ?p1 != ?p2 )
?p1 foaf:knows ?result .FILTER ( <PERSON> != ?result )?p2 foaf:knows ?result .
OPTIONAL {<PERSON> ?knows ?result .FILTER ( ?knows = foaf:knows )
}FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63
Query Template 2ndDegree2
SELECT DISTINCT ?result WHERE{
<PERSON> foaf:knows ?p1 .<PERSON> foaf:knows ?p2 .FILTER ( ?p1 != ?p2 )
?result foaf:knows ?p1 .FILTER ( <PERSON> != ?result )?result foaf:knows ?p2 .
OPTIONAL {<PERSON> ?knows ?result .FILTER ( ?knows = foaf:knows )
}FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64
Experiment – Single Query
● In the ideal case for Bupper
= [ bcur
, bcur
] :
● pupper
( bcur
, Bupper
) = c( bcur
, [ ] ) – c( bcur
, Bupper
) = c( bcur
, [ ] )
● supper
( bcur
, Bupper
) = b( bcur
, Bupper
) – b( bcur
, [ ] ) = 0
1 Averaged over all 100 queries
Experiment Avg.1 number of Query Results
(std.dev.)
Average1
Hit Rate
(std.dev.)
Avg.1 query Execution Time
(std.dev.)
no reuse27.37
(140.49)
0.849(0.205)
64.95 s(124.50)
reuse per query26.71
(148.77)
1(0)
0.02 s(0.07)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65
Experiment – Single Query
● Summary (measurement errors aside):● Same number of query results● Significant improvements in query performance
1 Averaged over all 100 queries
Experiment Avg.1 number of Query Results
(std.dev.)
Average1
Hit Rate
(std.dev.)
Avg.1 query Execution Time
(std.dev.)
no reuse27.37
(140.49)
0.849(0.205)
64.95 s(124.50)
reuse per query26.71
(148.77)
1(0)
0.02 s(0.07)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66
Experiment – Complete Sequence
● Summary:● Data cache may provide for additional query results● Impact on performance may be positive but also negative
1 Averaged over all 100 queries
Experiment Avg.1 number of Query Results
(std.dev.)
Average1
Hit Rate
(std.dev.)
Avg.1 query Execution Time
(std.dev.)
no reuse27.37
(140.49)
0.849(0.205)
64.95 s(124.50)
reuse per query26.71
(148.77)
1(0)
0.02 s(0.07)
reuse all queries44.87
(178.36)
0.991(0.053)
37.91 s(112.94)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67
Experiment – Complete Sequence
● Executing the query sequence in a random order results in measurements similar to the given order.
Experiment Avg.1 number of Query Results
(std.dev.)
Average1
Hit Rate
(std.dev.)
Avg.1 query Execution Time
(std.dev.)
no reuse27.37
(140.49)
0.849(0.205)
64.95 s(124.50)
reuse per query26.71
(148.77)
1(0)
0.02 s(0.07)
reuse all queries44.87
(178.36)
0.991(0.053)
37.91 s(112.94)
reuse all queries(random orders)
118.18(867.07)
0.992(0.016)
20.61 s(216.61)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68
These slides have been created byOlaf Hartig
http://olafhartig.de
This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
top related