continuously updating query results over real-time linked data

21
Ruben Taelman - @rubensworks iMinds - Ghent University Continuously Updating Query Results over Real-Time Linked Data

Upload: ruben-taelman

Post on 10-Feb-2017

405 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Continuously Updating Query Results over Real-Time Linked Data

Ruben Taelman - @rubensworksiMinds - Ghent University

Continuously Updating Query Resultsover Real-Time Linked Data

Page 2: Continuously Updating Query Results over Real-Time Linked Data

Dynamic Linked DataE.g. Thermometer measures every minute:

“19,05°C” - 30-05-2016 11:00“19,06°C” - 30-05-2016 11:01“19,11°C” - 30-05-2016 11:02“19,08°C” - 30-05-2016 11:03…

Typically exposed as an RDF stream = stream of <RDF triple, timestamp>

Page 3: Continuously Updating Query Results over Real-Time Linked Data

Querying continous dataClients send queries to server: e.g. What is the current temperature?

Server continuously evaluates the queries

→ Server does all of the work

Cause of low public endpoint availability!½ have availability of < 95% (Buil-Aranda 2013)

→ Clients just wait for results

Page 4: Continuously Updating Query Results over Real-Time Linked Data

What if we moved continuous query evaluation to the client?→ to lower server load

Page 5: Continuously Updating Query Results over Real-Time Linked Data

Triple Pattern Fragments does this for static data!

Triple pattern fragments (TPF) (Verborgh 2016):

Servers can only respond to triple pattern queriesClients need to evaluate queries locally→ Lowers server complexity

Can we do the same for dynamic data?

Page 6: Continuously Updating Query Results over Real-Time Linked Data

OverviewDynamic data representation

Query streamer engine

Evaluation

Page 7: Continuously Updating Query Results over Real-Time Linked Data

OverviewDynamic data representation

Query streamer engine

Evaluation

Page 8: Continuously Updating Query Results over Real-Time Linked Data

Dynamic data representationExpose dynamic data through the TPF interface

→ Represent dynamic data in RDF

We annotate dynamic data with the time at which they are valid

→ Client can derive the time at which data can change!

But how do we annotate data/triples with time?

Page 9: Continuously Updating Query Results over Real-Time Linked Data

Annotation methodsReification

Singleton properties (Nguyen 2014)

Graphs

Implicit graphs

Outdated

Instantiate predicates

Define fourth element in quad

TPF makes triples (de)referencable

Page 10: Continuously Updating Query Results over Real-Time Linked Data

Time labeling typesTime interval

Expiration time

Start- and endtime of validity

Good for maintaining a history of elements

Endtime of validity

When only the latest version is required

Page 11: Continuously Updating Query Results over Real-Time Linked Data

Dynamic data example

radio:bbc-radio-1 m:plays radio:jauz-netsky-higher.

GRAPH _:g1 {radio:bbc-radio-1 m:plays radio:jauz-netsky-higher.

}_:g1 tmp:interval _:interval_1._:interval_1 tmp:initial "2016-05-30T09:15:00"^^xsd:dateTime._:interval_1 tmp:final "2016-05-30T09:20:00"^^xsd:dateTime.

Graph-annotation: [ 9:15, 9:20 ]

Page 12: Continuously Updating Query Results over Real-Time Linked Data

OverviewDynamic data representation

Query engine

Evaluation

Page 13: Continuously Updating Query Results over Real-Time Linked Data

Query streamer engine

Page 14: Continuously Updating Query Results over Real-Time Linked Data

OverviewDynamic data representation

Query streamer engine

Evaluation

Page 15: Continuously Updating Query Results over Real-Time Linked Data

Measure query execution times for query duration

Query: “All trains with their delay in station X within the next hour”Frequency: 10 secondsClients: 1Engine: Query streamer

Annotation methods: singleton property, graph, implicit graphTime labeling types: time interval, expiration time

Evaluating annotation methods

Page 16: Continuously Updating Query Results over Real-Time Linked Data

Evaluating annotation methods

Time interval Expiration time

Page 17: Continuously Updating Query Results over Real-Time Linked Data

Evaluating scalabilityMeasure server CPU usage for increasing # clients

Query: “All trains with their delay in station X within the next hour”Frequency: 10 secondsClients: 1 → 200Engines: Query streamer, C-SPARQL (Barbieri 2012) and

CQELS (Le-Phuoc 2011)

Annotation method: graphTime labeling types: expiration time

Page 18: Continuously Updating Query Results over Real-Time Linked Data

Query Streamer has better scalability

Page 19: Continuously Updating Query Results over Real-Time Linked Data

Query Streamer moves load from server to client

Page 20: Continuously Updating Query Results over Real-Time Linked Data

OverviewDynamic data representation

Annotate dynamic data with time

Query streamer engine

Client-side query engineDynamic data at TPF server

Evaluation

Annotation methodsScalability

Page 21: Continuously Updating Query Results over Real-Time Linked Data

ConclusionsFurther evaluation: Different query types, …?

Solve efficiency-problem time intervals?

Promising approach for improved scalability