1 evaluating conjunctive triple pattern queries over large structured overlay networks erietta...

28
1 Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks Erietta Liarou, Stratos Idreos, and Manolis Koubarakis 2006. Waled Almukawi. Albert-Ludwigs-Universität Fr eiburg Rechnernetze und Telematik

Upload: kaden-herman

Post on 14-Dec-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

1

Evaluating Conjunctive Triple Pattern Queries over Large Structured

Overlay Networks

Erietta Liarou, Stratos Idreos, and Manolis Koubarakis 2006.

Waled Almukawi.

Albert-Ludwigs-Universität Freiburg

Rechnernetze und Telematik

2

Outline

Motivation Terminologies

Conjunctive Queries

RDF

DHT

System Model & Data Model Query Chain Algorithm Spread By Value Algorithm Experiments Future Works References

3

Motivation

One of the most challenging and interesting open

problems in the frontiers of P2P and Semantic Web is how

to evaluate queries expressed in RDF query languages on

top of P2P networks .

Previous Rdfpeers used only atomic formula or disjoint

RDF triple pattern.

The conjunctive queries are interesting, since they provide

a flexible methods that can deal effectively with a full

functionality of existing RDF query .

4

Conjunctive Queries

The conjunctive queries are simply the fragment of first

order logic given by the set of atomic formula using

conjunction Λ .

A conjunctive query q formula :

?x1, . . . , ?xn : (s1, p1, o1) ^ (s2, p2, o2) ^ · · · ^ (sn, pn, on)

where ?x1, . . . , ?xn are answer variables, each (si, pi, oi)

is a triple pattern, and each variable ?xi appears in at least

one triple pattern (si, pi, oi).

Conjunctive queries (2)

Example:

Q:ans(x,y):- country(Id,x),capital(y,x ),continent(x,“Europa“)

Q‘:ans(x,y):-(?x,“type“,“country“),(?x,“has-capital“,?y),

(?x, continent,“Europa“).

5

6

Resource Description Framework (RDF) RDF is a framework for describing Web resources, such as

the title, author, content …e.t.c.

RDF Statements is The combination of a Resource, a

Property, and a Property value forms a Statement (known

as the subject, predicate and object of a Statement).

SubjectSubject ObjectObject

Uris, blank node

Predicate

Uris, Literals, Blank nodes

7

Distriuted Hash Tables (DHT) DHTs are an important class of P2P networks that offer

distributed hash table functionality, and allow one to

develop scalable, robust and fault-tolerant distributed

applications.

Distribute data over a large P2P network

Can Quickly find any given item ni Log N nodes.

Look up for an Item in the DHT is performed in O(Log N)

nodes.

System Model ( ref:[4] )

10N11N42+32

42N42N42+0keyssucc.distan

5850464443

N63N50N48N48N48

N42+4N42+2N42+1

N42+16N42+8

finger table

N27

N11

N2

N56

N50

N17

N63

N48

N6

N33

N42

27N27N11+16

11N11N11+0keyssucc.distan

43

19151312

N17N11+4N17N11+2N17N11+1

N48N11+32

N27N11+8

finger table

28N33N27+127N27N27+0

keyssucc.distan

5943353129

N63N48N42N33N33

N27+4N27+2

N27+16N27+32

N27+8

finger table

Key 28Key 28

N42N42

1010N11N11N42+32N42+32

N11N112727N27N27N11+16N11+16

N27N27 2828N33N33N27+1N27+1

N33N33

9

Data Model

Each network node is able to describe in RDF the resources that it wants to make available to the rest of the network .

It creates and inserts the RDF triples

( Subject, Predicate, Object ). The subject of a triple identifies the resource that the statement

is about. The predicate identifies a property or a characteristic of the

subject. The object identifies the value of the property.

10

Query Chain Algorithm The main idea of QC is that the query is evaluated by a chain

of nodes.

Intermediate results flow through the nodes of this chain

Finally the last node in the chain delivers the result back to

the node that submitted the query.

Distribute the Load Balance by

Split the query into the triple patterns that it consists of.

Evaluate each triple pattern separately to a deferent node.

Intermediate result flow through these nodes as triple and

partially satisfy the query.

Query Chain Algorithm

11

N42

N27

N11

N2

N56

N50

XX

N63

N48

N6

N33

T1=Hash(S)

T2=Hash(P)

T3=Hash(O)

r1r1

r2r2

r3r3

Indexing a new triple

Query Chain Algorithm

12

N42

N27

N11

N2

N56

N50

XX

N63

N48

N6

N33

T1=Hash(p1)

T2=Hash(p2)

T3=Hash(o3)

r1r1

r2r2

r3r3

Indexing a new triple

13

Query Chain Algorithm Evaluating a query

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

xx

r1r1

R1 searches in its TT

and evaluates t1

R1 searches in its TT

and evaluates t1 r1r1

r2r2 QEval(q,2,R’,ip(x)) QEval(q,2,R’,ip(x)) QEval (q,i,R’,ip(x)) QEval (q,i,R’,ip(x))

L={s1,s2}

R‘={s1,s2}

L={s1,s2}

R‘={s1,s2}

S1,p1,s2S1,p1,s2

14

Query Chain Algorithm Evaluating a query

R2 searches in its TT and evaluates

t2 =(?y,p2,?z)

R2 searches in its TT and evaluates

t2 =(?y,p2,?z)

r2r2

r3r3

R3 searches in its TT

and evaluates t3

(?z,p3,o3)

R3 searches in its TT

and evaluates t3

(?z,p3,o3)

r3r3

XX Delivers answer

answer (q,R’)

Delivers answer

answer (q,R’) QEval (q,3,R’,ip(x)) QEval (q,3,R’,ip(x))

S2,p2,s3S2,p2,s3

L={s2,s3},R‘={s1,s3}L={s2,s3},R‘={s1,s3}

L={s3},R‘={s1}L={s3},R‘={s1}

S3,p3,s3S3,p3,s3

Query chain algorithms Local processing at each chain node

15

nn

QEval (q,i,R,ip(x))

L = π X ( σF (TT) )

e.g : q1=(?s1,p1,o1)

L=π 1(σ2=p1Λ3=o1(TT) )

R’=π Y(R∞ L )

n’n’

QEval (q, i+1, R’, ip(x))

nknk

QEval (q,k, R’, ip(x))

R’=π Y(R ∞ π X ( σF (TT) )

Answer (q,R’)

16

Spread By Value Algorithm

In SBV the triple pattern t will be stored at the successor nodes of the identifiers Hash(s), Hash(p), Hash(o), Hash(s + p), Hash(s + o), Hash(p + o) and Hash(s + p + o).

SBV exploits the values of matching triples found while processing the query incrementally.

SBV algorithm

17

N42

N27

N11

N2

N56

N50

XX

N63

N48

N6

N33

T1=Hash(p1)

T2=Hash(s2+p2)

T3=Hash(s3+o3)

r1r1

r2r2

r3r3

Indexing a new triple

T4=Hash(p1)

18

SBV Algorithm Evaluating a query

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

xx

r1r1 QEval(q, V, u, IP(x)),

r1 evaluates (?x,p1,?y)

QEval(q, V, u, IP(x)),

r1 evaluates (?x,p1,?y)

SBV Algorithm Evaluating a query

R1 searches in its TT and finds

v ’a={?x=s1,y=s2},v’b={?x=s1,y=o1}

q’a=(s2p2,?z)(?z,p3,o3)

q’b=(o1p2,?z)(?z,p3,o3)

R1 searches in its TT and finds

v ’a={?x=s1,y=s2},v’b={?x=s1,y=o1}

q’a=(s2p2,?z)(?z,p3,o3)

q’b=(o1p2,?z)(?z,p3,o3)

r1r1

r2r2

S (H

(p2+01))

QEval(q’a,V,v’a,ip(x)) QEval(q’a,V,v’a,ip(x))

r4r4

QEval(q’b,V,v’b,ip(x)) QEval(q’b,V,v’b,ip(x))

(?x,p1,?y) ,(?y,p2,z3)

t1=(s1,p1,s2)

t4=(s1,p1,o1)

(?x,p1,?y) ,(?y,p2,z3)

t1=(s1,p1,s2)

t4=(s1,p1,o1)

SBV Algorithm Evaluating a query

Each node searches in its TT for matching

triples (?z,p3,o3 )

v’’a={?x=s1,?y=s2,?z=s3} q’’a=(s3,p3,o3)

Each node searches in its TT for matching

triples (?z,p3,o3 )

v’’a={?x=s1,?y=s2,?z=s3} q’’a=(s3,p3,o3)

r2r2

r3r3

QEval (q’’a,V,v’’a,ip(x)) QEval (q’’a,V,v’’a,ip(x))

xxIP(x)

v’’’a={?x=s1,?y=s2,?z=s3}

Delivers answer Answer (q,V)

v’’’a={?x=s1,?y=s2,?z=s3}

Delivers answer Answer (q,V)

s2,p2,s3s2,p2,s3

s3,p3,s3s3,p3,s3

Comparing the query chains in QC and SBV

21

q is ?x : (s1,p1,?x),(?x,p2,?y),(?y,p3,o3).

?x has four matching values for (s1,p1,?x) .

?y has three matching values for each value of ?x .

QC SBV

Optimizing network traffic

IP cache (IPC) used by our algorithms to significantly

reduce network traffic.

Another effect of IPC, is that we reduce the routing load

incurred by nodes in the network.

22

Experiments

A simulator for chord network with java code was done.

Our metrics are:

1) The amount of network traffic that is created.

2) how well the query processing load and storage load

are distribution among the network nodes.

23

24

Experiments

Some Experimental evaluating of the two algorithms ,and the result are showed in the following figures

(b) IPC effect(a) Traffic cost

Experiments Query processing and storage load distribution

25

Future work

Order of triple choosed to evaluate the

query.

RDFpeers reasoning.

26

27

Reference

1) E. Liarou, S. Idreos, and M. Koubarakis. Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks. In I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo.

2) E. Liarou, S. Idreos, and M. Koubarakis. Publish-Subscribe with RDF Data over Large Structured ,Overlay Networks. In DBISP2P ’05.

3) Book : Foundations of Databases, Addison-Wesley,

1995. ISBN 0-201-53771-0.

4) I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord:

A Scalable peer-to-peer lookup service for internet applications.

5) E. Prud’hommeaux and A. Seaborn. SPARQL Query Language for RDF.

http://www.w3.org/TR/rdf-sparql-query/, 2005.

Thank you for your Attention.

28