distributed query-sub-query presented by noam pettel 29/5/05

39
Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05

Upload: jaron

Post on 19-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Distributed Query-Sub-Query Presented by Noam Pettel 29/5/05. Motivation. Optimization of query evaluation in a peer-to-peer environment Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

Distributed Query-Sub-Query

Presented by Noam Pettel29/5/05

Page 2: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

2

Motivation

Optimization of query evaluation in a peer-to-peer environment

Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment

Implementation of the algorithm using the Active XML system

Page 3: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

3

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

Page 4: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

4

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

Page 5: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

5

Example

Input:

We are interested in the ancestor(x,y) relation Typical query: “Give me all the ancestors of

Andy”

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)Alice

Joyce Nancy

Ruth Lois

Andy Mark

Page 6: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

6

Relational Database A Database composed of relations (tables) Stores only explicit information

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)

Alice

Joyce Nancy

Ruth Lois

Andy Mark

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

AliceLois

AliceMark

AliceAndy

AliceRuth

JoyceMark

JoyceAndy

anc(x,y)

Page 7: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

7

Deductive Database

Explicit information Rules that enable inferences based

on the stored data

anc(x,y) :- parent(x,y)anc(x,y) :- anc(x,z), parent(z,y)

Datalog program

recursions

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)

x,y (anc(x,y) ← parent(x,y))

x,y,z (anc(x,y) ← anc(x,z), parent(z,y))

↨head body

Page 8: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

8

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

Page 9: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

9

Alice

Joyce Nancy

Ruth Lois

Andy Mark

Query Evaluation

Query:

Goal: Compute query with minimal data materialization

q(y) :- anc(“Joyce”,y)

Page 10: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

10

QSQ

Known technique for optimization of Datalog queries:Query-Sub-Query (QSQ)

QSQ rewrites the Datalog program according to the given query

QSQ is based on two main notions:• Binding patterns • Supplementary relations

Page 11: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

11

Binding Patterns

For each relation, adorned versions of the relation based on the bindings of the variables are considered

For example, adorned versions of anc are: ancbb, ancbf, ancfb, ancff,

anc(x,y) :- parent(x,y)anc(x,y) :- anc(x,z), parent(z,y)q(y) :- anc(“Joyce”,y)

Page 12: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

12

Binding Patterns

anc (x,y) :- parent(x,y)anc (x,y) :- anc (x,z), parent(z,y)q(y) :- anc (“Joyce”,y)

bound to a constant free

The same relation may appear with different adornments in the Datalog program

different adornments of the same relation are treated as different relations during the QSQ computation

bf

bf

bf bf

Page 13: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

13

Supplementary Relations

ancbf (x,y) :- parent(x,y)

ancbf (x,y) :- ancbf (x,z), parent(z,y)

q(x) :- ancbf (“Joyce”,x)

For each adorned relation and each position in the body of a rule, we define a supplementary relation to accumulate the bindings relevant to that position

sup_10(x) sup_11(x,y)

sup_20(x) sup_21(x,z) sup_22(x,y)

sup_10(x) :- in_anc_bf(x)sup_11(x,y) :- sup_10(x), parent(x,y)anc_bf(x,y) :- sup_11(x,y)

sup_20(x) :- in_anc_bf(x)sup_21(x,z) :- sup_20(x), anc_bf(x,z)sup_22(x,y) :- sup_21(x,z), parent(z,y)anc_bf(x,y) :- sup_22(x,y)

QSQ rewriting of the program

Page 14: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

14

QSQ Example

sup_10(x) sup_11(x,y)

sup_20(x) sup_21(x,z) sup_22(x,y)

Joyce, LoisJoyce, Ruth

AliceNancy

AliceJoyce

JoyceLois

LoisMark

LoisAndy

JoyceRuth

parent(x,y)

LoisRuth

Joyce, LoisJoyce, Ruth

Joyce, MarkJoyce, Andy

Mark Andy

ancbf (x,y) :- parent(x,y)

ancbf (x,y) :- ancbf (x,z), parent(z,y)

q(y) :- ancbf (“Joyce”,y)

Joyce, MarkJoyce, Andy

query result

Alice

Joyce Nancy

Ruth Lois

Andy Mark

Joyce

Joyce

Page 15: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

15

Properties of QSQ

Compute the correct answer to the query

Materialize only a minimal set of tuples

Guaranteed to terminate

QSQ evaluations have nice properties!

Page 16: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

16

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

Page 17: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

17

Distributed Environment

r1 r(x,y) :- a(x,y)r2 r(x,y) :- s(x,z), t(z,y)r3 s(x,y) :- r(x,y), b(y,z)r4 t(x,y) :- c(x,y)

Centralized Datolog program

Distribution of the program between 3 peers

R

hosting r, aS

hosting s, b

T

hosting t, c

r1 r@R(x,y) :- a@R(x,y)r2 r@R(x,y) :- s@S(x,z), t@T(z,y)

r3 s@S(x,y) :- r@R(x,y), b@S(y,z)

r4 t@T(x,y) :- c@T(x,y)

The rules at peer P are the rules where P is the peer of the head

Page 18: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

18

Naïve Distributed Evaluation

Activation of remote relations

R

S T

r2 r@R(x,y) :- s@S(x,z), t@T(z,y)

request request

response response

AXML and Web Services make it very easy!

Page 19: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

19

Termination Detection

We need to detect when the system reaches a fixpoint

Fixpoint is reached when no new facts can be derived at any peer

Termination detection is a standard problem in distributed computing

Page 20: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

20

Termination Detection

The model: Communication is asynchronous Each message eventually arrives and

acknowledged At some point, the site that started the

query decides to check for termination It calls all the sites that it directly

invoked and asks them if they completed

These sites contact the sites they invoked and so on…

Page 21: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

21

Termination Detection

A site answers positively if:• It is idle (cannot produce more data)• All the data it has sent has been

acknowledged• All its successors believe the

computation terminated

Page 22: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

22

Termination Detectionr1 r@R(x,y) :- a@R(x,y)r2 r@R(x,y) :- s@S(x,z), t@T(z,y)

r3 s@S(x,y) :- r@R(x,y), b@S(y,z)

r4 t@T(x,y) :- c@T(x,y)

r

a s

b

t

c

Build a graph to represent the distributed Datalog program

Recursions result in cycles in the graph

Use a spanning tree of the graph in order to decide termination

Page 23: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

23

Distributed QSQ Rewriting

For each rule: The peer in the head of the rule starts the rewriting

When a remote relation is encountered, the peer delegates the remainder of the rule to the remote peer in charge of that relation

Page 24: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

24

Distributed QSQ Rewriting

sup_0(x) sup_1(x,z) sup_2(x,y)

rbf (x,y) :- sbf (x,z), tbf (z,y) sup_0(x) :- in_r_bf(x)sup_1(x,z) :- sup_0(x), s(x,z)sup_2(x,y) :- sup_1(x,z), t_bf(z,y)r_bf(x,y) :- sup_2(x,y)

centralized

sup_0@R(x) sup_1@S(x,z) sup_2@T(x,y)

r@Rbf (x,y) :- s@Sbf (x,z), t@Tbf (z,y)

distributed R computes sup_0@R(x) :- in_r_bf@R(x) R sends to S sup2@S(x,y) :- sup0@R(x,y), s_bf@S(x,z), t_bf@T(z,y)

Page 25: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

25

Distributed QSQ Rewriting

The rewriting is performed locally at each peer, without any global knowledge

Once the QSQ rewriting is complete, we start the QSQ computation process – Like in the central case, except for calling remote services

Page 26: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

26

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

Page 27: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

27

Why Active XML?

AXML is a natural selection An AXML document contains both

explicit and implicit data, just like in Datalog

r@R(x,y) :- s@S(x,z), t@T(z,y)

<r> <t> <x>1</x> <y>2</y> </t> <t> <x>1</x> <y>3</y> </t>

<sc>…

S T

continuous services

Page 28: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

28

Implementation Steps

Given a distributed Datalog program and a query:1. Transform the Datalog program to

distributed QSQ2. Transform the distributed QSQ to

Active XML3. Run!4. Detect termination

Page 29: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

29

Outline

Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query

(dQSQ) Implementation using AXML Using dQSQ for Petri Nets

Page 30: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

30

Article

“Diagnosis of Asynchronous Discrete Event Systems: Datalog to the Rescue!”

S. Abiteboul, Z. Abrams, S. Haar, T. Milo

PODS, June 2005

Page 31: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

31

Datalog & P2P

Deductive databases was a hot topic in the late 80s

Research in this area led to beautiful results, with little industrial impact

Years later, with networks everywhere, recursive data management is becoming more essential

Datalog and QSQ become hot again!

Page 32: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

32

Abstract

Diagnosis of distributed telecommunication systems

The problem can be modeled by Datalog

Can benefit from dQSQ

Page 33: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

33

Petri Nets

An enabled transition can fire and yield a new Petri net If a transition fires, its alarm symbol is reported to the supervisor For example, if transition (i) fires. The marking moves from

places 1,7 to places 2,3

place

alarm symbol

transition

marked place

The marked places model the current state of the peer

A transition node is enabled iff all its parent nodes are marked

Page 34: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

34

The Problem

The supervisor receives an alarm sequence (a1,p1),(a2,p2),…,(an,pn).Ai – An alarm symbolPi – The peer that emitted the alarm

Due to asynchronous communication• We do not guarantee that alarms sent by

different peers appear in the order they were emitted

• We can only assume that the order of alarms is kept for each individual peer

Goal: Find an explanation for a given alarm sequence

Page 35: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

35

Example

The set of shaded nodes in figure 2 is a diagnosis for the alarm sequence (b; p1), (a; p2), (c; p1).

Page 36: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

36

From Petri Nets to dQSQ

Petri Nets can be modeled by Datalog and dQSQ

A set of relations and rules is defined at each peer

Each peer builds its own Datalog program using local information only, even if it has transitions to other peers

Page 37: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

37

From Petri Nets to dQSQ

Here is a small part of the Datalog rules…

Page 38: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

38

From Petri Nets to AXML

Translation steps from Petri Nets to Active XML:

Petri Net

Datalog QSQ AXMLPNet2Datalog Datalog2QSQ QSQ2AXML

Page 39: Distributed  Query-Sub-Query  Presented by Noam Pettel 29/5/05

39

The End