pdq poster

1
Finding Plans from Proofs PDQ: Proof-driven Query Answering over Web-based Data Michael Benedikt, Julien Leblay, Efthymia Tsamoura - Oxford University Supported by EPSRC grant EP/H017690/1, Query-driven Data Acquisition from Web-based Data Sources Project homepage: http://www.cs.ox.ac.uk/projects/pdq/ Contact: [email protected] Example: online services for geographic information r 1 : Places(id, name, type, coordinates, ...) information about places (e.g. city, country, continent, lake, etc.) r 2 : BelongsTo(source, target) containment between places, "China belongs to Asia". r 3 : Countries(id, name, iso_code, ...) information about countries. φ 1 :Places(x, y, Country, ...) ↔ Countries(x, y, ...) Query for countries in Asia: not answerable without considering constraints. SELECT p 1 .name FROM BelongsTo AS bt JOIN Places AS p 1 ON p 1 .id=bt.source JOIN Places AS p 2 ON p 2 .id=bt.target WHERE p 1 .type = ’Country’ AND p 2 .name = ’Asia’ Pre-processing steps create auxiliary schema by adding relations InferredAccPlaces, InferredAccBelongsTo, InferredAccCountries, Accessible and constraints: φ’ 1 : InferredAccPlaces(x, y, Country, ...) ↔ InferredAccCountries(x, y,...) α 1 : Accessible(y)Places(x, y , z, ...) InferredAccPlaces(x, y, z, ...)Accessible(x) Accessible(z)... α 2 : Accessible(x)BelongsTo(x, y) InferredAccBelongsTo(x, y)Accessible(y) α 3 : Countries(x, y, z, ...) InferredAccCountries(x, y, z, ...)Accessible(x)α 4 : Context Web data sources which may have: overlapping information, access restrictions. As a result: There may be no web query plan for a given user query. There may be many plans using different sources with different costs. Need to reason about Integrity constraints and access limitations. PDQ System for determining a query plan in the presence of web-based sources. i. constraint-aware ii. access-aware – abiding by access restrictions, iii. cost-aware – making use of any cost information Approach: generating query plans from proofs that a query is answerable. Input S: Schema R, Σ, R set of relations with access methods (free, limited, inaccessible), Σ set of integrity constraints (TGDs). Q: Conjunctive query over S. f: Cost function on evaluation plans. Output P best : plan with minimal cost. Step 1: Pre-processing S augmented with new relations and axioms modelling the access restrictions. A goal query Q inferred is created based on the relations of the augmented schema. Q is grounded to form the initial state of the plan search. Step 2: Basic search step Each state is closed under firing of rules (blue arrows) other than accessibility axioms (denoted α i ). Every possible firing of accessibility axioms (red arrows) gives a new candidate state, inheriting all the facts of its ancestors. Step 3: Plans and costs Each new state gives a plan, to which a cost is assigned (orange circles). If state corresponds to a match with Q inferred and its plan’s cost is lower than the best so far, it becomes the new best state. Queries over Web Data Architecture & User Experience User interface for creating and editing schemas and queries Interactive exploration of the planner’s search space. Online execution of plans. User interface for creating and configuring planning sessions. Dashboard Architecture Runtime Planner InferredAccPlaces(id 2 , "Asia", c 2 , …), Accessible(id 2 ), Accessible(c 2 ), … T’ 1 Places ("") InferredAccPlaces(id 2 , "Asia", c 2 , …), Accessible(id 2 ), Accessible(c 2 ), … T 2 Places ⇐("") T 3 := T 1 T 2 InferredAccBelongsTo(id 1 , id 2 ) T 4 BelongsTo π source (T 3 ) T 5 := π name ( T 3 T 4 ) Places(id 1 , name 1 , "Country", …), Places(id 2 , "Asia", …), BelongsTo(id 1 , id 2 ), Accessible("Asia"), Accessible("Country") Initial State Countries(id 1 , name 1 , c 1 , …) φ 1 Goal : Q inferred (name) InferredAccPlaces(id 1 , name 1 , "Country", …) InferredAccPlaces(id 2 , "Asia", …)InferredAccBelongsTo(id 1 , id 2 ) φ‘ 1 α 1 α 1 α 2 α 3 InferredAccCountries(id 1 , name 1 , c 1 , …), Accessible(id 1 ), Accessible(name 1 ), Accessible(c 1 ) T 1 Countries Ø InferredAccPlaces(id 1 , name 1 , "Country", …) InferredAccCountries(id 1 , name 1 , c 1 , …), Accessible(id 1 ), Accessible(name 1 ), T’ 2 Countries Ø T’ 3 := T’ 1 T’ 2 InferredAccPlaces(id 1 , name 1 , "Country", …) φ‘ 1 α 3 InferredAccBelongsTo(id 1 , id 2 ) T’ 4 BelongsTo π source (T’ 3 ) T‘ 5 := π name ( T‘ 3 T‘ 4 ) α 2 3 2 25 35 45 55 Models free access on Countries

Upload: dbonto

Post on 09-Jul-2015

66 views

Category:

Technology


1 download

DESCRIPTION

PDQ: Proof-driven Query Answering over Web=based Data Abstract: The data needed to answer queries is often available through Web-based APIs. Indeed, for a given query there may be many Web-based sources which can be used to answer it, with the sources overlapping in their vocabularies, and differing in their access restrictions (required arguments) and cost. We introduce PDQ (Proof-Driven Query Answering), a system for determining a query plan in the presence of web-based sources. It is: (i) constraint-aware -- exploiting relationships between sources to rewrite an expensive query into a cheaper one, (ii) access-aware -- abiding by any access restrictions known in the sources, and (iii) cost-aware -- making use of any cost information that is available about services. PDQ takes the novel approach of generating query plans from proofs that a query is answerable. We demonstrate the use of PDQ and its effectiveness in generating low-cost plans.

TRANSCRIPT

Page 1: PDQ Poster

Finding Plans from Proofs

PDQ: Proof-driven Query Answering over Web-based Data Michael Benedikt, Julien Leblay, Efthymia Tsamoura - Oxford University

Supported by EPSRC grant EP/H017690/1, Query-driven Data Acquisition from Web-based Data Sources

Project homepage: http://www.cs.ox.ac.uk/projects/pdq/

Contact: [email protected]

Example: online services for geographic information

r1: Places(id, name, type, coordinates, ...) information about places (e.g. city, country, continent, lake, etc.)

r2: BelongsTo(source, target) containment between places, "China belongs to Asia".

r3: Countries(id, name, iso_code, ...) information about countries.

φ1:Places(x, y, Country, ...) ↔ Countries(x, y, ...)

Query for countries in Asia: not answerable without considering constraints. SELECT p1.name FROM BelongsTo AS bt

JOIN Places AS p1 ON p1.id=bt.source

JOIN Places AS p2 ON p2.id=bt.target

WHERE p1.type = ’Country’ AND p2.name = ’Asia’

Pre-processing steps create auxiliary schema by adding relations InferredAccPlaces,

InferredAccBelongsTo, InferredAccCountries, Accessible and constraints: φ’1: InferredAccPlaces(x, y, Country, ...) ↔ InferredAccCountries(x, y,...)

α1: Accessible(y)∧Places(x, y , z, ...)

→ InferredAccPlaces(x, y, z, ...)∧Accessible(x)∧Accessible(z)∧ ...

α2: Accessible(x)∧BelongsTo(x, y) → InferredAccBelongsTo(x, y)∧Accessible(y)

α3: Countries(x, y, z, ...) → InferredAccCountries(x, y, z, ...)∧Accessible(x)∧…

α4: …

Context

Web data sources which may have: • overlapping information, • access restrictions.

As a result: • There may be no web query plan for a given user query. • There may be many plans using different sources with different costs.

Need to reason about Integrity constraints and access limitations.

PDQ

System for determining a query plan in the presence of web-based sources. i. constraint-aware ii. access-aware – abiding by access restrictions, iii. cost-aware – making use of any cost information

Approach: generating query plans from proofs that a query is answerable.

Input S: Schema ⟨R, Σ⟩, R set of relations with access methods (free, limited, inaccessible), Σ set of integrity constraints (TGDs). Q: Conjunctive query over S.

f: Cost function on evaluation plans.

Output Pbest: plan with minimal cost.

Step 1: Pre-processing S augmented with new relations and axioms modelling the access restrictions. A goal query Qinferred is created based on the relations of the augmented schema. Q is grounded to form the initial state of the plan search.

Step 2: Basic search step Each state is closed under firing of rules (blue arrows) other than accessibility axioms (denoted αi).

Every possible firing of accessibility axioms (red arrows) gives a new candidate state, inheriting all the facts of its ancestors.

Step 3: Plans and costs Each new state gives a plan, to which a cost is assigned (orange circles).

If state corresponds to a match with Qinferred and its plan’s cost is lower than the best so far, it becomes the new best state.

Queries over Web Data

Architecture & User Experience

User interface for creating and editing schemas and queries

Interactive exploration of the planner’s search space. Online execution of plans.

User interface for creating and configuring planning sessions.

Dashboard

Architecture Runtime Planner

InferredAccPlaces(id2, "Asia", c2, …), Accessible(id2), Accessible(c2), …

T’1 ⇐ Places ⇐ ("𝐴𝑠𝑖𝑎")

InferredAccPlaces(id2, "Asia", c2, …), Accessible(id2), Accessible(c2), …

T2 ⇐ Places ⇐("𝐴𝑠𝑖𝑎") T3 := T1 ⋈ T2

InferredAccBelongsTo(id1, id2)

T4 ⇐ BelongsTo ⇐ π source (T3)

T5 := π name ( T3 ⋈ T4 )

Places(id1, name1, "Country", …), Places(id2, "Asia", …), BelongsTo(id1, id2), Accessible("Asia"), Accessible("Country")

Initial State

Countries(id1, name1, c1, …)

φ1

Goal : Qinferred(name) ← InferredAccPlaces(id1, name1, "Country", …)

∧ InferredAccPlaces(id2, "Asia", …)∧ InferredAccBelongsTo(id1, id2)

φ‘1

α1

α1

α2

α3

InferredAccCountries(id1, name1, c1, …), Accessible(id1), Accessible(name1), Accessible(c1)

T1 ⇐ Countries ⇐ Ø

InferredAccPlaces(id1, name1, "Country", …)

InferredAccCountries(id1, name1, c1, …), Accessible(id1), Accessible(name1), …

T’2 ⇐ Countries ⇐ Ø

T’3 := T’1 ⋈ T’2

InferredAccPlaces(id1, name1, "Country", …)

φ‘1

α3

InferredAccBelongsTo(id1, id2)

T’4 ⇐ BelongsTo ⇐ π source (T’3)

T‘5 := π name ( T‘3 ⋈ T‘4 )

α2

3

2

25

35

45 55

Models free access on Countries