information integration using logical views jeffrey d. ullman

17
Information Integration Using Logical Views Jeffrey D. Ullman

Upload: bella-prentis

Post on 14-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Integration Using Logical Views Jeffrey D. Ullman

Information Integration Using Logical Views

Jeffrey D. Ullman

Page 2: Information Integration Using Logical Views Jeffrey D. Ullman

Overview

Information Integration Systems

Global-as-view (Gav.) vs. Local-as-view (Lav.)

Query Reformulation Specification of Source

Description Adding new sources

Page 3: Information Integration Using Logical Views Jeffrey D. Ullman

Query Reformulation

Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema

Given a query Q in terms of the mediator schema relations, and descriptions of information sources

Find a query Q’ that uses only the source relations, such that

– Q’ Q, and– Q’ provides all possible answers to Q given the sources

Page 4: Information Integration Using Logical Views Jeffrey D. Ullman

Solving Queries by Views

Mediator Relations

Source Relations

Page 5: Information Integration Using Logical Views Jeffrey D. Ullman

Query Rewriting Using Views

Query Containment: q’ q D q’(D) q(D) Query Equivalence: q’=q q’ q ^ q q’Given query q and view definitions V={v1, …, vn} q’ is an Equivalent Rewriting of q using V if

– q’ refers only to views in V, and– q’ = q

q’ is an Maximally-Contained Rewriting of q using V if – q’ refers only to views in V and– q’ q, and– There is no rewriting q1, such that q’ q1 and q1q’

Page 6: Information Integration Using Logical Views Jeffrey D. Ullman

ComputationComplexity

p

k

p

k

pk

p

k

p

k 1

Page 7: Information Integration Using Logical Views Jeffrey D. Ullman

Complexity of Query Containment

Conjunctive Queries (CQ) (NP-Complete) – Q1: p(X,Z) :- a(X,Y) & a(Y,Z)– Q2: p(X,Z) :- a(X,Y) & a(V,Z)

CQ’s With Negation ( -Complete)– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z)

CQ’s With Arithmetic Comparision ( -Complete)– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y

Datalog Programs– p(A,C) :- a(A,B) & b(B,C)

p

2

p

2

Page 8: Information Integration Using Logical Views Jeffrey D. Ullman

Specification of Source Description

Views: resources that used by integrator to help to answer queries

Gav. Mediator relation defined as view over source relations

Lav. Source relation defined as view over mediator relations

Page 9: Information Integration Using Logical Views Jeffrey D. Ullman

Information Integration Systems

Information Manifold (IM)– AT&T– Local-as-View (Lav)– Description logic– Source relations defined as views of mediator

relations ( a collection of global predictions) Tsimmis

– Stanford and IBM– Global-as-View (Gav)– Mediator relations defined as views of source

relations

Page 10: Information Integration Using Logical Views Jeffrey D. Ullman

IM Example

Global Predicates: Mediator relations

Page 11: Information Integration Using Logical Views Jeffrey D. Ullman

IM Example (Cont.)

Views: Source Relations

Query: “What are Sally’s phone and office?”

Mediator Relations

Mediator Relations

Page 12: Information Integration Using Logical Views Jeffrey D. Ullman

IM Example (Cont.)

Answer: Source Relations

Query reformulation : Bucket Algorithm (check query containment NP-Complete (query length) )

Page 13: Information Integration Using Logical Views Jeffrey D. Ullman

Advantages and Disadvantages (IM)

Advantage: adding new sources– Mediator (global predicates, source descriptions)– Query processing

Disadvantages : query reformulation (Bucket algorithm)

Page 14: Information Integration Using Logical Views Jeffrey D. Ullman

Tsimmis

OEM and MSL Mediator Relations

Page 15: Information Integration Using Logical Views Jeffrey D. Ullman

Tsimmis Example

Exported OEM Objects

Query: “What are Sally’s phone and office?”

Mediator Relations

Source Relations

Source Relations

Page 16: Information Integration Using Logical Views Jeffrey D. Ullman

Advantage and Disadvantage ( Tsimmis)

Advantage– Query reformulation: rule unfolding

Disadvantage– Mediation description– Adding, removing, and modifying source description

Page 17: Information Integration Using Logical Views Jeffrey D. Ullman

IM vs. Tsimmis

Query Reformulation Adding Sources Levels of Mediation Semistructured Data Constraints Automatic Generation of Components

(Wrappers and Mediators)