1
A Scalable Algorithm for Answering Queries Using Views
Rachel PottingerQualifying ExamOctober 29, 1999Advisor: Alon Levy
2
Answering Queries Using Views
Problem: access views instead of original relations
Useful in data integration and query optimization
NP-Complete Many papers on the subject No empirical testing of algorithms
3
Data Integration:Query Reformulation Data sources are pre-calculated views Views are not complete Get the most answers possible given the views Many data sources
Ford cars- dealer prices- sticker prices- inventory
Cheap cars- prices-manufacturer
Used cars- prices- dealer- year
Car sale information
4
Data Integration Example
Q(cost):-dealercost(car,cost) & stickerprice(car,cost)
V1(price1,price2):-dealercost(car, price1) &
stickerprice(car, price2) & maker(car, “Ford”)V2(cost):-dealercost(car, cost) &
stickerprice(car,cost) & cheap(car)
Q’1(cost):-Ford(cost, cost) Q’2(cost):-BMW(cost)
Conjunctive rewritings
Views
Query
Query: find the prices of cars that we can buy at cost Database relations
Maximally contained rewriting
existentialdistinguished
5
Outline
Previous algorithms Bucket Algorithm [Levy, Rajaraman, Ordille, 1996] Inverse rules [Duschka, Genesereth, 1997]
Minimum Necessary Connections (MiniCon) Algorithm
Experimental evaluation Extension to arithmetic comparisons Conclusions and future work
7
Bucket Algorithm: Populating bucketsFor each subgoal in the query, place
relevant views in the subgoal’s bucketInputs:Q(x):- r1(x,y) & r2(y,x)
V1(a):-r1(a,b)
V2(d):-r2(c,d)
V3(f):- r1(f,g) & r2(g,f)
r1(x,y)
V1(x),V3(x)
r2(y,x)
V2(x), V3(x)
Buckets:
8
Combining Buckets
For every combination in the Cartesian products from the buckets, check containment in the query
Candidate rewritings:Q’1(x) :- V1(x) & V2(x)
Q’2(x) :- V1(x) & V3(x)
Q’3(x) :- V3(x) & V2(x)
Q’4(x) :- V3(x) & V3(x) r1(x,y)
V1(x),V3(x)
r2(y,x)
V2(x), V3(x)
Bucket Algorithm will check all possible combinations
r1(x,y)
r2(y,x)
Buckets:
9
Inverse Rules
Part of the Info Master systemInverse rules show how to get
database tuples from the viewsCannot be extended to interpreted
predicatesStops earlier than the Bucket
Algorithm
10
Creating Inverse Rules
Inputs:V1(a):-r1(a,b)
V2(d):-r2(c,d)
V3(f):- r1(f,g) & r2(g,f)
Inverse Rules:IR1 r1(a, sfV1(a)) :-V1(a)
IR2 r2(sfV2(d),d) :-V2(d)
IR3 r1(f,sfV3(f)) :-V3(f)
IR4 r2(sfV3(f),f) :-V3(f)
Skolem Function
For each V(X):-r1(X1) &… & rn(Xn)for each j = 1, …, n form an inverse rule: rj(Xj):-V(X)
11
Combining Inverse Rules
Inverse Rules +IR1 r1(a, sfV1(a)) :-V1(a)
IR2 r2(sfV2(d),d) :-V2(d)
IR3 r1(f,sfV3(f)) :-V3(f)
IR4 r2(sfV3(f),f) :-V3(f)
TuplesV1(g)
V2(h)
V3(j)
V3(m)
= Expansion:r1(g,sfV1(g)), r2(sfV2(h),h), r1(j,sfV3(j)), r2(sfV3(j),j)r1(m,sfV3(m)), r2(sfV3(m),m)
At query time, query over rules
Q(x):-r1(x,y)& r2(y,x)
Query +
12
Unfolding rules before tuples
Q(x):- r1(x,y) & r2(y,x)
IR1
IR3
IR2
IR4
Use unification to see if rewriting is contained in the query
No containment check necessary
13
The MiniCon Algorithm
Concentrate on variables rather than subgoals to create MiniCon Descriptions (MCDs)
Combine MCDs that only overlap on distinguished view variables
No containment check!
14
MiniCon Description Formation
Form all MiniCon Descriptions (MCDs) that map all query variables that have to be mapped together
Inputs:Q(x) :-r1(x,y) & r2(y,x)
V1(a):-r1(a,b)
V2(d):-r2(c,d)
V3(f):- r1(f,g) & r2(g,f)view mapping subgoals mapped
V3 x f, y g 1, 2 MCDs:
15
MiniCon Combination
Take all combinations of MCDs that map disjoint sets of subgoals map all subgoals of the query
MCDs:view mapping subgoals mapped
V3 x f, y g 1, 2
Rewriting: Q’(x):-V3(x)
16
Experimental Evaluation
Tested performance and scale up of: Bucket Algorithm Inverse Rules extended with unification MiniCon Algorithm
MiniCon at least as good in all cases, much better in some
Show results for chain queries:Q(a):-r1(a,b), r2(b,c), r3(c,d), r4(d,e)
17
Many Rewritings
Chain queries with 5 subgoals and all variables distinguished
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11
Number of Views
Tim
e (
se
c)
MiniCon
Inverse
Bucket
18
Few rewritings, very structured query and views
Chain queries with 10 subgoals and 2 distinguished variables
0
0.5
1
1.5
2
0 100 200 300 400
Number of Views
Tim
e (s
ec)
MiniCon
Inverse
Bucket
19
Few rewritings, less structured views
Chain queries; 2 variables distinguished, query of length 12, views of lengths 2, 3, and 4
0
0.5
1
1.5
2
0 50 100 150
Number of Views
Tim
e (
se
c)
Minicon
Inverse
20
Extension:Interpreted Predicates
Problem is in general undecidable We looked at subgoals of the form:
var < constant or var > constantIf maps to an existential view variable,
require interpreted predicates impliedEx: Q(x):-r1(x,y), y > 17
V1(a):-r1(a,b), b > 18
Guaranteed to be sound
Interpreted Predicates
21
Interpreted Predicate Results
Chain queries with all variables distinguished, 5 subgoals, and 5
variables constrained
012345678
1 2 3 4 5 6 7 8 9
Number of Views
Tim
e (s
ec)
MiniCon IP
Minicon
Chain queries with two distinguished variables, 10 subgoals, and 5 variables
constrained
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400
Number of Views
Tim
e (
se
c) MiniCon IP
MiniCon
22
Future Work
Query OptimizationLook for the fastest answer to queryAssume that all views are completeRequire equivalent rewritingsNeed to allow overlap on subgoals
mapped A fuller comparison of interpreted
predicates
23
Conclusions
Scalability of previous algorithms understood
MiniCon Algorithm invented First experimental comparison of algorithms
for answering queries using views Extensions to binding patterns, interpreted
predicates New maximally contained rewriting form
24
Maximally contained Rewritings
Q’ is a maximally contained rewriting of a query Q using the views V = V1, …, Vn ifFor any database D, and extensions v1, …, vn of
the views such that vi Vi(D), 1 i n, then Q’(v1, …, v2) Q(D) for all i
There is no other query Q1 such that Q’(v1, …, vn) Q1(v1, …, vn)
(2) Q1(v1, …, vn) Q(D), and there exists at least one database for which is a strict subset
25
Containment Checks
Q1 Q2 if the answer to Q1 is a subset of Q2
m is a containment mapping from Vars(Q2) to Vars(Q1) ifm maps every subgoal in the body of Q2
to a subgoal in the body of Q1
m maps the head of Q2 to the head of Q1
26
Inverse Rules With Unification
Find all Inverse Rules that match each query subgoal; place in bucket for that subgoal
For each rule in the first bucket For each other subgoal, i, attempt to unify the
rules so far with all elements in the bucket for IIf we cannot unify with anything in that
bucket, break out of loop, otherwise, recurse
27
Correctness requirements
We need both soundness and completeness A sound rewriting has a valid
containment mapping from the variables of the query to the variables of the view
For completeness we need only to check rewritings of length less than or equal to that of the query