solving failing queries *) zbigniew w. ras university of north carolina charlotte, n.c. 28223, usa...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Solving Failing Queries*)
Zbigniew W. Ras Zbigniew W. Ras
University of North CarolinaUniversity of North Carolina
Charlotte, N.C. 28223, USACharlotte, N.C. 28223, USA
[email protected]@uncc.edu
Failing Query Problems
Problem 1.Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax query’s constraints so that it returns a non-empty set of tuples.
Assumption: S(A) – information system based onattributes from A, q(B) – query based on attributes from B.Query q(B) is not local for system S(A), if [B A].
Problem 2.Given S(A), which represents one of the sites of a distributedautonomous information system, and not local query q(B)submitted to S(A), where A B , how to modify q(B) so it can be answered.
Failing Query Problem 1
Problem 1. [Cooperative Query Answering][Papers by: Minker, Chu, Gaasterland, Demolombe, Muslea]
age
young middle-aged old
salary
low medium high
18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k
Example of a query:(age, 18) (salary, 40k)
Possible relaxations:(age, young) (salary, 40k)(age, 18) (salary, low)(age, young) (salary, low)
Preference for relaxation:[1 - age, 2 - salary]
aa bb cc ee
xx11 aa[1,2][1,2] bb11 cc22 ee11
xx22 aa[1,2][1,2] bb11 cc22 ee11
xx33 aa[1,1][1,1] bb11 cc11 ee11
xx44 aa[2,1][2,1] bb22 cc22 ee22
xx55 aa[1,1][1,1] bb22 cc11 ee22
Information System S1
q = a[1,2]*c1 submitted to S1 fails(no objects in S1 satisfying q)
Solution: q can be generalized by QAS to q1 = a1*c1, which is matching objects x3 and x5 in S1.
Question: Which of these two objects (x3 or x5) is closer to q?
Attribute a is hierarchical of a structure in Lisp-like notation a(a1( a[1,1], a[1,2] ), a2( a[2,1] ,…))
Failing Query Problem 1
Information System S1
q = a[1,2]*c1 submitted to S1 fails(no objects in S1 satisfying q)
Question: Which of these two objects (x3 or x5) is closer to q?
Let k m. Then, the distance:δa(a[i(1), i(2),…, i(k)], a[j(1), j(2),…, j(m)]) = if [i(1) = j(1) … i(n) = j(n) [ n = k m i(n+1) j(n+1)]], then 1/2n else 0
δ(xi,xj) = δa+ δb+ δc+ δe
δ(q, x3) = ½+1+1+1= 3½δ(q, x5) = ½+1+1+1 = 3½
Result: both are OK
aa bb cc ee
xx11 aa[1,2][1,2] bb11 cc22 ee11
xx22 aa[1,2][1,2] bb11 cc22 ee11
xx33 aa[1,1][1,1] bb11 cc11 ee11
xx44 aa[2,1][2,1] bb22 cc22 ee22
xx55 aa[1,1][1,1] bb22 cc11 ee22
Failing Query Problem 1
Failing Query Problem 1
[Muslea, KDD’04]On-line, query-guided algorithm for relaxing failing DNF queries
Example. A = {Price, CPU, Display, Weight}. Failing queryq(A) = [Price$2000][CPU2.5GHz][Display17’’][Weight3lbs].
Select randomly chosen small subset of target DB to discoverimplicit relationships between values of attributes used in query.
Discovered Rules:r1 = [[Price$2900][Display18’’][Weight4lbs] [CPU2.5GHz]].r2 = [[Price$3500] [CPU2.5GHz]].r3 = …….Nearest-neighbor technique is used to identify which rule is most similar to failing query. Assume that r1 is such a rule.
Relaxed query:[Price$2900][CPU2.5GHz][Display17’’][Weight4lbs].
Failing Query Problem 2
Problem 2. [Collaborative Query Answering][Papers: Ras, Zemankova, Stolfo, Maitan, Zytkow, Dardzinska]
Example of a non-local query
Database:Flights(airline; departure time; arrival time; departure airport; arrival airport).
select * from Flights where airline = "Delta” departure time = "morning" departure airport = "Charlotte" aircraft = "Boeing"
Query Processing in Collaborative Systems
aa bb cc ee
xx11 aa[1,2][1,2] bb11 cc22 ee11
xx22 aa[1,2][1,2] bb11 cc22 ee11
xx33 aa[1,1][1,1] bb11 cc11 ee11
xx44 aa[2,1][2,1] bb22 cc22 ee22
xx55 aa[1,1][1,1] bb22 cc22 ee22System S1
aa bb cc dd
yy11 aa11 bb22 cc11 dd11
yy22 aa22 bb11 cc22 dd22
yy33 aa11 bb[1,1][1,1] cc22 dd22
yy44 aa11 bb[1,1][1,1] cc22 dd11
yy55 aa22 bb22 cc11 dd22
System S
q = a1 b1 e1 submitted to S fails,because attribute e is not in S
(clearly b[1,1] is also b1).
Find definition of e1 in S1:b1→e1; c1→e1; a[1,2]→e1
q = a1b1e1 a1b1 (b1+c1+a[1,2]) = = a1*b1+a1b1c1+a1b1a[1,2] = a1*b1.
Objects y3, y4 satisfy the query q.
Query Processing in Collaborative Systems
aa bb cc ee
xx11 aa[1,2][1,2] bb11 cc22 ee11
xx22 aa[1,2][1,2] bb11 cc22 ee11
xx33 aa[1,1][1,1] bb11 cc11 ee11
xx44 aa[2,1][2,1] bb22 cc22 ee22
xx55 aa[1,1][1,1] bb22 cc22 ee22System S1
aa bb cc dd
yy11 aa11 bb22 cc11 dd11
yy22 aa22 bb11 cc22 dd22
yy33 aa11 bb[1,1][1,1] cc22 dd22
yy44 aa11 bb[1,1][1,1] cc22 dd11
yy55 aa22 bb22 cc11 dd22System S
q = a[1,2]b[1,1] submitted to S1
fails because of the granularity of b.Find definition b[1,1] in S:
a1c2→b[1,1].
q = a[1,2] b[1,1] a[1,2]a1 c2 = a[1,2] c2.
Objects x1, x2 satisfy the query q.
Failing Query Problem 2
Query Processing in Incomplete IS
X is a set of objects, A is a set of attributes, Va is a set of values of attribute a, where a A, and V = {Va : a A},
S = (X,A,V) is a partially incomplete information system of type , if the following two conditions hold:
for any x X, a A,
- if aS(x) is defined, then [aS(x) Va or aS(x)={(vi,pi): 1 i m}],
- if [aS(x)={(vi,pi): 1 i m}], then [ i=1…m pi = 1 and (i)(pi )]
- Also, if [aS(x) = v, then the value v has the same meaning as {(v,1)}]
Failing Query Problem 2
xx88
xx77
xx66
xx55
xx44
xx33
xx22
xx11
eeddccbbaaXX
),( 32
2a
),( 31
1b1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e),( 3
12e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a),( 4
11b
),( 43
2b),( 3
11c
),( 32
2c 2d 2e
3a2b 1c
1d 3e
Incomplete Information System
Queries: q1(a,b) = a1 * b1
q2(a,b) = a1 + b1
J(a1) = {(x1,1/3), (x3,1),(x5,2/3)}J(b1) = {(x1,2/3),(x2,1/3),(x4,1/2), (x5,1),(x7,1/4)}
What aboutJ(a1* b1) = J(a1) J(b1),J(a1 + b1) = J(a1) J(b1) ?
Interpretations for and
Assume that: J(aAssume that: J(a11) = {(x) = {(xii, p, pii): i ): i K} and J(b K} and J(b11) = {(x) = {(xii, q, qii): i ): i K}. K}.
Interpretation TInterpretation T00
J(aJ(a11) ) 00 J(b J(b11) as {(x) as {(xii, S, S11(p(pii, q, qii): i ): i K}, where K}, where SS11(p(pii, q, qii) = [if max(p) = [if max(pii, q, qii) =1, then min(p) =1, then min(pii, q, qii), else 0].), else 0].
J(aJ(a11) ) 00 J(b J(b11) as {(x) as {(xii, S, S22(p(pii, q, qii): i ): i K}, where K}, where SS22(p(pii, q, qii) = [if min(p) = [if min(pii, q, qii)=0, then max(p)=0, then max(pii, q, qii), else 1].), else 1].
Interpretation TInterpretation T11
J(aJ(a11) ) 11 J(b J(b11) as {(x) as {(xii, max {0, p, max {0, pii+q+qii-1}): i -1}): i K} and K} and J(aJ(a11) ) 11 J(b J(b11) as {(x) as {(xii, min{1, p, min{1, pii + q + qii}) : i }) : i K}. K}.
Interpretation TInterpretation T22
J(aJ(a11) ) 22 J(b J(b11) = {(x) = {(xii, [p, [piiqqii]/[2 - (p]/[2 - (pii + q + qii – p – piiqqii)]): i )]): i K} and K} and J(aJ(a11) ) 22 J(b J(b11) = {(x) = {(xii, [p, [pi i + q+ qii]/[1 + p]/[1 + piiqqii]) : i ]) : i K}. K}.
Interpretations for and Interpretation T3Interpretation T3J(aJ(a11) ) 33 J(b J(b11) = {(x) = {(xii, p, piiqqii): i ): i K} K} J(aJ(a11) ) 33 J(b J(b11) = {(x) = {(xii, p, pii+q+qii - p - piiqqii) : i ) : i K} K}
Interpretation T4Interpretation T4J(aJ(a11) ) 44 J(b J(b11) = {(x) = {(xii, [p, [piiqqii]/[p]/[pii + q + qii – p – piiqqii]): i ]): i K} K} J(aJ(a11) ) 44 J(b J(b11) = {(x) = {(xii, [p, [pii + q + qii - 2 - 2ppiiqqii]/[1 – p]/[1 – piiqqii]) : i ]) : i K} K}
Fuzzy Interpretation T5Fuzzy Interpretation T5J(aJ(a11) ) 55 J(b J(b11) = {(x) = {(xii, min {p, min {pii, q, qii}: i }: i K} K} J(aJ(a11) ) 55 J(b J(b11) = {(x) = {(xii, max { p, max { pii, q, qii}) : i }) : i K} K}
Another possible interpretationAnother possible interpretation TT
J(aJ(a11) ) 33 J(b J(b11) = {(x) = {(xii, p, piiqqii): i ): i K}K}J(aJ(a11) ) 55 J(b J(b11) = {(x) = {(xii, max { p, max { pii, q, qii}) : i }) : i K} K}
Interpretations TInterpretations T00, T, T55, T satisfy property:, T satisfy property: a a (b (b c) = (a c) = (a b) b) (a (a c) c)
Incomplete IS [S2 is finer than S1]
Assume:Assume: SS11, S, S22 partially incomplete partially incomplete ISIS of type of type λλ
The same objects are stored in both systemsThe same objects are stored in both systems The same attributes are used to describe objectsThe same attributes are used to describe objects aaSS11((xx) ={() ={(aa11ii, , pp11ii): 1 ≤ ): 1 ≤ mm11}, }, aaSS22((xx) ={() ={(aa22ii, , pp22ii): 1 ≤ ): 1 ≤ mm22} }
Failing Query Problem 2
SS22 is finer than is finer than SS11 if: if:
((xxXX)()(aaAA)[card()[card(aaSS11((xx)) ≥ card()) ≥ card(aaSS22((xx))]))]
((xxXX)()(aaAA) [card() [card(aaSS11((xx)) = card()) = card(aaSS22((xx))] ))]
[[i≠j|i≠j|pp22ii - p - p2j2j| > | > i≠ji≠j||pp11ii - p - p1j1j|]|]
Incomplete Information System
S2 finer than S1
x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 21
2a
),( 31
1b1e
1c1d
),( 21
1a
),( 41
2a),( 4
33a ),( 3
22b 2d
2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e
),( 31
2e
),( 21
1a),( 2
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a),( 4
11b
),( 43
2b),( 3
11c
),( 32
2c 2d 2e
3a2b 1c
1d 3e x8
x7
x6
x5
x4
x3
x2
x1
edcbaX
),( 32
2a
),( 31
1b1e
1c1d
),( 21
1e),( 2
12e
),( 31
1a ),( 32
1b),( 3
12b
),( 41
2a),( 4
33a ),( 3
22b 2d
1a 2b ),( 21
1c),( 2
13c 2d
3e
3a 2c1d
),( 32
1e
),( 31
2e
),( 32
1a),( 3
12a 1b 2c
1e
2a 2b 3c 2d),( 3
12e
),( 32
3e
2a),( 4
11b
),( 43
2b),( 3
11c
),( 32
2c 2d 2e
3a2b 1c
1d 3e
S2S1
Failing Queries in Collaborative IS Assume:
• Query q = q(B) is submitted to S =(X, A, V), where:
• B is a set of all attributes used in q
• AB≠• Attributes in B\(AB) are foreign for S
• Two information systems can collaborate if they agree on the ontology of some of their common attributes
• The granularity of values of attributes used in a query q may differ from the granularity of values of the same attributes in S
Failing Queries in Collaborative IS
Query q(B) can be processed at site S by discovering definitions of values of attributes from B\(AB) at some of the remote sites for S.
With each certain rule discovered at a remote site, a number of additional rules can be also discovered.
Exampleage ( child( ≤ 17), young (18, … , 29), middle-aged (30, … , 60),
old (61, … , 80), senile ( ≥ 81) )
salary ( low(0, … , 40K), medium (50K, … , 70K), high (80K, … , 100K),
very-high ( >100K) )
( age, young ) ( salary, 40K )
( age, young ) ( salary, low )
( age, N ) ( salary, 40K )
( age, N ) ( salary, low )
Failing Query Problem 2
Failing Queries in Collaborative IS
S = (X, A, V) – client site
A = {a, b, d, …}, c A
Va={a1, a2, a3}, Vb={b1,1, b1,2, b1,3, b2,1, b2,2, b2,3, b3,1, b3,2, b3,3}
Vd={d1, d2, d3}
Semantics of hierarchical attributes {a, b, c, d} used by S and systems collaborating with S:
• a(a1[a1,1, a1,2, a1,3], a2[a2,1, a2,2, a2,3], a3[a3,1, a3,2, a3,3])
• b(b1[b1,1, b1,2, b1,3], b2[b2,1, b2,2, b2,3], b3[b3,1, b3,2, b3,3])
• c(c1 [c1,1, c1,2, c1,3], c2[c2,1, c2,2, c2,3], c3 [c3,1, c3,2, c3,3])
• d(d1 [d1,1, d1,2, d1,3], d2[d2,1, d2,2, d2,3], d3 [d3,1, d3,2, d3,3])
Assume:
Query q = ai,1* bi* ci,3* di is submitted to S.
q = ai,1* [ bi,1+ bi,2+ bi,3 ] * ci,3* di =
[ ai,1* bi,1 * ci,3* di ] + [ ai,1* bi,2 * ci,3* di ] +
[ai,1* bi,3 * ci,3* di ]How to solve query q ?
1. Generalize ai,1 to ai and ci,3 to c. The query has new form:
q1 = ai* [ bi,1+ bi,2+ bi,3 ] * di
2.a. Objects matching q1 may satisfy q2.b. Generalizations decrease the chance that retrieved objects will match query q.
Check what values of attributes a and c are impliedby di* bi,1, di* bi,2, or di* bi,3 at remote sites for S, and if any of these rules have high confidence and support.
S: a[i], b[i,j], d[i]
q = ai,1* [ bi,1+ bi,2+ bi,3 ] * ci,3* di =
[ ai,1* bi,1 * ci,3* di ] + [ ai,1* bi,2 * ci,3* di ] +
[ai,1* bi,3 * ci,3* di ]How to solve query q ?
1. Generalize ai,1 to ai and ci,3 to c. The query has new form:
q1 = ai* bi* di = [ai* bi,1* di ] + [ai* bi,2* di ] + [ai* bi,3* di ]
2. Check what values of attributes a and c are implied by di* bi,1, di* bi,2, or di* bi,3 at remote sites for S, and if any of these rules have high confidence and support.
Assume that: di bi,1 ai,2 , di bi,2 ci,3 are certain rules, extracted at a remote site for S.
We get q [ ai,1* bi,2* di ] + [ai,1* bi,3* ci,3 * di ] local non-local
S: a[i], b[i,j], d[i]
q=q(a[3,1,3,2], b1, c2)
Possible generalization:
q1=q1(a3, b1, c2)
Rules extracted at remote sites which define any of the values below a[3] will help in solving q.
Rules describing values not belonging to
{a[3,1], a[3,1,3], a[3,1,3,2]}
are used to reduce the size of the query (to remove some conjuncts).
Failing Query Problem 2
Questions?
Thank You