solving failing queries *) zbigniew w. ras university of north carolina charlotte, n.c. 28223, usa...

25
Solving Failing Queries *) Zbigniew W. Ras Zbigniew W. Ras University of North University of North Carolina Carolina Charlotte, N.C. 28223, Charlotte, N.C. 28223, USA USA [email protected] [email protected]

Post on 21-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Solving Failing Queries*)

Zbigniew W. Ras Zbigniew W. Ras

University of North CarolinaUniversity of North Carolina

Charlotte, N.C. 28223, USACharlotte, N.C. 28223, USA

[email protected]@uncc.edu

Page 2: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Query Problems

Problem 1.Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax query’s constraints so that it returns a non-empty set of tuples.

Assumption: S(A) – information system based onattributes from A, q(B) – query based on attributes from B.Query q(B) is not local for system S(A), if [B A].

Problem 2.Given S(A), which represents one of the sites of a distributedautonomous information system, and not local query q(B)submitted to S(A), where A B , how to modify q(B) so it can be answered.

Page 3: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Query Problem 1

Problem 1. [Cooperative Query Answering][Papers by: Minker, Chu, Gaasterland, Demolombe, Muslea]

age

young middle-aged old

salary

low medium high

18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k

Example of a query:(age, 18) (salary, 40k)

Possible relaxations:(age, young) (salary, 40k)(age, 18) (salary, low)(age, young) (salary, low)

Preference for relaxation:[1 - age, 2 - salary]

Page 4: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

aa bb cc ee

xx11 aa[1,2][1,2] bb11 cc22 ee11

xx22 aa[1,2][1,2] bb11 cc22 ee11

xx33 aa[1,1][1,1] bb11 cc11 ee11

xx44 aa[2,1][2,1] bb22 cc22 ee22

xx55 aa[1,1][1,1] bb22 cc11 ee22

Information System S1

q = a[1,2]*c1 submitted to S1 fails(no objects in S1 satisfying q)

Solution: q can be generalized by QAS to q1 = a1*c1, which is matching objects x3 and x5 in S1.

Question: Which of these two objects (x3 or x5) is closer to q?

Attribute a is hierarchical of a structure in Lisp-like notation a(a1( a[1,1], a[1,2] ), a2( a[2,1] ,…))

Failing Query Problem 1

Page 5: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Information System S1

q = a[1,2]*c1 submitted to S1 fails(no objects in S1 satisfying q)

Question: Which of these two objects (x3 or x5) is closer to q?

Let k m. Then, the distance:δa(a[i(1), i(2),…, i(k)], a[j(1), j(2),…, j(m)]) = if [i(1) = j(1) … i(n) = j(n) [ n = k m i(n+1) j(n+1)]], then 1/2n else 0

δ(xi,xj) = δa+ δb+ δc+ δe

δ(q, x3) = ½+1+1+1= 3½δ(q, x5) = ½+1+1+1 = 3½

Result: both are OK

aa bb cc ee

xx11 aa[1,2][1,2] bb11 cc22 ee11

xx22 aa[1,2][1,2] bb11 cc22 ee11

xx33 aa[1,1][1,1] bb11 cc11 ee11

xx44 aa[2,1][2,1] bb22 cc22 ee22

xx55 aa[1,1][1,1] bb22 cc11 ee22

Failing Query Problem 1

Page 6: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Query Problem 1

[Muslea, KDD’04]On-line, query-guided algorithm for relaxing failing DNF queries

Example. A = {Price, CPU, Display, Weight}. Failing queryq(A) = [Price$2000][CPU2.5GHz][Display17’’][Weight3lbs].

Select randomly chosen small subset of target DB to discoverimplicit relationships between values of attributes used in query.

Discovered Rules:r1 = [[Price$2900][Display18’’][Weight4lbs] [CPU2.5GHz]].r2 = [[Price$3500] [CPU2.5GHz]].r3 = …….Nearest-neighbor technique is used to identify which rule is most similar to failing query. Assume that r1 is such a rule.

Relaxed query:[Price$2900][CPU2.5GHz][Display17’’][Weight4lbs].

Page 7: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Query Problem 2

Problem 2. [Collaborative Query Answering][Papers: Ras, Zemankova, Stolfo, Maitan, Zytkow, Dardzinska]

Example of a non-local query

Database:Flights(airline; departure time; arrival time; departure airport; arrival airport).

select * from Flights where airline = "Delta” departure time = "morning" departure airport = "Charlotte" aircraft = "Boeing"

Page 8: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Query Processing in Collaborative Systems

aa bb cc ee

xx11 aa[1,2][1,2] bb11 cc22 ee11

xx22 aa[1,2][1,2] bb11 cc22 ee11

xx33 aa[1,1][1,1] bb11 cc11 ee11

xx44 aa[2,1][2,1] bb22 cc22 ee22

xx55 aa[1,1][1,1] bb22 cc22 ee22System S1

aa bb cc dd

yy11 aa11 bb22 cc11 dd11

yy22 aa22 bb11 cc22 dd22

yy33 aa11 bb[1,1][1,1] cc22 dd22

yy44 aa11 bb[1,1][1,1] cc22 dd11

yy55 aa22 bb22 cc11 dd22

System S

q = a1 b1 e1 submitted to S fails,because attribute e is not in S

(clearly b[1,1] is also b1).

Find definition of e1 in S1:b1→e1; c1→e1; a[1,2]→e1

q = a1b1e1 a1b1 (b1+c1+a[1,2]) = = a1*b1+a1b1c1+a1b1a[1,2] = a1*b1.

Objects y3, y4 satisfy the query q.

Page 9: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Query Processing in Collaborative Systems

aa bb cc ee

xx11 aa[1,2][1,2] bb11 cc22 ee11

xx22 aa[1,2][1,2] bb11 cc22 ee11

xx33 aa[1,1][1,1] bb11 cc11 ee11

xx44 aa[2,1][2,1] bb22 cc22 ee22

xx55 aa[1,1][1,1] bb22 cc22 ee22System S1

aa bb cc dd

yy11 aa11 bb22 cc11 dd11

yy22 aa22 bb11 cc22 dd22

yy33 aa11 bb[1,1][1,1] cc22 dd22

yy44 aa11 bb[1,1][1,1] cc22 dd11

yy55 aa22 bb22 cc11 dd22System S

q = a[1,2]b[1,1] submitted to S1

fails because of the granularity of b.Find definition b[1,1] in S:

a1c2→b[1,1].

q = a[1,2] b[1,1] a[1,2]a1 c2 = a[1,2] c2.

Objects x1, x2 satisfy the query q.

Page 10: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Query Problem 2

Page 11: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Query Processing in Incomplete IS

X is a set of objects, A is a set of attributes, Va is a set of values of attribute a, where a A, and V = {Va : a A},

S = (X,A,V) is a partially incomplete information system of type , if the following two conditions hold:

for any x X, a A,

- if aS(x) is defined, then [aS(x) Va or aS(x)={(vi,pi): 1 i m}],

- if [aS(x)={(vi,pi): 1 i m}], then [ i=1…m pi = 1 and (i)(pi )]

- Also, if [aS(x) = v, then the value v has the same meaning as {(v,1)}]

Failing Query Problem 2

Page 12: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

xx88

xx77

xx66

xx55

xx44

xx33

xx22

xx11

eeddccbbaaXX

),( 32

2a

),( 31

1b1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e),( 3

12e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a),( 4

11b

),( 43

2b),( 3

11c

),( 32

2c 2d 2e

3a2b 1c

1d 3e

Incomplete Information System

Queries: q1(a,b) = a1 * b1

q2(a,b) = a1 + b1

J(a1) = {(x1,1/3), (x3,1),(x5,2/3)}J(b1) = {(x1,2/3),(x2,1/3),(x4,1/2), (x5,1),(x7,1/4)}

What aboutJ(a1* b1) = J(a1) J(b1),J(a1 + b1) = J(a1) J(b1) ?

Page 13: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Interpretations for and

Assume that: J(aAssume that: J(a11) = {(x) = {(xii, p, pii): i ): i K} and J(b K} and J(b11) = {(x) = {(xii, q, qii): i ): i K}. K}.

Interpretation TInterpretation T00

J(aJ(a11) ) 00 J(b J(b11) as {(x) as {(xii, S, S11(p(pii, q, qii): i ): i K}, where K}, where SS11(p(pii, q, qii) = [if max(p) = [if max(pii, q, qii) =1, then min(p) =1, then min(pii, q, qii), else 0].), else 0].

J(aJ(a11) ) 00 J(b J(b11) as {(x) as {(xii, S, S22(p(pii, q, qii): i ): i K}, where K}, where SS22(p(pii, q, qii) = [if min(p) = [if min(pii, q, qii)=0, then max(p)=0, then max(pii, q, qii), else 1].), else 1].

Interpretation TInterpretation T11

J(aJ(a11) ) 11 J(b J(b11) as {(x) as {(xii, max {0, p, max {0, pii+q+qii-1}): i -1}): i K} and K} and J(aJ(a11) ) 11 J(b J(b11) as {(x) as {(xii, min{1, p, min{1, pii + q + qii}) : i }) : i K}. K}.

Interpretation TInterpretation T22

J(aJ(a11) ) 22 J(b J(b11) = {(x) = {(xii, [p, [piiqqii]/[2 - (p]/[2 - (pii + q + qii – p – piiqqii)]): i )]): i K} and K} and J(aJ(a11) ) 22 J(b J(b11) = {(x) = {(xii, [p, [pi i + q+ qii]/[1 + p]/[1 + piiqqii]) : i ]) : i K}. K}.

Page 14: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Interpretations for and Interpretation T3Interpretation T3J(aJ(a11) ) 33 J(b J(b11) = {(x) = {(xii, p, piiqqii): i ): i K} K} J(aJ(a11) ) 33 J(b J(b11) = {(x) = {(xii, p, pii+q+qii - p - piiqqii) : i ) : i K} K}

Interpretation T4Interpretation T4J(aJ(a11) ) 44 J(b J(b11) = {(x) = {(xii, [p, [piiqqii]/[p]/[pii + q + qii – p – piiqqii]): i ]): i K} K} J(aJ(a11) ) 44 J(b J(b11) = {(x) = {(xii, [p, [pii + q + qii - 2 - 2ppiiqqii]/[1 – p]/[1 – piiqqii]) : i ]) : i K} K}

Fuzzy Interpretation T5Fuzzy Interpretation T5J(aJ(a11) ) 55 J(b J(b11) = {(x) = {(xii, min {p, min {pii, q, qii}: i }: i K} K} J(aJ(a11) ) 55 J(b J(b11) = {(x) = {(xii, max { p, max { pii, q, qii}) : i }) : i K} K}

Another possible interpretationAnother possible interpretation TT

J(aJ(a11) ) 33 J(b J(b11) = {(x) = {(xii, p, piiqqii): i ): i K}K}J(aJ(a11) ) 55 J(b J(b11) = {(x) = {(xii, max { p, max { pii, q, qii}) : i }) : i K} K}

Interpretations TInterpretations T00, T, T55, T satisfy property:, T satisfy property: a a (b (b c) = (a c) = (a b) b) (a (a c) c)

Page 15: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Incomplete IS [S2 is finer than S1]

Assume:Assume: SS11, S, S22 partially incomplete partially incomplete ISIS of type of type λλ

The same objects are stored in both systemsThe same objects are stored in both systems The same attributes are used to describe objectsThe same attributes are used to describe objects aaSS11((xx) ={() ={(aa11ii, , pp11ii): 1 ≤ ): 1 ≤ mm11}, }, aaSS22((xx) ={() ={(aa22ii, , pp22ii): 1 ≤ ): 1 ≤ mm22} }

Failing Query Problem 2

Page 16: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

SS22 is finer than is finer than SS11 if: if:

((xxXX)()(aaAA)[card()[card(aaSS11((xx)) ≥ card()) ≥ card(aaSS22((xx))]))]

((xxXX)()(aaAA) [card() [card(aaSS11((xx)) = card()) = card(aaSS22((xx))] ))]

[[i≠j|i≠j|pp22ii - p - p2j2j| > | > i≠ji≠j||pp11ii - p - p1j1j|]|]

Incomplete Information System

Page 17: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

S2 finer than S1

x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 21

2a

),( 31

1b1e

1c1d

),( 21

1a

),( 41

2a),( 4

33a ),( 3

22b 2d

2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e

),( 31

2e

),( 21

1a),( 2

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a),( 4

11b

),( 43

2b),( 3

11c

),( 32

2c 2d 2e

3a2b 1c

1d 3e x8

x7

x6

x5

x4

x3

x2

x1

edcbaX

),( 32

2a

),( 31

1b1e

1c1d

),( 21

1e),( 2

12e

),( 31

1a ),( 32

1b),( 3

12b

),( 41

2a),( 4

33a ),( 3

22b 2d

1a 2b ),( 21

1c),( 2

13c 2d

3e

3a 2c1d

),( 32

1e

),( 31

2e

),( 32

1a),( 3

12a 1b 2c

1e

2a 2b 3c 2d),( 3

12e

),( 32

3e

2a),( 4

11b

),( 43

2b),( 3

11c

),( 32

2c 2d 2e

3a2b 1c

1d 3e

S2S1

Page 18: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Queries in Collaborative IS Assume:

• Query q = q(B) is submitted to S =(X, A, V), where:

• B is a set of all attributes used in q

• AB≠• Attributes in B\(AB) are foreign for S

• Two information systems can collaborate if they agree on the ontology of some of their common attributes

• The granularity of values of attributes used in a query q may differ from the granularity of values of the same attributes in S

Page 19: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Queries in Collaborative IS

Query q(B) can be processed at site S by discovering definitions of values of attributes from B\(AB) at some of the remote sites for S.

With each certain rule discovered at a remote site, a number of additional rules can be also discovered.

Page 20: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Exampleage ( child( ≤ 17), young (18, … , 29), middle-aged (30, … , 60),

old (61, … , 80), senile ( ≥ 81) )

salary ( low(0, … , 40K), medium (50K, … , 70K), high (80K, … , 100K),

very-high ( >100K) )

( age, young ) ( salary, 40K )

( age, young ) ( salary, low )

( age, N ) ( salary, 40K )

( age, N ) ( salary, low )

Failing Query Problem 2

Page 21: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Failing Queries in Collaborative IS

S = (X, A, V) – client site

A = {a, b, d, …}, c A

Va={a1, a2, a3}, Vb={b1,1, b1,2, b1,3, b2,1, b2,2, b2,3, b3,1, b3,2, b3,3}

Vd={d1, d2, d3}

Semantics of hierarchical attributes {a, b, c, d} used by S and systems collaborating with S:

• a(a1[a1,1, a1,2, a1,3], a2[a2,1, a2,2, a2,3], a3[a3,1, a3,2, a3,3])

• b(b1[b1,1, b1,2, b1,3], b2[b2,1, b2,2, b2,3], b3[b3,1, b3,2, b3,3])

• c(c1 [c1,1, c1,2, c1,3], c2[c2,1, c2,2, c2,3], c3 [c3,1, c3,2, c3,3])

• d(d1 [d1,1, d1,2, d1,3], d2[d2,1, d2,2, d2,3], d3 [d3,1, d3,2, d3,3])

Page 22: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Assume:

Query q = ai,1* bi* ci,3* di is submitted to S.

q = ai,1* [ bi,1+ bi,2+ bi,3 ] * ci,3* di =

[ ai,1* bi,1 * ci,3* di ] + [ ai,1* bi,2 * ci,3* di ] +

[ai,1* bi,3 * ci,3* di ]How to solve query q ?

1. Generalize ai,1 to ai and ci,3 to c. The query has new form:

q1 = ai* [ bi,1+ bi,2+ bi,3 ] * di

2.a. Objects matching q1 may satisfy q2.b. Generalizations decrease the chance that retrieved objects will match query q.

Check what values of attributes a and c are impliedby di* bi,1, di* bi,2, or di* bi,3 at remote sites for S, and if any of these rules have high confidence and support.

S: a[i], b[i,j], d[i]

Page 23: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

q = ai,1* [ bi,1+ bi,2+ bi,3 ] * ci,3* di =

[ ai,1* bi,1 * ci,3* di ] + [ ai,1* bi,2 * ci,3* di ] +

[ai,1* bi,3 * ci,3* di ]How to solve query q ?

1. Generalize ai,1 to ai and ci,3 to c. The query has new form:

q1 = ai* bi* di = [ai* bi,1* di ] + [ai* bi,2* di ] + [ai* bi,3* di ]

2. Check what values of attributes a and c are implied by di* bi,1, di* bi,2, or di* bi,3 at remote sites for S, and if any of these rules have high confidence and support.

Assume that: di bi,1 ai,2 , di bi,2 ci,3 are certain rules, extracted at a remote site for S.

We get q [ ai,1* bi,2* di ] + [ai,1* bi,3* ci,3 * di ] local non-local

S: a[i], b[i,j], d[i]

Page 24: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

q=q(a[3,1,3,2], b1, c2)

Possible generalization:

q1=q1(a3, b1, c2)

Rules extracted at remote sites which define any of the values below a[3] will help in solving q.

Rules describing values not belonging to

{a[3,1], a[3,1,3], a[3,1,3,2]}

are used to reduce the size of the query (to remove some conjuncts).

Failing Query Problem 2

Page 25: Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

Questions?

Thank You