solving failing queries *) zbigniew w. ras university of north carolina charlotte, n.c. 28223, usa...

Click here to load reader

Post on 21-Dec-2015

215 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • Solving Failing Queries *) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA [email protected]
  • Slide 2
  • Failing Query Problems Problem 1. Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax querys constraints so that it returns a non-empty set of tuples. Assumption: S(A) information system based on attributes from A, q(B) query based on attributes from B. Query q(B) is not local for system S(A), if [B A]. Problem 2. Given S(A), which represents one of the sites of a distributed autonomous information system, and not local query q(B) submitted to S(A), where A B , how to modify q(B) so it can be answered.
  • Slide 3
  • Failing Query Problem 1 Problem 1. [Cooperative Query Answering] [Papers by: Minker, Chu, Gaasterland, Demolombe, Muslea] age young middle-aged old salary low medium high 18 29 30 60 61 80 10k40k 50k 60k 70k 80k100k Example of a query: (age, 18) (salary, 40k) Possible relaxations: (age, young) (salary, 40k) (age, 18) (salary, low) (age, young) (salary, low) Preference for relaxation : [1 - age, 2 - salary]
  • Slide 4
  • abce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c1c1c1c1 e2e2e2e2 Information System S 1 q = a [1,2] *c 1 submitted to S 1 fails (no objects in S 1 satisfying q) Solution: q can be generalized by QAS to q 1 = a 1 *c 1, which is matching objects x 3 and x 5 in S 1. Question: Which of these two objects (x 3 or x 5 ) is closer to q? Attribute a is hierarchical of a structure in Lisp-like notation a(a 1 ( a [1,1], a [1,2] ), a 2 ( a [2,1],)) Failing Query Problem 1
  • Slide 5
  • Information System S 1 q = a [1,2] *c 1 submitted to S 1 fails (no objects in S 1 satisfying q) Question: Which of these two objects (x 3 or x 5 ) is closer to q? Let k m. Then, the distance: a (a [i(1), i(2),, i(k)], a [j(1), j(2),, j(m)] ) = if [i(1) = j(1) i(n) = j(n) [ n = k m i(n+1) j(n+1)]], then 1/2 n else 0 (x i,x j ) = a + b + c + e (q, x 3 ) = +1+1+1= 3 (q, x 5 ) = +1+1+1 = 3 Result: both are OK abce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c1c1c1c1 e2e2e2e2 Failing Query Problem 1
  • Slide 6
  • [Muslea, KDD04] On-line, query-guided algorithm for relaxing failing DNF queries Example. A = {Price, CPU, Display, Weight}. Failing query q(A) = [Price $2000] [CPU 2.5GHz] [Display 17] [Weight 3lbs]. Select randomly chosen small subset of target DB to discover implicit relationships between values of attributes used in query. Discovered Rules: r 1 = [[Price $2900] [Display 18] [Weight 4lbs] [CPU 2.5GHz]]. r 2 = [[Price $3500] [CPU 2.5GHz]]. r 3 = . Nearest-neighbor technique is used to identify which rule is most similar to failing query. Assume that r 1 is such a rule. Relaxed query: [Price $2900] [CPU 2.5GHz] [Display 17] [Weight 4lbs].
  • Slide 7
  • Failing Query Problem 2 Problem 2. [Collaborative Query Answering] [Papers: Ras, Zemankova, Stolfo, Maitan, Zytkow, Dardzinska] Example of a non-local query Database: Flights(airline; departure time; arrival time; departure airport; arrival airport). select * from Flights where airline = "Delta departure time = "morning" departure airport = "Charlotte" aircraft = "Boeing"
  • Slide 8
  • Query Processing in Collaborative Systemsabce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 System S 1abcd y1y1y1y1 a1a1a1a1 b2b2b2b2 c1c1c1c1 d1d1d1d1 y2y2y2y2 a2a2a2a2 b1b1b1b1 c2c2c2c2 d2d2d2d2 y3y3y3y3 a1a1a1a1 b [1,1] c2c2c2c2 d2d2d2d2 y4y4y4y4 a1a1a1a1 c2c2c2c2 d1d1d1d1 y5y5y5y5 a2a2a2a2 b2b2b2b2 c1c1c1c1 d2d2d2d2 System S q = a 1 b 1 e 1 submitted to S fails, because attribute e is not in S (clearly b [1,1] is also b 1 ). Find definition of e 1 in S 1 : b 1 e 1 ; c 1 e 1 ; a [1,2] e 1 q = a 1 b 1 e 1 a 1 b 1 (b 1 +c 1 +a [1,2] ) = = a 1 *b 1 +a 1 b 1 c 1 +a 1 b 1 a [1,2] = a 1 *b 1. Objects y 3, y 4 satisfy the query q.
  • Slide 9
  • Query Processing in Collaborative Systemsabce x1x1x1x1 a [1,2] b1b1b1b1 c2c2c2c2 e1e1e1e1 x2x2x2x2 b1b1b1b1 c2c2c2c2 e1e1e1e1 x3x3x3x3 a [1,1] b1b1b1b1 c1c1c1c1 e1e1e1e1 x4x4x4x4 a [2,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 x5x5x5x5 a [1,1] b2b2b2b2 c2c2c2c2 e2e2e2e2 System S 1abcd y1y1y1y1 a1a1a1a1 b2b2b2b2 c1c1c1c1 d1d1d1d1 y2y2y2y2 a2a2a2a2 b1b1b1b1 c2c2c2c2 d2d2d2d2 y3y3y3y3 a1a1a1a1 b [1,1] c2c2c2c2 d2d2d2d2 y4y4y4y4 a1a1a1a1 c2c2c2c2 d1d1d1d1 y5y5y5y5 a2a2a2a2 b2b2b2b2 c1c1c1c1 d2d2d2d2 System S q = a [1,2] b [1,1] submitted to S 1 fails because of the granularity of b. Find definition b [1,1] in S: a 1 c 2 b [1,1]. q = a [1,2] b [1,1] a [1,2] a 1 c 2 = a [1,2] c 2. Objects x 1, x 2 satisfy the query q.
  • Slide 10
  • Failing Query Problem 2
  • Slide 11
  • Query Processing in Incomplete IS X is a set of objects, A is a set of attributes, V a is a set of values of attribute a, where a A, and V = {V a : a A}, S = (X,A,V) is a partially incomplete information system of type, if the following two conditions hold: for any x X, a A, - if a S (x) is defined, then [a S (x) V a or a S (x)={(v i,p i ): 1 i m}], - if [a S (x)={(v i,p i ): 1 i m}], then [ i=1m p i = 1 and ( i)(p i )] - Also, if [a S (x) = v, then the value v has the same meaning as {(v,1)}] Failing Query Problem 2
  • Slide 12
  • Incomplete Information System Queries: q 1 (a,b) = a 1 * b 1 q 2 (a,b) = a 1 + b 1 J(a 1 ) = {(x 1,1/3), (x 3,1),(x 5,2/3)} J(b 1 ) = {(x 1,2/3),(x 2,1/3),(x 4,1/2), (x 5,1),(x 7,1/4)} What about J(a 1 * b 1 ) = J(a 1 ) J(b 1 ), J(a 1 + b 1 ) = J(a 1 ) J(b 1 ) ?
  • Slide 13
  • Interpretations for and Assume that: J(a 1 ) = {(x i, p i ): i K} and J(b 1 ) = {(x i, q i ): i K}. Interpretation T 0 J(a 1 ) 0 J(b 1 ) as {(x i, S 1 (p i, q i ): i K}, where S 1 (p i, q i ) = [if max(p i, q i ) =1, then min(p i, q i ), else 0]. J(a 1 ) 0 J(b 1 ) as {(x i, S 2 (p i, q i ): i K}, where S 2 (p i, q i ) = [if min(p i, q i )=0, then max(p i, q i ), else 1]. Interpretation T 1 J(a 1 ) 1 J(b 1 ) as {(x i, max {0, p i +q i -1}): i K} and J(a 1 ) 1 J(b 1 ) as {(x i, min{1, p i + q i }) : i K}. Interpretation T 2 J(a 1 ) 2 J(b 1 ) = {(x i, [p i q i ]/[2 - (p i + q i p i q i )]): i K} and J(a 1 ) 2 J(b 1 ) = {(x i, [p i + q i ]/[1 + p i q i ]) : i K}.
  • Slide 14
  • Interpretations for and Interpretation T3 J(a 1 ) 3 J(b 1 ) = {(x i, p i q i ): i K} J(a 1 ) 3 J(b 1 ) = {(x i, p i +q i - p i q i ) : i K} Interpretation T4 J(a 1 ) 4 J(b 1 ) = {(x i, [p i q i ]/[p i + q i p i q i ]): i K} J(a 1 ) 4 J(b 1 ) = {(x i, [p i + q i - 2 p i q i ]/[1 p i q i ]) : i K} Fuzzy Interpretation T5 J(a 1 ) 5 J(b 1 ) = {(x i, min {p i, q i }: i K} J(a 1 ) 5 J(b 1 ) = {(x i, max { p i, q i }) : i K} Another possible interpretation T J(a 1 ) 3 J(b 1 ) = {(x i, p i q i ): i K} J(a 1 ) 5 J(b 1 ) = {(x i, max { p i, q i }) : i K} Interpretations T 0, T 5, T satisfy property: a (b c) = (a b) (a c) a (b c) = (a b) (a c)
  • Slide 15
  • Incomplete IS [S 2 is finer than S 1 ] Assume: S 1, S 2 partially incomplete IS of type S 1, S 2 partially incomplete IS of type The same objects are stored in both systems The same objects are stored in both systems The same attributes are used to describe objects The same attributes are used to describe objects a S1 (x) ={(a 1i, p 1i ): 1 m 1 }, a S2 (x) ={(a 2i, p 2i ): 1 m 2 } a S1 (x) ={(a 1i, p 1i ): 1 m 1 }, a S2 (x) ={(a 2i, p 2i ): 1 m 2 } Failing Query Problem 2
  • Slide 16
  • S 2 is finer than S 1 if: ( x X)( a A)[card(a S1 (x)) card(a S2 (x))] ( x X)( a A)[card(a S1 (x)) card(a S2 (x))] ( x X)( a A) [card(a S1 (x)) = card(a S2 (x))] ( x X)( a A) [card(a S1 (x)) = card(a S2 (x))] [ ij|p 2i - p 2j | > ij |p 1i - p 1j |] [ ij|p 2i - p 2j | > ij |p 1i - p 1j |] Incomplete Information System
  • Slide 17
  • S 2 finer than S 1 S2S2 S1S1
  • Slide 18
  • Failing Queries in Collaborative IS Assume: Query q = q(B) is submitted to S =(X, A, V), where: B is a set of all attributes used in q A B Attributes in B\(A B) are foreign for S Two information systems can collaborate if they agree on the ontology of some of their common attributes The granularity of values of attributes used in a query q may differ from the granularity of values of the same attributes in S
  • Slide 19
  • Failing Queries in Collaborative IS Query q(B) can be processed at site S by discovering definitions of values of attributes from B\(A B) at some of the remote sites for S. With each certain rule discovered at a remote site, a number of additional rules can be also discovered.
  • Slide 20
  • Example age ( child( 17), young (18, , 29), middle-aged (30, , 60), old (61, , 80), senile ( 81) ) salary ( low(0, , 40K), medium (50K, , 70K), high (80K, , 100K), very-high ( >100K) ) ( age, young ) ( salary, 40K ) ( age, young ) ( salary, low ) ( age, N ) ( salary, 40K ) ( age, N ) ( salary, low ) Failing Query Problem 2
  • Slide 21
  • Failing Queries in Collaborative IS S = (X, A, V) client