dzielenie relacyjne (relational division) bazy i hurtownie danych, two1, 2010 dbdw-winter_2010-11

22
Dzielenie relacyjne (Relational division) Bazy i hurtownie danych, TWO1, 2010 https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/ dbdw-winter_2010-11/

Upload: tristen-crosse

Post on 14-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Dzielenie relacyjne(Relational division)

Bazy i hurtownie danych, TWO1, 2010

https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/dbdw-winter_2010-11/

Bazy i hurtownie danych 2

References

1. V.M. Matos, R. Grasser, Assessing performance of the relational division operator. Data Base Management 2001, 22-20-30, 1-15

2. V.M. Matos, R. Grasser, A Simpler (and Better) SQL Approach to Relational Division, Journal of Information System Education 2002, 13 (2), 85-88.

21/11/2010

Bazy i hurtownie danych 3

Relational Division

The basic operators of the relational algebra:– Union (UNION)– Difference (MINUS/EXCEPT)– Cartesian product– Projection & selection (SELECT ... FROM...)

Additional operators added to the relational algebra: – Join most popular in practice– Rename (AS)– Intersection (INTERSECT)– Division

21/11/2010

Bazy i hurtownie danych 4

Relational Division

The division operator is less common than select-project-join queries, however, it is applicable to many common queries:– Find suppliers who supply all the engine parts– Find students who have taken all the core courses– Find customers who have ordered all items from a given line of products

The division operator can be also employed in data mining algorithms (e.g., generation of association rules)

21/11/2010

Bazy i hurtownie danych 5

Informal Definition

The division operator allows verifying whether or not a candidate subject is related to each of the values held in the base set.

The base set is called the divisor (or denominator T2[B]), and the table holding the subject’s data is called the dividend (or nominator T1[A, B]).

The expression T1[A, B]/T2[B] selects the A values from the dividend table T1[A, B], whose B values are a superset of those B values held in the divisor table T2[B].

21/11/2010

Bazy i hurtownie danych 6

Informal Definition

21/11/2010

T3[A] = T1[A,B] / T2[B]

Bazy i hurtownie danych 7

More Practical Example #1

Given the relations ORDERS[SID, PID, QTY] and PRODUCTS[PID, PRICE] find all the stores that ordered at least 10 of all products priced over 15 $

21/11/2010

SID PID QTY

s2 p2 25

s2 p3 20

s2 p4 20

s1 p1 25

s1 p4 4

s1 p3 12

s1 p2 30

s3 p4 12

s3 p2 14

ORDERS

PID PRICE

p1 10

p2 25

p3 18

p4 20

PRODUCTS

ORDERS[SID,PID] / PRODUCTS[PID] = { s2 }

Bazy i hurtownie danych 8

More Practical Example #2

Given the relations ORDERS[SID, PID, PMNT] and PRODUCTS[PID, PRICE] we focus on ordered products priced over 15$ and paid either with cash or a credit card. What would be the result of ORDERS[SID, PID, PMNT] / PRODUCTS[PID]?

21/11/2010

SID PID PMNT

s1 p1 cash

s1 p2 bank

s1 p2 credit

s1 p3 cash

s1 p4 bank

s1 p3 credit

s2 p2 cash

s2 p2 bank

s2 p3 credit

ORDERS

PID PRICE

p1 10

p2 25

p3 18

p4 14

PRODUCTS

ORDERS[SID,PID,PMNT] / PRODUCTS[PID] = { (s1, credit) }

Bazy i hurtownie danych 9

Formal Definition: Relational Algebra

Let’s assume that the numerator table T1 always consists of two columns A and B, and the denominator has only one B attribute. Then, the expression T1[A, B]/T2[B] is semantically equivalent to:

T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A]

21/11/2010

Bazy i hurtownie danych 10

Formal Definition: Relational Algebra

21/11/2010

Bazy i hurtownie danych 11

Formal Definition:Tuple-Calculus

Using relational tuple-calculus language, the division operator can be rephrased as follows:

T1[A, B]/T2[B] = { t1[A] / t1 T1 and for-all t2 (t2 T2 exists t3 (t3 T1 and

(t1[A] = t3[A]) and(t2[B] = t3[B]))) }

21/11/2010

Bazy i hurtownie danych 12

Formal Definition: NFNF Databases(Non First-Normal Form)

Assumption of a NFNF format (fields with sets of atomic values) results in a much simplified definition of the division operator in tuple-calculus (attribute *B in T1 and T2 is defined as a set of atomic values):

T1[A, B]/T2[B] = { t1[A] / t1 T1 and t2 T2 and t2[*B] t1[*B] }

21/11/2010

Bazy i hurtownie danych 13

Formal Definition: NFNF Databases

21/11/2010

Bazy i hurtownie danych 14

SQL Implementation: Q0

SELECT A FROM T1WHERE B IN (SELECT B FROM T2)GROUP BY AHAVING COUNT(*) = (SELECT COUNT(*) FROM T2)

21/11/2010

Bazy i hurtownie danych 15

SQL Implementation: Q1(Byzantine Method )

Based on the formal predicate calculus definition modified to fit SQL:– The universal quantifier for-all x (f(x)) replaced by not exists x

(not f(x))– The implication X Y replaced by (not(X) or Y)

T1[A, B]/T2[B] = { t1[A] / t1 T1 and not exists t2 (not(not( t2 T2)

or (exists t3 (t3 T1 and (t1[A] = t3[A]) and(t2[B] = t3[B]))))) }

21/11/2010

Bazy i hurtownie danych 16

SQL Implementation: Q1

Previous definition is equivalent (following De Morgan’s law not (P or Q) = not P and not Q ) to:

T1[A, B]/T2[B] = { t1[A] / t1 T1 and not exists t2 (( t2 T2) and

(not exists t3 (t3 T1 and (t1[A] = t3[A]) and(t2[B] = t3[B])))) }

21/11/2010

Bazy i hurtownie danych 17

SQL Implementation: Q1

SELECT DISTINCT x.A FROM T1 AS xWHERE NOT EXISTS (SELECT *

FROM T2 yWHERE NOT EXISTS (SELECT *

FROM T1 AS zWHERE (z.A=x.A) AND (z.B=y.B)))

21/11/2010

Bazy i hurtownie danych 18

SQL Implementation: Q2

Based on the algebraic definition of the division operator and broken into two steps:

SELECT DISTINCT y.A, z.B INTO T3FROM T1 AS y, T2 AS zWHERE NOT EXISTS (SELECT *

FROM T1WHERE (T1.A = y.A) AND (T1.B=z.B))

SELECT DISTINCT A FROM T1WHERE NOT EXISTS (SELECT *

FROM T3WHERE (T3.A=T1.A))

21/11/2010

T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A]

Bazy i hurtownie danych 19

SQL Implementation: Q3

Based on the definition for the NFNF and tuple-calculus:

SELECT DISTINCT x.AFROM T1 AS xWHERE (SELECT COUNT(*) FROM T2) =

(SELECT COUNT(*) FROM T1, T2 WHERE (T1.A=x.A) AND (T1.B=T2.B))

21/11/2010

Bazy i hurtownie danych 20

Zero Division

The divide operator is defined in such a way that T1[A,B]/T2[B] produces exactly all A values in T1 each time that T2[B] is empty.

An empty set would be a more appropriate answer this is how Q0 works.

21/11/2010

Bazy i hurtownie danych 21

Experimental Evaluation of Q0…Q3

Assume basic structure of tables (T1[A, B], T2[B], integer or char) Conduct an experiment with the following settings:

– number of A-values in T1 = 10 000,– number of B-values in T1 = 100,– number of B-values in T2 = 0, 20, 40, 60, 80, 100.

Use provided scripts to generate sample tables and to run specific queries (Q0…Q3).

Observer performance of specific queries (execution time – CPU time). Collect the observations in a tabular and graphical form.

21/11/2010

https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/dbdw-winter_2010-11/

Bazy i hurtownie danych 22

Checking Execution Time

21/11/2010

Turn SET STATISTICS TIME on (Tools Options)