relational algebra for bags · 2013. 2. 25. · datalog • datalog is a logic based query language...

29
Relational Algebra for Bags CSCI 4380 Database Systems Monday, October 4, 2010

Upload: others

Post on 26-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Relational Algebra for Bags

CSCI 4380 Database Systems

Monday, October 4, 2010

Page 2: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Bags• A bag is a multi-set.

• For a set, {1,2,2,3} = {1,2,3}

• For a bag, {1,2,2,3} ≠{1,2,3}

• There is no specific notation for bags that is universally accepted.

• A bag model for a relation means that a tuple may appear more than once.

Monday, October 4, 2010

Page 3: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Bags and relational algebra• Database implementations allow relations to be

defined as bags:

• there will be multiple copies of tuples if a primary key is not defined.

• Relational algebra implemented in databases uses bag semantics.

• We will now extend relational algebra to bags.

Monday, October 4, 2010

Page 4: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Bag operations

• Given a tuple t appears n times in R, m times in S

• t appears n+m times in R∪S

• t appears min(n,m) times in R∩S

• t appears min(0, n-m) times in R-S

Monday, October 4, 2010

Page 5: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Other operators• Selection, projection, Cartesian product and join are extended

in the usual way.

• In selection, each tuple that passes the condition is put in the output (no duplicate elimination).

• In projection, the columns are removed but there is no duplicate elimination.

• In Cartesian product (RxS), all pairs of tuples from R and S are put in the output.

• Join is simply Cartesian product followed by bag selection.

Monday, October 4, 2010

Page 6: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

New Operators• Duplicate elimination (δ (R))

• Removes duplicate tuples

• Extended projection (π (R)) projects

• attributes in relation R in the usual way, but attributes can be repeated

• constant values which creates a new column where each tuple has the constant value for the new column

• arithmetic and string operations involving attributes in R and constants

• attributes can be renamed with operation (➝).

Monday, October 4, 2010

Page 7: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Example

• πA+C➝E, B|D➝F, 2➝G,D,D (R)

A B C D

1 a 6 c

3 d 4 e

5 a 1 c

R

E F G D D

7 ac 2 c c

7 de 2 e e

6 ac 2 c c

Monday, October 4, 2010

Page 8: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Join

• The join operation we have seen up to know is called the inner join.

• The inner join R ⋈c S returns tuples from R and S that satisfy the join condition C.

• All other tuples are omitted.

Monday, October 4, 2010

Page 9: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Inner join

• The marked tuples do not participate in the join as they have no matching tuples in the other relation.

A B C1 a 63 d 45 a 1

C D E6 x 01 y 18 z 26 w 3

R S !"R S

A B C D E1 a 6 x 01 a 6 w 35 a 1 y 1

Monday, October 4, 2010

Page 10: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Outer join

• The unmatched tuples are included in (full) outer join. For attributes with missing values, null values are added (⊥).

A B C1 a 63 d 45 a 1

C D E6 x 01 y 18 z 26 w 3

R S R S

A B C D E1 a 6 x 01 a 6 w 35 a 1 y 13 d 4 ⊥ ⊥⊥ ⊥ 8 z 2

o!"

Monday, October 4, 2010

Page 11: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Left outer join

• Only the unmatched tuples from the left relation are added.

A B C1 a 63 d 45 a 1

C D E6 x 01 y 18 z 26 w 3

R S R S

A B C D E1 a 6 x 01 a 6 w 35 a 1 y 13 d 4 ⊥ ⊥

o!"L

Monday, October 4, 2010

Page 12: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Right outer join

• Only the unmatched tuples from the right relation are added.

A B C1 a 63 d 45 a 1

C D E6 x 01 y 18 z 26 w 3

R S R S

A B C D E1 a 6 x 01 a 6 w 35 a 1 y 1⊥ ⊥ 8 z 2

o!"R

Monday, October 4, 2010

Page 13: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Aggregate operators• It is possible to find the

• sum, min, max, avg (and other functions) of all tuples for an attribute or the result of an arithmetic/string operation over the attributes

• Examples, given R(A,B,C,D,E)

• sum(A), min(B+C), max(C)

• min(D|E)

Monday, October 4, 2010

Page 14: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Aggregate example

• Υsum(A), min(C) R

A B C D1 a 6 c3 d 4 e5 a 1 c

R

sum(A) min(C)9 1

Monday, October 4, 2010

Page 15: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Grouping operator

• Instead of computing the aggregate for all the tuples, we can compute it for groups of tuples

• Group by attributes

• Compute aggregates for each group

Monday, October 4, 2010

Page 16: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Group by• Group by BA B C D

1 a 6 c3 d 4 e5 a 1 c1 d 3 c3 f 2 e

3 d 4 e1 d 3 c

1 a 6 c5 a 1 c

3 f 2 e

• Aggregrate sum(A), min(C) 6 1

4 3

3 2

Monday, October 4, 2010

Page 17: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Group byA B C D1 a 6 c3 d 4 e5 a 1 c1 d 3 c3 f 2 e

• ϒB, sum(A), min(C)→MC R

B sum(A) MCa 6 1d 4 3f 3 2

R

Monday, October 4, 2010

Page 18: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Group byGroup by A,D

A B C D1 a 6 c3 d 4 e5 a 1 c1 d 3 c3 f 2 e 3 d 4 e

3 f 2 e

1 a 6 c1 d 3 c

Aggregrate sum(A*C)

9

5

18

5 a 1 c

• ϒA, D, sum(A*C)→AC RA D AC1 c 95 c 53 e 18

Monday, October 4, 2010

Page 19: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Datalog

• Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra

• no group by, aggregate operators

• set oriented -though bag oriented extensions exist

Monday, October 4, 2010

Page 20: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Datalog• A relation is represented as a predicate

• The attributes of the relation are arguments of the predicate

• A term is either a simple constant or a variable (variables are denoted by lower case letters

• An atom is a predicate p(*) that contains terms for each of its arguments

Monday, October 4, 2010

Page 21: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Atoms• Atoms with no variables are called facts.

• Atoms evaluate to true or false.

• A database contains a set of facts.

• Relation R to the right contains two facts:

• R(1,2,’S’)

• R(3,4,’A’)

A B C1 2 ‘S’3 4 ‘A’

R

Monday, October 4, 2010

Page 22: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Queries• We will represent queries as new predicates defined

using rule in terms of the predicate representing stored relations (also called extensional database).

• A Datalog rule is of the form:

• A ← B1,...,Bn

• where A (head), B1,...,Bn (body) are atoms.

• All rules with the same predicate p in the head represent the definition of p.

Monday, October 4, 2010

Page 23: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Queries• A rule of the form

• A ← B1,...,Bn

• is interpreted as follows:

• For all possible instances of the variables in B1,...,Bn that makes B1 and ... and Bn true, return a tuple in A.

• For all tuples in A, there exists a set of variable substitutions in B1,...,Bn such that B1 and ... and Bn is true.

Monday, October 4, 2010

Page 24: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Safety• A rule of the form:

• A ← B1,...,Bn

• is considered safe is every variable that appears in a negated atom also appears in a positive safe predicate.

• All predicates corresponding to database predicates are considered safe.

• If all the rules defining a new predicate p are safe, then p is also considered safe.

Monday, October 4, 2010

Page 25: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Example queries• q(x,y,z) ←r(x,y,z)

• returns all the tuples in r

• q(x,y,z) ←r(x,y,z), z = 1

• returns all tuples in r where the third attribute is 1.

• q(x,y) ←r(x,y,_)

• returns all x,y from r ( _ is the don’t care symbol).

Monday, October 4, 2010

Page 26: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Relational Algebra to Datalog

• Selection: P = σX>1 R:

• P(x,y,z) ← R(x,y,z), X>1

• Project P = πA,B R:

• P(x,y) ← R(x,y,_)

Monday, October 4, 2010

Page 27: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Relational Algebra to Datalog• Cartesian Product

• Example: given R(A,B,C), S(D,E)

• T = RxS is equivalent to

• T(x,y,z,w,g) ← R(x,y,z), S(w,g)

• Join: T = R S

• T(x,y,z,w,g) ← R(x,y,z), S(w,g), z=w

!"C=D

Monday, October 4, 2010

Page 28: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Relational Algebra to Datalog• Set union, P = R∪S

• P(x,y,z) ← R(x,y,z)

• P(x,y,z) ← S(x,y,z)

• Set intersection, P = R∩S

• P(x,y,z) ← R(x,y,z), S(x,y,z)

• Set difference, P = R-S

• P(x,y,z) ← R(x,y,z), not S(x,y,z)

Monday, October 4, 2010

Page 29: Relational Algebra for Bags · 2013. 2. 25. · Datalog • Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra • no

Recursion• It is possible to write recursive rules in Datalog

which cannot be expressed in relational algebra.

• Example:

• Given: parent(x,y) meaning x is a parent of y.

• ancestor(x,y) ← parent(x,y)

• ancestor(x,y) ← parent(x,z), ancestor(z,y).

Monday, October 4, 2010