1 relational algebra basic operators relational algebra expression tree

53
1 Relational Algebra Basic operators Relational algebra expression tree

Post on 21-Dec-2015

243 views

Category:

Documents


4 download

TRANSCRIPT

1

Relational Algebra

Basic operators

Relational algebra expression tree

2

Queries

• Find all courses that “Mary” takes

• What happens behind the scene ?– Query processor figures out how to answer

the query efficiently.

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

3

Queries, behind the scene

Imperative query execution plan:

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

Declarative SQL query

Students Takes

sid=sid

sname

name=“Mary”

cid=cid

Courses

The optimizer chooses the best execution plan for a query

4

What is an “Algebra”

• A mathematical model consists of:– Operands --- variables or values from which

new values can be constructed.– Operators --- symbols denoting procedures

that construct new values from given values. i.e. the ability to produce new values.

5

What is Relational Algebra?

• Simply speaking, the relational algebra is a collection of operators that – take relations as their operands – and return a relation as their result. (closure

properties allows nested expression!)

6

closure

• In mathematics, a set is said to be closed under some operation if the operation on members of the set produces a member of the set. For example, the real numbers are closed under subtraction.

• Similarly, a set is said to be closed under a collection of operations if it is closed under each of the operations individually.

• In relational algebra, we say the result of any operator is still a relation, thus in a closure, which is the foundation for sub-queries.

7

What is Relational Algebra?

• Its operands are relations or variables that represent relations.

• Comprises a basic set of relational model operations (represented by operators) which are designed to do the most common things that we need to do with relations in a database.– Sequences of relational algebra operations form

relational algebra expressions whose result is also a relation.

• The result is an algebra that can be used as but not limited to a query language (at abstract level) for relations.

8

Basic idea

• There is a core relational algebra that has traditionally been thought of as the relational algebra.

• But there are several other operators we shall add to the core in order to support or model better the language SQL --- the principal language used in relational database systems.

9

Core Relational Algebra

• Two groups of operations– Set theoretic operations:

• Union, intersection, and difference.– Usual set operations, but require both operands have the same

relation schema.

– Relational model specific operations:• Selection: picking certain rows. σ• Projection: picking certain columns. π• Products: cartesian product. * (also a kind of set operation,

but more related to joins.)• joins: compositions of relations.• Division: /

Pronounced Bow tie

10

Union, Intersection, Set difference operators

• These operations are binary: each is applied to two relations.

• In order to apply these operators to relations, the relations have to be union compatible.

• Two relations R(A1, …, An) and S(B1,…,Bn) are union compatible if they have same degree n and domain(Ai)=domain(Bi), 1<=1<=n.

11

Union, Intersection, Set difference operators

• In this part, we assume R and S are union compatible.

• Notation: – Union: R S – Intersection RS– Set difference: R S

12

Union: R S

• The result of R S is a relation that includes all the tuples that are either in R or S or in both R and S.

• Duplicates are eliminated.

13

Union: R S

FN LN

Susan Shasha

Bill Bush

Fname Lname

Sussan Shasha

Mike Ford

studentinstructor

FN LN

Susan Shasha

Bill Bush

Mike Ford

Student instructor

14

Intersection RS

• The result of RS is a relation that includes all the tuples that are in both R and S.

15

Intersection RS

FN LN

Susan Shasha

Bill Bush

Fname Lname

Sussan Shasha

Mike Ford

studentinstructor

FN LN

Susan Shasha

Student instructor

16

Set difference: R S

• The result of R-S is a relation that includes all the tuples that are in R but not in S.

17

Set difference: R S

FN LN

Bill Bush

Fname Lname

Mike FordInstructor - student

Student - instructor

18

Remarks

• Intersection and Union are commutative operations.– R S = S R – R S = R S

• Intersection and Union are associative operations.– R (S T) =( R S) T– R (S T) =( R S) T– Set difference is not a commutative operation.– In general, R-S !=S-R

19

Select operation (selection)

• The SELECT operation is used to select a subset of tuples from a relation that satisfy a selection condition.

• Form of the operation: σC (R)

– Where R is a relational algebra expression. The simplest expression is the name of a database relation.

– The condition C is an arbitrary Boolean expression on the attributes of R.

20

• The boolean expression is formed using clauses (atomic comparisons) of the form A op B or A op c, where A, B are attribute names, c is a constant, and op is one of the comparison operators {=, <,>=,>,>=,!=}

• The resulting relation has the same attributes as R.

• The resulting relation includes only tuples in R whose attribute values satisfy the condition C.

21

select

• R1 <-σC (R2)

– C is a condition (as in “if” statements) that refers to attributes of R2.

– R1 is all those tuples of R2 that satisfy C.

22

Example

Relation tinyemployee:fname lname salaryJoe Reale 250Joe Bush 275Sue Gates 250Bob Bush 300

AllBush<- σ lname=“Bush”(tinyemployee):fname lname salaryJoe Bush 275

Bob Bush 300

23

Example

σ (DNO=4 and SALARY>2500) or (DNO=5 and salary>30000) (employee):

24

Remarks

• The degree (i.e. the number of attributes) of the resulting relation is the same as that of the input relation.

• The resulting relation is empty if no tuple satisfies the selection condition.

• The number of tuples of the resulting relation is less than or equal to that of the input relation.

• The select operation is commutative:• σC 2(σC1 (R)) =σC 1(σC2 (R))• A sequence of select operations can be combined into a

single select operation• σC 1(σC2 (R)) =σC 1 and C2 (R)

25

Project operation (projection)

• The project operation selects some of the columns of a table while discarding the other columns.

• Form of operation: πL(R)– R is a relational algebra expression, – L is a list of attributes from the attributes of relation R.– The resulting relation has only those attributes of R

specified in L and in the same order as they appear in the list. Eliminate duplicate tuples, if any so that is remains a mathematical set (set does not allow duplicated elements! However, we will have more discussion later about this point when we come to SQL’s query mechanism!).

26

ExampleRelation tinyemployee:

fname lname salaryJoe Reale 250Joe Bush 275Sue Gates 250Bob Bush 300

Lastnameandsalary<- π lname, salary(tinyemployee)lname salaryReale 250Bush 275Gates 250

Bush 300

27

• Be aware that if several employees sharing the same last name and salary, only one tuple will be kept in the resulting relation for duplicated tuples.

28

Remarks

• The degree of the resulting relation is the number of attributes in the attribute list.

• The number of tuples of the resulting relation is less than or equal to that of the input relation:

• If the attribute list L is a superkey of R, the number of tuples in the resulting relation equals to the number of tuples in R.

• Commutativity does not hold on projection.

29

The Cartesian product operation

• The Cartesian product (Cross product, Cross join) is a binary operation and the input queries do not have to be union compatible.

• Notation: R x S• Given R(A1,…, An) and S(B1,… Bm), the

schema of the relation Q resulting by the operation RXS is: Q(A1,…An, B1, B….m). It has n+m attributes.

• The instance of Q has one tuple for each combination of tuples – one from R and one from S.

30

Cartesian Product

• R3 <- R1 x R2– Pair each tuple t1 of R1 with each tuple t2 of R2.– Concatenation t1t2 is a tuple of R3.– Schema of R3 is the attributes of R1 and R2, in

order.– But beware attribute A of the same name in R1 and

R2: use R1.A and R2.A. Since relational model assume no two attributes with the same name can appear in a relation schema.

31

Example: R3 <- R1 X R2

R1( A, B )1 23 4

R2( B, C )5 67 89 10

R3( A, R1.B, R2.B, C )1 2 5 61 2 7 81 2 9 103 4 5 63 4 7 83 4 9 10

32

From Cartesian to join

• The Cartesian product is a meaningless operation on its own. It can combine related tuples from two relations if followed by the appropriate select operation (in which case it is called join operation).

• The join operation is used to select and combine related tuples from two relations into single tuples.

33

Theta-Join

• The most general form of the join operation is the Theta Join, which is similar to a Cartesian product followed by a selection.

• R3 <- R1 C R2

– Take the product R1 X R2.

– Then apply SELECTC to the result.

• As for SELECT, C can be any boolean-valued condition.– Historic versions of this operator allowed only A theta B, where

theta was =, <, >, etc.; hence the name “theta-join.”– Theta join A join that links tables based on a relationship other

than equality between two columns– Theta means unknown (not necessarily equal) as apposed to

natural join here.

34

Example

Sells( bar, beer, price ) Bars( name, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

BarInfo <- Sells JOIN Sells.bar = Bars.name Bars

BarInfo( bar, beer, price, name, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Joe’s Maple St.Sue’s Bud 2.50 Sue’s River Rd.Sue’s Coors3.00 Sue’s River Rd.

35

Natural Join

• A frequent type of join connects two relations by:– Equating attributes of the same name, and– Projecting out one copy of each pair of

equated attributes.

• Called natural join.

• Denoted R3 <- R1 R2.

36

ExampleSells( bar, beer, price ) Bars( bar, addr )

Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

BarInfo <- Sells BarsNote Bars.name has become Bars.bar to make the naturaljoin “work.”

BarInfo( bar, beer, price, addr )Joe’s Bud 2.50 Maple St.Joe’s Milller 2.75 Maple St.Sue’s Bud 2.50 River Rd.Sue’s Coors3.00 River Rd.

37

• A join can be specified between a relation and itself (self-join).

• One can think of this as joining two distinct copies of the relation, although only one relation actually exists.

38

Inner joins or outer joins

• The join operations we described so far are called inner joins, where only matching tuples (through join condition or selection) are kept in the result.

• Sometimes, we need query results in which tuples from two tables are to be combined by matching corresponding rows, but without losing any tuples for lack of matching values. To meet this need, a set of operations, called outer joins are introduced here.

39

Outerjoin

• Suppose we join R JOINC S.

• A tuple of R that has no tuple of S with which it joins is said to be dangling.

– Similarly for a tuple of S.

• Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.

40

Example: Outerjoin

R = A B S = B C1 2 2 34 5 6 7

(1,2) joins with (2,3), but the other two tuplesare dangling.

R OUTERJOIN S = A B C1 2 34 5 NULLNULL 6 7

41

Three different kinds of outerjoin

• Full outerjoin

• left outerjoin

• Right outerjoin

A B C1 2 3

4 5 NULL

A B C1 2 3

NULL 6 7

42

Expressions in a Single Assignment

• There are multiple ways to express the same meaning in relational algebra.

• Example: the theta-join R3 <- R1 JOINC R2 use a single theta join operator, it is equivalent to : R3 <- SELECTC (R1 X R2), which involves two operators, i.e. product and select.

43

Precedence of relational operators

• Precedence of relational operators:1. Unary operators --- select, project, rename --- have

highest precedence, bind first.

2. Then come products and joins.

3. Then intersection.

4. Finally, union and set difference bind last.

You can always insert parentheses to force the order you desire.

44

Building Complex Expressions

• Algebras allow us to express sequences of operations in a natural way.

– Example: in arithmetic --- (x + 4)*(y - 3).

• Relational algebra allows the same.

• Three notations, just as in arithmetic:1. Sequences of assignment statements.

2. Expressions with several operators.

3. Expression trees.

45

Sequences of operations

• We can break complex relational algebra expressions into smaller ones by applying one or more operators at a time and naming the intermediate results to improve readability.

46

Expression Trees

• Leaves are operands --- either variables standing for relations or particular, constant relations.

• Interior nodes are operators, applied to their child or children.

47

Example

• Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Maple St. or sell Bud for less than $3.

48

As a Tree:

Bars Sells

SELECTaddr = “Maple St.” SELECT price<3 AND beer=“Bud”

PROJECTname

RENAMER(name)

PROJECTbar

UNION

49

Example

• Using Sells(bar, beer, price), find the bars that sell two different beers at the same price.

• Strategy: by renaming, define a copy of Sells, called S(bar, beer1, price). The natural join of Sells and S consists of quadruples (bar, beer, beer1, price) such that the bar sells both beers at this price.

50

The Tree

Sells Sells

RENAMES(bar, beer1, price)

JOIN

PROJECTbar

SELECTbeer != beer1

51

Relation Schemas ( i.e. relation structures) for Interior Nodes in

expression tree• An expression tree defines a schema for

the relation associated with each interior node.

• Similarly, a sequence of assignments defines a schema for each relation on the left of the <- sign.

52

Schema-Defining Rules 1

• For union, intersection, and difference, the schemas of the two operands must be the same, so use that schema for the result.

• Selection: schema of the result is the same as the schema of the operand.

• Projection: list of attributes tells us the schema.

53

Schema-Defining Rules 2

• Product: the schema is the attributes of both relations.– Use R.A, etc., to distinguish two attributes

named A.

• Theta-join: same as product.

• Natural join: use attributes of both relations.– Shared attribute names are merged.