relational algebra - École polytechnique fédérale de ...lsir · relational algebra ¥formalism...

16
1 Relational Algebra Lecture 7 2 Outline Relational Algebra (Section 6.1)

Upload: vananh

Post on 11-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Relational Algebra

Lecture 7

2

Outline

• Relational Algebra (Section 6.1)

3

Relational Algebra

• Formalism for creating new relations

from existing ones

• Its place in the big picture:

Declarative

query

language

Algebra Implementation

SQL,

relational calculus

Relational algebra

4

Relational Algebra• Five operators:

– Union: !

– Difference: -

– Selection: s

– Projection: P

– Cartesian Product: "

• Derived or auxiliary operators:– Intersection, complement

– Joins (natural,equi-join, theta join, semi-join)

– Renaming: r

5

1. Union and 2. Difference

• R1 ! R2

Example:

– ActiveEmployees ! RetiredEmployees

• R1 – R2

Example:

– AllEmployees ! RetiredEmployees

6

What about Intersection?

• It is a derived operatorR1 # R2 = R1 – (R1 – R2)

• Also expressed as a join (will see later)

Example

– UnionizedEmployees # RetiredEmployees

7

3. Selection

• Returns all tuples which satisfy a condition

• Notation: sc(R) (sometimes also !)

• Examples

– s Salary > 40000 (Employee)

– s name = “Smith” (Employee)

• The condition c can be =, <, $, >, %, <>

[in SQL: SELECT * FROM Employee

WHERE Salary > 40000]

condition

8

Selection Example

Employee

SSN Name DepartmentID Salary

999999999 John 1 30,000

777777777 Tony 1 32,000

888888888 Alice 2 45,000

SSN Name DepartmentID Salary

888888888 Alice 2 45,000

Find all employees with salary more than $40,000.s Salary > 40000 (Employee)

9

4. Projection

• Eliminates columns, then removes duplicates

• Notation: P A1,…,An (R) (sometimes !)

• Example:

– project to social-security number and names:

– P SSN, Name (Employee)

– Output schema: Answer(SSN, Name)

[In SQL: SELECT DISTINCT SSN, Name FROM Employee]

attributes

10

Projection Example

Employee

SSN Name DepartmentID Salary

999999999 John 1 30,000

777777777 Tony 1 32,000

888888888 Alice 2 45,000

SSN Name

999999999 John

777777777 Tony

888888888 Alice

P SSN, Name (Employee)

11

5. Cartesian Product

• Combine each tuple in R1 with each tuple inR2

• Notation: R1 " R2

• Example:– Employee " Dependents

• Very rare in practice; mainly used to expressjoins

[In SQL: SELECT * FROM R1, R2]

12

Cartesian Product Example

Employee

Name SSN

John 999999999

Tony 777777777

Dependents

EmployeeSSN Dname

999999999 Emily

777777777 Joe

Employee x Dependents

Name SSN EmployeeSSN Dname

John 999999999 999999999 Emily

John 999999999 777777777 Joe

Tony 777777777 999999999 Emily

Tony 777777777 777777777 Joe

Note the output

13

Cartesian Product and SQLselect * from Employee, Dependents;

+------+-----------+-------------+-------+

| Name | SSN | EmployeeSSN | Dname |

+------+-----------+-------------+-------+

| John | 999999999 | 999999999 | Emily |

| Tony | 777777777 | 999999999 | Emily |

| John | 999999999 | 777777777 | Joe |

| Tony | 777777777 | 777777777 | Joe |

+------+-----------+-------------+-------+

select * from Employee, Dependents where SSN=EmpoyeeSSN;

+------+-----------+-------------+-------+

| Name | SSN | EmployeeSSN | Dname |

+------+-----------+--------------+-------+

| John | 999999999 | 999999999 | Emily |

| Tony | 777777777 | 777777777 | Joe |

+------+-----------+-------------+-------+

14

Relational Algebra• Five operators:

– Union: !

– Difference: -

– Selection: s

– Projection: P

– Cartesian Product: "

• Derived or auxiliary operators:– Intersection, complement

– Joins (natural,equi-join, theta join, semi-join)

– Renaming: r

15

Renaming

• Changes the schema, not the instance

• Schema: R(A1, …, An )

• Notation: r B1,…,Bn (R)

• Example:

– r LastName, SocSocNo (Employee)

– Output schema: Answer(LastName, SocSocNo)

[in SQL: SELECT Name AS LastName, SSN AS SocSocNo

FROM Employee]

16

Renaming Example

Employee

Name SSNJohn 999999999Tony 777777777

LastName SocSocNoJohn 999999999Tony 777777777

r LastName, SocSocNo (Employee)

17

Natural Join

• Notation: R1 ! R2

• Meaning: R1 ! R2 = PA(sC(R1 " R2))– Equality on common attributes names

• Where:– The selection sC checks equality of all common

attributes

– The projection eliminates the duplicate commonattributes

[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2WHERE R1.B = R2.BSchema: R1(A,B), R2(B,C)]

18

Natural Join Example

Employee

Name SSNJohn 999999999Tony 777777777

Dependents

SSN Dname999999999 Emily777777777 Joe

Name SSN DnameJohn 999999999 EmilyTony 777777777 Joe

Employee Dependents =

PName, SSN, Dname(s SSN=SSN2(Employee x r SSN2, Dname(Dependents))

19

Natural Join

• R= S=

• R ! S=

VZ

ZY

ZX

YX

BA

VZ

WV

UZ

CB

WVZ

VZY

UZY

VZX

UZX

CBA

20

Natural Join

• Given the schemas R(A, B, C, D), S(A, C,

E), what is the schema of R ! S ?

• Given R(A, B, C), S(D, E), what is R ! S ?

• Given R(A, B), S(A, B), what is R ! S ?

21

Theta Join

• A join that involves a predicate

• R1 ! q R2 = sq (R1 " R2)

• Here q can be any condition

22

Eq-join

• A theta join where q is an equality

R1 !A=B R2 = s A=B (R1 " R2)

Equality on two different attributes

• Example:

– Employee !SSN=SSN Dependents

• Most useful join in practice(difference to natural join?)

23

Semijoin

• R " S = P A1,…,An (R ! S)

• Where A1, …, An are the attributes in R

• Example:

– Employee " Dependents

24

EmployeeName EmpId DeptName

Harry 3415 Finance

Sally 2241 Sales

George 3401 Finance

Harriet 2202 Sales

DeptDeptName Manager

Sales Harriet

Production Charles

Semijoin - Example

Employee ! DeptName EmpId DeptName

Sally 2241 Sales

Harriet 2202 Sales

Only attributes of Employee are selected

25

Semijoins in Distributed

Databases• Semijoins are used in distributed databases

. . .. . .

NameSSN

Dname

. . .. . .

AgeSSNEmployee

Dependents

network

Employee !ssn=ssn (s age>71 (Dependents))

T = P SSN s age>71 (Dependents)R = Employee "!T

Answer = R ! Dependents

26

Complex RA Expressions

Person Purchase Person Product

sname=fred sname=gizmo

P pidP ssn

seller-ssn=ssn

pid=pid

buyer-ssn=ssn

P name

27

Application:

Query Rewriting for Optimization

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Reserves Sailors

sid=sid

bid=100

sname

rating > 5(Scan;write to temp T1)

(Scan;write totemp T2)

The earlier we process selections, the fewer tuples we need to

manipulate higher up in the tree (predicate pushdown)

Disadvantages?

28

Algebraic Laws (Examples)

• Commutative and Associative Laws– R " S = S " R, R " (S " T) = (R " S) " T

– R S = S R, R (S T) = (R S) T

• Laws involving selection– s C AND C"(R) = s C(s C"(R)) = s C(R) " s C"(R)

– s C (R S) = s C (R) S• When C involves only attributes of R

• Laws involving projections– PM(PN(R)) = PM,N(R)

! ! ! ! ! !

! !

29

Operations on Bags

A bag = a set with repeated elements

All operations need to be defined carefully on bags

• {a,b,b,c}!{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}

• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b}

• sC(R): preserve the number of occurrences

• PA(R): no duplicate elimination

• Cartesian product, join: no duplicate elimination

Important: Relational Engines work on bags, not sets!

30

Finally: RA has Limitations !

• Cannot compute “transitive closure”

• Find all direct and indirect relatives of Fred

• Cannot express in RA !!! Need to write Cprogram

SisterLouNancy

SpouseBillMary

CousinJoeMary

FatherMaryFred

RelationshipName2Name1

31

Conclusion

• Relational algebra is important for SQL

• We mainly need projects, selections,

joins

• Joins are typically time consuming for

large databases