relational algebra tim kaddoura cs157a. introduction relational query languages are languages for...

27
Relational Algebra Tim Kaddoura CS157A

Upload: cale-miers

Post on 14-Dec-2015

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Relational Algebra

Tim Kaddoura

CS157A

Page 2: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Introduction

Relational query languages are languages for describing queries on a relational database

Three variants Three variants– Relational Algebra– Relational Calculus– SQL

Query languages V.S. programming languages– Query languages support easy, efficient access to large data sets.

Page 3: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Some Facts

Introduced by Edgar 'Ted' Coddof IBM Research in 1970.

Concept of mathematical relation as the underlying basis.

The standard database model for most transactional databases today.

Page 4: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

What is Algebra?

A language based on operators and a domain of values Operators map values taken from the domain into other

domain values Hence, an expression involving operators and arguments

produces a value in the domain Consider arithmetic operations +, - and * over integers.

– Algebra expressions: 2+3, (46–3)+3, (7*x)+(3*x) Relational algebra:

– Domain: the set of all relations– Expression: referred to as a query

Page 5: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Relational Algebra

Domain: set of relations Basic operators: select, project, union, set

difference, Cartesian product Derived operators: set intersection,

division, join Procedural: Relational expression

specifies query by describing an algorithm for determining the result of an expression

Page 6: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Example Database

STUDENT(Id, Name, Password, Address)FACULTY(Id, Name, DeptId, Password, Address)COURSE(CrsCode, DeptId, CrsName, CreditHours)REQUIRES(CrsCode, PrereqCrsCode, EnforcedSince)CLASS(CrsCode, SectionNo, Semester, Year,Textbook,

ClassTime, Enrollment, MaxEnrollment, ClassroomId, InstructorId)

CLASSROOM(ClassroomId, Seats)TRANSCRIPT(StudId, CrsCode, SectionNo, Semester,

Year, Grade)

Page 7: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Examples Queries

Find all the courses student named ‘John Doe’ has completed. Find all students who are taking a course by Prof. Lee in ‘Fall 2005’. Find all courses taught by a faculty from the ‘CS’ department in ‘Fall

2005’. Find all faculty who taught courses both in ‘Fall’ and ‘Spring’ 2005. Find all faculty who did not teach courses both in ‘Fall’ and ‘Spring’

2005. Find all students with 4.0 GPA. Find all students who completed a course without taking one of its

prerequisites.

Page 8: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Relational Algebra

A relation schema is given by R(A1,…,Ak), the name of the relation and the list of the attributes in the relation

A relation is a set of tuples that are valid instances of its schema

Relational algebra expressions take as input relations and produce as output new relations.

After each operation, the attributes of the participating relations are carried to the new relation. The attributes may be renamed, but their domain remains the same.

Page 9: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Basic Operators

Unary (single relation) operators SELECT ( selection-condition )

PROJECT ( attribute-list )

Binary (two relation) operators UNION ( ) SET DIFFERENCE ( - ) CARTESIAN PRODUCT ( )

Page 10: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Set Operators

Set operators connect two relations by set operations.

However, for a set operation R1 op R2 to be valid, R1 and R2 should be union-compatible, that is R1, R2 have the same number of columns The names of the attributes are the same in both

relations Attributes with the same name in both relations have

the same domain

Page 11: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Selection

SELECTION: selection-condition ( R ) Select from R all tuples that satisfy the selection-

condition The condition only refers to the attributes in R An atomic-selection-condition is of the form

relation-attribute operation constant, or relation-attribute operation relation-attribute

A selection-condition is obtained by boolean combination of atomic selection conditions by means of connectives AND, OR, and NOT.

Page 12: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Selection

Enrollment>MaxEnrollment AND Enrollment>100 (CLASS)

Grade=‘A’ (TRANSCRIPT)

Year=2005 AND (Semester=‘Fall’ OR Semester=‘Spring’) (CLASS)

Name like ‘%Lee’ (FACULTY)

Page 13: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Projection

PROJECT: attribute-list (R) Return from R all tuples, but remove from all tuples

any attribute that is not in the attribute list The attribute list can only refer to attributes in RExamples: ID (FACULY) ID ( Name like ‘%Lee’ FACULTY ) CrsCode, Textbook (COURSE) PrereqCrsCode ( CrsCode=‘CSCI4380’ REQUIRES )

Page 14: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Set Operators

Set operators connect two relations by set operations.

However, for a set operation R1 op R2 to be valid, R1 and R2 should be union-compatible, that is R1, R2 have the same number of columns The names of the attributes are the same in both

relations Attributes with the same name in both relations have

the same domain

Page 15: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Set Operators

Given two relations R1, R2 that are union-compatible, we have that R1 R2 returns the set of tuples that are in R1 or

R2. [UNION]∆ R1 R2 returns the set of tuples that are both in R1

and R2. [INTERSECTION]∆ R1 - R2 returns the set of tuples that are in R1, but

not in R2. [SET DIFFERENCE]∆ Note that set difference is the only negative operator,

i.e. not in R2.

Page 16: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Set Operators

Name (FACULY) Name (STUDENT) Address (FACULY) Address (STUDENT) CrsCode (CLASS) - CrsCode (TRANSCRIPT)

Id (STUDENT) - StudId (TRANSCRIPT) Problem: The two relations in this case are not union

compatible (even though the attribute numbers and domains match, the names do not).

Solution For this?

Page 17: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Rename

Rename all the attributes in the relation Given a relation with schema R(A1,…,An) The expression R[B1,…,Bn] is used to

rename attribute A1 as B1, …, An as Bn. The rename operator does not change the

domain of the attributes. The rename operator does not change the

number of attributes in a relation (that can be done by the projection operation).

Page 18: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Rename

Corrected Expression Id (STUDENT) - ( StudId (TRANSCRIPT))[Id]Example: Temp:= InstructorId (Year=2005 AND Semester=‘Spring’ COURSE)

InstructorId (Year=2005 AND Semester=‘Fall’ COURSE) ‘All faculty who taught courses both in ‘Fall’ and ‘Spring’ 2005.’

Id (FACULTY) - Temp[Id]All faculty who did not teach courses both in ‘Fall’ and

‘Spring’ 2005.’’

Page 19: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Set Difference

Find all students with 4.0 GPA

or find students with all As (and at least one A grade)

or find students who never got anything other than an A

Temp1= StudId (Grade=’A’ (TRANSCRIPT))

Students who got at least one non-A grade

Temp2 = StudId (TRANSCRIPT)

Students who got at least one grade

Result = Temp2 - Temp1

Page 20: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Cartesian Product

Given two sets, A={a1,a2,a3}, B={b1,b2}, the Cartesian

Product (A B) returns the set of tuples

A B={(a1,b1), (a1,b2), (a2,b1), (a2,b2), (a3,b1), (a3,b2)}

The Cartesian product for relations is a generalization of this concept.

Page 21: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Cartesian Product

Given two relations with schema R1(A1,…,An) and R2(B1,….,Bm), and let Temp = R1 R2

R1 R2 returns a relation Temp with schema Temp(A1,…,An,B1,…,Bm) such that For all possible tuples t1 in R1, and t2 in R2, there

exists a tuple t in Temp, such that t is identical to t1 with respect to A1,…,An and t is identical to t2 with respect to B1,…,Bm.

Page 22: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Cartesian Product

Find all the courses student named ‘John Doe’ has completed.

CrsCode (Id=StudId (TRANSCRIPT

(Name=‘John Doe’ STUDENT) ) )

Page 23: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Join

Join is a derived operator, obtained from a Cartesian product followed by a selection condition.

Given R1 join-condition R2 Where join-condition is a boolean combination of

combinations of the form relation-attribute operation relation-attribute where the attributes on the left (and right) hand side come from R1 (and R2).

R1 join-condition R2 = join-condition (R1 R2)

Page 24: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Join

Output the names of all employees that earn more than their managers.

Employee.Name (Employee >< MngrId=Manager.Id AND Employee.Salary >

Manager.Salary Manager)

RESULT (a table with the following attributes):

Employee.Name, Employee.Id, Employee.Salary, Manager.Name, Manager.Id, Manager.Salary

Page 25: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Division

o Given R1(A1,…,An,B1,…,Bm) and R2(B1,…,Bm)

R1/R2 is the “maximal” set of tuples from A1,…,An(R1) such that R1 contains all tuples in (R / S) x S.

o Division is often used for “for-all”queries, e.g. “Find tuples in R that are in a relation with ‘all’ tuples in S.”

o Important to note that all the attributes in the dividing relation R2 must exist in the divided relation R1!

Page 26: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Relational Algebra - Division

R A B C a1 b1 c1 a2 b1 c1 a1 b2 c1 a1 b2 c2 a2 b1 c2 a1 b2 c3 a1 b2 c4 a1 b1 c5

S C c1 c2

V B C

b1 c1

Y B C

b1 c1 b2 c1

Find:

• R / S• R / V• R / Y

Page 27: Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three

Conclusion

Based on the concept of mathematical relation

Building block: a relation comprising of attributes within domains

Tuples + Schema = Relation