cs 405g: introduction to database systems lecture 6: relational algebra instructor: chen qian

34
CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Upload: margaret-wheeler

Post on 18-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

CS 405G: Introduction to Database Systems

Lecture 6: Relational AlgebraInstructor: Chen Qian

Page 2: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 2

Review

Informal Terms Formal Terms

Table Relation

Column Attribute/Domain

Row Tuple

Values in a column Domain

Table Definition Schema of a Relation

Populated Table Extension

Page 3: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 3

Update Operations on Relations

Update operations INSERT a tuple. DELETE a tuple. MODIFY a tuple.

Constraints should not be violated in updates

Page 4: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 4

Example

We have the following relational schemas Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character)

We have the following sequence of database update operations. (assume all tables are empty before we apply any operations)

INSERT<‘1234’, ‘John Smith’, ‘3.5> into Student

sid name gpa

1234 John Smith 3.5

Page 5: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 5

Example (Cont.)

INSERT<‘647’, ‘EECS’> into Courses

INSERT<‘1234’, ‘647’, ‘B’> into Enrolled

UPDATE the grade in the Enrolled tuple with sid = 1234 and cid = 647 to ‘A’.

DELETE the Enrolled tuple with sid 1234 and cid 647

sid name gpa

1234 John Smith 3.5

cid department

647 EECS

sid cid grade

1234 647 B

sid cid grade

1234 647 A

sid cid grade

Page 6: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 6

Exercise

INSERT<‘108’, ‘MATH’> into Courses

INSERT<‘1234’, ‘108’, ‘B’> into Enrolled

INSERT<‘1123’, ‘Mary Carter’, ‘3.8’> into Student

sid name gpa

1234 John Smith 3.5

cid department

647 EECS

108 MATH

sid cid grade

1234 108 B

cid department

647 EECS

sid cid grade

sid name gpa

1234 John Smith 3.5

1123 Mary Carter 3.8

Page 7: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 7

Exercise (cont.) A little bit tricky INSERT<‘1125’, ‘Bob

Lee’, ‘good’> into Student Fail due to domain

constraint INSERT<‘1123’,

NULL, ‘B’> into Enrolled Fail due to entity

integrity INSERT

<‘1233’,’647’, ‘A’> into Enrolled Failed due to

referential integrity

sid name gpa

1234 John Smith 3.5

1123 Mary Carter 3.8

cid department

647 EECS

108 MATH

sid cid grade

1234 108 B

Page 8: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 8

Exercise (cont.)

A more tricky one UPDATE the cid in

the tuple from Course where cid = 108 to 109

sid name gpa

1234 John Smith 3.5

1123 Mary Carter 3.8

cid department

647 EECS

108 MATH

sid cid grade

1234 108 B

cid department

647 EECS

109 MATH

sid cid grade

1234 109 B

Page 9: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 9

Update Operations on Relations

In case of integrity violation, several actions can be taken: Cancel the operation that causes the violation

(REJECT option) Perform the operation but inform the user of the

violation Trigger additional updates so the violation is

corrected (CASCADE option, SET NULL option) Execute a user-specified error-correction routine

Page 10: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Relational Query Languages

Query languages: Allow manipulation and retrieval of data from a database.

Relational model supports simple, powerful QLs: Strong formal foundation based on logic. Allows for much optimization.

Query Languages != programming languages! QLs not intended to be used for complex calculations

and inference (e.g. logical reasoning) QLs support easy, efficient access to large data sets.

04/21/23 10

Page 11: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Formal Relational Query Languages

Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

Relational Algebra: More operational, very useful for representing execution plans.

Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.)

Understanding Algebra & Calculus is key to understanding SQL, query processing!

04/21/23 11

Page 12: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 12

Relational algebra

Core set of operators: Selection, projection, cross product, union, difference, and renaming

Additional, derived operators: Join, natural join, intersection, etc.

Compose operators to make complex queries

OPER

OPER

A language for querying relational databases based on operators:

Page 13: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 13

Selection

Input: a table R Notation: p R

p is called a selection condition/predicate Purpose: filter rows according to some criteria Output: same columns as R, but only rows of R that

satisfy p

Page 14: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 14

Selection example

Students with GPA higher than 3.0

GPA > 3.0 Student

sid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

1204 Susan Wong 22 3.4

1306 Kevin Kim 21 2.9

sid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

1204 Susan Wong 22 3.4

1306 Kevin Kim 21 2.9

GPA > 3.0

Page 15: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 15

More on selection

Selection predicate in general can include any column of R, constants, comparisons (=, >, etc.), and Boolean connectives (: and, : or, and ¬ : negation (not) ) Example: straight A students under 18 or over 21

GPA = 4.0 (age < 18 age > 21) Student

But you must be able to evaluate the predicate over a single row of the input table Example: student with the highest GPA

GPA >= all GPA in Student table Student

Page 16: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 16

Projection

Input: a table R Notation: πL R

L is a list of columns in R Purpose: select columns to output Output: same rows, but only the columns in L

Order of the rows is preserved Number of rows may be less (depends on where we have

duplicates or not)

Page 17: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 17

Projection example

ID’s and names of all students

πSID, name Student

sid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

1204 Susan Wong 22 3.4

1306 Kevin Kim 21 2.9

πSID, name

sid name

1234 John Smith

1123 Mary Carter

1011 Bob Lee

1204 Susan Wong

1306 Kevin Kim

Page 18: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 18

More on projection

Duplicate output rows are removed (by definition) Example: student ages

πage Student

sid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

1204 Susan Wong 22 3.4

1306 Kevin Kim 21 2.9

πage

age

21

22

22

22

21

Page 19: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 19

Cross product

Input: two tables R and S Notation: R × S Purpose: pairs rows from two tables Output: for each row r in R and each row s in S, output a

row rs (concatenation of r and s)

Page 20: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 20

Cross product example

Student × Enrollsid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

sid cid grade

1234 647 A

1123 108 A

sid name age gpa sid cid grade

1234 John Smith 21 3.5 1234 647 A

1123 Mary Carter 22 3.8 1234 647 A

1011 Bob Lee 22 2.6 1234 647 A

1234 John Smith 21 3.5 1123 108 A

1123 Mary Carter 22 3.8 1123 108 A

1011 Bob Lee 22 2.6 1123 108 A

×

Page 21: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

04/21/23 Chen Qian, University of Kentucky 21

A note on column ordering

The ordering of columns in a table is considered unimportant (as is the ordering of rows)

That means cross product is commutative, i.e.,R × S = S × R for any R and S

=sid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

sid name gpa age

1234 John Smith 3.5 21

1123 Mary Carter 3.8 22

1011 Bob Lee 2.6 22

Page 22: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Derived operator: join

Input: two tables R and S Notation: R p S

p is called a join condition/predicate Purpose: relate rows from two tables according to some

criteria Output: for each row r in R and each row s in S, output a

row rs if r and s satisfy p

04/21/23 22

Shorthand for σp ( R X S )

Page 23: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Join example Info about students, plus CID’s of their courses

Student (Student.SID = Enroll.SID) Enroll

04/21/23 23

Use table_name. column_name syntaxto disambiguate identically named columns from different input tables

sid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

sid cid grade

1234 647 A

1123 108 A

sid name age gpa sid cid grade

1234 John Smith 21 3.5 1234 647 A

1123 Mary Carter 22 3.8 1234 647 A

1011 Bob Lee 22 2.6 1234 647 A

1234 John Smith 21 3.5 1123 108 A

1123 Mary Carter 22 3.8 1123 108 A

1011 Bob Lee 22 2.6 1123 108 A

Student.SID =

Enroll.SID

Page 24: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Derived operator: natural join

Input: two tables R and S Notation: R S Purpose: relate rows from two tables, and

Enforce equality on all common attributes Eliminate one copy of common attributes

04/21/23 24

Shorthand for πL ( R p S ), where p equates all attributes common to R and S L is the union of all attributes from R and S, with duplicate

attributes removed

Page 25: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Natural join example

Student Enroll = πL ( Student p Enroll )

= πSID, name, age, GPA, CID ( Student Student.SID = Enroll.SID Enroll )

04/21/23 25

sid name age gpa

1234 John Smith 21 3.5

1123 Mary Carter 22 3.8

1011 Bob Lee 22 2.6

sid cid grade

1234 647 A

1123 108 A

sid name age gpa sid cid grade

1234 John Smith 21 3.5 1234 647 A

1123 Mary Carter 22 3.8 1234 647 A

1011 Bob Lee 22 2.6 1234 647 A

1234 John Smith 21 3.5 1123 108 A

1123 Mary Carter 22 3.8 1123 108 A

1011 Bob Lee 22 2.6 1123 108 A

Page 26: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Union

Input: two tables R and S Notation: R S

R and S must have identical schema Output:

Has the same schema as R and S Contains all rows in R and all rows in S, with duplicate

rows eliminated

04/21/23 26

Page 27: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Difference

Input: two tables R and S Notation: R - S

R and S must have identical schema Output:

Has the same schema as R and S Contains all rows in R that are not found in S

04/21/23 27

Page 28: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Derived operator: intersection

Input: two tables R and S Notation: R \ S

R and S must have identical schema Output:

Has the same schema as R and S Contains all rows that are in both R and S

04/21/23 Jinze Liu @ University of Kentucky 28

Shorthand for R - ( R - S ) Also equivalent to S - ( S - R ) And to R S

Page 29: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Renaming

Input: a table R Notation: ρS R, ρ(A1, A2, …) R or ρS(A1, A2, …) R

Purpose: rename a table and/or its columns Output: a renamed table with the same rows as R Used to

Avoid confusion caused by identical column names Create identical columns names for natural joins

04/21/23 29

Page 30: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Renaming Example

Enroll1(SID1, CID1,Grade1) Enroll

04/21/23 30

sid cid grade

1234 647 A

1123 108 A

sid1 cid1 grade1

1234 647 A

1123 108 A

Enroll1(SID1, CID1,Grade1)

Page 31: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Review: Summary of core operators

Selection: Projection: Cross product: Union: Difference: Renaming:

Does not really add “processing” power

04/21/23 31

σp R

πL R

R X S

R SR - Sρ S(A1, A2, …) R

Page 32: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Review Summary of derived operators

Join: Natural join: Intersection:

04/21/23 32

R p S

R SR S

Many more Outer join, Division, Semijoin, anti-semijoin, …

Page 33: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

Red parts

04/21/23 33

pid of red parts

Catalog having red parts

Page 34: CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian

sid of suppliers who support Red parts

04/21/23 34

names of suppliers who support Red parts