fundamentals/icy: databases 2013/14 week 11 (relational operators & relational algebra)

40
Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators & relational algebra) John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Upload: kasper-blair

Post on 02-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators & relational algebra). John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Fundamentals/ICY: Databases2013/14

WEEK 11(relational operators & relational algebra)

John BarndenProfessor of Artificial IntelligenceSchool of Computer ScienceUniversity of Birmingham, UK

Page 2: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Relational Operators

see also textbooks andone part of Week-11 Addnl Notes

NB: extra Maths materialnot given there

Page 3: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Relational Database Operators

Relational algebra Defines abstract way of expressing the manipulation of

tables using “relational operators”.

• SELECT

• PROJECT

• JOIN (various sorts)

• INTERSECT

Use of relational algebra operators on existing tables produces new tables

• UNION

• DIFFERENCE

• PRODUCT

• ((DIVIDE))

Page 4: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Select [better name would be Select-Rows]

SQL: SELECT * FROM … WHERE … Note: it’s the WHERE part that is actually doing the selection

according to a criterion.

Relational algebra notation in Additional Notes: Result table is C(T) where T is the given table and C is the

selection criterion. More compact than SQL notation. Avoids notation private to

particular versions of particular programming languages.

Page 5: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Select

Page 6: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Project [better name would be Select-Columns]

SQL: SELECT …column specs … FROM …

Relational algebra notation in Additional Notes: Result table is X(T) where T is the given table and X

is the list of selected attributes (columns).

Page 7: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Project

Page 8: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Relational Operators (continued)

Union and its All version

Intersect and its All version

Difference and its All version

The given tables must have compatible value domains.

Page 9: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Union

Page 10: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Intersect

Page 11: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Difference

Page 12: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Union, Intersection and Difference SQL:

UNION, INTERSECT, EXCEPT (or MINUS)

UNION ALL, INTERSECT ALL, EXCEPT (or MINUS) ALL

Relational algebra notation for the non-All cases: Result tables are T1 T2, T1 T2 and T1 T2 where T1 and

T2 are the given tables.

Maths of relations: Result relations: R1 R2, R1 R2 and R1 \ \ R2 (or R S)

in the non-ALL cases. where R1and R2 are the relations in the given tables.

Problem: relations don’t account for duplicates of rows, so don’t handle the ALL versions.

Page 13: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Some “Math’l Relation Operations”:Set Operations Applied to Relations

Union of relations R and S:

R S = the set of tuples that are in R or S (or both).

NB: no repetitions created!

Intersection of relations R and S:

R S = the set of tuples that are in both R and S.

Difference of relations R and S:

R S = R \\ S = the set of tuples that are in R but not S.

Page 14: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Mathematical Relation Operations: a contrast to Relational Operators

The math’l operations do NOT themselves require R and S to have similar tuples in order to be well-defined.

E.g., R could be binary and on integer sets, S could be ternary and on character-string sets.

But the corresponding relational operators do require the tables to have the same shape (same number of columns, same domains for corresponding columns).

Page 15: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Relational Operators (continued)

Join (various types)

Allows us to join related rows from two or more tables

It’s an important feature of the relational database idea

Joining has been implicitly important in some of the Additional Notes, because of the use of mutli-table queries and the use of WHERE to test for attribute equality between tables.

Page 16: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Relational Operators (continued)

Product or Cross Join Yields a table containing all concatenations of whole

rows from first given table with whole rows from second given table.

Page 17: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Product or Cross-Join

If second table also had a PRICE attribute, then the product would have a Table1.PRICE attr. and a Table2.PRICE attr.

Page 18: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

So, I want … ….. to define the non-standard notion of “flattened Cartesian

product” of two relations R and S.

I will notate it by the symbol (underlined multiplication symbol).

R S = the set of tuples that are the concatenations of members of R and members of S.

E.g., if <a,b,c> is in R and <d,e,f> is in S,

then <a,b,c> o <d,e,f> == <a,b,c,d,e,f> is in R S.

A standard alternative notation: R o SR o S (i.e., just “coerce” the concatentaion symbol used for tuples).

Page 19: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Contd. If A is the People relation and B is the Organizations relation, and

A has members of form E156, ‘Sam’, ‘Finks’, I678> and

B has members of form I459, ‘Dell’, ‘UK’>

THEN

A B has members of form

E156, ‘Sam’, ‘Finks’, I678>, I459, ‘Dell’, ‘UK’> >

BUT

A B has members of form

E156, ‘Sam’, ‘Finks’, I678, I459, ‘Dell’, ‘UK’>

Page 20: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Product or Cross Join (continued) SQL:

SELECT * FROM …two [or more] tables … NB: it’s the mere listing of the tables that does the Product,

but it’s possible also to write: SELECT * FROM T1 CROSS JOIN T2 CROSS JOIN ...

Relational algebra notation: Result table is T1 T2 where T1 and T2 are the given tables.

Maths of relations: Result relation is R1 R2 where R1and R2 are the relations

in the given tables.

Problem: relations don’t account for duplicates of rows.

Page 21: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Two Tables That Will Be Used to Illustrate Other Joins

Page 22: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Natural Join

SQL: SELECT …all the attributes but including only one version of each

shared one … FROM T1, T2 WHERE … explicit condition of equalities for ALL the shared attributes ...

SELECT * FROM T1 NATURAL JOIN T2; Instead of using *, can choose columns, and can add a WHERE

Relational algebra notation: Result table is T1 T2 where T1 and T2 are the given tables.

is the “bow tie” symbol.

Page 23: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Correspondence to your SQL experience:

SELECT sid, office FROM staff, lecturing WHERE staff.sid = lecturing.sid; Does a natural join (because sid is the only shared attribute) followed by

a projection onto sid, office.

SELECT sid, office FROM staff, lecturing WHERE staff.sid = lecturing.sid AND year > 2001; In effect, does a natural join followed by a further (row) selection

followed by a projection.

SELECT sid, office FROM staff NATURAL JOIN lecturing WHERE year > 2001; Does same thing.

Page 24: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Natural Join (contd)

The common attributes or columns are called the join attributes or columns): just the AGENT_CODE attribute in above example

Can be thought of as the result of a three-stage process:

the PRODUCT of the tables is created

a SELECT is performed on the resulting table to yield only the rows for which the join-attribute values (e.g. AGENT_CODE values) are equal

a PROJECT is now performed to yield a single copy of each join attribute, thereby eliminating duplicate columns

Page 25: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Natural Join, Step 1: PRODUCT

Note the two AGENT_CODE columns

Page 26: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Natural Join, Step 2: SELECTto get equal agent codes in each row

Page 27: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Natural Join, Step 3: PROJECTto get just one agent column

Page 28: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Natural Join (continued)

A row in one of the given tables that does not match any row in the other given table on the join attributes does not lead to a row in the result table.

Note that if the two tables have no attributes in common, then every row of each table trivially matches every row of the other table!

So in this case the result is the PRODUCT (CROSS JOIN) of the two tables!!

Page 29: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Other Forms of Join Equijoin

Links tables on the basis of an equality condition that compares SPECIFIED attributes of each table, rather than automatically taking the common attributes.

Result does not eliminate duplicate columns that are not involved in the join condition.

Theta join Like equijoin but using a non-equality join condition.

Outer joins (left, right, and full) Equijoin or theta join plus unmatched rows from left table,

right table or both, padding them out with NULLs to fit the result table.

Page 30: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Equijoin and Theta Join (continued)

SQL: SELECT * FROM T1, T2 WHERE … explicit join condition,

stating (non)equality of the CHOSEN attributes ...

SELECT * FROM T1 JOIN T2 ON … such a condition …

SELECT * FROM T1 JOIN T2 USING (… some common attribs …)

[for equijoin only]

Possible Rel’l Algebra notation (not in Addl Nts): T1 C T2 where C is the join condition.

Page 31: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Outer Join of CUSTOMER and AGENT, using equal AGENT_CODE

Left outer Uses all the rows in the CUSTOMER table, by doing equijoin on

AGENT_CODE but also including NON-matching CUSTOMER rows.

Right outer Uses all the rows in the AGENT table, doing equijoin on

AGENT_CODE but also including NON-matching AGENT rows.

Full outer Using all the rows in the AGENT and CUSTOMER tables, doing

equijoin on AGENT_CODE but also including NON-matching rows from each table.

= Union of Left Outer Join result and Right Outer Join result.

Page 32: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Left Outer Join

Same as an equijoin with the addition of the “extra”, last, row shown above

Page 33: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Right Outer Join:

Full Outer Join:

Would have the “extra” row of this table as well as the extra row of the Left Outer Join table

Page 34: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Outer Joins (continued)

SQL: SELECT * FROM T1, T2 WHERE … explicit join condition … UNION … a SELECT expression that gets the extra LEFT rows UNION … a SELECT expression that gets the extra RIGHT rows

SELECT * FROM T1 LEFT/RIGHT/FULL JOIN T2 USING (… some shared attribs …) // ON … explicit join cond …

Relational algebra notation: Variants of bow tie symbol.

Page 35: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Note on SQL Outer Join Queries

Can do your own extra projection (= attrib selection) in the SELECT, and can add a WHERE.

E.g.:

SELECT …attribs … FROM T1 LEFT JOIN T2 USING (… some shared attribs …) WHERE … ;

Page 36: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Following stuff on DIVIDE is optional

Page 37: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Towards the DIVIDE operation

It’s analogous to the “integer division” of an integer T by an integer S, included in many programming languages.

T div S = the largest integer Q such that

S Q T

So 7 div 3 = 22.

Page 38: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

DIVIDE operation on DB tables

The only value of LOC that is associated in T with both values ‘A’ and ‘B’ of CODE is 5.

Simplest case: 2-col table by 1-col table

T S QQ

Page 39: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Divide DIVIDE T by S: the attributes X1 … XM of table S must be

some but not all of those of T’s.

Gives a table QQ having the remaining attributes Y1 … YN of T.

QQ holds the values of Y1 … YN that T associates with every row (X1 … XM) in S.

So the rows of the Product of S with QQ form a subset of the rows (suitably re-ordered) of T, and QQ is maximal in this respect (i.e., adding further rows to QQ would stop the Product’s rows all being in T)

So QQ is the largest table such that

S Q Q T

using to mean: has some or all rows of.

Page 40: Fundamentals/ICY: Databases 2013/14 WEEK 11 (relational operators &   relational algebra)

Divide (continued) SQL:

Not standardly included. Effect can be simulated.

possible Relational Algebra notation (not in Add’l Notes): T2 T1

Maths of relations: Result relation R could be described as the maximal set R of tuples such that R1 R R2

where R1 and R2 are the relations in the given tables.