lecture 3/1 the relational model: relational algebra and...
TRANSCRIPT
Lecture 3/1 The Relational Model:
Relational Algebra and
Relational Calculus
• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 5th edition,
Addison-Wesley, 2009. ISBN: 0-321-60110-6, ISBN-13: 978-0-321-60110-0 (International Edition).
• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 4th edition,
Addison-Wesley, 2004. ISBN: 0-321-21025-5.
• R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.
ITM661 – Database Systems
The Relation Model 2
Objectives Overview of the relational model, e.g., how tables
represent data.
The connection between mathematical relations and
relations in the relational model.
Properties of database relations
How to identify candidate, primary, and foreign keys.
The meaning of entity integrity and referential integrity.
How to form queries in relational algebra.
How relational calculus queries are expressed.
The purpose and advantages of views in relational
systems.
The Relation Model 3
An Example (Branch and Staff Relations) • A relation is a table with
columns and rows. Only applies to logical structure of
the database, not the physical
structure.
• An attribute is a named
column of a relation.
• A domain is a set of
allowable values for one or
more attributes.
• A tuple is a row of a
relation.
• A degree is a number of
attributes in a relation.
• A cardinality is a number of
tuples in a relation.
• A relational database is a
collection of normalized
relations.
The Relation Model 4
Terminology for Relational Model and Attribute Domain
Terminology
Attribute Domain
The Relation Model 5
Mathematical Relations (I) Mathematical definition of relation
Consider two sets D1 = {2, 4} and D2 = {1, 3, 5}.
Cartesian product is D1D2, the set of all ordered pairs, 1st element is member of D1 and 2nd element is member of D2.
Alternative way is to find all combinations of elements with first from D1 and second from D2.
D1D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}
Another example: D3 = {“Mr. X” , “Ms. Y”} and
D4 = { “M”, “F” } (i.e., M=Male, F=Female)
D3D4 = { (“Mr. X”, “M”), (“Mr. X”, “F”),
(“Ms. Y”, “M”), (“Ms. Y”, “F”) }
The Relation Model 6
Mathematical Relations (II) Any subset of Cartesian product is a relation.
R = {(2, 1), (4, 1)}
May specify which pairs are in relation using some condition for selection. For example, the second element is 1
R = {(x, y) | x D1, y D2, and y = 1}
Using same sets, form another relation S, where first element is always twice the second.
S = {(x, y) | x D1, y D2, and x = 2y}
Only one ordered pair in the Cartesian Product satisfies this condition.
S = {(2, 1)}
The Relation Model 7
Mathematical Relations (III) Consider three sets D1, D2, and D3 with Cartesian
Product D1D2D3. For example
D1 = {1, 3} , D2 = {2, 4} , D3 = {5, 6}
D1D2D3 = {(1,2,5), (1,2,6), (1,4,5), (1,4,6),
(3,2,5), (3,2,6), (3,4,5), (3,4,6)}
Any subset of these ordered triples is a relation.
T = {(x, y, z) | x D1, y D2, z D3 and y = 2x
and z = 3y}
T = { (1, 2, 6) }
The Relation Model 8
Mathematical Relations (IV) To define a general relation on n domains…let D1,
D2, . . ., Dn be n sets with Cartesian product defined as
D1 D2 . . . Dn = {(d1, d2, . . . , dn) | d1 D1, d2 D2, . . . , dn Dn} usually written as
In defining relations we specify the sets, or domains, from which we chose values.
i
n
i
D1
The Relation Model 9
Database Relations and Their Properties Relation schema
Named relation defined by a set of attribute and domain name pairs.
Relational database schema Set of relation schemas, each with a distinct name.
Relation name is distinct from all other relations.
Each cell of relation contains exactly one atomic (single) value.
Each attribute has a distinct name.
Values of an attribute are all from the same domain.
Order of attributes has no significance.
Each tuple is distinct; there are no duplicate tuples.
Order of tuples has no significance, theoretically.
The Relation Model 10
Relational Keys (I) Superkey
An attribute or a set of attributes that uniquely
identifies a tuple within a relation.
Candidate Key
A superkey (K) such that no proper subset is a
superkey within the relation.
In each tuple of R, the values of K uniquely
identify that tuple (uniqueness).
No proper subset of K has the uniqueness property
(irreducicility).
The Relation Model 11
Relational Keys (II) Primary Key
Candidate key selected to identify tuples uniquely
within relation.
Alternate Keys
Candidate keys that are not selected to be the
primary key.
Foreign Key
An attribute or set of attributes within one relation
that matches candidate key of some (possibly
same) relation.
The Relation Model 12
Relational Integrity (I) Null
Represents a value for an attribute that is currently
unknown or is not applicable for this tuple.
Deals with incomplete or exceptional data.
represents the absence of a value and is not the
same as zero or spaces, which are values.
The Relation Model 13
Relational Integrity (II) Entity Integrity
In a base relation, no attribute of a primary key can be null.
Referential Integrity
If foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null.
Enterprise Constraints
Additional rules specified by users or database administrators.
The Relation Model 14
Relational Algebra and Calculus Relational Algebra
Unary Relational Operations
Relational Algebra Operations From Set Theory
Binary Relational Operations
Additional Relational Operations
Examples of Queries in Relational Algebra
Relational Calculus
Tuple Relational Calculus
Domain Relational Calculus
The Relation Model 15
Relational Algebra and Calculus Relational algebra and relational calculus are formal
languages associated with the relational model.
Informally,
Relational algebra is a (high-level) procedural
language and
Relational calculus a non-procedural language.
However, formally both are equivalent to one
another.
A language that produces a relation that can be
derived using relational calculus is relationally
complete.
The Relation Model 16
Relational Algebra Relational algebra operations work on one or more
relations to define another relation without changing
the original relations.
Thus, both operands and results are relations, so
output from one operation can become input to
another operation.
This allows expressions to be nested, just as in
arithmetic. This property is called closure.
The Relation Model 17
Relational Algebra (Overview) Relational Algebra consists of several groups of operations
Unary Relational Operations
SELECT (symbol: σ (sigma))
PROJECT (symbol: π (pi))
RENAME (symbol: ρ (rho))
Relational Algebra Operations From Set Theory
UNION ( ∪ ), INTERSECTION ( ∩ ), DIFFERENCE (or MINUS, – )
CARTESIAN PRODUCT ( x )
Binary Relational Operations
JOIN (several variations of JOIN exist)
DIVISION
Additional Relational Operations
OUTER JOINS, OUTER UNION
AGGREGATE FUNCTIONS These compute summary of information: for example, SUM, COUNT,
AVG, MIN, MAX
The Relation Model 18
Relational Algebra There are 5 basic operations, in relational algebra, that
performs most of the data retrieval operations needed.
Selection
Projection
Cartesian Product
Union
Set Difference
Also operations that can be expressed by 5 basic operations.
Join
Intersection
Division
The Relation Model 19
Relational Algebra Operations
The Relation Model 20
Relational Algebra Operations
The Relation Model 21
An Example (Home Rental Database)
The Relation Model 22
An Example (Home Rental Database)
The Relation Model 23
An Example (Home Rental Database)
The Relation Model 24
Selection (or Restriction) spredicate(R)
Selection operation works on a single relation R and defines a
relation that contains only those tuples (rows) of R that satisfy the
specified condition (predicate).
Ex.: List all staff with a salary greater than 10,000.
ssalary> 10000 (Staff)
The Relation Model 25
Projection Pcol1, . . . , coln(R)
Projection operation works on a single relation R and defines a
relation that contains a vertical subset of R, extracting the values of
specified attributes and eliminating duplicates.
Ex.: Produce a list of salaries for all staff, showing only the StaffNo,
fName, lName, and salary details.
PstaffNo, fName, lName, salary(Staff)
The Relation Model 26
Cartesian Product
R S
– The Cartesian product operation defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S.
– Ex.: List the names and comments of all renters who have viewed a property.
(PclientNo, fName, lName(Client))(PclientNo, propertyNo,comment (Viewing))
The Relation Model 27
Example - Cartesian Product and Selection
Use selection operation to extract those tuples where
Renter.Rno = Viewing.Rno.
sClient.clientNo=Viewing.clientNo
((PclientNo,fName,lName(Client))(PclientNo,propertyNo,comment(Viewing)))
Note that: Cartesian product and Selection can be reduced to a
single operation called a join.
The Relation Model 28
Union
R S
Union of two relations R and S defines a
relation that contains all the tuples of R, or
S, or both R and S, duplicate tuples being
eliminated.
R and S must be union-compatible.
If R and S have I and J tuples, respectively,
union is obtained by concatenating them into
one relation with a maximum of (I + J) tuples.
List all cities where there is either a branch
office or a property for rent.
Pcity(Branch) Pcity (PropertyForRent)
The Relation Model 29
Set Difference R – S
Define a relation consisting of the tuples that are in relation
R, but not in S.
R and S must be union-compatible.
List all cities where there is a branch office but no
properties for rent.
Pcity (Branch) – Pcity (PropertyForRent)
The Relation Model 30
Join Operations Join is a derivative of Cartesian product.
Equivalent to performing a selection, using the join
predicate as the selection formula, over the Cartesian
product of the two operand relations.
One of the most difficult operations to implement
efficiently in a relational DBMS and one of the
reasons why RDBMSs have intrinsic performance
problems.
The Relation Model 31
Join Operations There are various forms of join operation
Theta-join
Equi-join (a particular type of theta-join)
Natural join
Outer join
Semi-join
The Relation Model 32
Theta-join (q-join) R F S
Defines a relation that contains tuples
satisfying the predicate F from the Cartesian
product of R and S.
The predicate F is of the form R.ai q S.bi
where q may be one of the comparison
operators (<, < =, >, > =, =, ~ =).
The Relation Model 33
Theta-join (q-join) We can rewrite the theta-join in terms of the basic
Selection and Cartesian product operations.
R F S = sF(R S)
Degree of a theta-join is sum of the degrees of the
operand relations R and S. If predicate F contains
only equality (=), the term equi-join is used.
The Relation Model 34
Example - Equi-join List the names and comments of all clients who have
viewed a property for rent.
(PclientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo
(PclientNo, propertyNo, comment(Viewing))
The Relation Model 35
Natural Join R S
Natural join is an equi-join of the two relations R and S over
all common attributes x. One occurrence of each common
attribute is eliminated from the result.
List the names and comments of all clients who have
viewed a property for rent.
(PclientNo, fName, lName(Client)) (PclientNo, propertyNo, comment(Viewing))
The Relation Model 36
Outer Join Often in joining two relations, there is no matching
value in the join columns. To display rows in the
result that do not have matching values in the join
column, we use the outer join.
R S
The (left) outer join is a join in which tuples from
R that do not have matching values in the common
columns of S are also included in the result
relation.
The Relation Model 37
Example - Left Outer Join Produce a status report on property viewings.
PpropertyNo, street, city(PropertyForRent) Viewing
The Relation Model 38
Semi-join
R F S
The semi-join operation defines a relation that contains the tuples
of R that participate in the join of R with S.
Can rewrite Semijoin using Projection and Join:
R F S = PA(R F S)
List complete details of all staff who work at the branch in Partick.
Staff Staff.branchNo = Branch.branchNo and Branch.city = ‘Glasgow’ Branch
The Relation Model 39
Intersection R S
The intersection operation consists of
the set of all tuples that are in both R
and S.
R and S must be union-compatible.
Expressed using basic operations
R S = R – (R – S)
List all cities where there is both a branch
office and at least one property for rent.
Pcity(Branch) Pcity(PropertyForRent)
The Relation Model 40
Division R S
The division operation consists of the set of tuples
from R defined over the attributes C that match the
combination of every tuple in S.
Expressed using basic operations
T1 = PC(R)
T2 = PC(( ST1) – R)
T = T1 – T2
The Relation Model 41
Example - Division Identify all clients who have viewed all properties
with three rooms.
(PclientNo, propertyNo(Viewing))
(PpropertyNo(srooms = 3 (PropertyForRent)))
The Relation Model 42
Relational Algebra - Aggregate Function
Aggregate Function Operation
MAX Salary (EMPLOYEE) retrieves the maximum salary
value from the EMPLOYEE relation
MIN Salary (EMPLOYEE) retrieves the minimum Salary
value from the EMPLOYEE relation
SUM Salary (EMPLOYEE) retrieves the sum of the Salary
from the EMPLOYEE relation
COUNT SSN, AVERAGE Salary (EMPLOYEE) computes the
count (number) of employees and their average salary
Note: count just counts the number of rows, without
removing duplicates
The Relation Model 43
Relational Algebra - Aggregate Function
Group by Dno
The Relation Model 44
Relational Calculus Relational calculus query specifies what is to be
retrieved rather than how to retrieve it.
No description of how to evaluate a query.
In first-order logic (or predicate calculus), predicate
is a truth-valued function with arguments.
When we substitute values for the arguments,
function yields an expression, called a proposition,
which can be either true or false.
When applied to databases, relational calculus is in
two forms: tuple-oriented and domain-oriented.
The Relation Model 45
Relational Calculus
If a predicate contains a variable, as in ‘x is a
member of staff’, there must be a range for x.
When we substitute some values of this range for
x, the proposition may be true; for other values, it
may be false.
If P is a predicate, then we write the set of all x
such that P is true for x, as
{x | P(x)}
Predicates can be connected using (AND), (OR), and
~ (NOT)
The Relation Model 46
Tuple-oriented Relational Calculus Interested in finding tuples for which a predicate is
true. Based on use of tuple variables.
Tuple variable is a variable that ‘ranges over’ a
named relation: i.e., variable whose only permitted
values are tuples of the relation.
Specify range of a tuple variable S as the Staff
relation as:
Staff(S)
To find set of all tuples S such that P(S) is true:
{S | P(S)}
The Relation Model 47
Tuple-oriented Relational Calculus (Examples and quantifiers)
Examples of tuple-oriented relational calculus
To find details of all staffs earning more than £10,000:
{S | Staff(S) S.salary > 10000}
To find a particular attribute, such as salary, write:
{S.salary | Staff(S) S.salary > 10000}
Can use two quantifiers to tell how many instances the predicate applies to:
Existential quantifier $ (‘there exists’)
Universal quantifier " (‘for all’)
Tuple variables qualified by " or $ are called bound variables, otherwise called free variables.
The Relation Model 48
Existential quantifier (Tuple-oriented Relational Calculus)
Existential quantifier used in formulae that must
be true for at least one instance, such as:
{ S | Staff(S) ($B)(Branch(B)
(B.branchNo=S.branchNo)
B.city = ‘London’) }
Means ‘There exists a Branch tuple with same
branchNo as the branchNo of the current Staff
tuple, S, and is located in London’.
The Relation Model 49
Universal quantifier (Tuple-oriented Relational Calculus)
Universal quantifier is used in statements
about every instance, such as:
("B) (B.city ‘Paris’)
Means ‘For all Branch tuples, the address is
not in Paris’.
Can also use ~($B) (B.city=‘Paris’) which means ‘There are no branches with an address in Paris’.
The Relation Model 50
Tuple-oriented Relational Calculus Formulae should be unambiguous and make sense.
A (well-formed) formula is made out of atoms:
R(Si), where Si is a tuple variable and R is a relation
Si.a1 q Sj.a2
Si.a1 q c
Can recursively build up formulae from atoms:
An atom is a formula
If F1 and F2 are formulae, so are their conjunction, F1 F2;
disjunction, F1 F2; and negation, ~F1
If F is a formula with free variable X, then ($X)(F) and
("X)(F) are also formulae.
{ S1.a1, S2.a2, …, Sn.an | F(S1, S2, …, Sn) } General form
q is one of comparison operators (<,>, so on)
c is a constant.
The Relation Model 51
Tuple-oriented Relational Calculus (An example – I)
List the names of all managers who earn more than
£25,000.
{S.fName, S.lName | Staff(S)
S.position = ‘Manager’ S.salary > 25000}
List the staff who manage properties for rent in
Glasgow.
{S | Staff(S) ($P) (PropertyForRent(P)
(P.staffNo = S.staffNo) P.city = ‘Glasgow’)}
The Relation Model 52
Tuple-oriented Relational Calculus (An example – II)
List the names of staff who currently do not manage
any properties.
{S.fName, S.lName | Staff(S) (~($P)
(PropertyForRent(P)(S.staffNo = P.staffNo)))}
Or
{S.fName, S.lName | Staff(S) (("P)
(~PropertyForRent(P) ~(S.staffNo = P.staffNo)))}
The Relation Model 53
Tuple-oriented Relational Calculus (An example – III)
List the names of clients who have viewed a
property for rent in Glasgow.
{C.fName, C.lName | Client(C)
(($V)($P) (Viewing(V) Ù PropertyForRent(P)
(C.clientNo = V.clientNo)
(V.propertyNo=P.propertyNo)
P.city=‘Glasgow’ ) ) }
The Relation Model 54
Tuple-oriented Relational Calculus (An example – IV)
Expressions can generate an infinite set. For example:
{S | ~Staff(S)}
To avoid this, add restriction that all values in result
must be values in the domain of the expression.
Domain Relational Calculus
The Relation Model 55
Domain-oriented Relational Calculus
Uses variables that take values from domains instead
of tuples of relations.
If F(d1, d2, . . . , dn) stands for a formula composed of
atoms and d1, d2, . . . , dn represent domain variables,
then:
{d1, d2, . . . , dn | F(d1, d2, . . . , dn)}
is a general domain relational calculus expression.
The Relation Model 56
Domain-oriented Relational Calculus (An example – I)
Find the names of all managers who earn more than £25,000.
{fN, lN | ($sN, posn, sex, DOB, sal, bN)
(Staff (sN, fN, lN, posn, sex, DOB, sal, bN)
posn = ‘Manager’ sal > 25000)}
List the staff who manage properties for rent in Glasgow.
{sN, fN, lN, posn, sex, DOB, sal, bN |
($sN1,cty)(Staff(sN,fN,lN,posn,sex,DOB,sal,bN)
PropertyForRent(pN, st, cty, pc, typ, rms, rnt, oN, sN1, bN1)
(sN=sN1) cty=‘Glasgow’)}
The Relation Model 57
Domain-oriented Relational Calculus (An example – II)
List the names of staff who currently do not manage any properties for rent.
{fN, lN | ($sN)
(Staff(sN,fN,lN,posn,sex,DOB,sal,bN)
(~($sN1) (PropertyForRent(pN, st, cty, pc, typ,
rms, rnt, oN, sN1, bN1) (sN=sN1))))}
List the names of clients who have viewed a property for rent in Glasgow.
{fN, lN | ($cN, cN1, pN, pN1, cty)
(Client(cN, fN, lN,tel, pT, mR) Viewing(cN1, pN1, dt, cmt)
PropertyForRent(pN, st, cty, pc, typ, rms, rnt,oN, sN, bN)
(cN = cN1) (pN = pN1) cty = ‘Glasgow’)}
The Relation Model 58
Domain-oriented Relational Calculus
When domain relational calculus is restricted to safe
expressions, it is equivalent to tuple relational
calculus restricted to safe expressions, which is
equivalent to relational algebra.
This means every relational algebra expression has an
equivalent relational calculus expression, and vice
versa.
The Relation Model 59
Other Languages Transform-oriented languages are non-procedural
languages that use relations to transform input data into outputs (e.g. SQL).
Graphical languages provide the user with a picture or illustration of the structure of the relation. The user fills in an example of what is wanted and the system returns the required data in that format (e.g QBE).
Fourth-generation languages (4GLs) can create a complete customized application using a limited set of commands in a user-friendly, often menu-driven environment.
Some systems accept a form of natural language, sometimes called a fifth-generation language (5GL). This development is still in its infancy.
The Relation Model 60
Exercise The following tables form a part of a database held in a relational
DBMS: Hotel: (hotelNo, hotelName, hotelAddress)
Room: (roomNo, hotelNo, Type, Price)
Booking: (hotelNo, guestNo, dateFrom, dataTo, roomNo)
Guest: (guestNo, guestName, guestAddress)
Where the primary keys are underlined.
Generate the relational algebra, tuple-oriented and domain-oriented calculus for the following queries:
1. List all hotels.
2. List all single rooms with a price below 20 per night.
3. List the names and addresses of all guests.
4. List the price and type of all rooms at the Grosvenor Hotel.
5. List all guests currently staying at the Grovenor Hotel.
6. List the details of all rooms at the Grosvenor Hotel, including the name of the guest staying in the room, if the room is occupied.
7. List the guest details (guestNo, guestName, and guestAddress) of all guests staying at the Grosvenor Hotel.