the relational model - simon fraser university€¦ · a relational database consists of a...

Post on 26-Jul-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Relational Model

What is the Relational Model

▪ Relations

▪ Domain Constraints

SQL Integrity Constraints Translating an ER diagram to the Relational

Model and SQL Views

A relational database consists of a collection of tables

▪ Each table has a unique name

▪ Each row represents a single entity or relationship

A table corresponds to the mathematical concept of a

set

▪ Note that the definition of a relation* is not the same as

the usual mathematical definition

▪ Relations correspond to tables

▪ Tuples correspond to rows

*per E.F. Codd

a relation is a set of tuples (d1, d2, ...,dn), where each element dj is amember of Dj, a data domain

The key construct in the relational model is a relation

▪ A DB is a collection of relations

Relations consist of a relation instance and a relation schema

▪ A relation instance corresponds to an actual table with a set of rows

▪ A relation schema consists of the column headings and domains of the table

A schema consists of

▪ The relation's name

▪ The name of each column in the table, also known as a field, or attribute

▪ The domain of each field

▪ Each domain has a set of associated values like a type

For example, the Customer relation

▪ Customer = (sin, firstName, lastName, age, income)

▪ The schema also specifies the domain of each attribute

sin firstName lastName age salary

111 Buffy Summers 23 43000.00

222 Xander Harris 22 6764.87

333 Rupert Giles 47 71098.65

444 Dawn Summers 17 4033.32

Relation instance

Relation schema: Customer = (sin, firstName, lastName, age, salary)

Each field (attribute) has a domain: char(11), char(20), char(20), integer, real

An instance is a (small) subset of the Cartesian product of the domains

A domain of a field is similar to the type of a variable

▪ The values that can appear in a field are restricted to the

field's domain

A relation instance is a subset of the Cartesian

product of the list of domains

▪ If a relation schema has domains D1 to Dn

▪ Where the domains are associated with fields f1 to fn

▪ A relation must be a subset of D1 D2 … Dn-1 Dn

▪ In practice a relation instance is usually some very small

subset of the Cartesian product

A relation instance is a set of tuples, or records

▪ Each tuple has the same number of fields as the schema

No two tuples are identical

▪ Relations are defined to be sets of unique tuples

▪ A set is a collection of distinct values

▪ In practice DBMS allow tables to have duplicate records in some circumstances

The order of the tuples is not important

The degree of a relation is the number of fields

▪ Also referred to as arity

▪ The degree of the Customer relation is 5

The cardinality of a relation instance is the number of tuples it contains

▪ The cardinality of the Customer relation instance is 4

A relational database is a collection of relations with distinct relation names

▪ A relational database schema is the collection of schemas for all relations in the database

Relational model

▪ Relation schema

▪ Relation instances (relations)

▪ Fields

▪ Tuples

Database Terminology

▪ Table schema

▪ Tables

▪ Columns or attributes

▪ Records or rows

SQL stands for Structured Query Language SQL is divided into two parts

▪ Data Manipulation Language (DML) which allows users to create, modify and query data

▪ Data Definition Language (DDL) which is used to define external and conceptual schemas

The DDL supports the creation, deletion and modification of tables

▪ Including the specification of domain constraints and other constraints

To create a table use the CREATE TABLE statement

▪ Specify the table name, field names and domains

CREATE TABLE Customer (

sin CHAR(11),

firstName CHAR(20),

lastName CHAR(20),

age INTEGER,

income REAL)

Question – is SQL case sensitive?

Answer – SQL keywords (createand table for example) are notcase sensitive.

Named objects (tables, columnsetc.) may be.

To insert a record into an existing table use the

INSERT statement

▪ The list of column names is optional

▪ If omitted the values must be in the same order as the columns

INSERT

INTO Customer(sin, firstName, lastName, age, income)

VALUES ('111', 'Sam', 'Spade', 23, 65234)

To delete a record use the DELETE statement

▪ The WHERE clause specifies the record(s) to be deleted

Be careful, the following SQL query deletes all

the records in a table

DELETE

FROM Customer

WHERE sin = '111'

DELETE

FROM Customer

Use the UPDATE statement to modify a record, or records, in a table

▪ Note that the WHERE statement is evaluated beforethe SET statement

Like DELETE the WHERE clause specifies which records are to be updated

UPDATE Customer

SET age = 37

WHERE sin = '111'

To delete a table use the DROP TABLE

statement

▪ This not only deletes all of the records but also

deletes the table schema

▪ http://xkcd.com/327/

DROP TABLE Customer

Columns can be added or removed to tables using the ALTER TABLE statement

▪ ADD to add a column and

▪ DROP to remove a column

ALTER TABLE Customer

ADD height INTEGER

ALTER TABLE Customer

DROP height

An integrity constraint restricts the data that can be stored in a DB

▪ To prevent invalid data being added to the DB

▪ e.g. two people with the same SIN or

▪ someone with a negative age

When a DB schema is defined, the associated integrity constraints should also be specified

A DBMS checks every update to ensure that it does not violate any integrity constraints

Domain Constraints

▪ Specified when tables are created by selecting the type of the data▪ e.g. age INTEGER

Key Constraints

▪ Identifies primary keys and other candidate keys

Foreign Key Constraints

▪ References primary keys of other tables

General constraints

A key constraint states that a minimal subset of the fields in a relation is unique

▪ i.e. a candidate key

SQL allows two sorts of key constraints The UNIQUE statement identifies candidate keys

▪ Records may not have duplicate values for the set of attributes specified in the UNIQUE statement

A PRIMARY KEY identifies the primary key

▪ Records may not have duplicate primary keys and

▪ May not have null values for primary key attributes

A primary key identifies the unique set of attributes that will be used to identify a record in a table

▪ In some cases there are obvious primary keys in the problem domain (e.g. SIN)

▪ In other cases, a primary key may be generated by the system

Candidate keys are identified to record the fact that a set of attributes should be unique

▪ Preventing duplicate data being entered into the table for this set of attributes

Assume that a patient can be uniquely identified by either SIN or MSP number

▪ SIN is chosen as the primary key

CREATE TABLE Patient (

sin CHAR(11),

msp CHAR(15),

firstName CHAR(20),

lastName CHAR(20),

age INTEGER )

CREATE TABLE Patient (

sin CHAR(11),

msp CHAR(15),

firstName CHAR(20),

lastName CHAR(20),

age INTEGER,

CONSTRAINT unique_msp UNIQUE (msp)

PRIMARY KEY (sin) )

Consider the abbreviated ERD shown below

▪ Takes is a many-to-many relationship, so a database table must be created for it (more on this later)

▪ It makes no sense for the Takes table to be allowed to contain the student ID of students who do not exist

The student ID in Takes should be identified as a foreign key, that references Student

takesStudent Course

A foreign key constraint references the primary key of another relation

The value of the foreign key attribute(s) must match a primary key in the referenced table

referencingrelationreferenced relation

TakesStudent Course

The attributes of the foreign key and the referenced primary key must be consistent

▪ The same number of attributes, with

▪ Compatible attribute types, but

▪ The attribute names may be different

A foreign key references the entire primary key

▪ Where the primary key is compound the foreign key must also be compound

▪ And not multiple single attribute foreign keys

CREATE TABLE Account(

accountNumber INTEGER,

type CHAR(5),

balance REAL,

custSIN CHAR(11),

PRIMARY KEY (accountNumber))

Customer Account

an account must be owned by a single customer

CREATE TABLE Account(

accountNumber INTEGER,

type CHAR(5),

balance REAL,

custSIN CHAR(11),

PRIMARY KEY (accountNumber),

CONSTRAINT fk_customer FOREIGN KEY

(custSIN) REFERENCES Customer )

owns

A DB may require constraints other than primary keys, foreign keys and domain constraints

▪ Limiting domain values to subsets of the domain▪ e.g. limit grade to the set of legal grades

▪ Or other constraints involving multiple attributes

SQL supports two kinds of general constraint

▪ Table constraints associated with a single table

▪ Assertions which may involve several tables and are checked when any of these tables are modified

These will be covered later in the course

Whenever a table with a constraint is modified the constraint must be checked

▪ A modification can be a deletion, a change (update) or an insertion of a record

▪ If a transaction violates a constraint what happens?

Primary key constraints

▪ A primary key constraint can be violated in two ways

▪ A record can be changed or inserted so that it duplicates the primary key of another record in the table or

▪ A record can be changed or inserted so that (one of) the primary key attribute(s) is null

▪ In either case the transaction is rejected

Consider these schema (the domains have been omitted for brevity)

▪ Customer = (sin, firstName, lastName, age, income)

▪ Account = (accountNumber, balance, type, sin)

▪ sin in Account is a foreign key referencing Customer

ownsCustomer

accountNumber

type

age

income sin

firstNamelastName

Account

balance

sin first last age salary

111 Buffy Summers 23 43000.00

222 Xander Harris 22 6764.87

333 Rupert Giles 47 71098.65

444 Dawn Summers 17 4033.32

number balance type sin

761 904.33 CHQ 111

856 1011.45 CHQ 333

903 12.05 CHQ 222

1042 10000.00 SAV 333

Inserting {409, 0, CHQ,555} into Account violatesthe foreign key on sin asthere is no sin 555 inCustomer

The insertion is rejected

To process the insertion aCustomer with a sin of 555must first be inserted intothe Customer table

Changing this record’s sinto 555 violates the foreignkey, again leading to thetransaction being rejected

sin first last age salary

111 Buffy Summers 23 43000.00

222 Xander Harris 22 6764.87

333 Rupert Giles 47 71098.65

444 Dawn Summers 17 4033.32

number balance type sin

761 904.33 CHQ 111

856 1011.45 CHQ 333

903 12.05 CHQ 222

1042 10000.00 SAV 333

Deleting Buffy violatesthe foreign key,because a record withthat sin exists in theAccount table

sin first last age salary

111 Buffy Summers 23 43000.00

222 Xander Harris 22 6764.87

333 Rupert Giles 47 71098.65

444 Dawn Summers 17 4033.32

number balance type sin

761 904.33 CHQ 111

856 1011.45 CHQ 333

903 12.05 CHQ 222

1042 10000.00 SAV 333

Updating Buffy's sin to666 also violates theforeign key, because arecord with the old sinis in the Account table

sin first last age salary

111 Buffy Summers 23 43000.00

222 Xander Harris 22 6764.87

333 Rupert Giles 47 71098.65

444 Dawn Summers 17 4033.32

number balance type sin

761 904.33 CHQ 111

856 1011.45 CHQ 333

903 12.05 CHQ 222

1042 10000.00 SAV 333

A deletion or update transaction in the referenced table may violate a foreign key

Different responses can be specified in SQL

▪ Reject the transaction (the default) – NO ACTION

▪ Delete or update the referencing record – CASCADE

▪ Set the referencing record's foreign key attribute(s) to null (only on deletion) – SET NULL

▪ Set the referencing record's foreign key attribute(s) to a default value (only on deletion) – SET DEFAULT

▪ The default value must be specified in the foreign key

The ER model is a high level design which can be translated into a relational schema

▪ And therefore into SQL

Each entity and relationship set can be represented by a unique relation

▪ Rows in an entity set table represent unique entities

▪ Rows in a relationship set table represent associations between entities

▪ Some relationship sets do not need to be represented by separate relations

▪ SQL constraints should be created to represent constraints identified in the ERD

A table that represents an entity set should have the following characteristics

▪ One column for each attribute

▪ The domain of each attribute should be known and specified in the table

▪ The primary key should be specified in the table

▪ Other constraints that have been identified outside the ER model should also be created where possible

Customer

age

income

sin

firstName

lastName

CREATE TABLE Customer (

sin CHAR(11),

firstName CHAR(20),

lastName CHAR(20),

age INTEGER,

income REAL,

PRIMARY KEY (sin) )

The attributes of a relationship set are

▪ The primary keys of the participating entity sets, and

▪ Any descriptive attributes of the relationship set

The mapping cardinalities of the entities involved in a relationship determine

▪ The primary key of the relationship set and

▪ Whether or not the relationship set needs to be represented as a separate table

Foreign keys should be created for the attributes derived from the participating entity sets

CREATE TABLE WorksIn (

sin CHAR(11),

branchName CHAR(30),

startDate DATETIME,

FOREIGN KEY (sin) REFERENCES Employee,

FOREIGN KEY (branchName) REFERENCES Branch,

PRIMARY KEY (sin, branchName) )

Employee Branch

startDate

salary

name

sinbudget

branchNamecity

worksIn

A separate table is required to represent a relationship set with no cardinality constraints

▪ i.e. relationships that are many to many

The primary key of the relationship set is the unionof the attributes derived from its entity sets

▪ A compound key made up of the primary keys of the participating entity sets

Attributes derived from its entity sets should be declared as foreign keys

CREATE TABLE Manages (

managerSIN CHAR(11),

subordinateSIN CHAR(11),

FOREIGN KEY (managerSIN) REFERENCES Employee,

FOREIGN KEY (subordinateSIN) REFERENCES Employee,

PRIMARY KEY (managerSIN, subordinateSIN) )

Employeemanages

sin

salary firstName

managersubordinate

lastName

Determine attributes for the table as for a relationship set without cardinality constraints

To determine the primary key of a binary relationship set

▪ If the relationship is one to one then the primary key of either entity set can be its primary key

▪ But it must be only one of the two primary keys

▪ If the relationship is many to one, or one to many the primary key of the many entity set is its primary key

▪ That is, the entity set whose entities can only appear once in the relationship set; the entity set with the key constraint

If the relationship set has no cardinality constraints the primary key is a compound key

▪ Union of the primary keys of the participating entity sets

If the relationship set has at least one cardinality constraint

▪ The primary key is taken from one of the entity sets with a many relationship

▪ That is, one of the entity sets whose entities can be involved in at most one relationship

A binary relationship does not require a table if at least one its entities has a key constraint

Add the relationship set's attributes to the table for the entity set that provided the primary key

▪ If the entity’s participation in the relationship is partial, some rows may not have data for these attributes

▪ Such rows will contain null for these fields, which wastes space, and has other undesirable properties

The attributes from the other entity sets involved in the relationship are specified as foreign keys

CREATE TABLE Employee (

sin CHAR(11),

name CHAR(40),

salary REAL,

branchName CHAR(30),

startDate DATETIME,

FOREIGN KEY (branchName) REFERENCES Branch,

PRIMARY KEY (sin) )

Employee Branch

startDate

salary

name

sinbudget

branchNameaddress

worksIn

Since an employee can onlywork in one branch there is stillonly one row for each employeein the Employee table

No table is created for theworksIn relationship

Where possible participation constraints should be included in a table specification

By declaring the attributes on which there is a foreign key as NOT NULL

▪ This approach only works when a relationship set is notrepresented in a separate table

▪ i.e. when the relationship is represented as a foreign key in a table that represents an entity set

▪ Specifying an attribute as NOT NULL forces every record of that entity set to have a value for the attribute

Some participation constraints must be modeled using assertions, triggers or other methods

sin and accountNumbercannot be null but theparticipation constraint isnot represented

CREATE TABLE Owns (

sin CHAR(11),

accountNumber INTEGER,

FOREIGN KEY (sin) REFERENCES Customer,

FOREIGN KEY (accountNumber) REFERENCES Account,

PRIMARY KEY (sin, accountNumber) )

Customer Accountowns

accountNumber

typebirthdate

income

sin

firstName

lastName

balance

Why not?

CREATE TABLE Employee (

sin CHAR(11),

name CHAR(40),

salary REAL,

branchName CHAR(30) NOT NULL,

startDate DATETIME,

FOREIGN KEY (branchName) REFERENCES Branch,

PRIMARY KEY (sin) )

Making branchName NOT NULLensures that every Employee mustwork in a branch

Employee Branch

startDate

salary

name

sinbudget

branchNameaddress

worksIn

CREATE TABLE WorksIn (

sin CHAR(11),

branchID INTEGER,

from DATETIME,

to DATETIME,

FOREIGN KEY (sin) REFERENCES Employee,

FOREIGN KEY (branchID) REFERENCES Branch,

PRIMARY KEY (sin, branchID, from, to) )

Employee Branch

Duration

worksIn

branchID

budgetsalary

sin

firstName

lastNamebranchName

from to

Note that a table is not requiredfor Duration as it would beredundant

All durations must appear inworksIn and Duration only hasprimary key attributes

This ERD requires the following tables

And an Employee table

Account with NOT NULL foreign key, sin, references Customer and represents Owns

Customer, with foreign key empSINto represent PersonalBanker

Employee

name

salarysin

personalbanker

Customer Accountowns

accountNumber

typebirthdate

income

sin

firstName

lastNamebalance

A weak entity set has the following characteristics

▪ Total participation in the identifying relationship

▪ A cardinality constraint with the identifying relationship

▪ A partial key

Therefore the weak entity set should:

▪ Include a foreign key (to its owner entity set), the attributes of which are part of the WES' primary key▪ Which prevents these attributes from being null

▪ Usually specify the foreign key as ON DELETE CASCADE

CREATE TABLE Room (

name CHAR(10),

squareFeet REAL,

taxNumber INTEGER,

PRIMARY KEY (name, taxNumber),

FOREIGN KEY (taxNumber) REFERENCES Building ON DELETE CASCADE)

House Roomcontains

namenumberstreet

city

squareFeet

taxNumber

There are two basic approaches to translating class hierarchies to tables

Create separate tables for the superclass and for each subclass

▪ The superclass table only contains attributes of the superclassentity

▪ The subclass tables contain their own attributes, and the primary key attributes of the superclass

▪ Superclass attributes are declared as both the primary key and a foreign key in the subclasses

▪ Cascade deletion of superclass records to subclasses

Or …

Create tables for the subclasses only

▪ The subclass tables contain their attributes, and all of the attributes of the superclass

▪ The primary key of the subclass is the primary key of the superclass

This assumes that there are no entities in the superclass that are not entities in a subclass

▪ i.e. That a coverage constraint exists

Otherwise coverage and overlap constraints must be specified using assertions

RentalProperty = (taxNumberFK, units, rate)

HeritageBuilding = (taxNumberFK, buildYear)

Building = (taxNumber, city, street, number, tax)

and …

Building

RentalProperty HeritageBuilding

rate

number

streetcity

units buildYearISA

tax

taxNumber

RentalProperty = (taxNumber, city, street, number, tax, units, rate)

HeritageBuilding = (taxNumber, city, street, number, tax, buildYear)

Building

RentalProperty HeritageBuilding

rate

number

streetcity

units buildYearISA

tax

taxNumber

and …

An aggregate entity is represented by the table defining the relationship set in the aggregation

The relationship set between the aggregate entity and the other entity has the following attributes:

▪ The primary key of the participating entity set, and

▪ The primary key of the relationship set that defines the aggregate entity, and

▪ Its own descriptive attributes, if any

The normal rules for determining primary keys and omitting tables apply

sponsorsDepartment Project

Employeemonitors

Create tables for the entity sets

Project Department Employee

In this example, each department,project pair is monitored by anemployee

And the relationship sets

sponsors monitors

There is a case where no table is required for an aggregate entity

▪ Even where there are no cardinality constraints

▪ If there is total participation between the aggregate entity and its relationship, and

▪ If the aggregate entity does not have any descriptive attributes

▪ Insert the attributes of the aggregate entity into the table representing the relationship with that entity

sponsorsDepartment Project

Employeemonitors

Create tables for the entity sets

Project Department Employee

If sponsors has no descriptiveattributes, create a monitorstable with the primary keys ofEmployee, Project andDepartment

A project, department pair must bemonitored by an employee, so eachsuch pair must appear in monitors

Views are not explicitly stored in a DB but are created as required from a view definition

▪ At least conceptually, if not necessarily in practice

Once defined, views may be referred to in the same way as a tables

View are useful for

▪ Convenience – users can access the data they require without referring to many tables

▪ Security – users can only access appropriate data

▪ Independence – views can help mask changes in the conceptual schema

Consider a DB containing these two tables:

▪ Branch =(branchName, managerSIN, budget)

▪ Employee = (sin, name, salary, birthDate, branchName)

▪ Where branchName in Employee is a foreign key

A view is needed to access the employee's SIN, name and manager’s SIN

CREATE VIEW Emp_Manager

(E.sin, E.name, B.managerSIN)

AS SELECT E.sin, E.name, B.managerSIN

FROM Branch B Employee E

WHERE B.branchName = E.branchName

Users should be able to interact with views as if they were base tables

▪ And without being aware of the distinction

▪ This goal is achieved when running queries on views

However updates to views may cause problems

▪ Particularly if a view is derived from multiple tables and does not include the primary keys of all tables

Generally, updates are only allowed on views derived from a single table

▪ And, even then, there are further restrictions

Two distinct tuples in a relation instance, cannot have identical values in all the fields of a superkey

▪ Or if two entities have the same superkey values, they must have the same values for their other attributes

▪ So must, in fact, be the same entity

We can describe this idea formally in this way

▪ A subset K of a relation R is a superkey of R if for all pairs of tuples t1 and t2 | t1 t2 then t1[K] t2[K]

▪ That is, where K is a set of attributes of a relation R▪ K is a superkey for R if for all pairs of different tuples there are no

two tuples with the same values for K

The notion of a superkey can be generalized and applied to smaller sets of attributes

▪ e.g. Vehicle = {vin, colour, model, capacity}▪ vin (vehicle identification number) is a superkey

▪ We can say that capacity is functionally determined by vin as there can only be one capacity for a particular vin

▪ The model and the capacity have the same relationship▪ A particular model can only have one capacity

▪ However, unlike the vin, the model does not functionally determine the colour

We call this relationship functional dependency

Denote functional dependencies as follows

▪ If R is a relation and if R and R we can express a

functional dependency as

▪ holds on R if for all pairs of tuples such that t1[] = t2[] then t1[] = t2[]

▪ A subset K of a relation R is a superkey if K R

In the vehicle example▪ vin {vin, colour, model, capacity}

▪ model {capacity}

means subset, proper subset

Why use functional dependencies?

▪ To specify constraints on relations

▪ To determine if a relation is legal given a set of functional dependencies

▪ To reason about whether or not the set of tables in a database is appropriate

▪ We will use functional dependencies to decompose tables into normal forms

We will discuss functional dependencies in more detail later in the course

top related