chapter 56 relational database design compiled by eddie moorcroft

21
Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Upload: roxanne-norman

Post on 29-Jan-2016

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56Relational Database Design

Compiled by Eddie Moorcroft

Page 2: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

The Basics

A relational database is a widely-used type of DBMS (Database Management System) Data is held in tables The relationships are in fact links between

common fields in different tables. One row of a table holds one record. Each column in a table holds one field or attribute.

Page 3: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Typical table

IMPORTANT Standard NotationBOOK(AccessionNo, DeweyCode, Title, Author,DatePublished) Entity name in uppercase Key field (unique identifier) is underlined All attributes are shown in brackets, separated by commas

N.B A PRIMARY key uniquely references each record SIMPLE key is based on one attribute (field) COMPOSITE key is based on two or more attributes (fields)

AccessionNumber

DeweyCode Title Author DatePublished

88 121.9 Let’s Cook Chan, C 1992

123 345.440 Electricity Glendenning, V 1995

300 345.440 Riders Cooper, J 1995

657 200.00 Greek in 3 weeks Stavros, G 1990

etc…

Page 4: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Linking database tables

Tables are linked through the use of a common field. This field must be a key field in another table.

It is known as a foreign key. Consider this library database relationship:

One possible model in standard database notationBORROWER(BorrowerID, Name, Address)BOOK(AccessionNo, DeweyCode, Title, Author, DatePublished, BorrowerID,

DateDue)

Consider possible flaws in this model and suggest a better one.

BORROWER BOOKborrows

Page 5: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Normalisation Normalisation is a process used to come up with the

best possible design for a relational database. Tables should be organised in such a way that:1. No data is unnecessarily duplicated2. Data is consistent throughout the database3. The structure of each table is flexible enough

to allow you to enter as many or as few items as you want to

4. The structure should enable a user to make all kinds of complex queries relating data from different tables

Page 6: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

First normal form

Definition: A table is in first normal form if it contains

no repeating attributes or groups of attributes. i.e. if data is repeated, then it should be held in

another table, with a relationship – see example following.

Page 7: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Consider the following un-normalised data. STUDENT (Admission number, Surname, Forename, Title, Address,

Telephone no., Gender, Date of birth, Course code, Course Title, Level of course, Date joined, Date left, Lecturer number, Lecturer name, Room number )

We see that information about each course is repeated for each student.

So use the model of two entities STUDENT and COURSE. STUDENT (Admission number, Surname, Forename, Title, Address,

Telephone no., Gender, Date of birth, Course code ) COURSE (Course Code, Course Title, Level of course, Date joined,

Date left, Lecturer number, Lecturer name, Room number ) A student can take several courses, and each course has several

students attending. The relationship can be represented by the entity-relationship diagram:

First normal form

STUDENT COURSEattends

Page 8: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Sample data to be held in a database

It is always useful to view a sample of the un-normalised data, as above. Often, it is worthwhile to put this data in an Excel spreadsheet, and reorder it by different fields, to see if data is repeating.

Student number

Student name

Date of birth

Sex Course number

Course name Lecturer number

Lecturer name

12345 Jones, P 20/05/1984 M ITA201 A Level ICT T345267 Moorcroft, E

22433 Phillips, K 16/02/1983 F ITA201 A Level ICT T345267 Moorcroft, E

22433 Phillips, K 16/02/1983 F MUA201 A Level Music T773351 Parry, D

22433 Phillips, K 16/02/1983 F SPA201 A Level Spanish T876541 n/a

66688 Smith, J 25/12/1984 M MDA201 A Level Maths T666758 Newton, I

66688 Smith, J 25/12/1984 M MUA201 A Level Music T773351 Parry, D

etc

Page 9: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

In the previous example, the data should be split into two tables, STUDENT and COURSE, in standard database notation: STUDENT(StudentNumber, StudentName, DateOfBirth,Sex) COURSE(CourseNumber,CourseName,LecturerNumber,

LecturerName)

Consider the problems of creating a relationship between these two tables. Remember a link has to made between common fields

Course number

Course name Lecturer number

Lecturer name

ITA201 A Level ICT T345267 Moorcroft, E

MDA201 A Level Maths T666758 Newton, I

MUA201 A Level Music T773351 Parry, D

SPA201 A Level Spanish

T876541 n/a

etc

Student number

Student name

Date of birth Sex

12345 Jones, P 20/05/1984 M

22433 Phillips, K 16/02/1983 F

66688 Smith, J 25/12/1984 M

etc

Page 10: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Creating the relationship

We need a common field, but the problem is that because this is a many-to-many relationship, whichever table we put the link field into, there needs to be more than one field. e.g. STUDENT (student number, student name, date of birth, sex,

course number)is no good because the student is doing several courses, so which one would be mentioned?

Similarly, COURSE (course number, course name, lecturer number, lecturer name, student number)is no good either because each course has a number of students taking it.

How about allowing space for 3 courses on each student record?STUDENT (student number, student name, date of birth, sex, course1, course2, course3)

Why is this not a good idea?

Page 11: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Creating the relationship What we have engineered is a repeating attribute –

unacceptable in 1st normal form. In other words, the field course number is repeated 3 times The table is therefore NOT in first normal form. It would be

represented in standard notation with a line over the repeating attribute:

STUDENT (student number, student name, date of birth, sex, course number)

To put the data into first normal form, the repeating attribute must be removed.

Page 12: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Creating the relationshipIn its place, the field course number becomes part of the primary key in the student table. The tables are now as follows:STUDENT (student number, student name, date of birth, sex, course number)

COURSE (course number, course name, lecturer number, lecturer name)

Discussion: What is a primary key? Why does course number have to be part of the primary key?

Page 13: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

The situation so far:Student number

Student name

Date of birth

Sex Course number

12345 Jones, P 20/05/1984 M ITA201

22433 Phillips, K 16/02/1983 F ITA201

22433 Phillips, K 16/02/1983 F MUA201

22433 Phillips, K 16/02/1983 F SPA201

66688 Smith, J 25/12/1984 M MDA201

66688 Smith, J 25/12/1984 M MUA201

etc

Course number

Course name Lecturer number

Lecturer name

ITA201 A Level ICT T345267 Moorcroft, E

MDA201 A Level Maths T666758 Newton, I

MUA201 A Level Music T773351 Parry, D

SPA201 A Level Spanish

T876541 n/a

etc

STUDENT

COURSE

Page 14: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Second Normal FormDefinition

Only applicable if table has a compound key A table is second normal form (2NF) if it is in

first normal form (1NF) and no column that is not part of the primary key is dependent on a portion of the primary key: This may be said as:

A table in 2NF contains no partial dependencies

Page 15: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

The STUDENT table the earlier slide (see above) is not in 2NF because, for example, student name is dependent only on student number and not on course number

To put the tables in 2NF we need to introduce a third table to link the two entities

Student number

Student name

Date of birth

Sex Course number

12345 Jones, P 20/05/1984 M ITA201

22433 Phillips, K 16/02/1983 F ITA201

22433 Phillips, K 16/02/1983 F MUA201

22433 Phillips, K 16/02/1983 F SPA201

66688 Smith, J 25/12/1984 M MDA201

66688 Smith, J 25/12/1984 M MUA201

etc

Page 16: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

The tables in 2nd Normal Form (2NF)

STUDENT (student number, student name, date of birth, sex)

STUDENT_ COURSE(student number, course number)

COURSE (course number, course name, lecturer number, lecturer name)

What we had with the two entities was a many-to-many relationship

This situation will always need a link table to create two one-to-many relationships

Page 17: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Student number

Student name

Date of birth

Sex

12345 Jones, P 20/05/1984 M

22433 Phillips, K 16/02/1983 F

66688 Smith, J 25/12/1984 M

etc

Course number

Course name Lecturer number

Lecturer name

ITA201 A Level ICT T345267 Moorcroft, E

MDA201 A Level Maths T666758 Newton, I

MUA201 A Level Music T773351 Parry, D

SPA201 A Level Spanish

T876541 n/a

etc

Student number

Course number

12345 ITA201

22433 ITA201

22433 MUA201

22433 SPA201

66688 MDA201

66688 MUA201

etc

STUDENT

COURSE

STUDENT_COURSE

Page 18: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Third Normal FormDefinition A table in third normal form (3NF) contains

no non-key dependencies

In other words, the data in each entity (table) is checked to see if it relates entirely to the primary key, and not to any other attribute (i.e. field)

Page 19: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

The tables in 3rd Normal Form (3NF)

The COURSES table Contains an attribute for lecturer number and also

one for lecturer name. Lecturer name is dependent on lecturer number

(not on course number) Therefore a new table should be created for

LECTURER

Page 20: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Normalised Files

STUDENT (student number, student name, date of birth, sex) STUDENT_COURSE (student number, course number) COURSE (course number, course name, lecturer number) LECTURER (lecturer number, lecturer name)

This is the optimum way of holding this data, with no duplication.

The tables in Relational Databases should be in Third Normal Form

Page 21: Chapter 56 Relational Database Design Compiled by Eddie Moorcroft

Chapter 56 - ICT5

Comparing flat-files with RDB

A relational database is able to create links between tables representing different entities such as STUDENT and COURSE, through the use of foreign keys.

A flat file system is not able to link tables It is only useful for very simple databases which contain information

about just one entity.

It is impossible to ‘normalise’ a database in a flat file system, since this involves correctly establishing links between tables

Flat-file systems do not have any of the sophisticated features of a full DBMS such as the ability to set individual user-access rights, or allow several people to access the database at the same time