normalisation. nokats dog club a club formed since 1973. keeps records of members and their dogs on...
TRANSCRIPT
Normalisation
NoKats Dog Club• A club formed since 1973.
• Keeps records of members and their dogs on index cards.
• Cards are managed by the secretary and used for reference e.g. to inform members of shows.
• Excellent system in the early days as there were only 100 or so members!
• Now membership has grown to well over 5000. This is proving a real headache for the secretary.
Customer Record for the NoKats Dog ClubRecord Number:Initial:Surname:Title:Sex:Postcode:Tel No:
Dog Details:
Name Sex DofB Breed Origen of Breed
Breed life expectancy
Woof!
Problems for the secretary• Takes a long time to find an individual member’s record as there are now
5000 of them.– Especially true if you use multiple search criteria.
• Takes even longer to find a dogs record.– Especially if the member’s details are unknown for some reason.
• Finding all dogs born after a certain date.
• Sometimes records need to be sorted into a different order.
• When records are filled in some details are recorded over and over again.
• Details for letters have to be manually transferred from the cards (i.e. no mail merging!!!)
• New records sometimes contain spelling mistakes or errors in details about the dog, such as life expectancy etc.
Why would a computerised system help?• New records can be monitored for errors using
validation rules.
• All records can be easily sorted by different fields and multiple fields.
• Easy to search and find records.
• Reports can be generated showing search results.
• Mail merge is possible thus making letter writing a lot easier.
The flat file db
ID Initial Last Name
Title Post code
Tel Dog Name
Gen DofB Breed Origin L/E Years
1 A Fish Mrs CV35QW 02476111111 Bongo M 21/08/99 Poodle China 5
1 A Fish Mrs CV35QW 02476111111 Hiccup F 08/08/98 Poodle China 5
1 A Fish Mrs CV35QW 02476111111 Rizla F 09/09/00 Poodle China 5
1 A Fish Mrs CV35QW 02476111111 Gov F 11/01/01 Alsatian Germany 10
2 C Here Mrs CV27RF 01788222222
3 D Lapidated Mr CV14RR 02476333333 Manic M 11/01/01 Poodle China 5
3 D Lapidated Mr CV14RR 02476333333 Blip F 02/02/01 Spaniel France 7
4 X Ray Ms CV12YY 02476444444
5 Y Nott Mr CV24TT 01788555555 Ruff M 08/08/00 Poodle China 5
5 Y Nott Mr CV24TT 01788555555 Addi M 10/02/00 Poodle China 5
5 Y Nott Mr CV24TT 01788555555 Catnip F 10/03/99 Poodle China 5
5 Y Nott Mr CV24TT 01788555555 Emmi F 11/03/01 Poodle China 5
5 Y Nott Mr CV24TT 01788555555 Gov F 11/01/01 Alsatian Germany 10
… … … … … … … … … … … …
… … … … … … … … … … … …
… … … … … … … … … … … …
Good points about flat files• Quick and easy to set up.
• Ideal for smaller databases.
• They provide many of the searching and sorting tools required by most users of databases.
Problems with flat files• Lots of repeated data makes the database larger.
• Searching becomes inefficient if using repeated data.
• Dogs don’t have their unique ID.– Can’t add a new breed until a member introduces one to the club.
• Lots of details are held over and over again.
• If one of the members changed their name ALL their records would need to be changed.
• If a member joined with lots of dogs their records would have to be added over and over again.– Adding an anomaly.
Improving the flat file design• Having two smaller tables is no different to
having one large one…as long as we link them.
• Each member can own many dogs.
• Each dog is owned by one member.
A table for members’details
A table for dogs’details
owns
is owned by
MemberID Initial Last Name Title Post code Tel
1 A Fish Mrs CV35QW 02476111111
2 C Here Mrs CV27RF 01788222222
3 D Lapidated Mr CV14RR 02476333333
4 X Ray Ms CV12YY 02476444444
5 Y Nott Mr CV24TT 01788555555
DogDogID Dog Name Gen DofB Breed Origin L/E Years ID
1 Bongo M 21/08/99 Poodle China 5 1
2 Hiccup F 08/08/98 Poodle China 5 1
3 Rizla F 09/09/00 Poodle China 5 1
4 Gov F 11/01/01 Alsatian Germany 10 1
5 Manic M 11/01/01 Poodle China 5 3
6 Blip F 02/02/01 Spaniel France 7 5
7 Ruff M 08/08/00 Poodle China 5 5
8 Addi M 10/02/00 Poodle China 5 5
9 Catnip F 10/03/99 Poodle China 5 5
10 Emmi F 11/03/01 Poodle China 5 5
11 Gov F 11/01/01 Alsatian Germany 10 5
How has it improved?• Each member’s details are now only stored once.
• Each dog is now identified by their own unique identity number.
• A new member now has their details entered only once into the database.
• Changes to their record only need to happen once.
• Records that need to be deleted only need to be deleted once.
Can we improve further?• At present, we still need to record dog details
repeatedly.
• We still can’t add a breed unless a member introduces one to the club.
• We can try splitting up the dog table into two…
Like this…
A table for members’
details
A table for dogs’details
owns
is owned by
A table for breeds’details
is a particular
appears in
• Each member can own many dogs• Each dog can be owned by only one owner• Each dog can only be one particular breed• Each breed can appear in many different dogs
What we have now
DogID Dog Name Gen DofB ID BreedID
1 Bongo M 21/08/99
1 1
2 Hiccup F 08/08/98
1 1
3 Rizla F 09/09/00
1 1
4 Gov F 11/01/01
1 2
5 Manic M 11/01/01
3 1
6 Blip F 02/02/01
3 3
7 Ruff M 08/08/00
5 1
8 Addi M 10/02/00
5 1
9 Catnip F 10/03/99
5 1
10 Emmi F 11/03/01
5 1
11 Gov F 11/01/01
5 2
BreedID Breed Origin L/E Years
1 Poodle China 5
2 Alsatian
Germany 10
3 Spaniel France 7
ID Initial Last Name Title Post code Tel
1 A Fish Mrs CV35QW 02476111111
2 C Here Mrs CV27RF 01788222222
3 D Lapidated Mr CV14RR 02476333333
4 X Ray Ms CV12YY 02476444444
5 Y Nott Mr CV24TT 01788555555
Even better improvements• Each member’s details are now only stored once.
• Each dog is now identified by their own unique identity number.
• A new member now has their details entered only once into the database.
• Changes to their record only need to happen once.
• Records that need to be deleted only need to be deleted once.
And now…
• We can store information about each breed only once.
• We can add a new breed to the database without needing a member to introduce a new breed to the club.
Question• Define a flat file.
A flat file is…
A flat file is a file where all the data in the database is stored in only one table. The rows in the table correspond to records whilst the columns correspond to fields.
Click to reveal
ER Diagrams• Entity Relationship Diagrams
• They give a quick idea of how relationships are formed within a database system.
• They can show:– One-to-One– One-to-Many– Many-to-Many
One-to-One
• A shop may sell CD’s and Fish.• The product table will store details of the items they
sell.• The _Info tables give information about the product.• A shop can sell one type of COD and that particular
type of COD can only have one description and price.
Product
CD_Info
Fish_Info
One-to-Many
• A owner can have many dogs
• A dog can be owned by only one owner
Owner
Pet
owns
is owned by
Many-to-Many
• A student can study many A-Levels
• An A-Level subject can have many students studying it.
Student
A-Level
studies
is studied by
Fixing a Many-to-Many• This:
• Converts to this:
• By adding a linking table which uses both original primary keys as its primary key.– Known as a compound primary key.
Primary Key = X XY Primary Key = Y
Entity A Entity BNew Table
Primary Key = X Primary Key = Y
Entity A Entity B
Our previous example of Many-to-Many
• Many-to-Many Problem Resolved!
• Each record in the Pupil_A-Level table relates to only one A-Level and one pupil. There are no duplicating records!
Pupil_ID Name
1 Smith
2 Jones
3 Ali
4 Potts
5 Kanu
6 Stay
A-Level_ID
Code
1 Maths
2 ICT
3 Eng
4 Graphics
5 Bus_Stud
6 History
Pupil_ID A-Level_ID
1 1
1 4
1 5
2 2
2 3
2 4
… …
… …
… …
Pupil Pupil_A-Level A-Level
Normalisation• There are 3 stages of normalisation that you need to
know about:– First Normal Form (1NF)– Second Normal Form (2NF)– Third Normal Form (3NF)
• Minimise redundant data
• Minimise the chance of making data inconsistent.– Think about flat files and how you have to change all the
records if a member’s name changes…you may miss one in the flat file…but not in a relational database!!
The Analysis Table• Create a table with 5 columns.
• This table will be used to help you work through the normalisation process.
UNF 1NF 2NF 3NF Name
Here you write down all the attributes which are contained in the flat file database.
Write down the attributes as they would appear in first normal form.
Write down the attributes as they would appear in second normal form.
Write down the attributes as they would appear in third normal form.
Here you put the names of the tables that will be in your 3NF relational database.
Example – DVD NOW!Member’s Record
ID number: 241
Surname: Smith
Address 1: 23 Jones Rd.
Address 2: Walforth
Phone: 123456
Joined: 10/01/99
DVD_ID DVDName DateDueBack Cert CertDescription
323 Roboteacher 11/08/04 18 18 and over only
6512 Harrytron 12/09/04 PG Parental Guidance
441 The Brainbox 30/09/04 PG Parental Guidance
DVD Now!
UNF• Un-normal form
• First stage…write down all the attributes that you need to store in your database in the first column of your analysis table.
• Underline the possible primary keys
• Put the repeating data together within brackets.
UNF
• Now we know what attributes we need to store…repeating attributes are in (brackets).
• Time to move onto first normal form!
UNF 1NF 2NF 3NF Name
MemID
Surname
Address1
Address2
Phone
Join
(DVD_ID
DVDName
Due
Cert
CertDescription)
Converting to 1NF• A table is in 1NF if it contains:
– No repeating groups.
• You need to copy the primary key of the non-repeating group into the table of the repeating group.– You will need to underline this foreign key as it will
form part of a compound primary key.
• Copy the attributes into their own tables under the 1NF column.
1NF
• You now have two groups of attributes. Not just the one (like in UNF).
• Basically, you have created two tables that are linked by the MemID.
UNF 1NF 2NF 3NF Name
MemID
Surname
Address1
Address2
Phone
Join
(DVD_ID
DVDName
Due
Cert
CertDescription)
MemID
Surname
Address1
Address2
Phone
Join
MemID
DVD_ID
DVDName
Due
Cert
CertDescription
Converting to 2NF• A database is in 2NF if it is in 1NF and the non-key attributes
depend ENTIRELY on the primary key.
• If any table in 1NF has only one primary key then it is already in 2NF and can be copied to the 2NF column as it is.
• You will need to identify those attributes that are not related to the compound key.– E.g. If I wanted to find out the title of a DVD and I had the DVD_ID I
would be able to find the DVD title. But If I gave you the Mem_ID I wouldn’t be able to find out the DVD title.
– Again, giving the DVD_ID I would be able to find the certificate rating for it.
• The only attribute in our example that requires some thought is ‘Due’. A DVD can only be due if a member has it, which means we need both Mem_ID and DVD_ID to find this out.
Converting to 2NF• What do you do now?
• Copy the non-key attributes that are dependant on DVD and put them in their own table.
• Copy part of the primary key that those non-key attributes depend upon from 1NF into the new table you just created in 2NF.– E.g. Copy DVD_ID attribute across. Underline this to show
that it is the primary key of the new group.
• Finally, copy across any attributes left over from the old group in 1NF into 2NF, into their own table.
2NF
• You now have two groups of attributes. Not just the one (like in UNF).
• Basically, you have created two tables that are linked by the MemID.
UNF 1NF 2NF 3NF Name
MemID
Surname
Address1
Address2
Phone
Join
(DVD_ID
DVDName
Due
Cert
CertDescription)
MemID
Surname
Address1
Address2
Phone
Join
MemID
DVD_ID
DVDName
Due
Cert
CertDescription
MemID
Surname
Address1
Address2
Phone
Join
MemID
DVD_ID
Due
DVD_ID
DVDName
Cert
CertDescription
You have fixed the many-to-many problem!
Converting to 3NF• Needs to be in 2NF first.
• Time to remove non-key dependencies– Attributes which are dependent on another
attribute that isn’t a primary key.
• For example:– The certificate description is dependent on the
certificate ID…but the certificate ID is not a primary key…it is an attribute of the DVD.
3NF
• This is now in 3NF because by definition it is in 2NF and doesn’t contain any non-key dependencies.
UNF 1NF 2NF 3NF Name
MemID
Surname
Address1
Address2
Phone
Join
(DVD_ID
DVDName
Due
Cert
CertDescription)
MemID
Surname
Address1
Address2
Phone
Join
MemID
DVD_ID
DVDName
Due
Cert
CertDescription
MemID
Surname
Address1
Address2
Phone
Join
MemID
DVD_ID
Due
DVD_ID
DVDName
Cert
CertDescription
MemID
Surname
Address1
Address2
Phone
Join
MemID
DVD_ID
Due
DVD_ID
DVDName
CertID*
CertID
CertDescription
MEMBER
MEMBER_DVD
DVD
CERTIFICATE
What have we ended up with?
• Member can rent many DVDs.• DVD can only be rented by a member at any one time.
• DVD can be rented many times.• A rental can only be for a particular DVD.
• Certificate can be on many DVDs.• DVD can only have one certificate.
Member DVDMember_DVD
Certificate
Rules to remember• 1NF
– No repeating attributes and primary key of non repeating data must be placed in table of repeating data.
• 2NF– Look for non-key attributes that depend upon all the
attributes in the compound primary keys i.e. does the attribute depend on both primary keys?
• 3NF– Look for non-key attributes that depend upon other non-
key attributes.
Try for yourself…• Normalise the following examples…
• The answers will be posted later in the VLE.
1. Bank Account Record
Customer ID: 43543
Surname: Jones
Postcode: BA3 6TH
Telephone: 654321
Account Number Account Type Description Balance
22442546 Gold Savings Account £10.00
44342322 Platinum Current Account £100.00
77164645 Silver Internet Account £500.00
Normalise this data using the Analysis Table.
Some of it has been done for you on the next slide.
1. Bank Account Record
UNF 1NF 2NF 3NF Name
Cust-ID
Surname
Postcode
Telephone
(AccountNumber
AccountType
Description
Balance)
Cust-ID
Surname
Postcode
Telephone
Cust-ID
AccountNumber
AccountType
Description
Balance
Cust-ID
Surname
Postcode
Telephone
Cust-ID
Surname
Postcode
Telephone
AcountType
Description
CUSTOMER
CUST_ACCOUNT
ACCOUNT_DESC
2. Student Record
Student ID: 1232223
Surname: Patel
Initial: A
DofB: 09/06/86
CourseNo CourseName TeacherID TeacherSurname TeacherInitial
1 Maths 23 Harris P
2 ICT 18 Dolinski A
3 Drama 23 Morse B
4 Business Studies
11 Woodhall S
Normalise this data using the Analysis Table.
Some of it has been done for you on the next slide.
2. Student Record
UNF 1NF 2NF 3NF Name
CourseNumber
CourseName
TeacherID*
STUDENT
STUDENT_COURSE
COURSE
TEACHER
3. Camera Parts• A database is to be kept that records all the parts that make up a particular model
of digital camera.
• Following an analysis of the problem, it is found that these details are to be kept about each individual camera:
– The camera model (which is unique for every camera)– The date the camera was released for sale.
• For each camera, the following details about every part needed to build it will be kept:
– A part number.– A part name.– The ID of the manufacturer of the part.– The name of the manufacturer of the part.
• Your job is to produce a normalised design for the database.
• HINT: If you are unsure of how to start why not try sketching out a typical record card for one camera and then fill it in with some made-up data. It will help you visualise the problem.
3. Camera Parts
UNF 1NF 2NF 3NF Name