normalization of tables “between two evils, choose neither; between two goods, choose both.”...

29
Normalization of Tables “Between two evils, choose neither; between two goods, choose both.” Tryon Edwards

Upload: poppy-leonard

Post on 26-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Normalization of Tables

“Between two evils, choose neither; between two goods, choose both.” Tryon Edwards

Steps to E-R Transformation

1. Identify entities

2. Identify relationships

3. Determine relationship type

4. Determine level of participation

5. Assign an identifier for each entity

6. Draw completed E-R diagram

7. Deduce a set of preliminary skeleton tables along with a proposed primary key for each table (using rules provided)

8. Develop a list of all attributes of interest (not already listed and systematically assign each to a table in such a way to achieve a 3NF design (i.e., no repeating groups, no partial dependencies, and no transitive dependencies)

Tables

Database design is the process of separating information into multiple tables that are related to each other

Single table designs work only for the simplest of situations in which data integrity problems are easy to correct

Anomalies (abnormalities) often arise in single table designs as a result of inserting, deleting, or updating records

Some tables are better structured than others (i.e., result in fewer anomalies)

Redundancy

Unnecessary repetition or duplication of data increases likelihood of errors due to keying inconsistencies

Multi-valued Problems

Solution 1? Include all author’s names in a single field Difficult to search for a single author’s name or create an

alphabetical list of authors

Multi-valued Problems

Solution 2? Add multiple columns, one for each value empty fields waste storage space awkward to search across fields (e.g., Any books by Snoopy? Must search Author1, Author2, etc.) necessitates the creation of a new column every time a book has an additional author

Multi-valued Problems

Solution 3? Add multiple rows, one for each value Data about a book must be repeated for as many times as there are authors of a book (also creates

redundancy which lead to keying errors and unnecessarily wasting storage space with large files) count of total # of books or # from each publisher would be wrong

Update Anomalies

Update Anomalies To update an agent’s telephone number, each instance must be

changed if we miss an item or enter it incorrectly we create an unreliable

table

sometimes previous errors propagate errors further

An update anomaly occurs when multiple record changes for a single attribute are necessary.

Deletion Anomalies

Deletion anomalies What happens if a customer record is deleted? What happens if an agent record is deleted?

A deletion anomaly occurs when the removal of a record results in the unintended loss of important information.

Insertion Anomalies Insertion anomalies

What happens if we want to enter information regarding an agent for whom we do not have a customer?

Do we add null values (blanks) for the other fields?

An insertion anomaly occurs when there is not a reasonable place to assign attributes and attribute values to records.

The Problem with Nulls

Category Total Occurences

0

Accessories 2

Bikes 1

Components 1

1. Nulls used in mathematical expressions- unknown quantity leads to unknown total value- misleading value of all inventory

Product ID Product Description Category Price Quantity Total Value

801 Shur-Lock U-Lock Accessories 75.00

802 SpeedRite Cyclecomputer 60.00 20 1,200.00

803 SteelHead Microshell HelmetAccessories 40.00 40 1,600.00

804 SureStop 133-MB Brakes Components 25.00 10 250.00

805 Diablo ATM Mountain Bike Bikes 1,200.00

806 Ultravision Helmet Mount Mirrors 7.45 10 74.50

Total: 3,124.50

2. Nulls used in aggregate functions- blanks exist under category- cannot be counted because they don’t exist!

Database Design Problems

Use of the relational database model removes some database anomalies

Further removal of database anomalies relies on a structured technique called normalization

Presence of some of these anomalies is sometimes justified in order to enhance performance

Thus, database design consists of balancing the art of design with the science of design

Normalization Goal in database design to create well-structured tables Transform E-R models to tables following the rules provided Assuring tables are well-structured with minimal problems

(redundancy, multi-valued attributes, update anomalies, insertion anomalies, deletion anomalies) is achieved using structured technique called normalization

Normalization is the structured decomposition of one table into two or more tables using a procedure designed to determine the most appropriate split

Normalization our method of making sure the E-R design was correct in the first place

Normalization refers to a series of forms: we will cover 1NF to 3NF, which is usually sufficient. Note that there are also: 4NF, Boyce-Codd Normal Form (BCNF), Fifth Normal Form (5NF) and Domain-Key Normal Form (DKNF)

First Normal Form

A table is in first normal form if it meets the following criteria: The data are stored in a two-dimensional table with no two rows identical and there are no repeating groups. The following table in NOT in first normal form because it contains a multi-valued

attribute (an attribute with more than one value in each row).

Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones hiking, cooking3 Francine Moire golf, theatre, hiking2 Anne Abel concerts

Handling multi-valued attributes: Incorrect Solutions

Member_IDMemb_FName Memb_LName Hobby1 Hobby2 Hobby31 Rodney Jones hiking cooking3 Francine Moire golf theatre hiking2 Anne Abel concerts

Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones fishing1 Rodney Jones cooking3 Francine Moire golf3 Francine Moire theatre3 Francine Moire hiking2 Anne Abel concerts

Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones hiking, cooking3 Francine Moire golf, theatre, hiking2 Anne Abel concerts

Handling multi-valued attributes: Correct Solution

Member_IDMemb_FName Memb_LName1 Rodney Jones3 Francine Moire2 Anne Abel

Member_ID Hobby1 hiking1 cooking3 golf3 theatre3 hiking2 concerts

Create another entity (table) to handle multiple instances of the repeating group. This second table is then linked to the original table with an identifier (i.e., foreign key). This solution has the following advantages:

no limit to the number of hobbies per member no waste of disk space searching becomes much easier within a column (e.g., who likes hiking?)

Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones hiking, cooking3 Francine Moire golf, theatre, hiking2 Anne Abel concerts

Handling Repeating Groups An attribute can have a group of several data entries. Repeating groups can be

removed by creating another table which holds those attributes that repeat. This second table (validation table) is then linked to the original table with an identifier (i.e., foreign key)

Advantages: fewer characters tables; reduces miskeying, update anomalies

Product_ID Product_Name Category Price

801 Shur-Lock U-Lock Accessory 75.00

802 SpeedRite Cyclecomputer Component 60.00

803 SteelHead Microshell Helmet Accessory 40.00

804 SureStop 133-MB Brakes Component 25.00

805 Diablo ATM Mountain Bike Bike 1,200.00

806 Ultravision Helmet Mount MirrorsAccessory 7.45

Category_ID Category1 Accessory2 Component3 Bike

Product_ID Product_Name Category Price801 Shur-Lock U-Lock 1 75.00802 SpeedRite Cyclecomputer 2 60.00803 SteelHead Microshell Helmet 1 40.00804 SureStop 133-MB Brakes 2 25.00805 Diablo ATM Mountain Bike 3 1200.00806 Ultravision Helmet Mount Mirrors 1 7.45

Second Normal Form A table is in second normal form if it meets the following criteria: The relation

is in first normal form, and, all nonkey attributes are functionally dependent on the entire primary key.

Applies only to tables that have a composite primary key. In the following table, both the EmpID and Training (composite primary key) determine Date,

whereas, only EmpID (part of the primary key) determines Dept.

EmpID Training Date Dept1 Word 12-Sep-99 Oncology3 Excel 14-Oct-99 Paediatrics2 Excel 14-Oct-99 Renal1 Access 23-Nov-99 Oncology

Removing Partial Dependencies Remove partial dependencies by separating the relation into two relations. Reduces the problems of

update anomalies delete anomalies insert anomalies redundancies

EmpID Training Date1 Word 12-Sep-993 Excel 14-Oct-992 Excel 14-Oct-991 Access 23-Nov-99

EmpID Dept1 Oncology2 Renal3 Paediatrics

EmpID Training Date Dept1 Word 12-Sep-99 Oncology3 Excel 14-Oct-99 Paediatrics2 Excel 14-Oct-99 Renal1 Access 23-Nov-99 Oncology

Third Normal Form A table is in third normal form if it meets the following criteria: The relation is in

second normal form, and, a nonkey field is not functionally dependent on another nonkey field.

The following table is in second normal form but NOT in third normal form because Member_Id (the primary key) does not determine every attribute (does not determine RegistrationFee). RegistrationFee is determined by Sport.

Member_ID Memb_FName Memb_LName Sport RegistrationFee1 Rodney Jones Swimming $1003 Francine Moire Tennis $2002 Anne Abel Tennis $2004 Goro Azuma Skiing $150

Member ID FName, LName, Lesson; Lesson Cost

Removing non-key Transitive Dependencies

Remove transitive dependencies by placing attributes involved in a new relational table. Reduces the problems of:

update anomalies delete anomalies insert anomalies redundancies

MemberID MembFName MembLName Sport1 Rodney Jones 13 Francine Moire 22 Anne Abel 24 Goro Azuma 1

SportID Sport RegFee1 Swimming $1002 Tennis $2003 Skiing $150

MemberID MembFName MembLName Sport RegFee1 Rodney Jones Swimming $1003 Francine Moire Tennis $2002 Anne Abel Tennis $2004 Goro Azuma Skiing $150

Normalization Example: Video StoreA video rental shop tracks all of their information in one table. There are now 20,000 records in it. Is it possible to achieve a more efficient design? (They charge $10/movie/day.)Cust_Name Cust_address Cust_Phone Rental_date

Rodney Jones 23 Richmond St. 681-9854 15-Oct-99Francine Moire 750-12 Kipps Lane 672-9999 4-Nov-99Anne Abel 5 Sarnia Road 432-1120 3-Sep-99Rodney Jones 23 Richmond St. 681-9854 22-Sep-99

Video_1 Video_2 Video_3 VideoType_1 VideoType_2 VideoType3Gone with the WindBraveheart Mississippi BurningClassic Adventure AdventureManhatten ComedyManhatten The African Queen Comedy ClassicNever Say Never AgainSilence of the Lambs Adventure Horror

Return_date TotalPrice Paid?17-Oct-99 60.00$ yes

4-Sep-99 20.00$ yes26-Sep-99 80.00$ yes

VIDEO (Cust_name, Cust_address, Cust_phone, Rental_date, Video_1, Video_2, Video_3, VideoType_1, VideoType_2, VideoType3, Return_date, Total_Price, Paid?)

Is the Video store in 1NF?No attributes should form repeating groups - remove them by creating another table. There are repeating groups for videos and customers.

Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120

VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi Burning Adventure6 The African Queen Classic7 Silence of the Lambs Horror

CUSTOMER (Cust_Num, Cust_Name, Cust_address_Cust_phone

VIDEO (VideoNum, VideoName, VideoType

RENTAL (Cust_num, VideoNum, Rental_date, Return_date, TotalPrice, Paid?)Cust_Num VideoNum Rental_date Return_date TotalPrice Paid?

1 1,4,5 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 2,6 3-Sep-99 4-Sep-99 20.00$ yes1 3,7 22-Sep-99 26-Sep-99 80.00$ yes

Video Store: 1NF (cont’d)

Have not yet removed all repeating groups - video is a multi-valued attribute - move to another table.

RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes

RentalNum VideoNum1 11 41 52 23 23 64 34 7

Cust_Num VideoNum Rental_date Return_date TotalPrice Paid?1 1,4,5 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 2,6 3-Sep-99 4-Sep-99 20.00$ yes1 3,7 22-Sep-99 26-Sep-99 80.00$ yes

RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)

RENTALDETAILS (RentalNum, VideoNum)

The Video Store is now in 1NFCust_Num Cust_Name Cust_address Cust_Phone

1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120

VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi Burning Adventure6 The African Queen Classic7 Silence of the Lambs Horror

CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone

VIDEO (VideoNum, VideoName, VideoType

RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes

RentalNum VideoNum1 11 41 52 23 23 64 34 7

RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)

RENTALDETAILS (RentalNum, VideoNum)

Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120

VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi Burning Adventure6 The African Queen Classic7 Silence of the Lambs Horror

CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone

VIDEO (VideoNum, VideoName, VideoType

RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes

RentalNum VideoNum1 11 41 52 23 23 64 34 7

RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)

RENTALDETAILS (RentalNum, VideoNum)

Is the Video Store in 2NF?The only table that has a composite primary key has no other fields, therefore, yes.

RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes

Is the Video Store in 3NF?Does each attribute in each table depend upon the primary key?

Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120

VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi BurningAdventure6 The African Queen Classic7 Silence of the LambsHorror

RentalNum VideoNum1 11 41 52 23 23 64 34 7

Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120

VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi BurningAdventure6 The African Queen Classic7 Silence of the LambsHorror

CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone

VIDEO (VideoNum, VideoName, VideoType

RentalNum Cust_Num Rental_date1 1 15-Oct-992 2 4-Nov-993 3 3-Sep-994 1 22-Sep-99

RentalNum VideoNum ReturnDate Amt_Paid1 1 16-Oct-99 $101 4 17-Oct-99 $201 5 16-Oct-99 $102 2 5-Nov-99 $103 2 4-Sep-99 03 6 6-Sep-99 04 3 24-Sep-99 $54 7 16-Sep-99 0

RENTAL (RentalNum, Cust_Num, Rental_date)

RENTALDETAILS (RentalNum, VideoNum, ReturnDate, Amt_Paid)

The Video Store is now in 3NFYes, because in each table, every attribute depends on the primary key and not on any other key.

Conflicting Goals of Design

Database design must reconcile the following requirements:

Design elegance requires that the design must adhere to design rules concerning nulls, derived attributes, redundancies, relationship types, etc.

Information requirements are dictated by the end users Operational (transaction) speed requirements are also dictated by

the end users

Clearly, an elegant database design that fails to address end user information requirements or one that forms the basis for an implementation whose use progresses at a snail's pace has little practical use.