normalization of tables “between two evils, choose neither; between two goods, choose both.”...
TRANSCRIPT
Normalization of Tables
“Between two evils, choose neither; between two goods, choose both.” Tryon Edwards
Steps to E-R Transformation
1. Identify entities
2. Identify relationships
3. Determine relationship type
4. Determine level of participation
5. Assign an identifier for each entity
6. Draw completed E-R diagram
7. Deduce a set of preliminary skeleton tables along with a proposed primary key for each table (using rules provided)
8. Develop a list of all attributes of interest (not already listed and systematically assign each to a table in such a way to achieve a 3NF design (i.e., no repeating groups, no partial dependencies, and no transitive dependencies)
Tables
Database design is the process of separating information into multiple tables that are related to each other
Single table designs work only for the simplest of situations in which data integrity problems are easy to correct
Anomalies (abnormalities) often arise in single table designs as a result of inserting, deleting, or updating records
Some tables are better structured than others (i.e., result in fewer anomalies)
Redundancy
Unnecessary repetition or duplication of data increases likelihood of errors due to keying inconsistencies
Multi-valued Problems
Solution 1? Include all author’s names in a single field Difficult to search for a single author’s name or create an
alphabetical list of authors
Multi-valued Problems
Solution 2? Add multiple columns, one for each value empty fields waste storage space awkward to search across fields (e.g., Any books by Snoopy? Must search Author1, Author2, etc.) necessitates the creation of a new column every time a book has an additional author
Multi-valued Problems
Solution 3? Add multiple rows, one for each value Data about a book must be repeated for as many times as there are authors of a book (also creates
redundancy which lead to keying errors and unnecessarily wasting storage space with large files) count of total # of books or # from each publisher would be wrong
Update Anomalies
Update Anomalies To update an agent’s telephone number, each instance must be
changed if we miss an item or enter it incorrectly we create an unreliable
table
sometimes previous errors propagate errors further
An update anomaly occurs when multiple record changes for a single attribute are necessary.
Deletion Anomalies
Deletion anomalies What happens if a customer record is deleted? What happens if an agent record is deleted?
A deletion anomaly occurs when the removal of a record results in the unintended loss of important information.
Insertion Anomalies Insertion anomalies
What happens if we want to enter information regarding an agent for whom we do not have a customer?
Do we add null values (blanks) for the other fields?
An insertion anomaly occurs when there is not a reasonable place to assign attributes and attribute values to records.
The Problem with Nulls
Category Total Occurences
0
Accessories 2
Bikes 1
Components 1
1. Nulls used in mathematical expressions- unknown quantity leads to unknown total value- misleading value of all inventory
Product ID Product Description Category Price Quantity Total Value
801 Shur-Lock U-Lock Accessories 75.00
802 SpeedRite Cyclecomputer 60.00 20 1,200.00
803 SteelHead Microshell HelmetAccessories 40.00 40 1,600.00
804 SureStop 133-MB Brakes Components 25.00 10 250.00
805 Diablo ATM Mountain Bike Bikes 1,200.00
806 Ultravision Helmet Mount Mirrors 7.45 10 74.50
Total: 3,124.50
2. Nulls used in aggregate functions- blanks exist under category- cannot be counted because they don’t exist!
Database Design Problems
Use of the relational database model removes some database anomalies
Further removal of database anomalies relies on a structured technique called normalization
Presence of some of these anomalies is sometimes justified in order to enhance performance
Thus, database design consists of balancing the art of design with the science of design
Normalization Goal in database design to create well-structured tables Transform E-R models to tables following the rules provided Assuring tables are well-structured with minimal problems
(redundancy, multi-valued attributes, update anomalies, insertion anomalies, deletion anomalies) is achieved using structured technique called normalization
Normalization is the structured decomposition of one table into two or more tables using a procedure designed to determine the most appropriate split
Normalization our method of making sure the E-R design was correct in the first place
Normalization refers to a series of forms: we will cover 1NF to 3NF, which is usually sufficient. Note that there are also: 4NF, Boyce-Codd Normal Form (BCNF), Fifth Normal Form (5NF) and Domain-Key Normal Form (DKNF)
First Normal Form
A table is in first normal form if it meets the following criteria: The data are stored in a two-dimensional table with no two rows identical and there are no repeating groups. The following table in NOT in first normal form because it contains a multi-valued
attribute (an attribute with more than one value in each row).
Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones hiking, cooking3 Francine Moire golf, theatre, hiking2 Anne Abel concerts
Handling multi-valued attributes: Incorrect Solutions
Member_IDMemb_FName Memb_LName Hobby1 Hobby2 Hobby31 Rodney Jones hiking cooking3 Francine Moire golf theatre hiking2 Anne Abel concerts
Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones fishing1 Rodney Jones cooking3 Francine Moire golf3 Francine Moire theatre3 Francine Moire hiking2 Anne Abel concerts
Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones hiking, cooking3 Francine Moire golf, theatre, hiking2 Anne Abel concerts
Handling multi-valued attributes: Correct Solution
Member_IDMemb_FName Memb_LName1 Rodney Jones3 Francine Moire2 Anne Abel
Member_ID Hobby1 hiking1 cooking3 golf3 theatre3 hiking2 concerts
Create another entity (table) to handle multiple instances of the repeating group. This second table is then linked to the original table with an identifier (i.e., foreign key). This solution has the following advantages:
no limit to the number of hobbies per member no waste of disk space searching becomes much easier within a column (e.g., who likes hiking?)
Member_IDMemb_FName Memb_LName Hobbies1 Rodney Jones hiking, cooking3 Francine Moire golf, theatre, hiking2 Anne Abel concerts
Handling Repeating Groups An attribute can have a group of several data entries. Repeating groups can be
removed by creating another table which holds those attributes that repeat. This second table (validation table) is then linked to the original table with an identifier (i.e., foreign key)
Advantages: fewer characters tables; reduces miskeying, update anomalies
Product_ID Product_Name Category Price
801 Shur-Lock U-Lock Accessory 75.00
802 SpeedRite Cyclecomputer Component 60.00
803 SteelHead Microshell Helmet Accessory 40.00
804 SureStop 133-MB Brakes Component 25.00
805 Diablo ATM Mountain Bike Bike 1,200.00
806 Ultravision Helmet Mount MirrorsAccessory 7.45
Category_ID Category1 Accessory2 Component3 Bike
Product_ID Product_Name Category Price801 Shur-Lock U-Lock 1 75.00802 SpeedRite Cyclecomputer 2 60.00803 SteelHead Microshell Helmet 1 40.00804 SureStop 133-MB Brakes 2 25.00805 Diablo ATM Mountain Bike 3 1200.00806 Ultravision Helmet Mount Mirrors 1 7.45
Second Normal Form A table is in second normal form if it meets the following criteria: The relation
is in first normal form, and, all nonkey attributes are functionally dependent on the entire primary key.
Applies only to tables that have a composite primary key. In the following table, both the EmpID and Training (composite primary key) determine Date,
whereas, only EmpID (part of the primary key) determines Dept.
EmpID Training Date Dept1 Word 12-Sep-99 Oncology3 Excel 14-Oct-99 Paediatrics2 Excel 14-Oct-99 Renal1 Access 23-Nov-99 Oncology
Removing Partial Dependencies Remove partial dependencies by separating the relation into two relations. Reduces the problems of
update anomalies delete anomalies insert anomalies redundancies
EmpID Training Date1 Word 12-Sep-993 Excel 14-Oct-992 Excel 14-Oct-991 Access 23-Nov-99
EmpID Dept1 Oncology2 Renal3 Paediatrics
EmpID Training Date Dept1 Word 12-Sep-99 Oncology3 Excel 14-Oct-99 Paediatrics2 Excel 14-Oct-99 Renal1 Access 23-Nov-99 Oncology
Third Normal Form A table is in third normal form if it meets the following criteria: The relation is in
second normal form, and, a nonkey field is not functionally dependent on another nonkey field.
The following table is in second normal form but NOT in third normal form because Member_Id (the primary key) does not determine every attribute (does not determine RegistrationFee). RegistrationFee is determined by Sport.
Member_ID Memb_FName Memb_LName Sport RegistrationFee1 Rodney Jones Swimming $1003 Francine Moire Tennis $2002 Anne Abel Tennis $2004 Goro Azuma Skiing $150
Member ID FName, LName, Lesson; Lesson Cost
Removing non-key Transitive Dependencies
Remove transitive dependencies by placing attributes involved in a new relational table. Reduces the problems of:
update anomalies delete anomalies insert anomalies redundancies
MemberID MembFName MembLName Sport1 Rodney Jones 13 Francine Moire 22 Anne Abel 24 Goro Azuma 1
SportID Sport RegFee1 Swimming $1002 Tennis $2003 Skiing $150
MemberID MembFName MembLName Sport RegFee1 Rodney Jones Swimming $1003 Francine Moire Tennis $2002 Anne Abel Tennis $2004 Goro Azuma Skiing $150
Normalization Example: Video StoreA video rental shop tracks all of their information in one table. There are now 20,000 records in it. Is it possible to achieve a more efficient design? (They charge $10/movie/day.)Cust_Name Cust_address Cust_Phone Rental_date
Rodney Jones 23 Richmond St. 681-9854 15-Oct-99Francine Moire 750-12 Kipps Lane 672-9999 4-Nov-99Anne Abel 5 Sarnia Road 432-1120 3-Sep-99Rodney Jones 23 Richmond St. 681-9854 22-Sep-99
Video_1 Video_2 Video_3 VideoType_1 VideoType_2 VideoType3Gone with the WindBraveheart Mississippi BurningClassic Adventure AdventureManhatten ComedyManhatten The African Queen Comedy ClassicNever Say Never AgainSilence of the Lambs Adventure Horror
Return_date TotalPrice Paid?17-Oct-99 60.00$ yes
4-Sep-99 20.00$ yes26-Sep-99 80.00$ yes
VIDEO (Cust_name, Cust_address, Cust_phone, Rental_date, Video_1, Video_2, Video_3, VideoType_1, VideoType_2, VideoType3, Return_date, Total_Price, Paid?)
Is the Video store in 1NF?No attributes should form repeating groups - remove them by creating another table. There are repeating groups for videos and customers.
Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120
VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi Burning Adventure6 The African Queen Classic7 Silence of the Lambs Horror
CUSTOMER (Cust_Num, Cust_Name, Cust_address_Cust_phone
VIDEO (VideoNum, VideoName, VideoType
RENTAL (Cust_num, VideoNum, Rental_date, Return_date, TotalPrice, Paid?)Cust_Num VideoNum Rental_date Return_date TotalPrice Paid?
1 1,4,5 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 2,6 3-Sep-99 4-Sep-99 20.00$ yes1 3,7 22-Sep-99 26-Sep-99 80.00$ yes
Video Store: 1NF (cont’d)
Have not yet removed all repeating groups - video is a multi-valued attribute - move to another table.
RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes
RentalNum VideoNum1 11 41 52 23 23 64 34 7
Cust_Num VideoNum Rental_date Return_date TotalPrice Paid?1 1,4,5 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 2,6 3-Sep-99 4-Sep-99 20.00$ yes1 3,7 22-Sep-99 26-Sep-99 80.00$ yes
RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)
RENTALDETAILS (RentalNum, VideoNum)
The Video Store is now in 1NFCust_Num Cust_Name Cust_address Cust_Phone
1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120
VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi Burning Adventure6 The African Queen Classic7 Silence of the Lambs Horror
CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone
VIDEO (VideoNum, VideoName, VideoType
RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes
RentalNum VideoNum1 11 41 52 23 23 64 34 7
RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)
RENTALDETAILS (RentalNum, VideoNum)
Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120
VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi Burning Adventure6 The African Queen Classic7 Silence of the Lambs Horror
CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone
VIDEO (VideoNum, VideoName, VideoType
RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes
RentalNum VideoNum1 11 41 52 23 23 64 34 7
RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)
RENTALDETAILS (RentalNum, VideoNum)
Is the Video Store in 2NF?The only table that has a composite primary key has no other fields, therefore, yes.
RentalNum Cust_Num Rental_date Return_date TotalPrice Paid?1 1 15-Oct-99 17-Oct-99 60.00$ yes2 2 4-Nov-993 3 3-Sep-99 4-Sep-99 20.00$ yes4 1 22-Sep-99 26-Sep-99 80.00$ yes
Is the Video Store in 3NF?Does each attribute in each table depend upon the primary key?
Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120
VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi BurningAdventure6 The African Queen Classic7 Silence of the LambsHorror
RentalNum VideoNum1 11 41 52 23 23 64 34 7
Cust_Num Cust_Name Cust_address Cust_Phone1 Rodney Jones 23 Richmond St.681-98542 Francine Moire 750-12 Kipps Lane672-99993 Anne Abel 5 Sarnia Road 432-1120
VideoNum VideoName VideoType1 Gone with the Wind Classic2 Manhatten Comedy3 Never Say Never AgainAdventure4 Braveheart Adventure5 Mississippi BurningAdventure6 The African Queen Classic7 Silence of the LambsHorror
CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone
VIDEO (VideoNum, VideoName, VideoType
RentalNum Cust_Num Rental_date1 1 15-Oct-992 2 4-Nov-993 3 3-Sep-994 1 22-Sep-99
RentalNum VideoNum ReturnDate Amt_Paid1 1 16-Oct-99 $101 4 17-Oct-99 $201 5 16-Oct-99 $102 2 5-Nov-99 $103 2 4-Sep-99 03 6 6-Sep-99 04 3 24-Sep-99 $54 7 16-Sep-99 0
RENTAL (RentalNum, Cust_Num, Rental_date)
RENTALDETAILS (RentalNum, VideoNum, ReturnDate, Amt_Paid)
The Video Store is now in 3NFYes, because in each table, every attribute depends on the primary key and not on any other key.
Conflicting Goals of Design
Database design must reconcile the following requirements:
Design elegance requires that the design must adhere to design rules concerning nulls, derived attributes, redundancies, relationship types, etc.
Information requirements are dictated by the end users Operational (transaction) speed requirements are also dictated by
the end users
Clearly, an elegant database design that fails to address end user information requirements or one that forms the basis for an implementation whose use progresses at a snail's pace has little practical use.