normalization information systems ii ioan despi. informal approach building a database structure : a...

NormalizationNormalization

Information Systems IIIoan Despi

Informal approach

Building a database structure :

•A process of examining the data which is useful & necessary for an application

•Then breaking it down into a relative simple row and column format

There are two points to understand about tables and columns that are the essence of any database:

1. Tables store data about an entity

An entity may be a person, a part in a machine, a book, or

any other tangible or intangible object, but the primary

consideration is that a table only contain data about one

thing

2. Columns contain the attributes of an entity

Just as a table contains data about a single entity, each

column should only contain one item of data about that

entity

Personal tricks:

1. If (for example) you’re creating a table of addresses, there is no point in having a single column contain the city, state and postal code when it is just as easy to create three columns and record each attribute separately.

2. I use a plural form of a noun for table names (Authors, Books, Orders,aso) and a noun or a noun and adjective for column names (FirstName, City)

3. If I’m coming up with names that require the use of the word “and” or the use of two nouns, it’s an indication I haven’t gone enough in breaking down data

An ugly table

StudentName

AdvisorName

CourseID1

CourseDescription1

CourseInstructorName1

Al Gore BillClinton

VB1 Intro toVisual Basic

Bruce Lee

DanQuayle

GeorgeBush

DAO1 Intro to DAOProgramming

Joe Killy

GeorgeBush

RonaldRagan

API1 APIProgramming

Dan Ciuhan

WalterMondale

JimmyCarter

VB1 Intro toVisual Basic

Bruce Lee

Problems with this structure:

1. Repeating Groups

The CourseID, Description and Instructor are repeated for each class.

If a student need a second or a third class, you need to go back and modify the table design in order to record it.

Additionally, adding all those fields when most students would never use them is a waste of storage

2. Delete anomalies

If you no longer wish to track Joe Killy’s Intro to DAO class, you would need to delete a student, an adviser and an instructor in order to do it.

3. Insert anomalies

Perhaps the department head wishes to add a new class, “Intro to C++”, but hasn’t yet set up a schedule or even an instructor. What would you enter for the student, advisor and instructor names?

4. Inconsistent data

If after entering these rows you’ll discover that Bruce Lee’s course

is actually “Intro to Advanced Visual Basic”, you would need to

examine all the rows and change each individually, in order to

reflect this change.

This introduces the potential for errors if one the changes is

omitted or done incorrectly.

As you can see, this single simple flat table introduced a number of problems- all of which can be solved by normalizing the table design

Normalization =the process of taking a wide table with lots of columns but few rows and redesigning it as several narrow tables with fewer columns but more rows.

A properly normalized design allows:

1. To use storage space efficiently

2. To eliminate redundant data

3. To reduce or eliminate inconsistent data

4. To ease the data maintenance burden

The rule:

you must be able to reconstruct the original flat view of the data

Relational db theorists have divided normalization into several rules, called normal forms :

First normal form ( 1NF ) :

No repeating groups

Second normal form ( 2NF ) : 1NF +

No nonkey attributes depend on a

portion of the primary key

Third normal form (3NF ) : 2NF +

No attributes depend on other

nonkey attributes

1NF

A repeating group :

StudentNameAdvisorName

CourseID1CourseDescription1






Columns for course information have been duplicated to allow the student to take 3 courses.

The problem occurs when the student wants to take 4 courses or more.

The proper solution is to remove the repeating group of columns to another table

Ugly(StudentName, AdvisorName, CourseID1, CourseDescription1, CourseInstructorName1)

Students (StudentID, StudentName, AdvisorName)

StudentCourses (SCStudentID, SCCourseID,

SCCourseDescription, SCCourseInstructorName)

The primary keys are shown in italics. The new field SCStudentID is a foreign key to the Students table.

We’ve divided the table so that the student can now take as many courses he wants by removing the course information from the original table and creating two tables: one for the student information and one for the course list . The repeating group of columns in the original table is gone. We can still reconstruct the original table using StudentID and SCStudentID columns from the new two tables.

2NF :

No nonkey attributes depend on a portion of the primary key

2NF really only apply to tables where the primary key is defined by two or more columns.

The essence is that if there are columns which can be identified by only part of the primary key, they need to be in their own table.

StudentCourses (SCStudentID, SCCourseID, SCCourseDescription, SCCourseInstructorName)

The primary key is the combination: SCStudentID, SCCourseID

The columns SCCourseDescription, SCCourseInstructorName are only dependent on the SCCourseID column.

In other words, the description and instructor’s name will be the same regardless of the student.

To solve the problem, we split the table StudentCourses, obtaining three tables from the original one:

Students (StudentID, StudentName, AdvisorName)

StudentCourses (SCStudentID, SCCourseID)

Courses (CourseID, CourseDescription, CourseInstructorName)

What we’ve done is to remove the details of the course information to their own table Courses.

The relationship between students and courses revealed to be a

many -to many relationship:

each student can take many courses and each course can have many students

The StudentCourses table now contains only the two foreign keys to Students and Courses. It is also called a intersection entity.

Let us add a little more detail to the sample tables to make them look something more like the real world

Students

StudentID

StudentName

StudentPhone

StudentAddress

StudentCity

StudentState

StudentZIP

AdvisorName

AdvisorPhone

StudentCourses

SCStudentID

SCCourseID

Courses

CourseID

CourseDescription

CourseInstructorName

CourseInstructorPhone

3NF:

No attributes depend on other nonkey attributes

All the columns in the table containd data about the entity that is defined by the primary key.

The columns in the table must contain data about only one thing.

This is really a extension of 2NF : both are used to remove columns that belong in their own table.

To complete the normalization we need to look for columns that are not dependent on the primary key of the table.

Students table:

the advisor information is not dependent on the student:

if the student leaves the school, the advisor’s name &

phone number will remain the same

Courses table:

the same logic applies to the instructor information:

the data for the instructor is not dependent on the primarykey CourseID since the instructor will be unaffected ifthe course is dropped from the curriculum

Students

StudentID

StudentName

StudentPhone

StudentAddress

StudentCity

StudentState

StudentZIP

StudentAdvisorID

StudentCourses

SCStudentID

SCCourseID

Courses

CourseID

CourseDescription

CourseInstructorIDAdvisors

AdvisorID

AdvisorName

AdvisorPhoneInstructors

InstructorID

InstructorName

InstructorPhone

normalization information systems ii ioan despi. informal approach building a database structure : a...

Documents