09.02 normalization example
DESCRIPTION
TRANSCRIPT
DBMS
Normalization Examples
Example 1
Album Tracks Artist ArtistCountry
Abby Road Here comes the sun, Octopus Garden, Something, etc.
Beatles UK
Blond on Blond
Rainy Day Woman, Sad eyed lady of the lowlands, Stuck in Memphis withthe mobile blues again
Bob Dylan US
This table could potentially fall prey to all three anomalies. If the ArtistCountry was required, it would be impossible to insert a new album if you did not know the country of the artist. If you deleted an album, you could accidently remove all data about a given artist. Updating tracks could be difficult and result in errors because of the way they are listed in the cell.
Converting a spreadsheet into a relational database is a common task for database developers. The task is not as straightforward as it might seem. Although you can often import data from a spread-sheet directly into a database management system, spreadsheets are almost never well designed for relational databases
First Normal Form
The First Normal Form [FN1] involves getting rid of repeating groups or arrays. Each attribute should contain only a single value of a single type. This means a couple of things. For one, all the values under an attribute should be about the same thing. An attribute called “Email,” for instance, should contain emails only, no phone or pager numbers. A second meaning is that each value stored under an attribute should be a single value, not an array or list of values. It would be wrong, for example, to store two or three emails for the same person separated by commas. An entity is in First Normal Form if
An entity is in First Normal Form if
Every attribute represents only one value.
There are no repeating groups or arrays. Each row is unique.
This Album table does not meet the criteria for First Normal Form. The main problem is in theTrack column. The columnTrackcontains a list of songs rather than a single value. This would make it very difficult to locate information about any single song. One solution that often occurs to novice database developers is to enumerate a list of columns such as Track1, Track2, Track3, and so on, to some arbitrary number of tracks. This also violates the First Normal Form by creating a repeating group. Say, for argument’s sake, you made 13 track columns. What happens to an album with fourteen tracks? What if an album has only one or two tracks? Also consider what you would need to do to find any individual track. You would need to query 13 separate columns
Table in First Normal Form
AlbumTitle Track Artist ArtistCountry
Abby Road Here comes the sun Beatles UK
Abby Road Octopus’s Garden Beatles UK
Abby Road Something Beatles UK
Blond on Blond
Rainy Day Woman Bob Dylan US
Blond on Blond
Sad Eyed Lady of the lowlands
Bob Dylan US
Blond on Blond
Stuck in Mobile with the Memphis blues again
Bob Dylan US
First Normal Form is not sufficient. Every column contains a single value, and there are no arrays or repeating groups, but there is a great deal of redundancy.
In the Album table, there are really at least two large subjects. One is the Album itself. The other is the Track.
The Artist information depends on the Track. (Think about an album with tracks by multiple artists.) To conform to the Second Normal Form, the two functional dependencies—big themes—must be broken into separate entities.
To relate the Album entity to the Track entity, it is necessary to create a primary key for the Album entity that can be used to create a key–foreign key relationship with the Track entity. It is also a good idea to give the Track entity a primary key. Here is what the tables look like now:
AlbumKey AlbumTitle
ABRD Abby Road
BLBLB Blond On Blond
TrackKey
Track AlbumKey
Artist ArtistCountry
HCTS Here comes the sun ABRD Beatles UK
OPGD Octopus’s Garden ABRD Beatles UK
SMTH Something ABRD Beatles UK
RDWM Rainy Day Woman BLBL Bob Dylan
US
SELL Sad Eyed Lady of the lowlands
BLBL Bob Dylan
US
SMMB Stuck in Mobile with the Memphis blues again
BLBL Bob Dylan
US
Third Normal Form
For an entity to be in Third Normal Form, it has to first be in Second Normal Form. Third Normal Form is about removing “transitive dependencies.” A transitive dependency describes an attribute that depends on another attribute—not the primary key—for its eaning. The idea is that every attribute should directly describe the entity itself. If you have a Customer entity, every attribute should describe the customer. There shouldn’t be any attributes that describe another attribute.
While transitive dependencies may seem trivial, they do add to redundancy and therefore open the possibilities for update and other anomalies.
There is a transitive dependency in the table. ArtistCountry doesn’t describe the track; it describes the Artist. The solution, as usual, is to break out a separate table. Artist should be its own entity.
AlbumKey AlbumTitle
ABRD Abby Road
BLBLB Blond On Blond
ArtistKey ArtistName
ArtistCountry
BTLS Beatles UK
BDLN Bob Dylan US
ArtistAlbum
Track-Table 3NF
TrackKey
Track Album Key
Artist Key
HCTS
Here comes the sun
ABRD BTLS
OPGD
Octopus’s Garden
ABRD BTLS
SMTH
Something ABRD BTLS
RDWM
Rainy Day Woman
BLBL BDLN
SELL Sad Eyed Lady of the lowlands
BLBL BDLN
SMMB
Stuck in Mobile with the Memphis blues again
BLBL BDLN