database design & normalization. why? why ? why ? why? why ? why ? why? why we need to talk...

29
Database Design & Database Design & normalization normalization

Upload: gabriel-brooks

Post on 29-Dec-2015

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Database Design & Database Design & normalizationnormalization

Page 2: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Why?Why?

Why ? Why ? Why?Why ? Why ? Why?Why we need to talk about database Why we need to talk about database

design?design?

Page 3: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

LetLet’’s start with an example.s start with an example.

Say you need a sales report something like this:Say you need a sales report something like this: Customer Catalog Unit Qty Actual ExtendedNo. Name Address No. Description Price Date Sold Price Price

131 Jo Blo 13 May St 3A21 T-Shirt 12.49 03/01/98 45 10.00 450.00179 Yo Yo 271 OK Ave 1B77 Sweats 15.00 01/03/98 12 15.00 180.00212 Mu Mu 32 Saddle Rd 4X21 Pants 23.47 12/11/98 5 21.00 105.00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 4: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

is to build a is to build a relational table relational table

that that mimics this report.mimics this report. That is, it has the That is, it has the same columns same columns as this report.as this report. But what would we call this class?But what would we call this class? The best name would probably be something like The best name would probably be something like

“Sales” “Sales” or or “Sales Analysis.”“Sales Analysis.”

But . . .But . . .

What the uninitiateduninitiated (read “amateuramateur”) database designer tends to do

Page 5: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

We have:We have:

Data that describes a Data that describes a CustomerCustomer (Cust No./Name/Address)(Cust No./Name/Address)

Data that describes a Data that describes a ProductProduct (Cat No/Description/Unit Price)(Cat No/Description/Unit Price)

And data that describes a And data that describes a SaleSale (Date/Quantity/Actual (Date/Quantity/Actual and Extended Prices)and Extended Prices)

Compare this situation with all the earlier models we have looked at,Compare this situation with all the earlier models we have looked at,

YouYou’’ll see that ll see that CustomerCustomer, , ProductProduct and and SaleSale should each be a should each be a

separate class . . .separate class . . .

The problem is that we have three kinds three kinds of data in this report.

Page 6: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

The maintenance horror of the The maintenance horror of the poorly designed databasepoorly designed database

A customer can continuously buy several A customer can continuously buy several kinds of product. kinds of product.

What if he change his name?What if he change his name?What if the price of a product is increased What if the price of a product is increased

or decreased?or decreased?What if a customer change its address?What if a customer change its address?

Page 7: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

What is the problem of the What is the problem of the amateur’s database design? amateur’s database design?

This structure This structure does notdoes not allows our database to allows our database to answeranswer

any queryany query

that could possibly be dreamed up that could possibly be dreamed up against that data.against that data.

Some query can be done but very Some query can be done but very inefficient inefficient

Page 8: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

The The ““Un-normalizedUn-normalized”” structure that mimicked structure that mimicked the report will have problems ,the report will have problems ,

down the line a few months or years,down the line a few months or years,Attempting to answer queries Attempting to answer queries

that the database designer did not foresee -that the database designer did not foresee -What I refer to as:What I refer to as:

““That most dreaded of all database phenomena, That most dreaded of all database phenomena,

Unanticipated QueriesUnanticipated Queries””

Page 9: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

NormalizationNormalization

What Normalization is forWhat Normalization is for

is to make sure is to make sure

that each database table carries that each database table carries only the attributes only the attributes

that that actually describe actually describe What is needed.What is needed.

Page 10: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

NormalizationNormalization

Definition: Normalization is the process of Definition: Normalization is the process of structuring relational database schema such that structuring relational database schema such that most ambiguity is removed. The stages of most ambiguity is removed. The stages of normalization are referred to as normal forms normalization are referred to as normal forms and progress from the least restrictive (First and progress from the least restrictive (First Normal Form) through the most restrictive (Fifth Normal Form) through the most restrictive (Fifth Normal Form). Generally, most database Normal Form). Generally, most database designers do not attempt to implement anything designers do not attempt to implement anything higher than Third Normal Form or Boyce-Codd higher than Third Normal Form or Boyce-Codd Normal Form. Normal Form.

Page 11: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

A simpler explanation to A simpler explanation to normalizationnormalization

There are two goals of the normalization process:There are two goals of the normalization process: eliminate redundant dataeliminate redundant data (for example, storing (for example, storing

the same data in more than one table) and the same data in more than one table) and ensure data dependenciesensure data dependencies make sense (only make sense (only

storing related data in a table). Both of these are storing related data in a table). Both of these are worthy goals as they reduce the amount of worthy goals as they reduce the amount of space a database consumes and ensure that space a database consumes and ensure that data is logically stored. data is logically stored.

Page 12: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Normal formsNormal forms The database community has developed a The database community has developed a

series of guidelines for ensuring that databases series of guidelines for ensuring that databases are normalized. These are referred to as normal are normalized. These are referred to as normal forms and are numbered from one (the lowest forms and are numbered from one (the lowest form of normalization, referred to as first normal form of normalization, referred to as first normal form or form or 1NF1NF) through five (fifth normal form or ) through five (fifth normal form or 5NF5NF).).

In practical applications, you'll often see 1NF, In practical applications, you'll often see 1NF, 2NF2NF, and , and 3NF3NF along with the occasional 4NF. along with the occasional 4NF. Fifth normal form is very rarely seen and won't Fifth normal form is very rarely seen and won't be discussed in this article. be discussed in this article.

Page 13: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Normal form hierarchy Normal form hierarchy First normal form (1NF)First normal form (1NF) sets the very basic rules for an organized database: sets the very basic rules for an organized database:

Eliminate duplicative columns from the same table. Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with Create separate tables for each group of related data and identify each row with

a unique column or set of columns (the primary key). a unique column or set of columns (the primary key). Second normal form (2NF)Second normal form (2NF) further addresses the concept of removing further addresses the concept of removing

duplicative data: duplicative data: Meet all the requirements of the first normal form. Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in Remove subsets of data that apply to multiple rows of a table and place them in

separate tables. separate tables. Create relationships between these new tables and their predecessors through Create relationships between these new tables and their predecessors through

the use of foreign keys. the use of foreign keys. Third normal form (3NF)Third normal form (3NF) goes one large step further: goes one large step further:

Meet all the requirements of the second normal form. Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key. Remove columns that are not dependent upon the primary key.

Finally, fourth normal form (4NF)Finally, fourth normal form (4NF) has one additional requirement: has one additional requirement: Meet all the requirements of the third normal form. Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies. A relation is in 4NF if it has no multi-valued dependencies.

Page 14: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

1ST NF1ST NF

Eliminate duplicative columns from the Eliminate duplicative columns from the same table. same table.

Create separate tables for each group of Create separate tables for each group of related data and identify each row with a related data and identify each row with a unique column or set of columns (the unique column or set of columns (the primary key). primary key).

Page 15: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

An classic exampleAn classic example

a table within a human resources a table within a human resources database that stores the manager-database that stores the manager-subordinate relationship.subordinate relationship.

For the purposes of our example, wel For the purposes of our example, wel impose the business rule that each impose the business rule that each managermanager may have one or more may have one or more subordinatessubordinates while each subordinate may while each subordinate may have only one manager. have only one manager.

Page 16: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

An intuitive tableAn intuitive table

Manager Subordinate1 Subordinate2 Subordinate3 Subordinate4

Bob Jim Mary Beth

Mary Mike Jason Carol Mark

Jim Alan

Page 17: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Why it is not even 1st NF?Why it is not even 1st NF?

recall the first rule imposed by 1NF: eliminate recall the first rule imposed by 1NF: eliminate duplicative columns from the same table.? duplicative columns from the same table.? Clearly, the Subordinate1-Subordinate4 columns Clearly, the Subordinate1-Subordinate4 columns are duplicative.are duplicative.

Jim only has one subordinate, the Subordinate2-Jim only has one subordinate, the Subordinate2-Subordinate4 columns are simply wasted Subordinate4 columns are simply wasted storage space storage space

Furthermore, Mary already has 4 subordinates ?Furthermore, Mary already has 4 subordinates ?what happens if she takes on another what happens if she takes on another employee? The whole table structure would employee? The whole table structure would require modification. require modification.

Page 18: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

A second bright ideaA second bright idea Let try something like this: Let try something like this:

Manager Subordinates Manager Subordinates Bob Jim, Mary, Beth Mary Mike, Jason, Carol, Bob Jim, Mary, Beth Mary Mike, Jason, Carol, Mark Jim Alan This solution is closer, but it also falls short of the markMark Jim Alan This solution is closer, but it also falls short of the mark

The subordinates column is still duplicative and non-atomic. What The subordinates column is still duplicative and non-atomic. What happens when we need to add or remove a subordinate?? We need to happens when we need to add or remove a subordinate?? We need to read and write the entire contents of the table.? That not a big deal in read and write the entire contents of the table.? That not a big deal in this situation, but what if one manager had one hundred employees??this situation, but what if one manager had one hundred employees??Also, it complicates the process of selecting data from the database in Also, it complicates the process of selecting data from the database in future queries. future queries.

Manager Subordinates

Bob Jim, Mary, Beth

Mary Mike, Jason, Carol, Mark

Jim Alan

Page 19: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Here is a table that satisfies the Here is a table that satisfies the first rule of 1NF: first rule of 1NF:

Manager Subordinate

Bob Jim

Bob Mary

Bob Beth

Mary Mike

Mary Jason

Mary Carol

Mary Mark

Jim Alan

Page 20: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Not finished yetNot finished yet Now, what about the second rule: identify each row with Now, what about the second rule: identify each row with

a unique column or set of columns (the primary key)a unique column or set of columns (the primary key) You might take a look at the table above and suggest the You might take a look at the table above and suggest the

use of the subordinate column as a primary key. In fact, use of the subordinate column as a primary key. In fact, the subordinate column is a good candidate for a primary the subordinate column is a good candidate for a primary key due to the fact that our business rules specified that key due to the fact that our business rules specified that each subordinate may have only one manager.each subordinate may have only one manager.

However, the data that we have chosen to store in our However, the data that we have chosen to store in our table makes this a less than ideal solution.? What table makes this a less than ideal solution.? What happens if we hire another employee named Jim? How happens if we hire another employee named Jim? How do we store his manager-subordinate relationship in the do we store his manager-subordinate relationship in the database?? database??

Page 21: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Finally, the 1st NFFinally, the 1st NF

From

Mike Chapple

,

Your

Guide to

Databases

.

FREE

Newsletter.

Sign Up Now!

Sponsored LinksBirdstep Technology, IncPrimary provider of the RDM line of in-memory database engines.www.birdstep.com

Btrieve and Pervasive SQLData Control and Data Manager for conversion, DDF, reporting and morewww.classicsoftware.com

Networking News & InfoCutting Edge Tech Content, Podcasts & More for IT Execs. Get Info Now!www.networkworld.com

MSDE ManagerGet a complete management tool for MSDE and SQL Serverwww.valesoftware.com

Access Sample Databases101 programming examples & samples of report, form design & query codewww.BlueClaw-DB.com

Manager Subordinate

182 143

182 201

182 123

201 156

201 041

201 187

201 196

143 202

It best to use a truly unique identifier (like an employee ID or SSN) as a primary key.? Our final table would look like this:

Page 22: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Towards to 2NFTowards to 2NF

Definition:Definition: In order to be in Second In order to be in Second Normal Form, a relation must first fulfill the Normal Form, a relation must first fulfill the requirements to be in First Normal Form. requirements to be in First Normal Form. Additionally, each nonkey attribute in the Additionally, each nonkey attribute in the relation must be functionally dependent relation must be functionally dependent upon the primary key. upon the primary key.

Page 23: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

An exampleAn example

Order # Customer Contact Person Total

1 Acme Widgets John Doe $134.23

2 ABC Corporation Fred Flintstone $521.24

3 Acme Widgets John Doe $1042.42

4 Acme Widgets John Doe $928.53

The relation is in First Normal Form, but not Second Normal Form:

Remove subsets of data that apply to multiple rows of a table and place Remove subsets of data that apply to multiple rows of a table and place them in separate tablesthem in separate tables

Page 24: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

Two tables to satisfy 2NFTwo tables to satisfy 2NF

Customer Contact Person

Acme Widgets John Doe

ABC Corporation Fred Flintstone

Order # Customer Total

1 Acme Widgets $134.23

2 ABC Corporation $521.24

3 Acme Widgets $1042.42

4 Acme Widgets $928.53

Page 25: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

commentscomments

The creation of two separate tables eliminates The creation of two separate tables eliminates the dependency problem experienced in the the dependency problem experienced in the previous case. previous case.

In the first table, contact person is dependent In the first table, contact person is dependent upon the primary key -- customer name.The upon the primary key -- customer name.The second table only includes the information second table only includes the information unique to each order.unique to each order.

Someone interested in the contact person for Someone interested in the contact person for each order could obtain this information by each order could obtain this information by performing a JOIN operation performing a JOIN operation

Page 26: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

3RD NF3RD NF

Definition:Definition: In order to be in Third Normal In order to be in Third Normal Form, a relation must first fulfill the Form, a relation must first fulfill the requirements to be in Second Normal requirements to be in Second Normal Form.?Additionally, all attributes that are Form.?Additionally, all attributes that are not dependent upon the primary key must not dependent upon the primary key must be eliminated be eliminated

Page 27: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

An exampleAn example

Company City State ZIP

Acme Widgets New York NY 10169

ABC Corporation Miami FL 33196

XYZ, Inc. Columbia MD 21046

In this example, the city and state are dependent upon the In this example, the city and state are dependent upon the ZIP code.?To place this table in 3NF, two separate tables ZIP code.?To place this table in 3NF, two separate tables would be created -- one containing the company name and would be created -- one containing the company name and ZIP code and the other containing city, state, ZIP code ZIP code and the other containing city, state, ZIP code pairings.pairings.

Page 28: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

To go or not to go higher?To go or not to go higher?

This may seem overly complex for daily This may seem overly complex for daily applications and indeed it may be. applications and indeed it may be. Database designers should always keep in Database designers should always keep in mind the tradeoffs between higher level mind the tradeoffs between higher level normal forms and the resource issues that normal forms and the resource issues that complexity creates. complexity creates.

Page 29: Database Design & normalization. Why? Why ? Why ? Why? Why ? Why ? Why? Why we need to talk about database design? Why we need to talk about database

An exerciseAn exercise(20(20 分分 ) ) 假設你負責分析一個系統,此系統的資料包含了下面許多欄位假設你負責分析一個系統,此系統的資料包含了下面許多欄位Please analyze a system which contains the following attributesPlease analyze a system which contains the following attributes

S#: S#: 零件供應商的編號零件供應商的編號 (Supplier no)(Supplier no)SNAME: SNAME: 零件供應商的姓名零件供應商的姓名 (supplier name)(supplier name)CITY1 CITY1 零件供應商的城市零件供應商的城市 (The city of a supplier) (The city of a supplier) P# P# 零件編號 零件編號 (part no.)(part no.)PNAMEPNAME 零件名稱 零件名稱 (part name)(part name)COLOR COLOR 零件色彩 零件色彩 (part color)(part color)WEIGHT WEIGHT 零件重量 零件重量 (part weight)(part weight)CITY2 CITY2 零件所儲存的城市 零件所儲存的城市 (city where the parts are stored)(city where the parts are stored)QTY QTY 零件的存量 零件的存量 (The quantity of the parts)(The quantity of the parts)

In your analysis, you found that a part can be supplied by several suppliers. In your analysis, you found that a part can be supplied by several suppliers. Please determine how many tables should be used and what is the content Please determine how many tables should be used and what is the content of each table.of each table.

在你分析的過程中,你發現一個零件可能有多個供應商可以供應。請簡單說明在你分析的過程中,你發現一個零件可能有多個供應商可以供應。請簡單說明這樣一個資料庫系統,你要用幾個表格,每個表格的屬性又為何?這樣一個資料庫系統,你要用幾個表格,每個表格的屬性又為何?