1/22/20091 study the methods of first, second, third, boyce-codd, fourth and fifth normal form for...

41
1/22/2009 1 Study the methods of f irst, second, third, B oyce-Codd, fourth and fifth normal form for relational database de sign, in order to elim inate data redundancy and update abnormality. Lecture 3 on Data Normalization

Upload: dontae-head

Post on 14-Dec-2015

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 1

Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data redundancy and update abnormality.

Lecture 3 on Data Normalization

Page 2: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 2

Normalization Theory

Refine database design to eliminate abnormalities (irregularities) of manipulating database

Page 3: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 3

1NF, 2NF and 3NF

• Built around the concept of normal forms– Normal form: Contains atomic values only– All normalized relations are in 1NF– 2NF is the subset of 1NF, 3NF is the subset of

2NF and so on…– 3NF is more desirable than 2NF, 2NF is more

desirable than 1NF

Page 4: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 4

BCNF, 4NF and 5NF(PJNF)

• Boyce-Codd Normal Form– A stronger form of 3NF– Every BCNF is also 3NF, but some 3NF are n

ot BCNF

• 4NF and 5NF– Defined recently– Deal with multi-valued dependency (MVD) an

d join dependency (JD)

Page 5: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 5

Relationship between Normal Forms

Universe of relations

1NF relations

2NF relations

3NF relations

BCNF relations

4NF relations5NF/PJNF relations

Page 6: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 6

First Normal Form

• A relation is in 1NF if each attribute contains only one value (not a set of values)

• The primary key (PK) can not be null

Page 7: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 7

First Normal Form

S# S-name Enrollments

S1 Brown C1 Math

C2 Chem

C3 Phys

S2 Smith C2 Chem

C3 Phys

C4 Math

S3 Brown C2 Chem

C3 Phys

Is this relation in 1NF?

Relation STUDENT-A

Page 8: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 8

First Normal Form

S# S-name Enrollments

S1 Brown C1 Math

C2 Chem

C3 Phys

S2 Smith C2 Chem

C3 Phys

C4 Math

S3 Brown C2 Chem

C3 Phys

• NO!!!• Elements in the

domain Enrollments are not atomic

• Could be split into two domains: C# and C-Name

Relation STUDENT-B

Page 9: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 9

First Normal Form

• Enrollments is split into C# and C-Name

• Use S# and C# as a compound PK

• A student may attend several courses and a course may have several students

• So S# and C# has a m:n mapping

S# S-Name C# C-Name

S1 Brown C1 Math

S1 Brown C2 Chem

S1 Brown C3 Phys

S2 Smith C2 Chem

S2 Smith C3 Phys

S2 Smith C4 Math

S3 Brown C2 Chem

S3 Brown C3 Phys

Relation STUDENT-B

Page 10: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 10

Functional Dependency (FD)

• Attribute Y of relation R is functionally dependent on attribute X of R each value of X is associated with exactly one value of Y

• Denoted by X Y• In the relation STUDENT-B:

– S# S-Name– C# C-Name– S#, C# 0

Page 11: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 11

Anomalies using 1NF

• 1NF relations require less complicated application to operate as opposed to unnormalized relations

• Anomalies in insert:– Since PK is composed of C# and S#, both det

ails of student and course must be known before inserting a entry

– Eg: to add a course, at least one student is enrolled

Page 12: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 12

Anomalies using 1NF

• Anomalies in delete:– If all students attending a particular course are

deleted, the course will not be found in the database

• Anomalies in update:– Redundancy of S-Name and C-Name– Increase storage space and effort to modify data item– If a course is modified, all tuples containing that

course must be updated

Page 13: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 13

Second Normal Form

• A relation is in 2NF if it is in 1NF and every non-PK attribute is fully functionally dependant on the PK

• In the relation STUDENT-B– PK: C#, S#– Non-PK attribute: C-Name, S-Name– C#, S# S-Name– S# S-Name– Since S-Name is only partially dependent on the PK,

relation Student-B is not in 2NF

Page 14: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 14

Second Normal Form

• All of them are in 2NF as none of them has partial dependency

• Original information can be reconstructed by natural join operation

S# S-Name

S1 Brown

S2 Smith

S3 Brown

C# C-Name

C1 Math

C2 Chem

C3 Phys

C4 Math

S# C#

S1 C1

S1 C2

S1 C3

S2 C2

S2 C3

S2 C4

S3 C2

S3 C3

Relation STUDENT

Relation COURSE

Relation SC

Page 15: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 15

Anomalies in 2NF

• Suppose we have the relations PRODUCT, MACHINE and EMPLOYEE

• P# M#• P# E#• M# E#• The tuple (P1, M1, E1) means product P1

is manufactured on machine M1 which is operated by employee E1

Page 16: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 16

Anomalies in 2NF

• Anomalies in insert:– It is not possible to store the fact that which

machine is operated by which employee without knowing at least one product produced by this machine

• Anomalies in delete:– If an employee is fired the fact that which

machine he operated and what product that machine produced are also lost

Page 17: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 17

Anomalies in 2NF

• Anomalies in update:– If one employee is assigned to operate

another machine then several tuples have to be updated as well

Page 18: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 18

Third Normal Form

• A relation is in 3NF if it is in 2NF and no non-PK attributes is transitively dependent on the PK

• In the manufacture relations:– P# M# and M# E# implies P# E#– So P# E# is a transitive dependency

Page 19: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 19

Third Normal Form

P# M# E#

P1 M1 E1

P2 M2 E3

P3 M1 E1

P4 M1 E1

P5 M3 E2

P6 M4 E1

P# M#

P1 M1

P2 M2

P3 M1

P4 M1

P5 M3

P6 M4

M# E#

M1 E1

M2 E3

M3 E2

M4 E1

MANUFACTURE

R1

R2

• No loss of information

• Insert, delete and update anomalies are eliminated

Page 20: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 20

Boyce/Codd Normal Form

• A relation is BCNF every determinant is a candidate key

• A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent

Page 21: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 21

Boyce/Codd Normal Form

• There exists a relation SJT with attributes S (student), J (subject) and T (teacher). The meaning of SJT tuple is that the specified student is taught the specified subject by the specified teacher.

S J T

Smith Math Prof. White

Smith Physics Prof. Green

Jones Math Prof. White

Jones Physics Prof. Brown

Relation SJT

1. For each subject (J), each student (S) of that subject taught by only one teacher (T): FD: S, J T

2. Each teacher (T) teaches only one subject (J): FD: T J

3. Each subject (J) is taught by several teacher: MVD: J TT

Page 22: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 22

Boyce/Codd Normal Form

• There are two determinants: (S, J) and T in functional dependency

• Anomalies in update:– If the fact that Jones studies physics is

deleted, the fact that Professor Brown teaches physics is also lost. It is because T is a determinant but not a candidate key

Page 23: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 23

Boyce/Codd Normal Form

S J

Smith Math

Smith Physics

Jones Math

Jones Physics

T J

Prof. White Math

Prof. Green Physics

Prof. Brown Physics

Relation ST

Relation TJ

Relations (S, J) and (T, J) are in BCNF because all determinants are candidate keys.

Page 24: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 24

Multi-valued Dependency

• Given a relation R with attributes A, B and C. The multi-valued dependence R.A R.B holds the set of B-values matching a given (A-value, C-value) pair in R depends only on the A-value and is independent of the C-value

Page 25: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 25

Fourth Normal Form

• A relation is in 4NF whenever there exists an multi-valued dependence (MVD), say A B, then all attributes are also functionally dependent on A, i.e. A X for all attribute X of the relation

Page 26: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 26

Fourth Normal Form

Course Teacher Text

Physics Prof. Green Basic Mechanics

Physics Prof. Green Principles of Optics

Physics Prof. Brown Basic Mechanics

Physics Prof. Brown Principles of Optics

Physics Prof. Black Basic Mechanics

Physics Prof. Black Principles of Optics

Math Prof. White Modern Algebra

Math Prof. White Projective GeometryRelation CTX (not in 4NF)

Page 27: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 27

Fourth Normal Form

• A tuple (C, T, X) appears in CTX course C can be taught by teacher T and uses X as a reference. For a given course, all possible combinations of teacher and text appear – that is, CTX satisfies the constraint: if tuples (C, T1, X1), (C, T2, X2) both appears, then tuples (C, T1, X2), (C, T2, X1) both appears also

Page 28: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 28

Fourth Normal Form

• CTX contains redundancy

• CTX is in BCNF as there are no other functional determinants

• But CTX is not in 4NF as it involves an MVD that is not an FD at all, let alone an FD in which the determinant is a candidate key

Page 29: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 29

Anomalies in insert

• For example, to add the information that the physics course uses a new text called Advanced Mechanism, it is necessary to create three new tuples, one for each of the three teachers.

Page 30: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 30

Fourth Normal Form

Course Teacher

Physics Prof. Green

Physics Prof. Brown

Physics Prof. Black

Math Prof. White

Course Text

Physics Basic Mechanics

Physics Principles of Optics

Math Modern Algebra

Math Projective Geometry

Relation CT Relation CX

• 4NF is an improvement over BCNF, in that it eliminates another form of undesirable structure

Page 31: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 31

Fifth Normal Form

• Join dependency: relation R satisfies the JD (X, Y,…Z) it is the join of its projections on X, Y,…Z where X, Y,…Z are subsets of the set of attributes of R

• A relation is in 5NF/PJNF (Projection-join normal form) every join dependency in R is implied by the candidate keys of R

• 5NF is the ultimate normal form with respect to projection and join

Page 32: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 32

Fifth Normal Form

S# P# J#

S1 P1 J2

S1 P2 J1

S2 P1 J1

S1 P1 J1

S# P#

S1 P1

S1 P2

S2 P1

J# S#

J2 S1

J1 S1

J1 S2

P# J#

P1 J2

P2 J1

P1 J1

S# P# J#

S1 P1 J2

S1 P1 J1

S1 P2 J1

S2 P1 J2

S2 P1 J1

Join over P#

Spurious

Join over (J#, S#)

•SPJ is the join of all of its three projections,not of any two!

Relation SPJJS PJ

SP

Page 33: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 33

Join Dependence constraint

Condition: JD(join dependence) in relation R(S#, P#, J#)

Constraint: if R1(S#, P#), R2(P#, J#) and R3(J#, S#) exists

then R(S#, P#, J#) exists

Page 34: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 34

Connection Trap

Condition: Without JD(join dependence) in relation (S#, P#, J#)

Connect trap: if R1(S#, P#), R2(P#, J#) and R3(J#, S#) exists

then R(S#, P#, J#) may not exist and R1, R2 and R3 may not be able to be connected

Page 35: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 35

Abnomalies in insert with JD

If insert (S1, P1, J2), (S1, P2, J1), and

(S2, P1, J1)

Then (S1, P1, J1) must also be inserted

On the other hand, if one of (S1, P1, J2), (S1, P2, J1) and (S2, P1, J1) is deleted, then (S1, P1, J1) must also be deleted.

Page 36: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 36

Fifth Normal Form (5NF)

S# P#

S1 P1

S1 P2

S2 P1

J# S#

J2 S1

J1 S1

J1 S2

P# J#

P1 J2

P2 J1

P1 J1

JS PJ SP

Page 37: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 37

Steps in normalization

1. Decompose all data structures that are not 2D into 2D relations of segments

2. Eliminate any partial dependency

3. Eliminate any transitive dependency

4. Eliminate any remaining FD in which determinant is not a candidate key

5. Eliminate any MVD6. Eliminate any JD that are

implied by candidate keys

Unnormalized form

1NF

2NF

3NF

BCNF

4NF

5NF/PJNF

Page 38: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 38

Lecture Summary

The 1NF, 2NF, 3NF, BCNF, 4NF and 5NF are to split the unnormalized table into normalized table(s), and which can eliminate data redundancy and update abnormality. The higher norm form implies the lower norm form.

Page 39: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 39

Review Question

Explain the differences between Third Normal Form and Boyce Codd Normal Form with respect to functional dependencies.

Why Boyce Codd is called “Strong” third normal form?

How can one normalize relations of Third Normal Form into Boyce Codd Normal Form?

Page 40: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/22/2009 40

Tutorial Question Describe and derive the unnormal, first, second and third normal form

for the following unnormal form including 12 data fields with 4 of them are in repeating groups in a table. Identify the functional dependencies of each normal form.

Class number: ___________________ Class name: ___________________

Location: ___________________Begin date ___________________

End date ___________________ Instructor name ___________________

Instructor address ___________________ Instructor phone no ___________________

Student Number Student name Student address Grade

……………….. ……………. ……………….. ………….

……………….. ……………. ……………….. ………….

Where “…….” are repeating group of data in the record

Page 41: 1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data

1/27/2009 41

Reading Assignment

Chapter 10 Functional Dependencies and Normalization for Relational Databases and Chapter 11 Relational Database Design Algorithms and Further Dependencies of “Fundamentals of Database Systems” fifth edition, by Elmasri & Navathe, Pearson, 2007.