4 th normal form & lossless decomposition by: karen mcvay cs 157b

30
4 4 TH TH NORMAL FORM NORMAL FORM & & Lossless Decomposition Lossless Decomposition By: Karen McVay By: Karen McVay CS 157B CS 157B

Upload: david-barton

Post on 17-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

44THTH NORMAL FORM NORMAL FORM&&

Lossless Decomposition Lossless Decomposition

By: Karen McVayBy: Karen McVay

CS 157BCS 157B

Page 2: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

REVIEW OF NFsREVIEW OF NFs

1NF 1NF All values of the columns All values of the columns are atomic. That is, they contain are atomic. That is, they contain no repeating values. no repeating values.

2NF 2NF it is in 1NF and every non- it is in 1NF and every non-key column is fully dependent key column is fully dependent upon the primary key (avoid upon the primary key (avoid partial dependencies)partial dependencies)

Page 3: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

REVIEW OF NF Cont…REVIEW OF NF Cont…

3NF 3NF it is in 2NF and every non-key column it is in 2NF and every non-key column is non transitively dependent upon its primary is non transitively dependent upon its primary key. In other words, all non-key attributes are key. In other words, all non-key attributes are functionally dependent only upon the primary functionally dependent only upon the primary key. key.

BCNF BCNF A relation is in BCNF if every A relation is in BCNF if every determinant is a candidate key. This is an determinant is a candidate key. This is an improved form of third normal form. improved form of third normal form.

Determinant: an attribute on which some other Determinant: an attribute on which some other attribute is fully functionally dependentattribute is fully functionally dependent

Page 4: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

4NF and Multivalued 4NF and Multivalued DependenciesDependencies

Some relations can exist that are in BCNF but they have redundant data and have update anomalies

The next highest normal form is 4NF

4NF is based on multivalued dependencies

Page 5: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Multivalued Multivalued DependenciesDependencies

Consider a relation R with attributes X, Y, Z where X, Y, Z are sets of attributes

The multivalued dependency, X Y, exists if

when two tuples exist having the same X values:

T1(x, y1, z1) and T2(x, y2, z2), implies the two tuples

– T4(x, y2, z1) and T3(x, y1, z2) also exist

Page 6: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

ExampleExample

Suppose we have two one-to-many relationships:

Each employee may have many dependants Each employee may work on many projects For any employee, the dependents are

completely independent of the projects– For a given value of ename, the values of pname

are only determined by ename and not dname– For a given value of ename, the values of dname

are only determined by ename and not pname– So, each dname is repeated for each pname, and

viceversa

Page 7: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Consider the relation EMP

ename pname dnameEMP

If (Smith, X, John) and (Smith, Y, Anna) exist, then

(Smith, Y, John) and (Smith, X, Anna) exist

The MVD ename pname | dname exists in EMP

Note that EMP is BCNF, and there is a lot of redundancy in EMP

Page 8: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

We might have liked to have:

ename pname dnameEMP

Smith X, Y John, Anna

But 1NF does not permit multivalued attributes

Page 9: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

ename pname dnameEMP

Smith X John

Smith Y Anna

So, instead of :

ename pname dnameEMP

Smith X, Y John, Anna

We have:

Smith Y John

Smith X Anna

Page 10: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Note that if X Y | Z exists, then R can be decomposed into (X,Y) and (R-Y)

X Y ZR

X YRa

X ZRb

And this is a lossless decomposition

Decomposing a MVD without loss of information

Page 11: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

As ename pname | dname exists, EMP can be decomposed into

ename pname dnameEMP

ename pnameEMPa

ename dnameEMPb

This is a lossless decomposition

Page 12: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

4th Normal Form4th Normal Form

A Boyce Codd normal form relation A Boyce Codd normal form relation is in fourth normal form if is in fourth normal form if

(a)(a) there is no multi value there is no multi value dependency in the relation or dependency in the relation or

(b)(b) there are multi value dependency there are multi value dependency but the attributes, which are but the attributes, which are multi value dependent on a multi value dependent on a specific attribute, are dependent specific attribute, are dependent between themselves. between themselves.

Page 13: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

44thth Normal Form Cont… Normal Form Cont…

This is best discussed through mathematical This is best discussed through mathematical notation. notation.

Assume the following relationAssume the following relation

R(a:pk1, b:pk2, c:pk3)R(a:pk1, b:pk2, c:pk3)

Recall that a relation is in BCNF if all its Recall that a relation is in BCNF if all its determinant are candidate keys, in other words determinant are candidate keys, in other words each determinant can be used as a primary each determinant can be used as a primary key. key.

Because relation Because relation RR has only one determinant has only one determinant (a, (a, b, c)b, c), which is the composite primary key and , which is the composite primary key and since the primary is a candidate key therefore since the primary is a candidate key therefore R is in BCNF.R is in BCNF.

Page 14: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

44thth Normal Form Cont… Normal Form Cont…

Now R may or may not be in fourth normal form. Now R may or may not be in fourth normal form.   1. If R contains 1. If R contains no multi value dependencyno multi value dependency then R then R

will be in Fourth normal form.will be in Fourth normal form.  2. Assume R has the following two-multi value 2. Assume R has the following two-multi value

dependencies:dependencies:  a --->> b a --->> b and and a --->> c a --->> c   In this case R will be in the fourth normal form if In this case R will be in the fourth normal form if bb and and

c c dependent on each otherdependent on each other.. However if b and However if b and c are independent of each other c are independent of each other

then then R is notR is not in in fourth fourth normal formnormal form and the relation and the relation has to be projected to two non-loss projections.has to be projected to two non-loss projections.

Page 15: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Consider a case of class enrollment. Consider a case of class enrollment. Each student can be enrolled in one Each student can be enrolled in one or more classes and each class can or more classes and each class can contain one or more students. contain one or more students.

Clearly, there is a many-to-many Clearly, there is a many-to-many relationship between classes and relationship between classes and students. This relationship can be students. This relationship can be represented by a Student/Class represented by a Student/Class cross-reference table:cross-reference table:

{StudentID, ClassID}{StudentID, ClassID}

ExampleExample

Page 16: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Example Cont…Example Cont…

The key for this table is the combination The key for this table is the combination of StudentID and ClassID. To avoid of StudentID and ClassID. To avoid violation of 2NF, all other information violation of 2NF, all other information about each student and each class is about each student and each class is stored in separate Student and Class stored in separate Student and Class tables, respectively.tables, respectively.

Note that each StudentID determines not Note that each StudentID determines not a unique ClassID, but a well-defined, finite a unique ClassID, but a well-defined, finite setset of values. This kind of behavior is of values. This kind of behavior is referred to as referred to as multi-valued multi-valued dependencydependency of ClassID on StudentID. of ClassID on StudentID.

Page 17: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Consider another example with two many-to-many relationships, Consider another example with two many-to-many relationships, between students and classes and between classes and teachers.between students and classes and between classes and teachers.

Example 2Example 2

Students Classes* *

Also, a many-to-many relationship between Also, a many-to-many relationship between students and teachers is implied. students and teachers is implied.

Classes Teachers* *

Page 18: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

The combination of StudentID and TeacherID The combination of StudentID and TeacherID does not contain any additional information does not contain any additional information beyond the information implied by the beyond the information implied by the student/class and class/teacher relationships. student/class and class/teacher relationships.

Consequentially, the student/class and Consequentially, the student/class and class/teacher relationships are independent class/teacher relationships are independent of each other—these relationships have no of each other—these relationships have no additional constraints. The following table is, additional constraints. The following table is, then, in violation of 4NF:then, in violation of 4NF:

{StudentID, ClassID, TeacherID}{StudentID, ClassID, TeacherID}

Example 2 Cont…Example 2 Cont…

Page 19: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

As an example of the anomalies As an example of the anomalies that can occur, realize that it is not that can occur, realize that it is not possible to add a new class taught possible to add a new class taught by some teacher without adding at by some teacher without adding at least one student who is enrolled in least one student who is enrolled in this class.this class.

44thth NF and Anomalies NF and Anomalies

Page 20: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

44thth Normal Form and Normal Form and anomalies Cont…anomalies Cont…

Case 1:Case 1:

Assume the following relation:Assume the following relation:Employee (Eid:pk1, Language:pk2, Employee (Eid:pk1, Language:pk2,

Skill:pk3) Skill:pk3)

  

No multi value dependency, No multi value dependency, therefore R is in fourth therefore R is in fourth normal form.normal form.

Page 21: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

case 2: Assume the following relation with multi-value dependency: Employee (Eid:pk1, Languages:pk2, Skills:pk3) Eid --->> Languages Eid --->> Skills

Languages and Skills are dependent.This says an employee speak several languages and has several skills. However for each skill a specific language is used when that skill is practiced.

4th Normal Form and 4th Normal Form and anomalies Cont…anomalies Cont…

Page 22: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Thus employee 100 when he/she teaches speaks English but when he cooks speaks French. This relation is in fourth normal form and does not suffer from any anomalies.

EidEid LanguageLanguage SkillSkill

100100 English English TeachingTeaching

100100 KurdishKurdish PoliticPolitic

100100 FrenchFrench CookingCooking

200200 EnglishEnglish CookingCooking

200200 ArabicArabic SingingSinging

Page 23: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

case 3: case 3: Assume the following relation with Assume the following relation with multi-value dependency:multi-value dependency:  Employee (Eid:pk1, Languages:pk2, Skills:pk3)Employee (Eid:pk1, Languages:pk2, Skills:pk3)

Eid --->> LanguagesEid --->> Languages Eid --->> Eid --->> SkillsSkills

Languages and Skills are Languages and Skills are independentindependent..

4th Normal Form and 4th Normal Form and anomalies Cont…anomalies Cont…

Page 24: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

EidEid LanguageLanguage SkillSkill

100100 EnglishEnglish TeachingTeaching

100100 KurdishKurdish PoliticPolitic

100100 EnglishEnglish PoliticPolitic

100100 KurdishKurdish TeachingTeaching

200200 ArabicArabic SingingSinging

4th Normal Form and 4th Normal Form and anomalies Cont…anomalies Cont…

This relation is This relation is notnot in fourth normal form and in fourth normal form and suffers from all three types of anomalies.suffers from all three types of anomalies.

Page 25: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Insertion anomaly:Insertion anomaly: To insert row (200 English Cooking) To insert row (200 English Cooking) we have to insert two extra rows (200 Arabic cooking), we have to insert two extra rows (200 Arabic cooking), and (200 English Singing) otherwise the database will and (200 English Singing) otherwise the database will be inconsistent. Note the table will be as follow:be inconsistent. Note the table will be as follow:

EidEid LanguagLanguagee

SkillSkill

100100 EnglishEnglish TeachingTeaching

100100 KurdishKurdish PoliticsPolitics

100100 EnglishEnglish PoliticsPolitics

100100 KurdishKurdish TeachingTeaching

200200 ArabicArabic SingingSinging

200200 EnglishEnglish CookingCooking

200200 ArabicArabic CookingCooking

200200 EnglishEnglish SingingSinging

Page 26: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Deletion anomalyDeletion anomaly: If employee 100 discontinue : If employee 100 discontinue politic skill we have to delete two rows: politic skill we have to delete two rows:

(100 Kurdish Politic), and (100 English Politic) (100 Kurdish Politic), and (100 English Politic) otherwise the database will be inconsistent.otherwise the database will be inconsistent.

EidEid LanguageLanguage SkillSkill

100100 EnglishEnglish TeachingTeaching

100100 KurdishKurdish PoliticsPolitics

100100 EnglishEnglish PoliticsPolitics

100100 KurdishKurdish TeachingTeaching

200200 ArabicArabic SingingSinging

200200 EnglishEnglish CookingCooking

200200 ArabicArabic CookingCooking

200200 EnglishEnglish SingingSinging

Page 27: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

More anomaliesMore anomalies

Update anomaly:Update anomaly: If employee If employee 200 changes his skill from 200 changes his skill from singing to dancing we have to singing to dancing we have to make changes in more than make changes in more than one place.one place.

Page 28: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

The relation is projected to the following two The relation is projected to the following two non-loss projections which are in forth normal non-loss projections which are in forth normal formform

Emplyee_Language(Eid:pk1, Languages:pk2)Emplyee_Language(Eid:pk1, Languages:pk2)

EidEid LanguageLanguage

100100 EnglishEnglish

100100 KurdishKurdish

200200 ArabicArabic

Page 29: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

Emplyee_skill(Eid:pk1, Skills:pk2)Emplyee_skill(Eid:pk1, Skills:pk2)

EidEid SkillSkill

100100 TeachingTeaching

100100 PoliticPolitic

200200 SingingSinging

Cont…Cont…

Page 30: 4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B

ReferencesReferences

Functional Dependency Functional Dependency (Normalization)(Normalization) http://www.emunix.emich.edu/~khttp://www.emunix.emich.edu/~khailany/files/Normalization.htmhailany/files/Normalization.htm

Multivalued Dependencies (Ozmar Zaine):Multivalued Dependencies (Ozmar Zaine):http://www.cs.sfu.ca/CC/354/zaianhttp://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node13.e/material/notes/Chapter7/node13.htmlhtml