chapter 8 normalization of relational...

1

Praveen Kumar

CHAPTER 8

NORMALIZATION OF RELATIONAL SCHEMA

Normalization of a Relation Schema

Normalization of a Relation Schema refers to the process of:-

(a) Identifying those data dependencies in the schema, which would cause

anomalies during Insert, Update and Delete of data.

(b) And decomposing the schema into a set of sub-schemas, on the basis of

dependencies, causing anomalies.

The resulting schemas would permit representation of the intended information, while

maintaining the data redundancies to a minimal.

The process of normalization can be well understood only after understanding the

concept of various types of data dependencies occurring in a database.

Concept of Functional Dependencies (FDs)

Let there be a Relation Schema R, comprising sets of attributes , & i.e. R,

R & R. A Functional Dependency (read as “ determines ”) is said

to be holding on the Schema R, if for every legal relation r(R) and for every tuple- pair

{t1,t2} r, if t1[] = t2[] then it must satisfy t1[] = t2[]. This means that if any two

tuples of relation r(R) agree on the values of , then the two tuples must agree on the

values of .

, the “Left Side” of FD is called “Determinant”; and , the “Right Side” of

the FD is called “Dependent”.

Example: The statement that “Knowing the Registration-number of a Student, we

can determine his/her Name” implies FD Registration-number Name. Here, the

Registration-number is the Determinant of the FD and Name is the Dependent.

FDs holding on a Schema R

A set of FDs F is said to be holding on a schema R, if all FDs in F are satisfied by every

legal relation r(R).

Suppose, there is a Relational Schema “Student” having the following FDs holding on it:-

{Roll_No} Registration_No, Branch, Section, Name, Father’s-name, Address, DOB

{Registarion_No} Roll_No, Branch, Section, Name, Father’s-name, Address, DOB

2

Praveen Kumar

{Name, Father_Name, DOB, Address} Roll_No, Registration_No, Section, Branch

Trivial Functional Dependency

An FD is called Trivial, if . It is called trivial, because such an FD would

be satisfied by any relation r(R). Since, if two tuples of a relation agree on the values of

, the tuples will definitely agree on the values of , since is a subset of .

For Example, {Roll_No, Name} Name represents a trivial FD.

Extraneous Attributes on the “Left Side” of an FD

Suppose an FD is holding on a Schema R. An attribute A (A ) is said to be

extraneous, if FD ( - A) also holds on R. This means that attribute A is not

required in to determine the value of . The subset ( - A) is sufficient to determine

the value of .

Left-Irreducible FD

An FD , holding on a schema R, is said to be left-irreducible, if there exists no

proper subset of , which can determine i.e. there exists no attribute A (A ) such

that ( - A) holds on R.

Super Key (SK) of a Relation Schema

Suppose FD K R holds on schema R, where K R, then K forms a Super Key (SK)

of Schema R. Super-set of a SK will also form a SK.

Example: Suppose, we have a Schema Student (Roll_No, Registration_No, Branch,

Section, Name, Fathers-name, Address, DOB) and suppose the value of {Roll-no} is

distinct and also the value of {Registration_No} is distinct for each student. Thus, {Roll-

no, Class} will form a Super-Key of Schema Student. Similarly, {Roll-no, Class, Name}

will also form a Super-Key of Student. Similarly, there can be many Super Keys of

Student.

Prove that if K R holds, then K forms a Super Key of R

Proof by Contradiction

Let us assume that two tuples {t1 and t2} in a legal relation r(R) agree on the values of K

i.e. t1 [K] = t2 [K]. ------- (i)

Since, K R, it implies that t1 [R] = t2 [R] , and thus t1 = t2

3

Praveen Kumar

But no two tuples in a legal relation r(R) can be equal, since a relation is defined as a set

of tuples and in a set no two elements can be same.

Thus our assumption (i) is wrong.

Thus, it is proved that no two tuples in a legal relation r(R) can agree on the values of K;

thus K is a Super Key of R. So, knowing the value of attribute set K, a tuple can be

uniquely identified in a relation r.

Each Relation Schema R will have at least one default Super Key; that is, the set of all its

attributes i.e. the entire schema R itself.

Candidate Key (CK) of a Relation Schema R

Since superset of a Super Key of R will also be its Super Key. This implies that a Super

Key may be containing some extraneous attributes. Suppose K is a Super Key of R and E

is the complete set of extraneous attributes contained in K, then (K-E) will form a

minimal Super Key of R. No proper subset of (K-E) will form a Super Key of R. This

minimal Super Key is called a Candidate Key of R.

Alternatively, it can be stated that if K R forms a left-irreducible FD holding on R,

then K is called a Candidate Key of R.

For example, {Roll_No} and {Registration_No} will form Candidate Keys of schema

“Student”. In addition, {Name, Fathers_Name, Address, DOB} also forms another

Candidate Key.

Primary-Key

A Relation Schema R may have more than one Candidate Keys. One of the Candidate

Keys is chosen as primary means to identify tuples uniquely in a relation r(R). This

designated candidate key is called a Primary Key of R. Out of the three candidate keys in

the above example, we may select {Roll-No} as Primary Key of the Schema Student.

Prime Attributes or Key Attributes of a Relational Schema An attribute A of

Relational Schema R (AR) is said to be a Prime Attribute or Key Attribute, if it forms

part of any of its Candidate Keys. Let {K1, K2, …. Kn} be the complete set of candidate

keys of R. Then the set of Prime Attributes of R = K’ = K1 K2 …. Kn

Non-Prime Attributes of a Relational Schema The subset of R, which does not

form part of any of its Candidate Keys, is called the set its Non-Prime Attributes or Non-

Key Attributes.

4

Praveen Kumar

Logically implied FDs

Suppose F is the set of FDs holding on a Relation Schema R. There may be some FDs

that can be logically inferred from F. These inferred FDs will also hold on every legal

relation r(R). The set of such FDs, inferred from set F, is said to logically implied by F.

Rules for the Inference of FDs

Suppose , , and are subsets of attributes of a Relation Schema R i.e. R and

R, R and R

Armstrong’s Rules The inference rules 1..3 are Armstrong’s Rules:-

Rule 1 (Reflexivity Rule) If then holds on R.

Rule 2 (Augmentation Rule) If holds on R, then holds on R.

Rule 3 (Transitivity Rule) If and hold on R, then holds on R.

Additional Rules of Inference

Rule 4 (Union Rule) If and hold on R, then holds on R.

Rule 5 (Decomposition Rule) If holds on R, then and hold on R.

Rule 6 (Pseudo-Transitivity Rule) If and hold on R, then holds

on R.

Proofs of Inference Rules

1 Reflexivity Rule

Suppose . Consider a relation r (R) and a tuple pair { t1, t2 } r such that

t1[] = t2[] ----------------- (i)

Since , t1 and t2 will also satisfy

t1[] = t2[]-------------------(ii)

From (i) and (ii), it is implied that holds on R.

Thus, proved.

5

Praveen Kumar

2 Augmentation Rule

We prove this rule by contradiction.

Suppose holds on R.

Consider a relation r(R ) and a tuple pair { t1, t2 } r such that

t1[] = t2[] --------------- (i)

Since, holds on R, t1 and t2 will also satisfy

t1[] = t2[] --------------- (ii)

We make an assumption that does not hold on R. ----(A)

That is, if t1[] = t2[] -------------(iii) Then we have t1[] t2[] --------------(iv) From (i) and (iii), it is implied that

t1[] = t2[] -----------------(v)

From (ii) and (v), it is implied that

t1[] = t2[] --------------(vi)

Since, (vi) contradicts (iv), Our assumption (A), that does not hold on R, is NOT

CORRECT.

Thus, if holds on R then also holds on R.

3 Transitivity Rule

Suppose and holds on R.

Since, holds on R, for every relation r(R ) and a tuple pair { t1, t2 } r , which

satisfies :-

t1[] = t2[] -----------------(i)

we also have:-

t1[] = t2[] -----------------(ii)

Since, holds on R, for

t1[] = t2[] -----------------(iii)

we also have:-

t1[] = t2[] ----------------- (iv)

From (i) and (iv), it is implied that holds on R.

4 Union Rule

Suppose and hold on R.

Applying Augmentation Rule to {Augmenting by }, it implies that

------------------------(i)

Applying Augmentation Rule to {Augmenting by }, it implies that

------------------------(ii)

6

Praveen Kumar

Applying Transitivity Rules to (i) and (ii), it is implied that

5 Decomposition Rule

Suppose holds on R. ------ (i)

By Reflexivity Rule --------(ii)

Applying Transitivity Rule between (i) and (iii)

Similarly, we can prove that 6. Pseudo-Transitivity Rule

Suppose and hold on R.

Applying Augmentation Rule to {augmenting by on both sides),

Applying Transitivity Rule to and , we get

Trivial FD An FD is said to be trivial, if i.e. if the determinant is a

superset of dependent.

Example: The Functional Dependency conveyed by the statement, “ Knowing the Name

and Address of a Person, we can determine his Address “ i.e. (Name, Address)

Address is obviously trivial.

Closure of FD Set

Suppose F is a set of FDs that holds on a Schema R, then the Closure of F, denoted by F+,

is the complete set of FDs, that includes F and all the FDs that are logically implied

(inferred) by F. Any legal relation r(R) that satisfies F will also satisfy F+.

Algorithm to determine Closure F+ of an FD Set F

F+

= F;

Repeat

Save-F+ = F

+;

To each FD f1 F+, apply Armstrong’s Rule of Reflexivity; and add the FDs so

inferred to F+;

To each FD f1 F+, apply Armstrong’s Rule of Augmentation; and add the FDs

so inferred to F+;

To each such pair of FDs as { } F+, apply the Armstrong’s Rule of

Transitivity, and add FD to F+;

Until (F+ = Save-F

+);

7

Praveen Kumar

Cover of an FD Set

An FD set G is said to be the cover of another FD set F, if F G+ i.e. all the FDs that are

there in F are also there in the Closure of set G.

Equivalent Sets

Two FD sets F and G are said to be equivalent sets, if both form cover of each other i.e F

G+ and G F

+, which implies F

+ = G

+. This means that two sets F and G are

equivalent, if their closures are equal.

Extraneous FDs in a set

An FD f F is said to be extraneous, if its exclusion from F does not affect the Closure

of F i.e. {F-f}+

= F+. Such FDs in a set are logically implied by other FDs in the set.

Example:-

Suppose we have an FD set F :{ , and } then is said to be

extraneous FD, since it can be inferred by applying Armstrong’s Transitivity Rule to the

other two FDs in the set. The extraneous FD can be eliminated from the set, without

affecting Closure of the set.

Extraneous attributes in the determinant of an FD

Suppose, we have an FD , that holds on a schema R. An attribute A is said to

be extraneous, if ( - A) also holds on R.

Left-irreducible FDs

The left-side on an FD (i.e. its determinant) is said to be irreducible, if it does not contain

any extraneous attributes. Such FDs are known as left-irreducible FDs.

If R is a left-irreducible FD holding on the schema R, then forms a candidate Key

of R.

Canonical Cover of an FD Set

An FD set Fc is said to be Minimal Cover or Canonical Cover of FD set F, iff F Fc+ and

it satisfies the following three conditions:-

(a) Each FD in Fc is in a Canonical Form i.e. has only one attribute on its right

side.

(b) No FD in Fc has any extraneous attributes on its left side i.e. all the FDs in Fc

are left-irreducible.

8

Praveen Kumar

(c) None of the FDs in Fc is extraneous i.e. no FD in Fc is logically implied by the

other FDs in the set Fc . This implies than no FD can be removed from the set

Fc without changing its closure.

Algorithm to determine Canonical Cover of an FD Set F

Fc = F;

Repeat

Save-Fc = Fc;

To each FD in Fc of the form ABC (where A, B and C are attributes of

Schema R), apply Decomposition Rule; and replace the FD by a set of

FDs {A, B, C};

For each FD () Fc and for each attribute A

if {{Fc – {}} { ( - A) } }+ = Fc

+

then replace FD by FD ( - A) in Fc ;

For each FD () Fc

if {Fc – { }}+ = Fc

+

then eliminate FD from Fc ;

Until (Fc = Save-Fc);

How does Canonical Cover of an FD set help to reduce the DBMS overheads?

Suppose F is the set of FDs holding on a schema R, then a relation r(R) would be legal

only if satisfies all the FDs in set F. Now, to determine whether a relation r(R) is legal or

not, DBMS has to check for the satisfaction of all the FDs in set F. On the other hand, if

we determine a minimal Cover Fc of F, then we have to check for the satisfaction of a

much smaller set of FDs, since, a relation r(R) that satisfies FD set Fc will also satisfy FD

set F, since both are equivalent sets. This will reduce DBMS overheads.

Can an FD Set have more than one Canonical Covers?

Yes, an FD set F can have more than one Canonical Covers, but all of those sets would be

equivalent to each other; and in turn equivalent to F.

Example:

Determine Canonical Cover of an FD Set {A BCD, B CDA, C ABD}

Step 1: Covert the FDs to their canonical form i.e. by equivalent sets of FDs, having only

single attributes on their right side

Fc : { A B, A C, A D, B C, B D, B A, C A, C B, C D}

9

Praveen Kumar

Step2: Remove extraneous attributes from the left side of all FDs. Here, all FDs have

only one attribute on its left side, thus cannot contain any extraneous attribute.

Fc : { A B, A C, A D, B C, B D, B A, C A, C B, C D}

Step3: Remove extraneous FDs from the set.

A B is implied by A C, C B; so A B can be eliminated from the set.

So, now Fc:{A C, A D, B C, B D, B A, C A, C B, C D}

A D is implied by A C and C D; so A C can now be eliminated.

So, now Fc:{A C, B C, B D, B A, C A, C B, C D}

B C is implied by B A and A C; so B C can now be eliminated

So, now Fc:{A C, B D, B A, C A, C B, C D}

C D is implied by C B and B D; so C D can now be eliminated

So, now Fc:{A C, B D, B A, C A, C B}

C A is implied by C B and B A; so C A can now be eliminated

So, now Fc:{A C, B D, B A, C B}

No more FDs can be eliminated; so {A C, B D, B A, C B}forms the

Canonical Cover of F.

In the beginning of Step3 above, had we eliminated A C, since it was implied by A

B and B C. Then, the Canonical Cover would have been different. So, the Canonical

Cover of an FD Set need not be unique.

Closure of an Attribute Set under F

Suppose is a sub-set of Schema R i.e. R. And suppose F is the set of FDs holding

on Schema R. Then, the Closure of Attribute Set , denoted by +, is the complete set of

attributes that can be determined by under the FD set F.

Algorithm to determine Closure of an Attribute Set under a set of FDs

Let schema R and F be the set of FDs holding on schema R.

The closure of i.e. + under F can be determined as follows:-

+

= ;

Repeat

Save-+ =

+;

For each FD ( ) F

if

+ then

+ =

+ ;

Until (Save-+ ==

+ ) ;

10

Praveen Kumar

The Concept of “Attribute Set Closure” can be used to determine the following:-

(a) Whether an Attribute Set is a Super Key of Relation Schema R (where

R)

Determine +

under F.

If

+ equals R, then is a Super-Key of R.

(b) Whether holds on R(where R and R)

Determine + under F.

If +, then holds on R.

(c) To determine Closure of the FD Set F i.e F+

F+

:= F;

For each FD in F+

Begin

Determine + under F

+;

For each +

holds; Thus include it in F+

i.e. F+ := F

+ { };

End;

Loss-Less-Join Decomposition of a Relation Schema R

Decomposition of a relation r(R) into r1(R1) and r2(R2) (such that R1 R2 = R) is said to

be a loss-less-join decomposition, if it satisfies r1 * r2 = r i.e. natural join of r1 and r2

should generate r, with no tuples eliminated and with no new tuples added. Such a

decomposition is also called Non-Additive Decomposition.

Example:-

Case I

Consider the following relation r on schema R (A,B,C) and its decomposition into r1

and r2 .

r(R) A B C

A1 B1 C1

A2 B2 C1

A1 B1 C2

A3 B2 C3

11

Praveen Kumar

A1 B1 C3

A2 B2 C4

r1(R1)

A B

A1 B1

A2 B2

A3 B2

r2(R2)

A C

A1 C1

A2 C1

A1 C2

A3 C3

A1 C3

A2 C4

r1 * r2

A B C

A1 B1 C1

A1 B1 C2

A1 B1 C3

A2 B2 C1

A2 B2 C4

A3 B2 C3

It is a loss-less-join-decomposition, since r1 * r2 = r

Case II

Now consider the following decomposition of r into r1 and r2 .

r(R) A B C

A1 B1 C1

A2 B2 C1

A1 B2 C2

A3 B2 C3

A1 B1 C3

A2 B1 C4

12

Praveen Kumar

r1 (R1) A B

A1 B1

A2 B2

A1 B2

A3 B2

A2 B1

r2 (R2)

A C

A1 C1

A2 C1

A1 C2

A3 C3

A1 C3

A2 C4

r1 * r2

A B C

A1 B1 C1

A1 B1 C2

A1 B1 C3

A2 B2 C1

A2 B2 C4

A1 B2 C1

A1 B2 C2

A1 B2 C3

A3 B2 C3

A2 B1 C1

A2 B1 C4

It is NOT a loss-less-join-decomposition, since r1 * r2 r

Here r1 * r2 contains five additional tuples (shown in italics and underlined), which

are not found in r.

Necessary Condition for a Decomposition to be loss-less –join decomposition

A decomposition of Relation Schema R into R1 and R2 (such that R1 R2 =R) will be a

Loss-Less-Join (Non-Additive) Decomposition, if the common attributes of R1 and R2

(i.e. R1 R2) form candidate key of either R1 or R2 or both.

i.e. R1 R2 R1

13

Praveen Kumar

OR

R1 R2 R2

In the above example, in Case I, FD A B holds on the Schema R which can be

ascertained from the data represented in relation r of Case I. Thus, the common attribute

of r1 and r2 i.e. {A} forms a Primary key of both R1 and R2. That is why the

decomposition is a loss-less-join decomposition; whereas in Case II, such a condition

does not hold; and that is why the decomposition is a Lossy (Additive) decomposition.

Heath’s Theorem If a relation schema R(,,) has an FD holding on it, then it can have a loss-less-join decomposition:-

R1 (,) and R2 (,).

Since R1 R2 = {} R1

N-ary Loss-less-join Decomposition

An N-ary decomposition of R into R1,R2 , R3 ,……,Rn (such that R1 R2 R3 ……Rn

= R) is said to be Loss-less-join (Non-Additive) Decomposition, if each of the

decompositions Ri satisfies the following:-

Ri Rk Ri

Ri Rk Rk

where Rk is the union of all decompositions R1,R2 , R3 ,……,Rn, except Ri

Restriction of FDs to a Decomposition

Let F be the set FDs holding on a relation schema R and Ri R is a decomposition of R.

Then the restriction of F to decomposition Ri (denoted by Fi) is defined as:-

Fi = () F+ , Ri , Ri}

i.e. Fi is the set of FDs that belong to F+

AND Ri.

Dependency-Preserving Decomposition

Let F be the set of FDs holding on a schema R, having a decomposition (R1, R2,…, Rn)

such that R1 R2 R3 ……Rn = R.

Let {F1, F2, ……, Fn} be the restrictions of F to R1, R2, ……, Rn respectively.

Let F’ = F1 F2 ……U Fn

The decomposition (R1, R2,…, Rn) is said to be “Dependency-Preserving” if F’+ = F

+ i.e

each FD of F+ must be preserved in at least one of the decompositions;

else the

decomposition is called non-dependency-preserving.

Example:-

Suppose a relation schema R (A, B, C) has FD set F holding on it:-

14

Praveen Kumar

F: {AB, B C}

And let us consider the decomposition R1(B,C ) and R2(A,B).

Restriction of F to R1 = F1 = { B C}

Restriction of F to R2 = F2 = { A B}

F’ = F1 F2 = { AB , B C}

Obviously, F’+

= F+; thus it is a Dependency Preserving Decomposition.

Also R1 R2 = {B} is the primary key of R2, thus making it a loss-less-join

decomposition.

Thus, this decomposition is a “Dependency-Preserving” and “Loss-less-join”

Decomposition.

Now consider the Decomposition R1 (A,C ) and R2(A,B).:-

Here, R1 R2 = {A} forms a primary key of both R1 and R2, making it a loss-less-join

decomposition.

Restriction of F to R1 = F1 = {A C}

Restriction of F to R2 = F2 = {A B}

F’ = F1 F2 = { AB, A C }

Obviously, F’+

F+; since one of the FDs (i.e. B C) is lost in the decomposition, which

is not logically implied by other FDs in the set F’. Thus, though this decomposition is a

loss-less-join decomposition but not a dependency-preserving decomposition.

Why is it desirable that a decomposition should be dependency-

preserving:-

For any decomposition, it is mandatory that it should be a loss-less-join decomposition

to ensure consistency of database; and also desirable (not mandatory) that it should be

dependency-preserving, for the following reason:-

If each FD in F+ is preserved in at least one of the decompositions, then the satisfaction of

each FD can be verified in a single relation itself; else it would require natural-join of

more than one relations to verify some of the FDs, which are not preserved in the

decomposition. A natural join operation to ascertain compliance of FDs would be too

costly in terms of execution time and memory requirements. But, sometimes while

normalizing a schema, it may not be possible to preserve all the FDs. But, ensuring that a

decomposition is a loss-less-join decomposition is mandatory, since no compromise is

possible on consistency of a database.

So, dependency-preservation is a desirable criteria, NOT A MANDATORY ONE.

Whereas, ensuring Loss-Less-Join (Non-Additive Join) Decomposition is a mandatory

criteria.

Algorithm to determine whether a Decomposition *( R1, R2, R3, -----,Rn) of R is a

Dependency-Preserving Decomposition or not.

15

Praveen Kumar

Let F be the set of FDs holding on R.

Compute Minimal Cover Fc of F.

For Each FD in Fc

If Ri (For 1< i < n )

Then it is Dependency-Preserving Decomposition

Else it is Non-Dependency-Preserving Decomposition.

16

Praveen Kumar

NORMALIZATION

First Normal Form (1 NF) A Schema R is said to be in first normal form (1 NF) if all

its attributes have only atomic domains i.e. domains of all attributes have only indivisible

values. Alternatively, it can be stated that for a Relation Schema R to be in 1 NF, all its

attributes should be “simple” and “single-valued”. Each field in each tuple of that relation

must have only one value from the respective domain or a “NULL” value.

Let there be a relation schema EMP (E#, E_Name, Salary, Tel_No)

An employee in EMP may have none/one/more-than-one Tel_No.

Then a Table (Relation) may be represented as :-

UN-NORMALIZED TABLE

E# E_Name

Salary Tel_No

001 Ajay 200000 {9810222777,

2449227,

9422230230}

002 Vijay 50000 {NULL}

003 Ram 100000 {9810345567}

The Field Tel_No in a tuple has a set of values. Such a Table is said to be Un-normalized

and it is not in First Normal Form.

The above table can be transformed to First Normal Form as follows:-

NORMALIZED TABLE

E# E_Name

Salary Tel_No

001 Ajay 200000 9810222777

001 Ajay 200000 2449227

001 Ajay 200000 9422230230

002 Vijay 50000 NULL

003 Ram 100000 9810345567

17

Praveen Kumar

A tuple in the Un-Normalized Table is replaced by as many tuples in the Normalized

Table as the number of telephones owned by the respective employee. Such a table is

called Normalized Table or Flat Table. It is in First Normal Form and has a lot of data

redundancy, which will be eliminated by further normalization of the Table.

Full Functional Dependency Let there be a Relational Schema R with a Candidate Key

K and a non-prime attribute A (K R, AR). The Functional Dependency K A is said

to be a “Full Functional Dependency”, if attribute A cannot be determined by any proper

subset of K i.e. there does not exists any K1 K for which K1 A holds.

Partial Functional Dependency Let there be a Relational Schema R with a Candidate

Key K and a non-prime attribute A (K R, AR). The Functional Dependency K A

is said to be a “Partial Functional Dependency”, if attribute A can be determined by a

proper subset of K i.e. there exists K1 K for which K1 A holds.

Let R1 (A, B, C, D, E) be a Relation Schema with all its attributes (A..E) having only

atomic domains; and let F: {AB C, A D, D E} be the set of FDs holding on it.

By the Armstrong’s Rule of Transitivity {A D, D E} implies A E.

So, A E also holds on R.

Since, all attributes of R1 have only atomic domains, it is in 1 NF.

Since {A, B}+

= ABCDE, {A,B} forms the a candidate key of this schema R; and this is

the only candidate key of R.

So, A and B are the prime attributes of R and all other attributes i.e. C, D and E are non-

prime attributes.

The Non-prime attribute C is determined only by the full candidate key, since AB C.

holds on R. So, C is said to be fully functionally dependent on the candidate key and the

FD AB C is called a Full FD or Complete FD of R.

However, the non-prime attributes D and E are determined by A alone, which is a proper

subset of the candidate key {A, B}. Such a dependency is called partial functional

dependency; and such FD causes certain Insert/Delete/Update anomalies, as

demonstrated in the following example:-

Example:-

Consider a Schema SP1 (S#, P#, Sname, Scity, Status, Pname, Qty)

Where S# :Supplier Id number (Unique)

P# :Part Id Number (Unique)

Sname :Supplier Name

18

Praveen Kumar

Scity :Supplier City

Status : Supplier Status, which depends on Scity

Pname: Part Name

Qty :Quantity of a Part (P#) to be supplied by a Supplier (S#)

Suppose, the following set of FDs holds on the Schema SP1:-

S# Sname, Scity

Scity Status,

P# Pname,

{S#,P#} Qty

An instance of a relation, defined on Schema SP1:-

S# P# Sname Scity Status Pname Qty

S1 P1 Avia Mumbai 10 Aero-engine 5

S1 P2 Avia Mumbai 10 Generator 5

S2 P1 Aero Delhi 20 Aero-engine 2

S2 P3 Aero Delhi 20 Altimeter 5

S3 P2 Air-supp Mumbai 10 Generator 10

S3 P3 Air-supp Mumbai 10 Altimeter 20

This relation, being in 1 NF, has the following Insert/Delete/Update-anomalies:-

(a) Insertion Anomalies:-

(i) Information about a supplier, like Sname, Scity can be inserted

only when the supplier is supplying at least one part.

(ii) Information about a Part like its Pname can be inserted only when

the part is being supplied by at least one supplier.

(iii) Information about a City like its Status can be inserted only when

there is at least one supplier from that city and the supplier is supplying at

least one part.

(b) Deletion Anomalies:-

(i) If a supplier is supplying only one part, and that supply is

concluded. Then the tuple relating to that supply will be deleted. With the

deletion of that tuple, we would lose complete information about the

supplier i.e. its Name and City.

(ii) Suppose, a part is being supplied by only one supplier. On completion

of that supply, when the related tuple is deleted, we would lose the

information about the Name of that part.

19

Praveen Kumar

(iii) Suppose, a city has only one supplier and that supplier is supplying

only one part. On deletion of the tuple of that particular supply, we would

lose the information about the Status of that City.

(c ) Update Anomalies There is a lot of unwanted data redundancy like:-

(i) Information about the Sname and Scity of a particular supplier will

be appearing as many times as the number of parts being supplied

by that supplier.

(ii) Information about the name of a particular part will be appearing

as many times as the number of supplies relating to that part.

(iii) Information about the Status of a City will be appearing as many

times as the number of supplies from the suppliers of that City.

The unwanted data redundancy will need the redundant information to be

updated at multiple places. Like when a Supplier moves from one city to

other city, this has to be changed in multiple tuples. An inconsistent update

or a partial update will cause database inconsistency.

Second Normal Form (2 NF)

A Relation Schema R is said to be in Second Normal Form (2 NF), iff:-

(a) It is in 1 NF and

(b) Each non-prime attribute of R is fully functionally determined by the

candidate keys of R, that is, R does not involve any partial functional

dependencies.

The above relation schema R1 is not in 2 NF, since it involves a partial functional

dependency A DE.

Decomposing a 1 NF Schema into 2 NF Schemas

Using Heath’s Theorem, R1 can be loss-less decomposed into R21 and R22.

R21 (A, D, E) Primary Key {A}

A D, D E

By transitivity A E also holds on R21.

R22 (A, B, C) Primary Key {A,B}

Foreign Key {A} references R21

AB C

20

Praveen Kumar

The decomposition of R1 into R21 and R22 is Loss-less Join Decomposition since:-

R21 R22 R21

Since, R21 and R22 involve no partial functional dependencies, both are in 2 NF.

Now, Let us consider Schema SP1

{S#,P#}is a candidate key of SP1 and this is the only candidate key of SP1

So, S# and P# are prime attributes and all other attributes are non-prime.

Since S# Scity and Scity Status; so S# Status also holds.

SP1 involves the following partial FDs:-

S# Sname, Scity, Status

AND

P# Pname

Since, SP1 has partial FDs, so it is not in 2 NF.

SP1 can be decomposed into 2 NF Schemas S, P and SP2 in two steps as shown

below:-

Step 1 Decompose SP1 into S and SP11,

on the basis of partial FD S# Sname, Scity, Status

S (S#, Sname, Scity, Status) Primary Key {S#}

S# Sname, S# Scity, Scity Status

By Transitivity, S# Status

SP11( S#, P#, Pname, Qty) Primary Key {S#, P#}

P# Pname

{S#, P#} Qty

The above decomposition is a loss-less-join decomposition, since:-

S SP11 = {S#} S since S# Sname, Scity, Status

The Schema S has no partial FD and is so in 2NF.

However, SP11 has a partial FD P# Pname and it is still not in 2NF.

Step 2 Decompose SP11 into P and SP2,

on the basis of partial FD P# Pname

P (P#, Pname) Primary Key {P#}

P# Pname

21

Praveen Kumar

SP2 (S#, P#, Qty) Primary Key {S#, P#}

{S#,P#} Qty Foreign Key {S#} references S

Foreign Key {P#} references P

The above decomposition is a loss-less-join decomposition, since:-

P SP2 = {P#} P Since P# Pname

Both P and SP11 have no partial FDs and are in 2NF.

Also, SP11 has no partial FD P# and it is still not in 2NF.

So, the 2NF decomposition of SP1 is:-

S (S#, Sname, Scity, Status) Primary Key {S#}

S# Sname, S# Scity, Scity Status

By Transitivity, S# Status


P# Pname


{S#,P#} Qty Foreign Key {S#} references S


The projections of SP1 over S, P and SP2 are:-

S

S# Sname Scity Status

S1 Avia Mumbai 10

S2 Aero Delhi 20

S3 Air-supp Mumbai 10

P

P# Pname

P1 Aero-engine

P2 Generator

P3 Altimeter

SP2

S# P# Qty

S1 P1 5

S1 P2 5

S2 P1 2

S2 P3 5

S3 P2 10

S3 P3 20

The partial dependency related problems have been resolved, as explained

below:-

22

Praveen Kumar

Insert:

Sname and Scity of a Supplier can now be inserted into table S, even when it is

not supplying even one part.

Pname of a part can now be inserted into table P, even when it is not being

supplied by any supplier.

Delete: When information about a supply is deleted from table SP2, we do not lose any

information about Sname, Scity or Pname.

Update

Information about Sname & Scity of a supplier now appears only in one tuple in

table S.

Information about Pname of a particular part now appears only in one tuple in

table P.

However, some anomalies still remain in 2 NF Schema. For example, the

Status of city can be inserted in table S, only when at least one supplier is

available in that city. Also, if the only supplier from a city ceases to exist, we

lose the information about the status of that city. And if multiple suppliers

exist in city, it status would appear in multiple tuples of table S. These

anomalies will be eliminated in the next normal form.

In R21, the non-prime attribute E is dependent on the candidate key A, through

another non-prime attribute D

i.e. A D, D E A E.

Such a Functional Dependency is called a Transitive Dependency.

Transitive Functional Dependency This refers to a situation wherein a non-

prime attribute of a Relation Schema R is dependent on its candidate key, through

another non-prime attribute of R. Such a FD also causes some

Insert/Delete/Update anomalies.

Third Normal Form (3NF)

A Relation Schema R is said to be in 3 NF, iff:-

(a) It is in 2 NF and

(b) Each non-prime attribute of R is non-transitively dependent on its

candidate keys i.e. R does not involve any Transitive Dependencies.

23

Praveen Kumar

Thus, R22 is in 3 NF but R21 is not, since it involves a Transitive Dependency i.e

A D, D E A E.

Decomposition of a 2 NF Schema into 3 NF Schemas

R21 can be loss-less decomposed into R21 and R22.

R31 (D, E) Primary Key {D}

D E

R32 (A, D) Primary Key {A}

A D Foreign Key {D} references R31.

The decomposition of R21 into R31 and R32 is Loss-less Join Decomposition

since:-

R31 R32 R31

R31 and R32 do not involve any Transitive Dependency; and are thus in 3 NF.

3 NF decomposition of R1:-

R31 (D, E) Primary Key {D}

D E

R31 (A, D) Primary Key {A}

A D Foreign Key {D} references R31.

R22 (A, B, C) Primary Key {A, B}

Foreign Key {A} references R31

AB C

Let us now consider decompositions of SP1

The Schemas P and SP2 do not involve any Transitive Dependencies; and are thus

already in 3NF. But, S has a Transitive Dependency i.e. S# Scity, Scity

Status S# Status

This Transitive Dependency of Schema S can be eliminated by decomposing it on

the basis of FD Scity Status

STS (Scity, Status) Primary Key {Scity}

Scity Status

SUPP (S#, Sname, Scity) Primary Key {S#}

Foreign Key {Scity} references STS

24

Praveen Kumar

S# Sname, Scity

Now, STS and SUPP do not have any Transitive Dependency; so both are in 3 NF.

The decomposition is loss-less, since:-

STS SUPP = {Scity} STS, since Scity Status

Thus, the 3NF Decomposition of SP1 is:-


Scity Status



S# Sname, Scity


P# Pname


{S#,P#} Qty Foreign Key {S#} references SUPP


The projections of S over STS and SUPP are:-

SUPP

S# Sname Scity

S1 Avia Mumbai

S2 Aero Delhi

S3 Air-supp Mumbai

STS

Scity Status

Mumbai 10

Delhi 20

Now, all update anomalies have been resolved. Status of a particular city now

appears in only one tuple in table STS. Also, information about status of a city can

now be inserted irrespective of whether any supplier exists in that city or not.

25

Praveen Kumar

BOYCE CODD NORMAL FORM (BCNF)

An alternate definition of a Relation Schema R to be in 3NF is as follows:-

3 NF A Relation Schema R is said to be in 3 NF, if each FD α→β holding on R satisfies

one of the following three conditions:-

(a) It is a trivial FD

OR (b) α is a Super Key of R

OR (c) Each attribute in the set (β – α ) is a prime attribute.

A Relation Schema in Third Normal Form (3 NF) may still be riddled with some

anomalies, under the situations when a schema has multiple candidate keys, which may

be composite and overlapping. Any relation, under such schema, may have some data

redundancies that would cause some update anomalies. For example, the schema:-

SP (S#, Sname, P#, Qty)

with FDs: {S#,P#} Qty

{Sname, P#} Qty

S# Sname

Sname S#

holding on it.

The relation schema SP has two candidate keys i.e {S#, P#} and {Sname, P#}. Both the

candidate keys are composite and have one common attribute i.e. P#.

Set of Prime Attributes: {S#, Sname , P# }

Set of Non-Prime Attributes: { Qty}

The only non-key attribute i.e. Qty is non-transitively and fully dependent on both the

candidate keys. Thus, the schema SP is free of any partial dependencies or transitive

dependencies and is thus in Third Normal Form (3NF).

This can also verified from the fact that each FD satisfies one of the necessary conditions

for SP to be in 3 NF.

Despite being in 3NF, any legal relation under the schema will have some data

redundancies, for example, the name of a particular supplier i.e. Sname will be repeated

as many times as the number of supplies being made by that supplier.

Thus, there is need to have a normal form, stronger than 3NF. The necessary solution is

provided by Boyce Codd Normal Form (BCNF).

26

Praveen Kumar

BCNF A Relation Schema R is said to be in BCNF, if all non-trivial left-

irreducible FDs that hold on R have only candidate keys as determinants. Alternately, we

can state that a Relation Schema R would be in BCNF if each FD α→β holding on R

satisfies one of the following three conditions:-

(a) It is a trivial FD

OR (b) α is a Super Key of R

The above two conditions, for BCNF, are same as the first two conditions of 3 NF. Thus,

if a schema is in BCNF, it must also be in 3NF. However, the third condition of 3 NF is

missing against the BCNF criteria, indicating that BCNF is more restrictive as compared

to 3NF. So, it is possible that a schema may be in 3NF but not in BCNF. Thus, BCNF is a

stronger normal form than 3NF.

Going by the definition of BCNF, SP is in 3 NF but not in BCNF since the two FDs i.e

S# Sname, Sname S#, are neither trivial and nor have Super Keys on their Left Side.

Alternating it can be stated that a Relational Schema R will be in BCNF if each non-

trivial left-irreducible FD α→β, holding on R, has only Candidate Key on its left side

i.e. α must be a Candidate Key of R.

Decomposition of SP into a BCNF schema

SP can be decomposed into BCNF Schemas, on the basis of the FDs that violate BCNF

i.e. S# Sname and Sname S#. The resulting BCNF decompositions of SP will be:-

S (S#, Sname) Primary Key {S#} or {Sname}

S# Sname, Sname S#


Foreign Key {S#} references S

{S#, P#} Qty

OR

S (S#, Sname) Primary Key {S#} OR {Sname}

S# Sname, Sname S#

SP2 (Sname, P#, Qty) Primary Key {S#, P#}

Foreign Key {Sname} references S

{S#, P#} Qty

27

Praveen Kumar

It can be verified that the decompositions are loss-less-join decompositions.

Algorithm to decompose a Non-3NF Schema into a set of 3NF Schemas

Let R be a relation schema that is not in 3NF.


Determine Canonical Cover Fc of F.

Determine the set of Candidate Keys {K1, K2,-------, Kn} of R.

K’ = R -{K1 K2 ------ Kn}; /*Set of Non-Prime Attributes of R*/

S = {R}; {Where S is a set of Relation Schemas}

WHILE (There exists a Non-3NF Schema Ri S) DO

FOR (Each Non-Trivial Left-Irreducible FD holding on Ri) DO

IF (( is not a candidate key of Ri) AND ({-} K’ 0 ))

THEN S = {S - Ri} (Ri - ) (, ) ;

/* Replace the Schema Ri by two schemas (Ri-) and (, ) */

At the end, S would comprise of a set of 3NF Schemas, equivalent to R.

Algorithm to decompose a non-BCNF Schema into a set of BCNF Schemas

Let R be a relation schema that is not in BCNF.


Determine Canonical Cover Fc of F.

S = {R}; {Where S is a set of Relation Schemas}

WHILE (There exists a Non-BCNF Schema Ri S) DO

FOR (Each Non-Trivial Left-Irreducible FD holding on Ri) DO

IF ( is not a candidate key of Ri)

THEN S = {S - Ri} (Ri - ) (, ) ;

/* Replace the Schema Ri by two schemas (Ri-) and (, ) */

At the end, S would comprise of a set of BCNF Schemas, equivalent to R.

Example:- Decompose the following Schema into a set of BCNF Schemas.

SP ( S#, Sname, P#, Pname, Scity, Status, Qty )

The set of FDs holding on SP:-

28

Praveen Kumar

S# Sname, Scity

Sname S#, Scity

Scity Status

P# Pname

{S#,P#} Qty

{Sname, P#} Qty

The candidate keys of SP are :- {S#,P#}, {Sname, P#}

It can be verified that all the FDs indicated above are non-trivial and left-irreducible; and

the following FDs do not have Candidate Key on left side:-

S# Sname, Scity

Sname S#, Scity

Scity Status

P# Pname

Thus SP is not in BCNF.

Now, applying the above algorithm to convert SP into a BCNF Schema:-

Let S’ := { SP };

SP has an FD Scity Status | ((Scity is not a candidate key of SP)

&& (Scity Status = ))

S’ := (S’ – SP) SP1 STS

where SP1 = ( S#, Sname, P#, Pname, Scity, Qty )

and STS = (Scity, Status)

Now, S’ = {SP1 , STS}

Again, SP1 has an FD P# Pname | ((P# is not a candidate key of SP1 )

&& (P# Pname = ))

S’ := (S’ – SP1 ) SP2 P

where SP2 = ( S#, Sname, P#, Scity, Qty )

and P = (P#, Pname)

Now, S’ = {SP2 , STS, P}

Still, SP2 has an FD S# {Sname, Scity} | ((S# is not candidate key of SP2 )

&& (S# {Sname, Scity} = ))

S’ := (S’ – SP2 ) SP3 SUPP

29

Praveen Kumar

where SP3 = ( S#, P#, Qty )

and SUPP = (S#, Sname, Scity)

Now, S’ = {SP3 , STS, P, SUPP}

Finally, all the schemas in S’ are now in BCNF.

Now, the BCNF equivalent schema of SP is:-


Scity Status


P# Pname



S# {Sname, Scity}


Foreign Key {S#} references SUPP


{S#, P#} Qty

All the relation schemas in the above decomposition are free of any partial dependencies

and transitive dependencies. Also, all the FDs have only candidate keys (of the respective

schemas) as their determinants. Thus, all the relation schemas are in BCNF.

Also, the decomposition is loss-less join decomposition, since:-

SP3 P P

SP3 SUPP SUPP

SUPP STS STS

Is BCNF a stronger normal form than 3NF ?

Yes, a relation in 3NF may not be in BCNF. But, a relation in BCNF will definitely be in

3NF also; since BCNF is more restrictive as compared to 3 NF. Like in the above

example, SP is in 3NF but not in BCNF. However, its decompositions SP3, P, STS and

SUPP are all in BCNF; and are also in 3NF. Thus, BCNF is a stronger normal form than

3NF.

In fact, we can state that a relation schema in BCNF will be free of all those data

anomalies that can be eliminated on the basis of functional dependencies (FDs).

30

Praveen Kumar

ABU’s Algorithm to determine whether a given Decomposition of a Relational R is a

Loss-less-join Decomposition or not.

ABU’s Algorithm can be used to determine whether a Decomposition *(R1, R2,….., Rn)

of Schema R is a loss-less-join decomposition or not.

Let the Relational Schema R be of degree m i.e. R (A1, A2, …….Am)

Let F be the set of FDs holding on the Schema R.

The ABU’s Algorithm operates as follows:-

Step 1 Make a matrix M of size nXm with column “j” corresponding to Attribute Aj (1 <

j < m) and row “i” corresponding to a projection Ri (1 < i < n).

Step 2 Initialize the Matrix M as follows:-

for i := 1 to n do

for j:=1 to m do

if Aj Ri

then M [i, j] := aj;

else M [i,j] := bij ;

Step 3 Repeat

Save-M = M;

For each FD () F

if any two Rows of Matrix M match on the values of

then force those two rows to match on the values of , by

replacing “b” values by corresponding “a” values.

(if corresponding “a” value does not exist for a pair of cells

to be matched, then replace both the cells by one of the

corresponding “b” values.)

Until (M = Save-M);

Step 4 Test of Loss-less-join Decomposition:

if (any of the rows of M contains only “a” values)

then the decomposition is a loss-less-join decomposition

else it is a lossy decomposition.

Example:-

Using ABU’s Algorithm, determine whether the following decomposition of

SP (S#, Sname, Scity, Status, P#, Pname, Price, Qty) is a loss-less-join decomposition?

Decomposition:-

31

Praveen Kumar

CS (Scity, Status)

SUPP(S#, Sname, Scity)

PART (P#, Pname, Price)

SPN (S#, P#, Qty)

FDs Holding on SP:-

S# Sname, Scity

Scity Status

P# Pname, Price

{S#, P#} Qty

Since, S# Scity and Scity Status, thus S# Status

Therefore, S# Sname, Scity, Status

ABU’s Algorithm

Step 1 and Step 2: Make a matrix of size 4 x 8 and initialize it.

S# Sname Scity Status P# Pname Price Qty

0 1 2 3 4 5 6 7

CS 0 b00 b01 a2 a3 b04 b05 b06 b07

SUPP 1 a0 a1 a2 b13 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

SPN 3 a0 b31 b32 b33 a4 b35 b36 a7

Step 3:

Applying the FD Scity Status, rows 0 and 1 match on the value of Scity, so force these

two rows to match on the value of Status. Thus replace b13 in row 1 by a3.


0 1 2 3 4 5 6 7

CS 0 b00 b01 a2 a3 b04 b05 b06 b07

SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

SPN 3 a0 b31 b32 b33 a4 b35 b36 a7

Now, applying the FD P# Pname, Price , rows 2 and 3 match on the value of P#, so

force these two rows to match on the value of Pname and Price. Thus replace b35 in row 3

by a5 and replace b36 in row 3 by a6


0 1 2 3 4 5 6 7

CS 0 b00 b01 a2 a3 b04 b05 b06 b07

SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

32

Praveen Kumar

SPN 3 a0 b31 b32 b33 a4 a5 a6 a7

Now, applying the FD S# Sname, Scity, Staus , rows 1 and 3 match on the value of

S#, so force these two rows to match on the value of Sname, Scity and Status. Thus

replace b31 in row 3 by a1; replace b32 in row 3 by a2; and replace b33 in row 3 by a3


0 1 2 3 4 5 6 7

CS 0 b00 b01 a2 a3 b04 b05 b06 b07

SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

SPN 3 a0 a1 a2 a3 a4 a5 a6 a7

Step 4 The row 3 contains only “a” values; therefore the above decomposition is a loss-

less-join decomposition of SP.

33

Praveen Kumar

Multi-Valued Dependencies (MVDs and 4 NF)

Definition of MVD

A relation schema R (,,) is said to have multi-valued dependencies ( multi-

determines ) and ( multi-determines ), if and only if for every legal relation

r(R) and for each tuple-pair {t1, t2} r that satisfies t1[] = t2[], t1 [] t2 [] and t1 []

t2 [], there exists a tuple-pair {t3, t4} r that satisfies t3[] = t4[]= t1[] = t2[] and

t3[] = t1[] & t4[] = t2[] and t3[] = t2[] & t4[] = t1[]. Non-Trivial MVDs always

occur in pairs, like and ; both can be jointly denoted as .

Trivial MVD

An MVD holding on a schema R is said to be trivial, iff:-

(a) or

(b) = R

Non-Trivial MVDs occur only in pairs.

MVDs and a Loss-less Join Decomposition

Fagin’s Theorem

If a relation schema R (, , ) has a MVD holding on it, then it can be loss-

less decomposed into R1 ( ,) and R2 (,).

Inference Rules for MVDs

1. Complementation Rule If holds on a Schema R where (R,

R) then R- ( ) will also hold on R.

This rule implies that all non-trivial MVDs will occur only in pairs.

2. Transitivity Rule If and then (-)

3. Union Rule If , , then ,

- ,

– ,

34

Praveen Kumar

4. Augmentation Rule If , then where

5. Replication Rule If , then .

6. Coalescence Rule If

and Where and = 0

Then .

7. Mixed Transitivity Rule

If , , then (-)

Problem Let R = (A, B, C, G, H, I) be a relation schema, with the following set of dependencies holding on it:-

D = { A B, BHI, CGH}

Find whether the following dependencies are members of D+ :-

A CGHI

A HI

B H

A CG

A H Sol:

Since A B, so by complementation A (R – A – B)

CGHI

Since A B and B HI so by transitivity A HI – B

HI

Since B HI

CG H where H HI and HI CG = 0

Therefore, by Coalescence Rule, B H

Since, A CGHI and A HI therefore by Union Rule A CGHI- HI

CG

Since A HI

CG H where H HI and HI CG = 0

Therefore, by Coalescence Rule, A H

Problem Consider a Relational Schema R (A, B, C, D, E). Let the set of MVDs

holding on R be { A BC, B CD and E AD}. Determine its loss-less 4NF

decomposition.

35

Praveen Kumar

Solution

R (A, B, C, D, E)

M = {A BC, B CD, E AD }

Since, MVD A BC holds on R, it will also satisfy A (R-BC) – A

i.e. A DE

Thus, by Fagin’s Theorem, R can be loss-less decomposed into:-

R1 (A, B , C)

R2 (A, D, E).

Similarly, since MVDs B CD and E AD hold on R, it can be proved that

the following are also loss-less & 4 NF decompositions of R :-

R1 (B, C, D)

R2 (B, A, E)

OR

R1 (E, A, D)

R2 (E, B, C)

36

Praveen Kumar

BCNF to 5 NF

A Relation Schema is said to be in Boyce Codd Normal Form (BCNF), if all non-trivial

left-irreducible Functional Dependencies (FDs), holding on the schema, have only its

Candidate Keys as their Determinants. Any relation defined on such a schema will be

free of all those data anomalies that can be eliminated on the basis of FDs. However,

there may still be some residual data redundancies persisting in BCNF relations, causing

insert/delete/update anomalies. So, we have to look beyond FDs, for the elimination of

such anomalies in BCNF Schemas.

A SCHEMA IN BCNF AND STILL HAVING SOME DATA REDUNDANCIES,

CAUSING ANOMALIES:-

Let us define a Schema CTX (Course, Teacher, Text) with the following

constraints:-

(a) A Course can be taught by more than one Teachers.

(b) A number of Text Books can be followed for teaching a Course.

(c) The set of Text Books followed for teaching a Course is determined only

by the Course taught and is completely independent of the Teacher

teaching it. Thus, the attributes Teacher and Text are completely

independent of each other.

There exists a one-to-many cardinality from Course to Teacher and also from Course to

Text, but there is absolutely no relationship between Teacher & Text. This situation

represents a MVD Course TeacherText.

A Relation ctx, satisfying the above constraints:-

ctx:

COURSE TEACHER TEXT

OS Ravi Galvin

OS Vivek Dietel

OS Ravi Dietel

OS Vivek Galvin

CO Ram Hamacher

CO Shyam M-mano

37

Praveen Kumar

CO Ram M-mano

CO Shyam Hamacher

As indicated in ctx, the schema CTX does not have any non-trivial FDs. Thus, it is an

“All-Key” schema and all legal relations under this schema will be in BCNF. But, this

relation still has the following data anomalies:-

(a) The information that a particular Teacher is teaching a particular course is

represented as many times as the number of Text books followed for that

particular Course.

(b) The information that a particular Text Book is followed for a particular

Course is represented as many times as the number of Teachers teaching the

particular Course.

How to eliminate these data anomalies?

These anomalies are due to the non-trivial Multi Valued Dependencies (MVDs)

holding on the schema CTX.

Multi Valued Dependency (MVD)

Let there be a relation schema R (,,). It is said to have MVD from to (denoted as

) and from to (denoted as ), if and only if for every legal relation r(R)

it satisfies the following:-

(a) The set of -values, matching a given {-value, -value} pair, are

dependent only on the -value and are completely independent of the -value.

And

(b) The set of -values, matching a given {-value, -value} pair, are

dependent only on -value and are independent of -value.

Alternately, we can state that a relation schema R (,,) is said to have multi-valued

dependencies and (both denoted by ), if and only if for every

legal relation r(R), and for a tuple pair {t1 , t2 } r t1[] = t2[], there exists a tuple-pair

{t3 , t4 } r, which satisfy:-

t1[] = t2[]=t3[] = t4[]

and t1[] = t3[], t2[] = t4[],

and t1[] = t4[], t2[] = t3[],

As per this definition, the relation ctx satisfies the MVD Course TeacherText

38

Praveen Kumar

Since, {OS, Galvin} {Ravi, Vivek} and {OS, Dietel} {Ravi, Vivek}

So, the set of Teachers, teaching a particular Course, is dependent only on the Course

taught and is completely independent of the Texts followed for the particular Course.

Similarly, {OS, Ravi} {Galvin, Dietel} and {OS, Vivek} (Galvin, Dietel}

So, the set of Texts, followed for a Course, depends only on the Course taught and is

independent of the Teacher teaching it.

Trivial MVD An MVD , holding on a relation schema R, is said to be trivial, if:-

(a) or

(b) = R

Such MVDs are termed to be trivial, since these are satisfied by every relation on a

schema R.

MVD is a generalization of FD An FD implies an MVD , wherein the

set of values, matching a given value, will be a singleton set.

Fagin’s Theorem If a relation schema R (,,) satisfies an MVD then R

can be loss-less decomposed into schemas R1 (,) and R2 (,). This implies that any

legal relation r, on the schema R (,,), will be equal to equi-join of its projections on

(,) and (,); that is r = ,(r) * ,(r)

Fourth Normal Form (4 NF)

A relation schema R is said to be in 4 NF, if and only if every MVD holding on

R satisfies either the following two conditions:-

(a) It is trivial MVD or

(b) is a Super Key of R

Now, the relation CTX has non-trivial MVDs Course Teacher and Course

Text. These are not trivial MVDs. Also, Course is not Super Key of CTX, since CTX is

an “All Key” Relation Schema. Thus, CTX is in BCNF; but not in 4 NF.

Non-loss Decomposition, based on MVDs

As per Fagin’s Theorem, a relation schema R (,,) satisfying MVDs can be

loss-less decomposed into schemas R1 (,) and R2 (,). So, CTX can be decomposed

into CT and CX.

ct

COURSE TEACHER

OS Ravi

39

Praveen Kumar

OS Vivek

CO Ram

CO Shyam

cx

COURSE TEXT

OS Galvin

OS Dietel

CO Hamacher

CO M-mano

As evident, ct * cx = ctx

The relations ct and cx are not satisfying any non-trivial MVDs; thus both CT & CX are

in 4 NF.

The relations are free of the data redundancies indicated above, which existed in ctx. The

information of a Teacher teaching a particular Course is now represented only in one

tuple in ct and the information regarding a Text being followed for a particular Course is

represented only at one place in cx.

A relation schema R in BCNF, not having any non-trivial MVDs holding on it, will be in

4 NF. But, it may still have some data anomalies, for example consider a schema CTX4,

with the following constraints:-

(a) A Course may be taught by a number of Teachers.

(b) A number of Text books may be followed for a Course.

(c) The set of Texts, followed for a Course, depend not only on the Course

but also on the Teacher teaching it. It means that each teacher teaching a

particular course may follow different sets of text books; the sets may be

overlapping.

(d) If a Teacher T1, teaching a Course C1, does not follow a Text X1, which is

being followed by another Teacher T2 to teach the course C1, then T1 must not

follow X1 for any other Course, which he may be teaching.

ctx4

COURSE TEACHER TEXT

OS Ravi Galvin

OS Vivek Dietel

OS Ravi Dietel

CO Ram Hamacher

CO Shyam M-mano

CO Shyam Hamacher

40

Praveen Kumar

So, the set of TEXT-values that occur matching a given {COURSE-value, TEACHER-

value} pair in ctx4 depends not only on COURSE but also on TEACHER. So, it does not

satisfy the MVD COURSE TEACHERTEXT. So, the schema CTX4 does not have

any non-trivial MVDs holding on it. So, it is in 4 NF. But ctx4 still has some data

redundancies, like the information about a teacher teaching a course appears as many

times, as the number of text books followed by that teacher for that Course.

Since, CTX4 does not satisfy MVD COURSE TEACHERTEXT, it can be verified

that ct * cx ctx4

ct

COURSE TEACHER

OS Ravi

OS Vivek

CO Ram

CO Shyam

cx

COURSE TEXT

OS Galvin

OS Dietel

CO Hamacher

CO M-mano

ct * cx

COURSE TEACHER TEXT

OS Ravi Galvin

OS Ravi Dietel

OS Vivek Galvin

OS Vivek Dietel

CO Ram Hamacher

CO Ram M-mano

CO Shyam M-mano

CO Shyam Hamacher

As verified above, ct * cx ctx4. ct * cx has two spurious tuples, which do not exist in

ctx4. So, the decomposition of CTX4 into CT and CX is not a loss-less (non-additive)

decomposition.

41

Praveen Kumar

It may be feasible to eliminate these data redundancies of ctx4, on the basis of another

type of dependency, called Join Dependency (JD).

Example (Employee-Project-Department)

Suppose an organization has a set of Employees, a set of Departments and a set of

Projects which are being progressed at various departments with the following constraints

holding on the system:-

(a) Each Department could be working on many Projects.

(b) Each Project could be getting progressed at many Departments.

(c) Each employee could be working on many Projects in many Departments.

(d) The set of Departments in which an Employee is Working is determined by

the Employee alone and is completely independent of the set of Projects on

which that Employee is working.

(e) The set of Projects on which an Employee is working is determined by the

Employee alone and is completely independent of the set of Departments in

which the Employee is working.

A Sample Database, created on a schema with the above constraints, will be:-

EPD

E# P# D#

E1 P1 D3

E1 P2 D1

E1 P1 D1

E1 P2 D3

E4 P3 D2

E4 P1 D3

E4 P3 D3

E4 P1 D2

The above table does not have any non-trivial FD; and is thus in BCNF. However, it

has a lot of data redundancies; like the fact that Employee E1 is working on project P1

is reflected in two tuples. Similarly, there are many redundancies.

In the above table, we have:-

{E1 , P1} {D3, D1}

{E1 , P2} {D3, D1}

This implies that the set {D3, D1}is determined by E1 alone and does not change when

P# is changed from P1 to P2.

However when E# is changed from E1 to E4, the set of D#s changes as indicated

below:-

{E4 , P3} {D2, D3}

42

Praveen Kumar

{E4 , P1} {D2, D3}

Thus, the schema for this table satisfies E# P# and E# D#. This pair of

MVDs is non-trivial. Thus, EPD is not in 4NF. It can be loss-less decomposed into

EP(E#, P#) and ED(E# , D#) as shown below:-

EP

E# P#

E1 P1

E1 P2

E4 P3

E4 P1

ED

E# D#

E1 D3

E1 D1

E4 D2

E4 D3

It can be verified that EP*ED = EPD. Also, both EP and ED are free of the data

redundancies. Both EP and ED do not have any non-trivial MVD and are thus in 4NF.

Join Dependency (JD) A Relation Schema R is said to have a Join Dependency

*(R1, R2,…., Rn), if and only if any legal relation r(R) is equal to equi-join of its

projections on R1, R2,…., Rn.

i.e r = R1 (r) * R2 (r) * ……..* Rn (r)

Trivial Join Dependency

A JD *( R1, R2,…., Rn) of a relation schema R is said to be trivial, if one of the

projections (R1…Rn )is equal to R itself.

An MVD is also a JD

An MVD on a relation schema R is also a JD *(, ). This implies that a

legal relation r(R ) can be loss-less decomposed into its projections and i.e.

r = (r) (r)

The relation schema CTX4 has a Join Dependency * (CT, TX, XC) where C: Course, T:

Teacher and X: Text , which can be verified as follows:-

43

Praveen Kumar

ct

COURSE TEACHER

OS Ravi

OS Vivek

CO Ram

CO Shyam

tx

TEACHER TEXT

Ravi Galvin

Vivek Dietel

Vivek Galvin

Ram Hamacher

Shyam M-mano

Shyam Hamacher

xc

TEXT COURSE

Galvin OS

Dietel OS

Hamacher CO

M-mano CO

ct * tx * xc

COURSE TEACHER TEXT

OS Ravi Galvin

OS Ravi Milan

OS Vivek Galvin

CO Ram Hamacher

CO Shyam M-mano

CO Shyam Hamacher

Thus, ct * tx * xc = ctx4, thus CTX4 has a Join Dependency *(CT, TX, XC)

So, CTX4 can be loss-less decomposed into its projections on CT, TX and XC, which are

free of any data redundancies that existed in the relation CTX4.

Non-Trivial JD

A JD *(R1, R2, ….Rn) of relation schema R is said to be trivial iff one of the projections

in JD is equal to R itself. Such a JD hold on each schema.

44

Praveen Kumar

Fifth Normal Form (5 NF)

A relation schema R is said to be in 5 NF, if and only if any non-trivial Join Dependency

holding on R, is implied by its Candidate Keys.

The relation CTX4 has a JD *(CT,TX,XC) which is not implied by its Candidate Key

{C,T,X}. Thus, CTX4 is in 4 NF but not in 5 NF.

The relations CT, TX and XC do not have any non-trivial Join Dependencies, and are

thus in 5 NF.

Assuming that the Text Books followed for a Course are dependent not only on the

Course but also on the Teacher teaching it, Join Dependency will hold on CTX4,

only if the following is satisfied:-

“If a Teacher T1 teaching a Course C1, does not follow a Text Book X1, which is

being followed by another Teachers teaching Course C1, then T1 must not follow X1

for any other Course also that he may be teaching.” Only then the JD *(CT,TX,XC)

will hold on CTX4

A relation schema in 5 NF but still having some data redundancies

Example: A schema CTX5, with the following constraints:-

1. A Course may be taught by any number of Teachers and a Teacher

may teach any number of Courses.

2. A number of Text books may be followed for a Course and a Text

book may be followed for any number of courses.

3. A Teacher T1, teaching a Course C1, may not follow a Text Book

X1, which is being followed by another teacher teaching the

Course C1, but T1 may follow X1 while teaching another Course

say C2.

A relation under schema CTX5:-

ctx5

COURSE TEACHER TEXT

OS Ravi Galvin

OS Vivek Milan

OS Ravi Milan

CO Vivek Hamacher

CO Ram Hamacher

CO Vivek Galvin

45

Praveen Kumar

If we have its projections on ct, tx and xc :-

ct

COURSE TEACHER

OS Ravi

OS Vivek

CO Vivek

CO Ram

tx

TEACHER TEXT

Ravi Galvin

Vivek Milan

Ravi Milan

Vivek Hamacher

Ram Hamacher

Vivek Galvin

xc

TEXT COURSE

Galvin OS

Milan OS

Hamacher CO

Galvin CO

ct * tx * xc

COURSE TEACHER TEXT

OS Ravi Galvin

OS Ravi Milan

OS Vivek Galvin

OS Vivek Milan

CO Vivek Hamacher

CO Vivek Galvin

CO Ram Hamacher

46

Praveen Kumar

ct * tx * xc has an additional tuple i.e. OS, Vivek, Galvin, which does not exist CTX5.

Thus, decomposition of CTX5 into its projections on CT,TX and XC is not a loss-less

(NON-ADDITIVE) decomposition. The natural join of CT, TX, XC contains some

spurious tuples. Thus, CTX5 does not have any JD and. thus, it is in 5 NF.

CTX5, though in 5 NF, still has some data anomalies; like information that ‘Galvin is the

text-book for OS’ is represented twice. These data anomalies cannot be eliminated on the

basis of Functional Dependencies, Multi-Valued Dependencies or Join Dependencies.

What Next?

We reach a dead end, till a new type of dependency is discovered.

SIXTH NORMAL FORM (6 NF)

A Relation Schema R will be in 6NF if the only Join Dependencies holding on R are

trivial Join Dependencies.

Example: ACCOUNT (AN, BN, BAL)

AN BN, BAL

Primary Key: {AN}

The only left-irreducible, non-trivial FD holding on ACCOUNT has

Primary Key on LHS; thus the schema is at least in BCNF.

The FD AN BN, BAL implies MVD AN BN | BAL.

Since the MVD has Primary Key on LHS; ACCOUNT is at least in 4NF.

The MVD AN BN | BAL implies JD * ({AN,BN}, {AN,BAL})

Since all the decompositions of this JD form Super Keys of ACCOUNT;

thus ACCOUNT is at least in 5NF.

But it is not in 6NF, since it has a non-trivial JD * ({AN,BN},

{AN,BAL})

For transformation to 6NF, ACCOUNT has to be decomposed on the basis

of the non-trivial JD holding on it.

So, ACCOUNT is decomposed into:-

ACOOUNT-BN (AN, BN) Primary Key (AN)

47

Praveen Kumar

AN BN

ACCOUNT-BAL (AN, BAL) Primary Key (AN)

AN BAL

ACCOUNT-BN and ACCOUNT-BAL do not support any non-trivial JDs;

and are thus in 6 NF.

We can conclude that a Schema in 6NF can comprise of only its Primary Key and at

most one non-key attribute.

How is 6NF superior to 5NF?

To transform it into higher normal forms, it has to be decomposed wrt AN BN

i.e. ACOOUNT-BN (AN, BN) Primary Key (AN)

AN BN

ACCOUNT-BAL (AN, DATE, TIME, BAL)

Primary Key (AN, DATE, TIME)

{AN, DATE, TIME} BAL

Since each of these schemas contains Primary Key plus only one non-key

attribute, thus both are in 6NF.

Taking another example:-

EMP (E#, E_NAME, SALARY, PROJ_NO) Primary Key (E#)

E# E_NAME, SALARY, PROJ_NO

This schema is also in 5NF but not in 6NF.

The SALARY and PROJ_NO (Project on which he works) will keep on changing.

Suppose, we want to record the information of durations during which different

values of SALARY were valid and the durations during which different values of

PROJ_NO were valid for each employee, then the schema would need to be

decomposed as follows:-

6NF is most suitable for Temporal Databases, which contain time-element. For example

if we want to introduce time element in ACCOUNT, to indicate the Time and Date when

BAL is valid, then the schema will be:-

ACCOUNT (AN, DATE, TIME, BN, BAL)

{AN, TIME, DATE} BAL

AN BN

Primary Key : {AN, TIME, DATE}

Since it has a partial FD AN BN, thus it is not even in 2NF; it is rather in 1NF.

48

Praveen Kumar

EMP_NAME (E#, E_NAME) Primary Key (E#)

EMP_SALARY (E#, FROM_DATE, TO_DATE, SALARY)

Primary Key (E#, FROM_DATE, TO_DATE)

EMP_PROJ( E#, FROM_DATE, TO_DATE, PROJ_NO)

Primary Key (E#, FROM_DATE, TO_DATE)

All these schemas are in 6NF.

Concept of Inclusion Dependencies

A Foreign Key constraint cannot be represented by an FD, MVD of JD. It can be

represented using an Inclusion Dependency (ID).

Inclusion Dependency Suppose attribute set Y in Schema S is a Foreign Key (FK)

referencing Primary Key X of Schema R, then this foreign key constraint can be

represented by an Inclusion Dependency S.Y < R.X

This constraint specifies that for a given relation r(R), a relation s(S) would be valid only

if it satisfies Y (s (S)) X (r (R))

Suppose there are schemas ACCOUNT (AN, BN, BAL)

And BRANCH (BN, BC, ASSETS)

where BN is Foreign Key in ACCOUNT referencing BN in BRANCH.

This Foreign Key Constrain can be represented by Inclusion Dependency

ACCOUNT. BN < BRANCH. BN

Inference Rules of Inclusion Dependecies.

(i) Reflexivity: R.X < R.X

(ii) Attribute Correspondence: If R.X < S.Y and X = {A1, A2, ….., An} and

Y = {B1, B2, ….., Bn} and Ai corresponds to Bi for 1 < i < n, then it will have

R. Ai <S. Bi for all i.

(iii) Transitivity: If R.X < S.Y and S.Y < T.Z then it will have R.X < T.Z.

49

Praveen Kumar

chapter 8 normalization of relational...

Documents