normal is at ion 1

30
Notes of Database Management System Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 1 UNIT – III NORMALISATION Functional Dependencies A relational schema has attributes (A, B, C... Z) and that the whole database is described by a single universal relation called R = (A, B, C, ..., Z). This assumption means that every attribute in the database has a unique name. A functional dependency is a property of the semantics of the attributes in a relation. The semantics indicate how attributes relate to one another, and specify the functional dependencies between attributes. When a functional dependency is present, the dependency is specified as a constraint between the attributes. Consider a relation with attributes A and B, where attribute B is functionally dependent on attribute A. If we know the value of A and we examine the relation that holds this dependency, we will find only one value of B in all of the tuples that have a given value of A, at any moment in time. In the figure above, A is the determinant of B and B is the consequent of A. The determinant of a functional dependency is the attribute or group of attributes on the left-hand side of the arrow in the functional dependency. The consequent of a fd is the attribute or group of attributes on the right-hand side of the arrow. Consider a relational schema

Upload: api-27570914

Post on 10-Apr-2015

417 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 1

UNIT – III

NORMALISATION

Functional Dependencies

A relational schema has attributes (A, B, C... Z) and that the whole database is described by a single universal relation

called R = (A, B, C, ..., Z). This assumption means that every attribute in the database has a unique name. A functional

dependency is a property of the semantics of the attributes in a relation. The semantics indicate how attributes relate to one

another, and specify the functional dependencies between attributes. When a functional dependency is present, the

dependency is specified as a constraint between the attributes.

Consider a relation with attributes A and B, where attribute B is functionally dependent

on attribute A. If we know the value of A and we examine the relation that holds this dependency, we will find only one

value of B in all of the tuples that have a given value of A, at any moment in time.

In the figure above, A is the determinant of B and B is the consequent of A.

The determinant of a functional dependency is the attribute or group of attributes on the left-hand side of the arrow in

the functional dependency. The consequent of a fd is the attribute or group of attributes on the right-hand side of the

arrow.

Consider a relational schema

Page 2: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 2

The functional dependency staff# → position clearly holds on this relation instance. However, the reverse functional

dependency position → staff# clearly does not hold.

The relationship between staff# and position is 1:1 – for each staff member there is only one position. On the other

hand, the relationship between position and staff# is 1:M – there are several staff numbers associated with a given

position.

For the purposes of normalization we are interested in identifying functional dependencies between attributes of a

relation that have a 1:1 relationship.

A functional dependency is a property of a relational schema (its intension) and not a property of a

particular instance of the schema (extension).

Identify the functional dependencies that hold using the relation schema STAFFBRANCH

In order to identify the time invariant Fds, we need to clearly understand the semantics of the various attributes in each of

the relation schemas in question. For example, if we know that a staff member’s position and the branch at which they are

located determines their salary.

Inference Rules for Functional Dependencies

The set of all Fds that are implied by a given set S of Fds is called the closure of S, written S+.

The very first Armstrong is the person who gives a set of inference rules. The following are the six well-known

inference rules that apply to functional dependencies.

IR1: reflexive rule – if X ⊇ Y, then X → Y

IR2: augmentation rule – if X → Y, then XZ → YZ

IR3: transitive rule – if X → Y and Y → Z, then X → Z

IR4: projection rule – if X → YZ, then X → Y and X → Z

IR5: additive rule – if X → Y and X → Z, then X → YZ

IR6: pseudo transitive rule – if X → Y and YZ → W, then XZ → W

Page 3: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 3

The first three of these rules (IR1-IR3) are known as Armstrong’s Axioms and constitute a necessary and sufficient set of

inference rules for generating the closure of a set of functional dependencies.

The other rules derived from this are:

Given R = (A,B,C,D,E,F,G,H, I, J) and

F = {AB → E, AG → J, BE → I, E → G, GI → H}

Does F = { AB → GH} ?

Proof

1. AB → E, given in F

2. AB → AB, reflexive rule IR1

3. AB → B, projective rule IR4 from step 2

4. AB → BE, additive rule IR5 from steps 1 and 3

5. BE → I, given in F

6. AB → I, transitive rule IR3 from steps 4 and 5

7. E → G, given in F

8. AB → G, transitive rule IR3 from steps 1 and 7

9. AB → GI, additive rule IR5 from steps 6 and 8

10. GI → H, given in F

11. AB → H, transitive rule IR3 from steps 9 and 10

12. AB → GH, additive rule IR5 from steps 8 and 11 . Hence proven

Irreducible sets of dependencies

We define a set of Fds to be irreducible( Usually called minimal in the literature) if and only if it satisfies the

following three properties

1. The right hand side (the dependent) of every Fds in S involves just one attribute (that is, it is singleton set)

2. The left hand side (determinant) of every in S is irreducible in turn-meaning that no attribute can be

discarded from the determinant without changing the closure S+(that is, with out converting S into some set

not equivalent to S). We will say that such an Fd is left irreducible.

3. No Fd in S can be discarded from S without changing the closure S+(That is, without converting s into

some set not equivalent to S)

Eg. Relation R {A,B,C,D,E,F} satisfies the following Fds

AB → C , C → A , BC → D , ACD → B

BE → C

CE → FA

CF → BD

D → EF

Find an irreducible equivalent for this set of Fds?

Page 4: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 4

The solution is simple.

1. AB → C

2. C → A

3. BC → D

4. ACD → B

5. BE → C

6. CE → A

7. CE → F

8. CF → B

9. CF → D

10. D → E

11. D → F

Now:

• 2 implies CE → AE (By augmentation), CE → A (By decomposition), so we can drop 6.

• 8 implies CF → BC (By augmentation), by which 3 implies CF → D (By Transitivity), so we can drop 9.

• 8 implies ACF → AB (By augmentation), and 11 implies ACD → ACF (By augmentation), and so ACD → AB

(By Transitivity), and so ACD → B (By Decomposition), so we can drop 4.

No further reductions are possible, and so we are left with the following irreducible set:

1. AB → C

2. C → A

3. BC → D

5. BE → C

7. CE → F

8. CF → B

10. D → E

11. D → F

Eg. Given a relation schema R(A,B,C,D) and a set of FDs

A� BC

B� C

A� B

AB� C

AC� D

Compute an irreducible set of FDs that is equivalent to this set.

Page 5: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 5

Sol:

1) The first step is to rewrite the FDs such that each has a singleton right-hand side:

A�B

A�C [By Decomposition]

B�C

A�B

AB�C

AC�D

We observe immediately that is FD A�B occurs twice, so one occurrence can be eliminated.

2) Next, attribute C can be eliminated from the left-hand side of the FD AC�D, because we have

A�C, so A�AC by augmentation, and we are given AC�D, so A�D by transitivity; thus the C on

the left-hand side of AC�D is redundant.

So the remaining FDs are:

A�B

A�C

B�C

AB�C

A�D

3) Next, we observe that the FD AB�C can be eliminated, because again we have A�C, so AB�CB

by augmentation, so AB�C by decomposition. So the remaining FDs are:

A�B

A�C

B�C

A�D

4) Finally, the FD A�C is implied by the FDs A�B and B�C, so it can also be eliminated. We are

left with

A�B

B�C

A�D

This set is irreducible.

Page 6: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 6

Eg. Consider the Relation R(A,B,C,D,E,F,G,H,I,J) and a set of FDs

ABD�E , AB�G , B�F , C�J ,CJ�I ,G�H

Is this an irreducible set? What are the candidate keys?

Sol: 1) The set is not irreducible, since C�J and CJ�I together imply C�I.

2) The super key of this relation is (A, B, C, D, G, J) (i.e. the set of attributes mentioned on the

left-hand side of the given FDs). We can eliminate J from this set because C�J, and we eliminate

G because AB�G. Since none of the attributes A, B, C, D appears on the right-hand side of any

of the given FDs, it follows that (A,B,C,D) is a candidate key.

Full Functional Dependency – A functional dependency X�Y is a full functional dependency if removal

of any attribute A from X means that the dependency does not hold any more; i.e. for any attribute A є X,

(X – {A}) does not functionally determine Y.

E.g. consider the following InvLine relation.

INV_NUM LINENUM QTY INVDATE

Candidate Keys: {InvNum, LineNum}.

Qty and InvDate is fully functionally dependent on combination of InvNum and LineNum, because

removal of any attribute from candidate key does not determine any entity from the relation.

Trivial Dependency – A functional dependency X�Y is trivial if X כ Y.(i.e X is a super set of Y or Y is a

subset of X).

E.g. The FD AB�B and AB�A is a trivial dependency.

Partial dependency

A partial dependency exists when an attribute B is functionally dependent on an attribute A, and A is a component of a

multipart candidate key. Candidate keys: {InvNum, LineNum} InvDate is partially dependent on {InvNum, LineNum} as

InvNum is a determinant of InvDate and InvNum is part of a candidate key.

If there is some attribute that can be removed from A and the dependency still holds.

Page 7: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 7

PART - II

A redundancy in a conceptual schema corresponds to a piece of information that can be derived (that is, obtained

through a series of retrieval operations) from other data in the database.

The presence of a redundancy in a database may be decided upon the following factores

• an advantage: a reduction in the number of accesses necessary to.Obtain the derived information.

• a disadvantage: because of larger storage requirements, (but, usually at negligible cost) and the necessity to

carry out additional operations in order to keep the derived data consistent.

The serious problem with using the relations is the problem of update anomalies.

These can be classified in to

• Insertion anomalies

• Deletion anomalies

• Modification anomalies

Q. Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with

examples.

Ans. EMP_DEPT

Insertion Anomalies

An "insertion anomaly" is a failure to place information about a new database entry into all the places in the database

where information about that new entry needs to be stored. In a properly normalized database, information about a new

entry needs to be inserted into only one place in the database; in an inadequately normalized database, information about a

new entry may need to be inserted into more than one place and, human fallibility being what it is, some of the needed

additional insertions may be missed.

Eg. First Instance: - To insert a new employee tuple in to Emp_Dept table, we must include either the attribute values for

the department that the employee works for, or nulls (if the employee does not work for a department as yet). For example

to insert a new tuple for an employee who works in department no 5, we must enter the attribute values of department

number 5correctly so that they are consistent, with values for the department 5 in other tuples in emp_dept.

Page 8: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 8

Second Instance: - It is difficult to insert a new department that has no employees as yet in the emp_dept relation. The only

way to do this is to place null values in the attributes for the employee this causes a problem because SSN in the primary

key of emp_dept table and each tuple is supposed to represent an employee entity- not a department entity.

Moreover, when the first employee is assigned to that department, we do not need this tuple with null values anymore.

Deletion Anomalies

A "deletion anomaly" is a failure to remove information about an existing database entry when it is time to remove that

entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only

one place in the database; in an inadequately normalized database, information about that old entry may need to be deleted

from more than one place, and, human fallibility being what it is, some of the needed additional deletions may be missed.

Eg. The problem of deletion anomaly is related to the second insertion anomaly situation which we have discussed earlier,

if we delete from emp_dept an employee tuple that happens to represent the last employee working for a particular

department, the information concerning that department is lost from the database.

Modification Anomalies

Eg. In Emp_Dept, if we change the value of one of the attribute of a particular department- say, the manager of

department 5-we must update the tuples of all employees who work in that department; other wise, the database will

become inconsistent. If we fail to update some tuples, the same department will be shown to have 2 different values for

manager in different employee tuple which would be wrong.

All three kinds of anomalies are highly undesirable, since their occurrence constitutes corruption of the

database. Properly normalized databases are much less susceptible to corruption than are unnormalized databases.

Redundant information not only wastes storage but makes updates more difficult since.

Normalization is the process of taking data from a problem and reducing it to a set of relations while

ensuring data integrity, eliminating data redundancy and minimizing the insertion, deletion and update

anomalies.

1. Data Integrity – All of the data in the database are consistent and satisfy all integrity constraints.

2. Data redundancy – If data in the database can be found in two different locations (direct redundancy) or if data can be

calculated from other data items (indirect redundancy) then the data is said to contain redundancy.

Data should only be stored once and avoid storing data that can be calculated from other

data already held in the database. During the process of normalization redundancy must be removed, but not at the expense

of breaking data integrity rules.

Basically the normal form of the data indicates how much redundancy is in that data. The normal forms

have a strict ordering.

a) 1NF

b) 2NF

c) 3NF

d) BCNF

e) 4NF

f) 5NF

Page 9: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 9

A normalized relational database provides several benefits:

• Elimination of redundant data storage.

• Close modeling of real world entities, processes, and their relationships.

• Structuring of data so that the model is flexible.

1st Normal Form (1NF)

Def: A table (relation) is in 1NF if

1. There are no duplicated rows in the table.

2. Each cell is single-valued (i.e., there are no repeating groups or arrays).

3. Entries in a column (attribute, field) are of the same kind.

Note: The order of the rows is immaterial; the order of the columns is immaterial. The requirement that there be no

duplicated rows in the table means that the table has a key (although the key might be made up of more than one

column—even, possibly, of all the columns).

So we come to the conclusion,

A relation is in 1NF if and only if all underlying domains contain atomic values only.

The first normal form deals only with the basic structure of the relation and does not resolve the problems of

redundant information or the anomalies discussed earlier. All relations discussed in these notes are in 1NF.

Page 10: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 10

Eg. The following is not in 1NF.

EmpNum EmpPhone EmpDegrees

1234 2345-987

3456 2367-890 BA,BSC

6789 2367-890 BSC,MSC,PhD

EmpDegrees is a multi valued field:

Employee 6789 has 3 degrees: BSC,PhD and MSC

Employee 3456 has 2 degrees: BA, BSC

To obtain 1NF relations we must, without loss of information, replace the above with two relations.

EMP_DETAIL

EmpNum EmpPhone

1234 2345-987

3456 2367-890

6789 2367-890

EMP_DEG

EmpNum EmpDegrees

1234

3456 BA

3456 BSC

6789 BSC

6789 MSC

6789 PhD

An outer join between Employee and EmployeeDegree will produce the information that the original table hold.

Page 11: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 11

Eg. No. of duplicate tuples

Eg. EMP_Proj

A relation is in first normal form if and only if, in every legal value of that relation every tuple contains one

value for each attribute

The above definition merely states that the relations are always in first normal form which is always correct.

However the relation that is only in first normal form has a structure those undesirable for a number of reasons.

First normal form (1NF) sets the very basic rules for an organized database:

• Eliminate duplicative columns from the same table.

• Create separate tables for each group of related data and identify each row with a unique column or set

of columns (the primary key).

Page 12: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 12

2nd Normal Form (2NF)

Def: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on the entire key.

The second normal form attempts to deal with the problems that are identified with the relation above that is in 1NF. The

aim of second normal form is to ensure that all information in one relation is only about one thing.

A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.

Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite)

key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial

dependencies."

The general requirements of 2NF:

• Remove subsets of data that apply to multiple rows of a table and place them in separate rows.

• Create relationships between these new tables and their predecessors through the use of foreign keys.

These rules can be summarized in a simple statement: 2NF attempts to reduce the amount of redundant data in a table by

extracting it, placing it in new table(s) and creating relationships between those tables.

Eg. The concept of 2NF requires that all attributes that are not part of a candidate key be fully dependent on each

candidate key. If we consider the relation

student (sno, sname, cno, cname)

and the functional dependencies

sno � sname

cno � cname

and assume that (sno, cno) is the only candidate key (and therefore the primary key), the relation is not in 2NF

since sname and cname are not fully dependent on the key. The above relation suffers from the same anomalies

and repetition of information as discussed above since sname and cname will be repeated. To resolve these

difficulties we could remove those attributes from the relation that are not fully dependent on the candidate keys

of the relations. Therefore we decompose the relation into the following projections of the original relation:

S1 (sno, sname) , S2 (cno, cname) , SC (sno, cno)

Page 13: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 13

Eg. Online store that maintains customer information database.

At this table reveals a small amount of redundant data. We're storing the "Sea Cliff, NY 11579" and "Miami, FL

33157" entries twice each. After decomposition:

Now that we've removed the duplicative data from the Customers table, we've satisfied the first rule of second

normal form. We still need to use a foreign key to tie the two tables together. We'll use the ZIP code (the primary

key from the ZIPs table) to create that relationship. Here's our new Customers table:

Page 14: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 14

This minimized the amount of redundant information stored within the database and our structure is in second

normal form.

Eg.

Stu_ID Class_ID Class_name

134-56-7890 M148 Math 148

123-45-7894 P113 Physics 113

534-98-9009 H151 History 151

134-56-7890 H151 History 151

AND

Transitive Dependency

A functional dependency X�Y in a relation schema R is a transitive dependency if there is a set of

attributes Z that is neither a candidate key nor a subset of any key of R, and both X �Z and Z�Y hold.

Consider attributes A, B, and C, and where A � B and B � C.

FDs are transitive, which means that we also have the FD, A � C.

We say that C is transitively dependent on A through B.

Eg. EmpNum � DeptNum

Emp_number Emp_email Dept_number Dept_name

DeptNum � DeptName

Emp_number Emp_email Dept_number Dept_name

DeptName is transitively dependent on EmpNum via DeptNum: EmpNum � DeptName

Page 15: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 15

3rd Normal Form (3NF)

Def: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

The basic requirements of 3NF are as follows

• Meet the requirements of 1NF and 2NF

• Remove columns that are not fully dependent upon the primary key.

Although transforming a relation that is not in 2NF into a number of relations that are in 2NF removes many of the

anomalies that appear in the relation that was not in 2NF, not all anomalies are removed and further normalization is

sometime needed to ensure further removal of anomalies. These anomalies arise because a 2NF relation may have

attributes that are not directly related to the thing that is being described by the candidate keys of the relation. Let us first

define the 3NF.

For E.g. The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key exist. However, EMP_DEPT

is not in 3NF because of the transitive dependency of DMGRENO (and also DNAME) on ENO via DNUMBER. We can

normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2. Intuitively, we see that ED1

and ED2 represent independent entity facts about employees and departments. A NATURAL JOIN operation on ED1 and

ED2 will recover the original relation EMP_DEPT without generating spurious tuples.

Page 16: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 16

Consider the following relation

subject (cno, cname, instructor, office)

cno � cname , cno � instructor , instructure � office

Assume that cname is not unique and therefore cno is the only candidate key. The following functional

dependencies exist.

We can derive cno � office from the above functional dependencies and therefore the above relation is in 2NF.

The relation is however not in 3NF since office is not directly dependent on cno. This transitive dependence is an

indication that the relation has information about more than one thing (viz. course and instructor) and should

therefore be decomposed. The primary difficulty with the above relation is that an instructor might be

responsible for several subjects and therefore his office address may need to be repeated many times. This leads

to all the problems that we identified at the beginning of this chapter. To overcome these difficulties we need to

decompose the above relation in the following two relations:

s (cno, cname, instructor) ins (instructor, office)

s is now in 3NF and so is ins.

Eg. Consider a table of Orders

Test for 1NF, 2NF and 3NF????

Ans. Our first requirement is that the table must satisfy the requirements of 1NF and 2NF. Are there any

duplicative columns? No. Do we have a primary key? Yes, the order number. Therefore, we satisfy the

requirements of 1NF. Are there any subsets of data that apply to multiple rows? No, so we also satisfy the

requirements of 2NF.

Now, are all of the columns fully dependent upon the primary key? The customer number varies with the order

number and it doesn't appear to depend upon any of the other fields. What about the unit price? This field could

be dependent upon the customer number in a situation where we charged each customer a set price. However,

looking at the data above, it appears we sometimes charge the same customer different prices. Therefore, the unit

price is fully dependent upon the order number. The quantity of items also varies from order to order. BUT, The

Page 17: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 17

total can be derived by multiplying the unit price by the quantity; therefore it's not fully dependent upon the

primary key. We must remove it from the table to comply with the third normal form:

Boyce-Codd Normal Form (BCNF)

A relation R is said to be in BCNF if whenever X � A , holds in R, and A is not in X, then X is a candidate

key for R.

Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. That

is, every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF.

It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a 3NF relation is not in

BCNF and this happens only if

(a) the candidate keys in the relation are composite keys (that is, they are not single attributes),

(b) there is more than one candidate key in the relation, and

(c) the keys are not disjoint, that is, some attributes in the keys are common.

The BCNF differs from the 3NF only when there are more than one candidate keys and the keys are

composite and overlapping.

So, a relation is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-irreducible

functional dependency has a candidate key as its determinant. In more informal terms, a relation is in BCNF if it

is in 3NF and the only determinants are the candidate keys.

( REFER understandingnormalisation.pdf )

DESIRABLE PROPERTIES OF DECOMPOSITIONS

So far our approach has consisted of looking at individual relations and checking if they belong to 2NF, 3NF or

BCNF. If a relation was not in the normal form that was being checked for and we wished the relation to be

normalized to that normal form so that some of the anomalies can be eliminated, it was necessary to decompose

the relation in two or more relations. The process of decomposition of a relation R into a set of relations

R1,R2….RN was based on identifying different components and using that as a basis of decomposition. The

decomposed relations R1,R2….RN are projections of R and are of course not disjoint otherwise the glue holding

the information together would be lost. Decomposing relations in this way based on a recognize and split method

Page 18: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 18

is not a particularly sound approach since we do not even have a basis to determine that the original relation can

be constructed if necessary from the decomposed relations.

Desirable properties of decomposition are:

1. Attribute preservation

2. Lossless-join decomposition

3. Dependency preservation

4. Lack of redundancy

Attribute Preservation

This is a simple and an obvious requirement that involves preserving all the attributes that were there in the

relation that is being decomposed.

Lossless-Join Decomposition

In these notes so far we have normalized a number of relations by decomposing them. We decomposed a relation

intuitively. We need a better basis for deciding decompositions since intuition may not always be correct. We

illustrate how a careless decomposition may lead to problems including loss of information.

Consider the following relation

enrol (sno, cno, date-enrolled, room-No., instructor)

Suppose we decompose the above relation into two relations enrol1 and enrol2 as follows

enrol1 (sno, cno, date-enrolled) , enrol2 (date-enrolled, room-No., instructor)

There are problems with this decomposition but we wish to focus on one aspect at the moment. Let an instance

of the relation enrol be

and let the decomposed relations enrol1 and enrol2 be:

Page 19: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 19

All the information that was in the relation enrol appears to be still available in enrol1 and enrol2 but this is not

so. Suppose, we wanted to retrieve the student numbers of all students taking a course from Wilson, we would

need to join enrol1 and enrol2. The join would have 11 tuples as follows:

…………. So on.

The join contains a number of spurious tuples that were not in the original relation Enrol. Because of these

additional tuples, we have lost the information about which students take courses from WILSON. (Yes, we have

more tuples but less information because we are unable to say with certainty who is taking courses from

WILSON). Such decompositions are called lossy decompositions. A nonloss or lossless decomposition is that

which guarantees that the join will result in exactly the same relation as was decomposed. One might think that

there might be other ways of recovering the original relation from the decomposed relations but, sadly, no other

operators can recover the original relation if the join does not.

Page 20: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 20

We need to analyze why some decompositions are lossy. The common attribute in above decompositions was

Date-enrolled. The common attribute is the glue that gives us the ability to find the relationships between

different relations by joining the relations together. If the common attribute is not unique, the relationship

information is not preserved. If each tuple had a unique value of Date-enrolled, the problem of losing

information would not have existed. The problem arises because several enrolments may take place on the same

date.

Dependency Preservation

It is clear that decomposition must be lossless so that we do not lose any information from the relation that is

decomposed. Dependency preservation is another important requirement since a dependency is a constraint on

the database and if X � Yholds than we know that the two (sets) attributes are closely related and it would be

useful if both attributes appeared in the same relation so that the dependency can be checked easily. Let us

consider a relation R(A, B, C, D) that has the dependencies F that include the following:

A � B , A� C etc….

If we decompose the above relation into R1(A, B) and R2(B, C, D) the dependency A� C cannot be checked

(or preserved) by looking at only one relation. It is desirable that decompositions be such that each dependency

in F may be checked by looking at only one relation and that no joins need be computed for checking

dependencies. In some cases, it may not be possible to preserve each and every dependency in F but as long as

the dependencies that are preserved are equivalent to F, it should be sufficient.

Lack of Redundancy

We have discussed the problems of repetition of information in a database. Such repetition should be avoided as

much as possible.

Lossless-join, dependency preservation and lack of redundancy not always possible with BCNF. Lossless-join,

dependency preservation and lack of redundancy is always possible with 3NF.

Q. Define Boyce-Codd normal form. How it is differ from 3NF? Why it is considered a stronger form of 3NF?

Sol:

BCNF - A relation schema R is in BCNF if whenever a nontrivial functional dependency X� A holds in R, then X is a

super key of R.

3NF - A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency X�A holds in

R, either

• X is a super key of R, or

• A is a prime attribute of R.

Difference – The formal definition of BCNF differs slightly from the definition of 3NF. The only difference between the

definitions of BCNF and 3NF is the second condition of 3NF (i.e. A is a prime attribute of R) which allows A to be prime,

is absent from BCNF. BCNF requires that every determinant is a candidate key and 3NF doesn’t require every determinant

to act as candidate key. So every relation which is in BCNF is also in 3NF but every relation which is in 3NF is not

necessarily in BCNF. Consider for instance a relation R(A, B, C) with the following dependency diagram.

Page 21: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 21

R

A B C

Here the candidate key of the relation is AB and also the super key of this relation is AB. So the relation is

in 3NF because we have the FDs AB�C and AB is a super key of R and the FD C�B and B is a prime

attribute of R. The relation is not in BCNF because C is not a super key of the relation and still the FD

C�B holds.

Why it is considered as a stronger normal form of 3NF

BCNF is considered as a stronger normal form of 3NF because 3NF relations still have insertion,

deletion and update anomalies but BCNF relations doesn’t have any insertion, deletion and update anomalies. In

BCNF every dependency is on the candidate key which is our goal of normalization and in 3NF it is not

required that every dependency is on candidate key.

For e.g., consider a relation Stud_Course with the following dependency diagram with the fact that a

particular student takes a course and has one instructor and a particular instructor teaches one course only.

Stud_Course

With the dependency diagram we have following FDs.

Studno, Courseno�Instrno and

Instrno�Courseno

Consider the following instance of the above relation:-

From this relation we have Instrno�Courseno i.e. instructor 99 teaches 1803, instructor 77 teaches 1903

and instructor 66 teaches 1803.

The super key and candidate key of this relation is Studno, Courseno.The relation is in 3NF

because we have FD Studno, Courseno�Instrno and Studno, Courseno is a super key and FD

Instrno�Courseno and Courseno is a prime attribute of this relation.

Page 22: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 22

This relation has the following insertion, deletion and update anomalies.

1. Insertion Anomaly – How do we add the fact that instructor 55 teaches course 2906.

2. Deletion Anomaly – If we delete all rows for course 1803 we will lose the information that

instructor 99 teaches student 121 and 66 teaches student 222.

3. Update Anomaly – If we update the Courseno of student 121 to be 1903 then we also have to

update his Instrno to be 77.

Now the relation is not in BCNF because we have the FD Instrno�Courseno and Instrno is not a super key

of this relation. So we decompose this relation into two relations Stud(Studno, Instrno) and Instr(Instrno,

Courseno).

Stud

Studno Instrno

Instr

Instrno courseno

Stud

Studno Instrno

121 99

121 77

222 66

222 77

Instr

Instrno Courseno

99 1803

77 1903

66 1803

Now these relations don’t have any insertion, deletion and update anomalies because every dependency is

on the super key. The natural join between these two relations produce the original relations.

Page 23: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 23

Q. Prove that any relation schema with two attributes is in BCNF?

Sol: Consider a relation schema R1 with two attributes A and B and with the following dependency

diagram.

R1

A B

Now with this dependency diagram we either have A�B or B�A.

Now we define BCNF as a relation schema R is in BCNF if whenever a nontrivial functional dependency

X�A holds in R, then X is a super key of R.

From the definition of BCNF the relation schema R1 is necessarily in BCNF because we either have A�B

with super key A or have B�A with super key B.

MULTIVALUED DEPENDENCIES

A multivalued dependency (MVD) on R, X ��Y , says that if two tuples of R agree on all the attributes of X, then their

components in Y may be swapped, and the result will be two tuples that are also in the relation.

i.e., for each value of X, the values of Y are independent of the values of R-X-Y.

Every FD is an MVD (promotion ):- If X �Y, then swapping Y ’s between two tuples that agree on X doesn’t change

the tuples. Therefore, the “new” tuples are surely in the relation, and we know X ��Y.

Consider the following relation that represents an entity employee that has one mutlivalued attribute proj:

emp (e#, dept, salary, proj)

We have so far considered normalization based on functional dependencies; dependencies that apply only to

single-valued facts. For example, implies only one dept value for each value of e#. Not all information in a

database is single-valued, for example, e# ����proj in an employee relation may be the list of all projects that the

employee is currently working on. Although e# determines the list of all projects that an employee is working

on, e# ����dept is not a functional dependency.

Eg. programmer (emp_name, qualifications, languages)

The above relation includes two multivalued attributes of entity programmer; qualifications and languages. There

are no functional dependencies.

The attributes qualifications and languages are assumed independent of each other. If we were to consider

qualifications and languages separate entities, we would have two relationships (one between employees and

qualifications and the other between employees and programming languages). Both the above relationships are

many-to-many i.e. one programmer could have several qualifications and may know several programming

languages. Also one qualification may be obtained by several programmers and one programming language may

be known to many programmers.

The above relation is therefore in 3NF (even in BCNF) but it still has some disadvantages. Suppose a

programmer has several qualifications (B.Sc, Dip. Comp. Sc, etc) and is proficient in several programming

languages; how should this information be represented? There are several possibilities.

Page 24: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 24

All these variations have some disadvantages. If the information is repeated we face the same problems of

repeated information and anomalies as we did when second or third normal form conditions are violated. If there

is no repetition, there are still some difficulties with search, insertions and deletions. For example, the role of

NULL values in the above relations is confusing.. These problems may be overcome by decomposing a relation

like that:-

Page 25: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 25

The concept of multivalued dependencies was developed to provide a basis for decomposition of relations like

the one above. Therefore if a relation like enrolment(sno, subject#) has a relationship between sno and subject#

in which sno uniquely determines the values of subject#, the dependence of subject# on sno is called a

trivial MVD since the relation enrolment cannot be decomposed any further. More formally, X�� Ya MVD is

called trivial MVD if either Y is a subset of X or X and Y together form the relation R. The MVD is trivial since

it results in no constraints being placed on the relation.

A MVD X��Y in R is called a trivial MVD if

a) Y is a subset of X or

b) X U Y = R.

For e.g., consider the relation EMP_PROJECTS.from table we have a trivial MVD ENAME ��PNAME.

Therefore a relation having non-trivial MVDs must have at least three attributes;

two of them multivalued. Non-trivial MVDs result in the relation having some constraints on it since all possible

combinations of the multivalue attributes are then required to be in the relation. Possible: ZX�� Y but

X�� ZY means X�� Y , X�� Z.

Multivalued Normalization --- Fourth Normal Form

Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.

Consider an example of Programmer(Emp name, qualification, languages) and discussed the problems that may

arise if the relation is not normalised further. We also saw how the relation could be decomposed into P1(Emp

name, qualifications) and P2(Emp name, languages) to overcome these problems. The decomposed relations are

in fourth normal form (4NF) which we shall now define.

We are now ready to define 4NF. A relation R is in 4NF if, whenever a multivalued dependency X�Yholds then

either

(a) the dependency is trivial, or (b) X is a candidate key for R.

In fourth normal form, we have a relation that has information about only one entity. If a relation has more than

one multivalue attribute, we should decompose it to remove difficulties with multivalued facts.

EMP

ENAME PNAME DNAME

Page 26: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 26

EMP

ENAME PNAME DNAME

Kumar X Rakesh

Kumar Y Rao

Kumar X Rao

Kumar Y Rakesh

From the above fig we have ENAME�PNAME and ENAME ��DNAME (or ENAME �PNAME|DNAME)

holds in the EMP relation. The employee with ENAME ‘Kumar’ works on projects with PNAME ‘X’ and ‘Y’ and has two

dependents with DNAME ‘Rakesh’ and ‘Rao’.

This relation is not in 4NF because in the nontrivial MVDs ENAME��PNAME and ENAME��DNAME, ENAME is

not a super key of EMP. We decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS shown in fig below.

Both EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs ENAME��PNAME

in EMP_PROJECTS and ENAME��DNAME in EMP_DEPENDENTS are trivial MVDs. No other nontrivial MVDs

hold in either EMP_PROJECTS or EMP_DEPENDENTS. No FDs hold in these relation schemas either.

JOIN DEPENDENCIES

Notice that an MVD is a special case of a JD where n = 2. That is, a JD denoted as JD(R1, R2) implies a

MVD (R1 R2) �� (R1 – R2) (or, by symmetry, (R1 R2 �� (R2 – R1)). A join dependency JD(R1,

R2,…,Rn), specified on relation schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2,…,Rn) is equal to

R. Such a dependency is called trivial because it has the nonadditive join property for any relation state r of R and hence

does not specify any constraint on R.

For e.g., consider the relation SUPPLY with JD(R1, R2, R3) shown below. From the relations we have

R3) = SUPPLY

Page 27: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 27

SUPPLY

SNAME PARTNAME PROJNAME

Kumar Bolt projX

Kumar Nut projY

Prakash Bolt projY

Ramesh Nut projZ

Prakash Nail projX

R1 R2

R3

Fifth Normal Form (5NF) (Project Join Normal Form) (PJNF) A relation schema R is in fifth normal form (5NF) (or project join normal form [PJNF]) with respect

to a set of F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1,

R2,…,Rn) in F+ (that is implied by F), every Ri is a super key of R.

OR

Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every join

dependency in the table is a consequence of the candidate keys of the table. The fifth normal form deals with

join-dependencies which is a generalisation of the MVD. The aim of fifth normal form is to have relations that

cannot be decomposed further. A relation in 5NF cannot be constructed from several smaller relations.

A relation R satisfies join dependency (R1,R2,…..Rn) if and only if R is equal to the join of R1,R2,…..Rn where

are R1subsets of the set of attributes of R.

A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies at least one of the

following holds.

(a) (R1,R2,…..Rn) is a trivial join-dependency (that is, one of R1 is R) (b) Every R1 is a candidate key for R.

Page 28: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 28

For e.g., consider the SUPPLY relation shown above. The super keys of this relation are {SNAME, PARTNAME},

{SNAME, PROJNAME} and {PARTNAME, PROJNAME}. The decomposition of

R1(SNAME, PARTNAME), R2(SNAME, PROJNAME) and R3(PARTNAME, PROJNAME) is in 5NF

because we have a JD(R1, R2, R3) over R and every Ri is a super key of R.

Eg. An example of 5NF can be provided by the example below that deals with departments, subjects and

students.

The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and CP3000 which are taken by a

variety of students. No student takes all the subjects and no subject has all students enrolled in it and therefore all

three fields are needed to represent the information.

The above relation does not show MVDs since the attributes subject and student are not independent; they are

related to each other and the pairings have significant information in them. The relation can therefore not be

decomposed in two relations

(dept, subject), and (dept, student)

without loosing some important information. The relation can however be decomposed in the following three

relations.

(dept, subject), and (dept, student) , (subject, student)

and now it can be shown that this decomposition is lossless.

DOMAIN KEY NF (Free from modification anomalies)

A key uniquely identifies each row in a table. A domain is the set of permissible values

for an attribute. By enforcing key and domain restrictions, the database is assured of being free from modification

anomalies. Thus, it avoids all non-temporal anomalies. It's much easier to build a database in domain/key normal form than

it is to convert lesser databases which may contain numerous anomalies. However, successfully building a domain/key

normal form database remains a difficult task, even for experienced database programmers. Thus, while the domain/key

normal form eliminates the problems found in most databases, it tends to be the most costly normal form to achieve.

Page 29: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 29

Eg. A violation of DKNF occurs in the following table:

Wealthy Person

Wealthy Person Wealthy Person Type Net Worth in Dollars

Steve Eccentric Millionaire 124,543,621

Roderick Evil Billionaire 6,553,228,893

Katrina Eccentric Billionaire 8,829,462,998

Gary Evil Millionaire 495,565,211

(Assume that the domain for Wealthy Person consists of the names of all wealthy people in a pre-defined sample of

wealthy people; the domain for Wealthy Person Type consists of the values 'Eccentric Millionaire', 'Eccentric Billionaire',

'Evil Millionaire', and 'Evil Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or

equal to 1,000,000.)

There is a constraint linking Wealthy Person Type to Net Worth in Dollars, even though we cannot deduce one from the

other. The constraint dictates that an Eccentric Millionaire or Evil Millionaire will have a net worth of 1,000,000 to

999,999,999 inclusive, while an Eccentric Billionaire or Evil Billionaire will have a net worth of 1,000,000,000 or higher.

This constraint is neither a domain constraint nor a key constraint; therefore we cannot rely on domain constraints and key

constraints to guarantee that an inconsistent Wealthy Person Type / Net Worth in Dollars combination does not make its

way into the database.

The DKNF violation could be eliminated by altering the Wealthy Person Type domain to make it consist of just two

values, 'Evil' and 'Eccentric' (the wealthy person's status as a millionaire or billionaire is implicit in their Net Worth in

Dollars, so no useful information is lost).

DKNF is frequently difficult to achieve in practice.

Page 30: Normal is at Ion 1

Notes of Database Management System

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 30

Denormalization Occasionally database designers choose a schema that has redundant information; that is, it is not

normalized. They use the redundancy to improve performance for specific applications. The penalty paid

for not using a normalized schema is the extra work (in terms of coding time and execution time) to keep

redundant data consistent. For instance, suppose that the name of an account holder has to be displayed

along with the account number and balance, every time the account is accessed. In our normalized schema,

this requires a join of account with depositor. One alternative to computing the join on the fly is to store a

relation containing all the attributes of account and depositor. This makes displaying the account

information faster. However, the balance information for an account is repeated for every person who owns

the account, and all copies must be updated by the application, whenever the account balance is updated.

The process of taking a normalized schema and making it non-normalized is called denormalization, and

designers use it to tune performance of systems to support time-critical operations.