1 cse 480: database systems lecture 18: normal forms and normalization

35
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

Upload: joella-fields

Post on 17-Jan-2016

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

1

CSE 480: Database Systems

Lecture 18: Normal Forms and Normalization

Page 2: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

2

Functional Dependencies

A functional dependency (FD) takes the form of X Y, where X and Y are subsets of attributes in a relation

What does X Y mean?

Values of attributes X determines the values of attributes Y;

Values of attributes Y depends on the values of attributes X;

Suppose t1 and t2 are two tuples in the relation. If t1 and t2 have the same values for attribute set X, then their values for attribute set Y

must be identical to each other in these two tuples

Page 3: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

3

Functional Dependencies

EMP_PRJ(Ssn, Pnumber, Hours, Ename, Pname, Plocation)

{Ssn} {Ename} is a FD

Ename depends on Ssn

{Pnumber} {Pname, Plocation} is a FD

Pname and Plocation depends on Pnumber

Two rows with the same Pnumber must have the same values of Pname and Plocation

{Plocation} {Pnumber} is not a FD

{Ename, Plocation} {Pnumber} is not a FD

Page 4: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

4

Functional Dependencies

Graphical Representation of FDs:

FD1: {SSN, Pnumber} {Hours} FD2: {SSN} {Ename} FD3: {PNumber} {PName, PLocation}

Page 5: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

5

Functional Dependencies

A relation may contain many functional dependencies– How to derive all of them?

Given a set of functional dependencies of a relation R:

= {AC B, A C, D A}

– Does entail AD BC (i.e., is AD BC also a FD of R)?

Page 6: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

6

Inference Rules (Example)

Given AC B, A C, D A }

Does entail AD BC?

1. D A (given in )

2. AD A (augmenting (1) with A)

3. A C (given in )

4. A AC (augmenting (3) with A)

5. AC B (given in )

6. AC BC (augmenting (5) with C)

7. A BC (transitive between (4) and (6))

8. AD BC (transitive between (2) and (7))

Page 7: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

7

Normal Forms and Normalization

Functional dependencies can help us analyze whether a relational schema is “good” or “bad”

In relational model, we don’t say that a schema is good/bad. We say it is in 1NF, 2NF, 3NF, etc

– Properties The higher the NF, the stricter the conditions placed on the schema A higher NF relation is also in lower NF but not vice-versa

– A 3NF relation is in 2NF and 1NF (but not in 4NF, 5NF)

Normalization:– The process of decomposing "bad" (lower normal form) relations

by breaking up their attributes into smaller relations

Page 8: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

8

First Normal Form

A schema is in 1NF if it permits only atomic (indivisible) attribute values

1NF disallows– composite attributes

– multivalued attributes

The relational model itself prohibits relations that contain composite and multivalued attributes– Therefore, all the schemas in relational model are at least in 1NF

Page 9: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

9

Example

Relation is not in 1NF because it has a multivalued attribute (Dlocations)

Page 10: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

10

Normalization into 1NF

3 strategies for normalization:– Place the “offending” attributes in a separate relation

DEPARTMENT(Dname, Dnumber, Dmgr_ssn) DEPTLOCATIONS(Dnumber, Dlocation)

– Change Dlocations into Dlocation and modify the primary key DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dlocation)

– If the maximum number of locations per department is 3: DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dloc1, Dloc2, Dloc3)

Page 11: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

11

Is 1NF Sufficient?

Key of the relation is the combination of (Dnumber, Dlocation)

Relation is in 1NF, but there are redundancies:– Two rows with the same Dnumber must have the same Dname

and Dmgr_ssn (even though their Dlocations are different)

Page 12: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

12

2NF (Motivating Example)

Functional dependencies – {Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key)

– {Dnumber} {Dname, Dmgr_ssn}

Consequence: two tuples with same Dnumber but different Dlocation will have same Dname and Dmgr_ssn, which leads to redundancy!

If {Dnumber} {Dname, Dmgr_ssn} is not a FD, then there won’t be a redundancy problem

Page 13: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

13

2NF (Motivating Example)

This example suggests that if X Y is a FD, where X is the key, you can’t have X’ Y also a FD of the same table (where X’ is a subset of X), otherwise, there’ll be redundancies in the table

– We say that X Y must be a full FD

{Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key)

{Dnumber} {Dname, Dmgr_ssn}

Page 14: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

14

Full versus Partial Dependencies

X Y is a full FD if removal of any attribute from X means the FD does not hold any more

X Y is a partial FD if there is a FD X’ Y where X’ is a subset of X

Example:

– {Dnumber, Dlocation} {Dname, Dmgr_ssn} is a partial FD because {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema

Page 15: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

15

Prime versus NonPrime Attributes

Prime attribute: – an attribute that is a member of the candidate key K

– Example (from previous slide): Dnumber, Dlocation

Nonprime attribute:– an attribute that is not a member of any candidate key.

– Example (from previous slide): Dname, Dmgr_ssn

Page 16: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

16

2NF Definition

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the key of R

Since {Dnumber, Dlocation} is the key– {Dnumber, Dlocation} {Dname, Dmgr_ssn} is FD of the schema– But {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema

The non-prime attributes are not fully functionally dependent on the key

So schema is not in 2NF

Page 17: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

17

Example

FDs:– {SSN, Pnumber} {Hours, Ename, Pname, Plocation},

– {SSN} {Ename},

– {Pnumber} {Pname, Plocation}

Page 18: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

18

Example

– {SSN, PNUMBER} HOURS is a full FD since neither SSN HOURS nor PNUMBER HOURS hold

– But {SSN, PNUMBER} ENAME is a partial dependency since SSN ENAME also holds

Page 19: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

19

2NF

– Is {SSN, PNUMBER} {Hours} a full FD? Yes– Is {SSN, PNUMBER} {Ename} a full FD? No– Is {SSN, PNUMBER} {Pname} a full FD? No– Is {SSN, PNUMBER} {Plocation} a full FD? No

Conclusion: The EMP_PROJ relation is not in 2NF 2NF normalization: take the “offending” FDs and create

separate relations

Page 20: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

20

Normalizing into 2NF

{SSN, Pnumber} {Hours},

{SSN} {Ename},

{Pnumber} {Pname, Plocation}

Page 21: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

21

Is 2NF sufficient?

Key is SSN FDs:

– {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn}– {Dnumber} {Dname, Dmgr_ssn}

Is the table in 2NF? – Yes because every non-prime attribute is fully FD on the key

Page 22: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

22

Is 2NF sufficient?

Are there still redundancies in the relation? Yes– Two tuples with the same Dnumber have the same Dname and

Dmgr_ssn

What is the “offending” FD that causes redundancy?

Page 23: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

23

Is 2NF sufficient?

Functional dependencies:– {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn}

– {Dnumber} {Dname, Dmgr_ssn}

Since Dnumber is not a key, you can have two rows with the same Dnumber. Hence their Dname and Dmgr_ssn must be the same => redundancy!

Page 24: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

24

3NF

A relation schema R is in third normal form (3NF) if – It is in 2NF and

– There is no non-prime attribute in R that is transitively dependent on the primary key In X Y and Y Z are FDs, with X as the primary key, we consider

Z to be transitively dependent on X only if Y is not a candidate key. If Y is a candidate key, then we do not consider this as a transitive dependency problem

Page 25: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

25

Example of 3NF

FDs:– SSN Ename, Bdate, Address, Dnumber– SSN Dnumber– Dnumber Dname, Dmgr_ssn

Dname is transitively dependent on the primary key SSN because SSN Dnumber and Dnumber Dname are FDs of the relation

– Therefore the relation is not in 3NF

Page 26: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

26

Third Normal Form

Another way to check whether a relation is in 3NF (without checking for partial and transitive dependencies):

– A relation schema R is in 3NF if whenever a nontrivial FD X A holds, either X is a superkey of R or A is a prime attribute of R

Page 27: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

27

3NF

FDs:– SSN Ename, Bdate, Address

– SSN Dnumber

– Dnumber Dname, Dmgr_ssn But Dnumber is not superkey and Dname,Dmgr_ssn are not prime

attributes

Therefore the relation is not in 3NF

Transitive dependency

Page 28: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

28

Normalizing into 3NF

Take the “offending” FDs and create separate relations

Page 29: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

29

Is 3NF enough to remove redundancy?

FDs: – {Student, Course} Instructor

– Instructor Course

Relation is in 3NF (but there is still redundancy)

Assume every instructor teaches only 1 course

Key is (Student, Course)

No transitive dependency because Course is not a

prime attribute

Page 30: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

30

BCNF (Boyce-Codd Normal Form)

A relation schema R is in BCNF if whenever an FD X A holds in R, then X must be a superkey of R

FDs: – {Student, Course} Instructor

– Instructor Course

Relation is not in BCNF because Instructor is not a superkey

Page 31: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

31

Achieving BCNF by Decomposition

STUD_COURSE– Key is {Student,Course}

COURSE_INSTRUCT– Key is {Instructor}

– FD: Instructor Course

Loses the FD: {Student, Course} Instructor– But no redundancy

STUD_COURSE COURSE_INSTRUCT

Page 32: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

32

Decomposition 1

Problem: decomposition does not result in lossless join (i.e., does not have nonadditive join property)

– i.e., spurious tuples may be generated

Page 33: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

33

Decomposition 2

Dependency preserving? No– loses the FD: {Student, Course} Instructor

Lossless join? Yes

Page 34: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

34

Decomposition 3

Dependency preserving? No– loses the FD: {Student, Course} Instructor

Lossless join? No

Page 35: 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

35

Summary

1st normal form– no composite/multivalued attributes in relations

2nd, 3rd, and Boyce-Code normal forms– Eliminate redundancies based on FDs

More normal forms (see textbook)– 4th : deal with multivalued dependencies

– 5th : deal with join dependencies