copyright © 2003-2012 curt hill schema refinement iii 4 th nf and 5 th nf

28
Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Upload: holly-briggs

Post on 16-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Schema Refinement III

4th NF and 5th NF

Page 2: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Now what?

• An example• Consider a table that contains courses,

instructors and textbooks• There may be multiple instructors for

multiple sections of the class• There may be multiple textbooks as well• Both instructors and textbooks come

from a set of possibilities

Page 3: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Course/Instructor/BookDept Number Instructor Book

CIS 385 221 Smith & Boss

CIS 385 221 Noble

CIS 385 403 Smith & Boss

CIS 385 403 Noble

• Key is entire tuple• Each instructor uses two books for the course• There is a redundancy

Page 4: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Commentary• There is redundancy that we should

deal with

• The table is in BCNF– No examination of FDs will help us

• The two instructors and two textbooks are both determined by the course department and number

• This is an example of a MultiValued Dependency

Page 5: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Commentary Again• First normal form disallows repeating

groups• A repeating group is often a set• A MultiValued Dependecy is a set

depending on an item• Examples:

– People working on many projects– Each of these have many dependents

Copyright © 2003-2012 Curt Hill

Page 6: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Examples• In this example the course determines a

set of instructors

• The course also determines a set of textbooks

• These two sets are independent

• If the sets are large we get plenty of redundancy and yet are still in BCNF– If we have every book connected to every

instructor connected to the course

Copyright © 2003-2012 Curt Hill

Page 7: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

MultiValue Dependency

• An MVD determines a value from a set

• Notation is two arrows• Dept,Number Instructor and

• Dept,Number Book

• The correct decomposition is splitting teacher from book

Page 8: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Course/Instructor/Book

Dept Number Instructor Book

CIS 385 221 Smith & Boss

CIS 385 221 Noble

CIS 385 403 Smith & Boss

CIS 385 403 Noble

Dept Num Instruct

CIS 385 221

CIS 385 403

Dept Num Book

CIS 385 Smith & Boss

CIS 385 Noble

Project into

Page 9: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Fourth Normal Form

• The above two tables are in 4th NF

• A table is in 4th NF if and only if

• The table is in BCNF

• All MVDs are now FDs

• If there are no MVDs then BCNF is also 4NF

Page 10: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Another View of 4th NF

• If a relation is in 4th NF then for each MVD, X A one of the following must hold

• The MVD is trivial– A is part of X or– XA is the whole relation

• X is a superkey

Page 11: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Is this 4th NF?Dept Number Instructor Book

CIS 385 221 Smith & Boss

CIS 385 221 Noble

CIS 385 403 Smith & Boss

CIS 385 403 Noble

• There are two MVDs– Dept,Number Instructor

– Dept,Number Book

• Trivial MVDs? - No• Dept,Number superkey? - No

Page 12: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Is this 4th NF?Dept Num Instruct

CIS 385 221

CIS 385 403

• There is one MVD– Dept,Num Instructor

• Trivial MVD?– Yes, this is whole relation

Page 13: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Decomposability

• A strange thing happens:

• There are relations that may not be lossless join decomposed into two relations

• But they can be decomposed into larger number of relations

• The following example shows a relation that can be decomposed into three but not two

Page 14: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

S P J

1 1 2

1 2 1

2 1 1

1 1 1

AExample

Page 15: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

What about this?

• What is the key?– Entire tuple– Must be in 4th NF

• What MVDs?– S P– S J– P J– Among others

Page 16: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Decomposition

• In the next slide we will see the table decomposed into tables of two fields

• However, no two of them can be joined into the original without extra rows

• All three of them can be joined into the original

Copyright © 2003-2012 Curt Hill

Page 17: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

S P J

1 1 2

1 2 1

2 1 1

1 1 1

S J

1 2

1 1

2 1

S P

1 1

1 2

2 1

P J

1 2

2 1

1 1

S P J

1 1 2

1 2 2

1 2 1

2 1 1

1 1 1

S P J

1 1 2

1 2 1

2 1 1

1 1 1

AB C D

Example Decomposed

Page 18: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

What Just Happened?

• A could not be lossless join decomposed into any two of {B, C, D}– Decomposing into just two must break an MVD

• It could be lossless join decomposed into all three

• There is a join dependency between A and {B, C, D}

• There is no join dependency between any of– A and {B, C} – A and {B, D} – A and {C, D}

Page 19: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Join Dependencies• A Join Dependency {R1,R2,…RN} holds over R if

R1,R2,…RN is a lossless join decomposition of R– In other words, joining R1,R2,…RN gives R

• Notation: {R1,R2,…RN}• A JD is a generalization of MVDs

• In the previous example, the MVDs S P S JP Jmay be expressed as the join dependency {B,C,D}

Page 20: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Trivial Join Decompositions

• The join dependency {R1,R2,…RN} on R is trivial iff– At least one of R1,R2,…RN is the set of all

attributes of R– In other words, there is a relation

equivalent to R in the decomposition• Joining R to any decomposition of R or its join

reproduces the original

Page 21: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Implied Join Dependencies

• Suppose the join dependency {R1,R2,…RN} on R

• This Join Dependency is Implied by the Candidate Key(s) iff

• Each relation R1,R2,…RN is a superkey for R

Page 22: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Fifth Normal Form

• 5th NF is also known as: Projection Join Normal Form (PJNF)

• A relation R is in 5th NF if and only if every non-trivial join dependency that is satisfied by R is implied by the candidate key(s) of R

Page 23: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

S P J

1 1 2

1 2 1

2 1 1

1 1 1

Is this in 5th NF?• There is a non-trivial join

decomposition, {B,C,D}

–None of these are A

• This decomposition is not implied by the only candidate key, SPJ–None of these contain SPJ

• No – not in 5NF

Page 24: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Is 5th NF the Ultimate?• It is the ultimate that can be obtained

with just projections– The guaranteed best in terms of a lack of

anomalies that can be removed by projections

• Hence the name Join Projection Normal Form

• However, there may be some anomalies that cannot be eliminated with just projections

Page 25: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

JDs and FDs

• FDs and MVDs have a set of inference rules– This allows us to reason about them

• JDs lack this set

• Thus finding JDs and using them to move to 5th NF has its problems

• We do have one tool

Page 26: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

3NF and 5NF

• If a relation is in 3rd NF and each of its keys is atomicthen the relation is also in 5th NF– The same may be said on BCNF

• There may be 5th NF relations that do not have atomic keys

• When we can apply this we can determine the table is in 5th NF without any consideration of JDs

Page 27: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Copyright © 2003-2012 Curt Hill

Denormalization

• The argument against making everything 5th NF:– Lots of separate relations– These relations become separate files– This means lots of I/O

• Since SQL cannot separate a relation from a file, the argument has some merit

Page 28: Copyright © 2003-2012 Curt Hill Schema Refinement III 4 th NF and 5 th NF

Conclusion

• MVD are much less common than FD• Thus tables that are in BCNF are very

often in 5NF because there are no MVDs• MVDs are also harder to observe and

reason about• Thus 3NF and BCNF are the most

common normal forms

Copyright © 2003-2012 Curt Hill