cs 1555people.cs.pitt.edu/~nlf4/cs1555/slides/normalization.pdf · a superkey of a relation r = {a...

41
CS 1555 www.cs.pitt.edu/~nlf4/cs1555/ Normalization

Upload: doanhanh

Post on 27-Feb-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

CS 1555www.cs.pitt.edu/~nlf4/cs1555/

Normalization

● Why do we use different relations?

● It seems simpler to just use a single giant table…

● More generally, how do we define a good or bad database

table layout?

○ What makes a good database design good?

○ What makes a bad design bad?

Why aren't databases just a single table?

2

Bad design example

● Problems?

Name ID Major Course GradeAlice 334322 CS CS 441 3.45

Alice 334322 CS CS 447 3.45

Alice 334322 CS Math 230 3.45

Bob 546346 Math Math 422 3.23

Bob 546346 Math Stat 1000 3.23

3

Student_Enrollment

Student_Enrollment

Redundancy

Alice 334322 CS CS 441 3.95

Alice 334322 CS CS 447 3.65

Alice 334322 CS Math 230 3.55

Bob 546346 Math Math 422 3.23

Bob 546346 Math Stat 1000 3.23

Name ID Major Course Grade

4

Insertion anomaly

Alice 334322 CS CS 441 3.95

Alice 334322 CS CS 447 3.65

Alice 334322 CS Math 230 3.55

Bob 546346 Math Math 422 3.23

Bob 546346 Math Stat 1000 3.23

Name ID Major Course Grade

Mike 823485 German NULL NULL

5

Student_Enrollment

Deletion anomaly

Alice 334322 CS CS 441 3.95

Alice 334322 CS CS 447 3.65

Alice 334322 CS Math 230 3.55

Name ID Major Course Grade

Mike 823485 German NULL NULL

Bob 546346 Math Math 422 3.23

Bob 546346 Math Stat 1000 3.23

Mike 823485 German NULL NULL

6

Student_Enrollment

Modification anomaly

Alice 334322 CS CS 441 3.95

Alice 334322 CS CS 447 3.65

Alice 334322 CS Math 230 3.55

Bob 546346 Math Math 422 3.23

Bob 546346 Math Stat 1000 3.23

Name ID Major Course Grade

Mike 823485 German NULL NULL

7

Student_Enrollment

Alice 334322 Math CS 441 3.95

Alice 334322 Math CS 447 3.65

Alice 334322 CS Math 230 3.55

Bob 546346 Math Math 422 3.23

Bob 546346 Math Stat 1000 3.23

● We're using this single table for many different purposes

○ Keep track of information about students overall

○ Keep track of enrollment information

○ We need to decompose the this offending table into multiple

smaller tables

■ The question is how do we perform a good

decomposition?

● Clearly, there are semantic relationships between attributes

of the table

○ How can we formally recognize these relationships to create

good database designs?

Why do these problems pop up?

8

● Given:

○ R = {A1, A2, A3, …, An}

○ X ⊆ R

○ Y ⊆ R

● X → Y

○ If the value(s) of X uniquely determines the value(s) of Y

■ X functionally determines Y

■ Y is functionally dependent on X

Functional dependencies

9

Functional dependencies

A B C D

1 1 1 1

1 2 1 2

2 2 2 2

2 2 2 3

3 3 2 4

10

● They are not a property of a particular relation state

○ Property of R, not r(R)

● They cannot be automatically inferred from r(R)

○ It can show which dependencies may exist

○ And show which dependencies cannot exist

○ Must have knowledge of the semantics of the attributes of R

for the full story

Functional dependencies are properties of the schema

11

Functional dependency example

Alice 334322 CS CS 441 3.45

Alice 334322 CS CS 447 3.45

Alice 334322 CS Math 230 3.45

Bob 546346 Math Math 422 3.23

Bob 546346 Math Stat 1000 3.23

Charlie 783857 CS CS 441 3.00

Danielle 971357 CS Stat 1000 3.50

Name ID Major Course Grade

12

Student_Enrollment

● A relation is 1NF if all of its attributes contain only single atomic values

1st Normal form

Name ID Major CourseAlice 334322 CS {CS 441,

CS 447, Math 230}

Bob 546346 Math {Math 422, Stat 1000}

13

● A superkey of a relation R = {A1, A2, A3, …, An} is a set of attributes S ⊆ R such that no tuples t1 and t2 can exist in any legal relational where t1[S] = t2[S]

● A key is a superkey such that if any attribute is removed, it would no longer be a superkey

● Relations may have multiple keys in which case they are called candidate keys○ One candidate key is designated to be the primary key

● Any attribute of R that is a member of a candidate key is called a prime attribute○ All other attributes are said to be non-prime

Important terms

14

● A functional dependency X → Y is a partial dependency if

some attribute A of X can be removed from X and the

dependency will still hold

○ I.e., (X - {A}) → Y

○ If X → Y but there does not exist an A such that (X - {A}) → Y

■ Then Y is fully functionally dependent on X

Partial functional dependencies

15

● A relation is 2NF is if it is 1NF and every nonprime attribute of R is fully functionally dependent on every key of R

Second normal form

16

● X → Y is a transitive if there is a set of attributes Z that is not

a subset of any key in R and both X → Z and Z → Y

○ Consider EMP( Ssn, Ename, DeptID, MgrSsn)

Transitive functional dependencies

17

● A relation is 3NF is if it is 2NF and every non-prime attribute of R is non-transitively dependent on every key of R

Third normal form

18

Example

Patient Hospital Doctor

19

● For a relation R to be BCNF, R must be in 3NF, and

if X → Y holds in R, then one of the following must hold:

○ X → Y is trivial

■ X → Y is trivial if X ⊇ Y

● E.g.,

○ A → A

○ {A, B} → B

○ X is a superkey of R

Boyce Codd normal form

20

● The universe of all relations is a super set of:○ The set of 1NF relations ○ Which is a super set of:

■ The set of 2NF relations■ Which is a super set of:

● The set of 3NF relations● Which is a super set of:

○ The set of BCNF relations○ Which is a super set of:

■ The set of 4NF relations■ Which is a super set of:

● The set of 5NF relations○ …

Fourth normal form … ?

21

● Decompose it into multiple relations to get it into BCNF

● Options?

○ {Doctor, Patient}, {Patient, Hospital}

○ {Doctor, Hospital}, {Patient, Hospital}

○ {Doctor, Hospital}, {Doctor, Patient}

● Which should we go with?

So what should we do with R = {Patient, Hospital, Doctor}?

Patient Hospital Doctor

22

● If a functional dependency from R spans multiples tables in

a decomposition of R, then that decomposition is not

dependency preserving

○ Well, actually, that might be a bit too strict...

BCNF is not always dependency preserving

23

● Consider:

○ Dept_no → MgrSsn

○ MgrSsn → Mgr_phone

○ so… Dept_no → Mgr_phone

■ Do we care if this is lost in a decomposition?

● No!

○ … as long as the other two remain

Inferred dependencies

24

● Reflexive rule: ○ If X ⊇ Y, then X →Y

● Augmentation rule: ○ {X → Y} |= XZ → YZ

● Transitive rule:○ {X → Y, Y → Z} ⊧ X → Z

● Decomposition, or projective, rule:○ {X → YZ} ⊧ X → Y.

● Union, or additive, rule:○ {X → Y, X → Z} ⊧ X → YZ.

● Pseudotransitive rule:○ {X → Y, WY → Z} ⊧ WX → Z.

Inference rules for functional dependencies

Armstrong's inference rules

25

● Any dependency that can be inferred from F using

Armstrong's inference rules will hold on any r(R) that

satisfies the dependencies of F

● Repeatedly applying Armstrong's inference rules on F will

produce the set of all possible dependencies that can be

inferred from F

○ The closure of F

■ Denoted F+

Given a set F of functional dependencies on R

26

● F and E are equivalent if E+ = F+

● F covers E if E ⊆ F+

○ E can be inferred from F

● Consider the informal definition of a minimum cover:

○ F is a minimum cover of E if E ⊆ F+, but this property would

not hold if any dependency was removed from F

Comparing sets of functional dependencies

27

● With functional dependencies F

● We don't need to ensure that no dependency in F is split

across multiple tables

○ We really just want to ensure that no dependency in a

minimum cover of F is split across multiple tables

○ Though we can achieve what we want by ensuring that the

union of functional dependencies on tables resulting from the

decomposition of R is equivalent to F

When decomposing relation R …

28

● Let D = {R1, R2, R3, …, Rm} a decomposition of R

○ The tables resulting from a decomposition of R

● D exhibits attribute preservation if:

○ ⋃i = 1 to m Ri = R

Attribute preservation

29

Example of a bad decomposition

Name ID Major Course GradeAlice 334322 CS CS 441 3.95

Alice 334322 CS CS 447 3.65

Bob 546346 Math Math 422 3.90

Bob 546346 Math CS 447 2.75

Name ID MajorAlice 334322 CS

Bob 546346 Math

Course GradeCS 441 3.95

CS 447 3.65

Math 422 3.90

CS 447 2.7530

● Aka lossless decomposition

● Informally, can the decomposed tables be combined in

some way to recover the original exactly.

○ How could we have performed a nonadditive decomposition

of the previous example?

Nonadditive decomposition

31

A better decomposition

Name ID MajorAlice 334322 CS

Bob 546346 Math

ID Course Grade334322 CS 441 3.95

334322 CS 447 3.65

546346 Math 422 3.90

546346 CS 447 2.75

32

A general test for nonadditive decomposition

● Need:○ A universal relation R with k attributes○ A decomposition of R, D = {R1, R2, …, Rm}○ A set of functional dependencies F

● Create an m x k matrix○ One row for each relation in the decomposition○ One column for each attribute of the universal relation

● Set initial b values for each entry● Set a values for each entryi,j where Ri contains attribute j● For each dependency X → Y in F:

○ For all rows that have matching symbols for X:■ Set the symbols for Y to match

● If any row has an a value, use that● Otherwise, select a b value to use

○ Repeat until no changes are made for any dependency33

Example test for nonadditive decomposition

● R = {Ssn, Ename, Pnumber, Pname, Plocation, Hours}● D = {R1, R2}

○ R1 = {Ename, Plocation}○ R2 = {Ssn, Pnumber, Hours, Pname, Plocation}

● F = { Ssn → Ename, Pnumber → {Pname, Plocation}, {Ssn, Pnumber} → Hours }

Ssn Ename Pnumber Pname Plocation Hours

R1

R2

b11 b12 b13 b14 b15 b16

b21 b22 b23 b24 b25 b26

a2 a5

a1 a3 a4 a5 a6

34

Example 2 test for nonadditive decomposition

● R = {Ssn, Ename, Pnumber, Pname, Plocation, Hours}● D = {R1, R2, R3}

○ R1 = {Ssn, Ename}○ R2 = {Pnumber, Pname, Plocation}○ R3 = {Ssn, Pnumber, Hours}

● F = { Ssn → Ename, Pnumber → {Pname, Plocation}, {Ssn, Pnumber} → Hours }

Ssn Ename Pnumber Pname Plocation Hours

R1

R2

R3

b11 b12 b13 b14 b15 b16

b21 b22 b23 b24 b25 b26

b31 b32 b33 b34 b35 b36

a1 a2

a3 a4 a5

a1 a3 a6a2 a4 a5

35

The normalization process

● General rule:

○ We want table attributes to depend on the primary key, the

whole primary key, and nothing but the primary key

● Normalization can be achieved via two methodologies:

○ Decomposition

■ Top-down approach

○ Synthesis

■ Bottom-up approach

36

Goals of the process

● BCNF

● Nonadditive joins

● Dependency preservation

● If we can't get all three, we'll have to settle for either:

○ Lack of dependency preservation

○ Some redundancy due to the use of 3NF

37

Algorithm for decomposition to BCNF

● Given:

○ A universal relation R

○ A set of functional dependencies F

● Set D := {R}

● While some schema Q in D is not in BCNF:

○ Find a functional dependency X → Y in Q that violates BCNF;

○ Replace Q in D by two relation schemas

■ D := D - {Q}

■ D := D ∪ {(Q – Y)}

■ D := D ∪ {(X ∪ Y)}

38

Algorithm for synthesis to 3NF

● Set D := {}

● Find a minimal cover G for F

● For each X that is the left side of a dependency in G:

○ D := D ∪ { X ∪ {A1} ∪ {A2} … ∪ {Ak} }

■ Where X → A1, X → A2, … , X → Ak are the only

dependencies in G with X as left side

■ X is the key of this newly-added relation

● If none of the relation schemas in D contains a key of R,

then create one more relation schema in D that contains

attributes that form a key of R

● Eliminate redundant relations in D39

Algorithm for finding the minimal cover of F

● For each dependency X → {A1, A2, … , An} in F:

○ Replace X → {A1, A2, … , An} with:

■ X → A1, X → A2, … , X → An

● For each functional dependency X → A in F:

○ For each attribute B that is an element of X:

■ if { {F – {X → A} } ∪ { ( X – {B} ) → A} } is equivalent to F:

● Replace X → A with ( X – {B} ) → A in F

● For each remaining functional dependency X → A in F:

○ If {F – {X → A} } is equivalent to F:

■ Remove X → A from F

40

Algorithm for finding a key to R

● Set K := R

● For each attribute A in K:

○ Compute (K – A)+ with respect to F;

○ if R ⊆ (K – A)+:

■ Set K := K – {A}

41