n ormalization joe meehean 1. r edundancies repeated data in database wastes space can cause...

62
NORMALIZATION Joe Meehean 1

Upload: jovani-carle

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

1

NORMALIZATION

Joe Meehean

Page 2: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

2

REDUNDANCIES

Repeated data in database Wastes space Can cause modification anomalies

unexpected side effect when changing data make building software on top of DB difficult

Normalization process of removing redundancies

Page 3: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

3

MODIFICATION ANOMALIES

Insert anomaly extra data must be known to insert a row into a

table Update anomaly

must change multiple rows to modify a single fact

Deletion anomaly deleting a row causes other data to be deleted deletes more data than is necessary or desired

Page 4: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

4

BAD COLLEGE DATABASE

All data in 1 table

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

Page 5: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

5

BAD COLLEGE DATABASE

Insert anomaly adding Rush Daniels as a student requires knowing which offerings Rush is

enrolled in cannot add Rush as a student until he enrolls

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

Page 6: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

6

BAD COLLEGE DATABASE

Update anomaly if Emily changes her name to Emma need to change multiple rows

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

Page 7: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

7

BAD COLLEGE DATABASE

Delete anomaly if Roger drops out of college and we delete him we also delete that there is an offering of DB in

the spring

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

Page 8: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

8

FUNCTIONAL DEPENDENCIES (FDS)

Constraint between 2 or more columns Represented by → X determines Y (X →Y) if there exists at most

1 value of Y for each value of X like a mathematical function f(x) = y left hand side (or LHS) is called the

determinant e.g., StdNo determines Student first name

StdNo → First Name

Page 9: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

9

ORGANIZING FDS

Make a list can condense list by listing all dependent

columns for a given determinant e.g., StdNo →First Name, Last Name

Determinants should be minimal least # of columns required to determine values

of other columns e.g., StdNo,First Name → Last Name

Page 10: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

10

BAD COLLEGE DATABASE

StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course

Descr. Std No, Offer No → Grade

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

Page 11: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

11

IDENTIFYING FDS

From business narrative Look for words like unique

e.g., “Each student has a unique student number, a first name, and a last name.”

Look for 1-M relationships child (M-side) is the determinant (LHS) e.g., “Faculty teach many offerings.” e.g., Offer No → Faculty Id

Page 12: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

12

IDENTIFYING FDS

From relational tables FDs where determinant (LHS) is not the PK or

a candidate key recall, a candidate key is column(s) that unique

identify a row e.g., Zip → State

Combined PKs does 1 column determine values of some

other columns? e.g., StdNo → First Name, Last Name

Page 13: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

QUESTIONS?

13

Page 14: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

14

NORMAL FORMS

Normalization remove redundancies in tables removes modification anomalies makes data easier to modify

Normal form rules about functional dependencies (FDs)

allowed each successive normal form removes FDs

Page 15: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

15

NORMAL FORMS1NF

2NF

3NF/BCNF

Page 16: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

16

1ST NORMAL FORM

All relational tables are already in 1NF by definition

Page 17: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

17

2ND NORMAL FORM

Key columns columns that are part (or all of) a candidate key recall a candidate key is a key that uniquely

identifies a row Non-key columns

columns that are not part of a candidate key

Page 18: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

18

2ND NORMAL FORM

A table is in 2NF if each non-key column depends on all candidate keys NOT on any subset of any candidate key check functional dependencies (FDs)

A 2NF violation a FD where part of a key determines a

non-key column

Page 19: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

19

2ND NORMAL FORM

2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course

Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

Page 20: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

20

3RD NORMAL FORM

A table is in 3NF if it is in 2NF AND each non-key column depends only on

candidate keys NOT other non-key columns e.g., CourseNr → Course Desc.

3NF violation a non-key column on the right-hand side (RHS) AND anything other than a candidate key on LHS

Page 21: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

21

3RD NORMAL FORM

3NF prohibits transitive dependencies Transitive dependencies

if A → B & B → C, then A → C e.g., Offer No → Course No & Course No → Course Desc. then Offer No → Course Desc.

Page 22: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

22

COMBINED 2NF & 3NF

A table is in 3NF if each non-key column depends on all candidate keys whole candidate keys and nothing but candidate keys

Page 23: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

23

3RD NORMAL FORM

2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr.

3NF Violations CourseNo → Course Descr. OfferNo → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

Page 24: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

24

BOYCE-CODD NORMAL FORM (BCNF)

Revised, simpler version of 3NF Covers additional special cases A table is in BCNF if every determinant is a

candidate key Violations are easy to detect

determinant (LHS) is not a candidate key e.g., StdNo → Last Name

Page 25: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

25

BOYCE-CODD NORMAL FORM (BCNF)

Excludes 2 redundancies that 3NF does not1. part of a key determines part of a key2. a non-key determines part of a key

Page 26: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

26

BOYCE-CODD NORMAL FORM (BCNF)

StdNo OfferNo Email EnrGrade

S1 O1 [email protected]

3.5

S1 O2 [email protected]

3.6

S2 O1 [email protected]

3.8

S2 O3 [email protected]

3.5 BCNF Violations Email → StdNo

Page 27: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

27

SIMPLE SYNTHESIS (BCNF)

Convert tables into BCNF1. Eliminate extraneous columns from LHS of

FDs2. Remove derived (transitive) FDs3. Arrange FDs into groups by determinant4. For each FD group make table with

determinant as primary key5. Merge tables where one table include all

columns of other table choose PK of one of the tables to be PK of new

table

Page 28: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

28

BAD COLLEGE DATABASE (1)

StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade

Course No → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

Page 29: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

29

BAD COLLEGE DATABASE (2)

StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade

Course No → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

Page 30: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

30

BAD COLLEGE DATABASE (3)

StdNo → First Name, Last Name OfferNo → Term, Year, Course No Std No, Offer No → Grade Course No → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

Page 31: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

31

BAD COLLEGE DATABASE (4)

StdNo First Name Last Name

S1 Emily Blem

S2 Phil Park

Offer No Term Year Course No

O1 Spring 2012 C1

O2 Fall 2011 C2

O3 Spring 2012 C3

StdNo OfferNo Grade

S1 O1 --

S1 O2 B+

S2 O3 --

S2 02 B+

Course No Course Descr.

C1 PL

C2 DB

C3 OS

Page 32: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

32

BAD COLLEGE DATABASE (5)

StdNo First Name Last Name

S1 Emily Blem

S2 Phil Park

Offer No Term Year Course No

O1 Spring 2012 C1

O2 Fall 2011 C2

O3 Spring 2012 C3

StdNo OfferNo Grade

S1 O1 --

S1 O2 B+

S2 O3 --

S2 02 B+

Course No Course Descr.

C1 PL

C2 DB

C3 OS

Page 33: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

33

IMPORTANCE OF NORMAL FORM VIOLATIONS

We have the BCNF synthesis process we can just make BCNF tables why do we care about detecting NF violations?

DBA has 2 jobs make new databases maintain old ones

Making new DBs requires using BCNF synthesis process

Maintaining old DBs requires detecting NF violations perhaps made by other employees detecting violations narrows scope of DB redesign

Page 34: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

QUESTIONS?

34

Page 35: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

35

4TH NORMAL FORM (4NF)

M-way relationships associative entity types (weak entities) multiple associations primary key made of FKs from 3 or more tables often represent important documents

glue multiple things together e.g., invoice

can sometimes contain redundancies

Page 36: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

36

4TH NORMAL FORM (4NF)

Student

StdNoName

Offering

OfferNoLocation

Textbook

TextNoTextTitle

Enroll

Page 37: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

37

4TH NORMAL FORM (4NF)

StdNo OfferNo TextNo

S1 O1 T1

S1 O2 T2

S1 O1 T2

S1 O2 T3

Enroll Table

Page 38: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

38

MULTIVALUED DEPENDENCIES (MVDS)

Given table R with columns X,Y, and Z X →→ Y

each X maps to a set of Ys (between 1 and M) X →→ Z

each X maps to a set of Zs (between 1 and M) Y & Z are independent

knowing Y doesn’t tell you anything about Z and vice-versa

Y →→ Z & Y → Z Z →→ Y & Z → Y also Y,V →→ Z, unless V →→ Z

Every FD is an MVD not every MVD is an FD

Page 39: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

39

TRIVIAL MVDS

MVD X →→ Y is trivial if Y is a subset of X OR X and Y are the only columns in the table OR X → Y and X → Z

e.g., has-job table E# →→ P#

e.g. offering table C#, S# →→ #S

Employee# Position# Course Number

Section #

Faculty ID

Page 40: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

40

MULTIVALUED DEPENDENCES (MVDS) non-trivial MVDs manifest as redundancies

in tables there exist rows where X and Y are the same

but Z is different e.g., enroll table

O# →→ S# O# →→ T# S# independent of T#

if Emily drops 242 it doesn’t change the text books

OfferNo StudentNo TextNo

CS242A Phil

CS242A Emily

CS242A Drozdek

CS242A Weiss

Page 41: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

41

MULTIVALUED DEPENDENCES (MVDS) non-trivial MVDs manifest as redundancies

in tables there exist rows where X and Y are the same

but Z is different e.g., enroll table

O# →→ S# O# →→ T# S# independent of T#

if Emily drops 242 it doesn’t change the text books

OfferNo StudentNo TextNo

CS242A Phil Weis

CS242A Emily Drozdek

CS242A Phil Drozdek

CS242A Emily Weiss

Page 42: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

42

4TH NORMAL FORM (4NF)

4th normal form table in BCNF AND all MVDs are trivial

Detecting a violation are there any MVDs? are those MVDs non-trivial?

Page 43: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

43

4TH NORMAL FORM (4NF) Resolving violations

X →→ Y X →→ Z

X Y Z

X1 Y1 Z1

X1 Y2 Z2

X1 Y2 Z1

X1 Y1 Z2

X Y

X1 Y1

X1 Y2

X Z

X1 Z1

X1 Z2

Page 44: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

44

MORE EXAMPLES

Student Offering Grade

Phil CS242A A

Phil CS370A B

Emily CS242A B

Emily CS370A A

S →→ O & S →→ G ?

O →→ G & O →→ S ?

G →→ S & G →→ O ?

Page 45: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

45

MORE EXAMPLES

Student Offering Grade

Phil CS242A A

Phil CS370A B

Emily CS242A B

Emily CS370A A

Offering and Grade not independent

Grade and Student not independent

Student and Offering not indepedent

S →→ O & S →→ G ?

O →→ G & O →→ S ?

G →→ S & G →→ O ?

Page 46: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

46

MORE EXAMPLES

B →→ E & B →→ C Is this a trivial MVD?

Bank Branch Employee Customer

B3 Ann Ted

B3 Terry Alfred

B3 Ann Alfred

B3 Terry Ted

Page 47: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

47

MORE EXAMPLES

B →→ E & B →→ C Is this a trivial MVD?

E is not a subset of B & C is not a subset of B B and E are not the only columns in the table B → E & B → C NO!!!

Bank Branch Employee Customer

B3 Ann Ted

B3 Terry Alfred

B3 Ann Alfred

B3 Terry Ted

Page 48: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

48

MORE EXAMPLESBank Branch Employee Customer

B3 Ann Ted

B3 Terry Alfred

B3 Ann Alfred

B3 Terry Ted

Bank Branch Employee

B3 Ann

B3 Terry

Bank Branch Customer

B3 Ted

B3 Alfred

Page 49: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

QUESTIONS?

49

Page 50: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

50

QUIZ BREAK!!!

Part# PQty PDesc

P1 2 5mm bolt

P2 4 10mm nut

P3 2 5mm wrench

P4 4 8mm washer

PQty →→ PDesc & PQty →→ Part# ?

Page 51: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

51

QUIZ BREAK!!!

Loc # Item Managers

L1 XBox 360 250GB

Cindy

L1 Garmin GPS Aaron

L1 XBox 360 250GB

Aaron

L1 Garmin GPS Cindy

Page 52: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

52

EXTRA 4NF SLIDES

Page 53: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

53

4TH NORMAL FORM (4NF)

Relationship independence 2 relationships are independent if one cannot be

derived from the other knowing one relationship tells you nothing about

the other

Page 54: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

54

4TH NORMAL FORM (4NF)

StdNo OfferNo TextNo

S1 O1 T1

S1 O2 T2

S1 O1 T2

S1 O2 T3

Enroll Table

3 relationships StdNo -- OfferNo StdNo -- TextNo OfferNo -- TextNo

Page 55: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

55

4TH NORMAL FORM (4NF)

StdNo -- OfferNo cannot be derived from other 2 StdNo -- TextNo & TextNo -- OfferNo

same textbook can be use for 2 offerings

OfferNo -- TextNo cannot be derived from other 2 OfferNo -- StdNo & StdNo -- TextNo

students use many text books, not all related to this offering

StdNo -- TextNo can be derived StdNo -- OfferNo & OfferNo -- TextNo

offering number gives the set of texts a student needs

Page 56: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

56

4TH NORMAL FORM (4NF)

Multivalued Dependencies (MVDs) each X can map to a set of Ys and a set of Zs generalization of functional dependencies

each X maps to one Y each X maps to one Z

represented by X→→Y|Z every FD is an MVD

known as a trivial MVD not every MVD is an FD

Page 57: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

57

4TH NORMAL FORM (4NF)

M-way tables sometimes introduce MVDs X →→Y X→→Z X→→Y|Z Y and Z are independent

relationship X--Y is independent of relationship X--Z

Not all M-way tables produce MVDs

Page 58: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

58

4TH NORMAL FORM (4NF)

MVD Table Redundancies assume X1 maps to Y1 & Y2 and X1 maps to Z1 & Z2

X Y Z

X1 Y1

X1 Y2

X1 Z1

X1 Z2

Page 59: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

59

4TH NORMAL FORM (4NF)

Need to fill in the rest of the table

X Y Z

X1 Y1 Z1

X1 Y2 Z2

X1 Y2 Z1

X1 Y1 Z2

Page 60: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

60

4TH NORMAL FORM (4NF)

Rows below the line exist because relationship B--C can be derived from relationships A--B & A--C

Rows below line are redundant

X Y Z

X1 Y1 Z1

X1 Y2 Z2

X1 Y2 Z1

X1 Y1 Z2

Page 61: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

61

4TH NORMAL FORM (4NF)

OfferNo StdNo TextNo

O1 S1 T1

O1 S2 T2

O1 S2 T1

O1 S1 T2

Enroll Table

OfferNo→→StdNo|TextNo offerings map to many students offerings can have many text books

Rows below the line are redundant

Page 62: N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing

62

4TH NORMAL FORM (4NF)

4NF definition tables cannot contain any non-trivial MVDs

Resolving 4NF violations for each table with a non-trivial MVD split 3 column table into two 2 column tables A,B,C goes to A,B & A,C

StdNo OfferNo

S1 O1

S1 O2

OfferNo TextNo

O1 T1

O1 T2

O2 T1

O2 T3