n ormalization joe meehean 1. r edundancies repeated data in database wastes space can cause...
TRANSCRIPT
1
NORMALIZATION
Joe Meehean
2
REDUNDANCIES
Repeated data in database Wastes space Can cause modification anomalies
unexpected side effect when changing data make building software on top of DB difficult
Normalization process of removing redundancies
3
MODIFICATION ANOMALIES
Insert anomaly extra data must be known to insert a row into a
table Update anomaly
must change multiple rows to modify a single fact
Deletion anomaly deleting a row causes other data to be deleted deletes more data than is necessary or desired
4
BAD COLLEGE DATABASE
All data in 1 table
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Fall 2011
C- C1 DB
S1 Phil Park O2 Fall 2011
B+ C2 OS
S2 Blem Emily O3 Spring
2012
A+ C3 PL
S2 Blem Emily O2 Fall 2011
B+ C2 OS
S3 Roger Cook O4 Spring
2014
--- C1 DB
5
BAD COLLEGE DATABASE
Insert anomaly adding Rush Daniels as a student requires knowing which offerings Rush is
enrolled in cannot add Rush as a student until he enrolls
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Fall 2011
C- C1 DB
S1 Phil Park O2 Fall 2011
B+ C2 OS
S2 Blem Emily O3 Spring
2012
A+ C3 PL
S2 Blem Emily O2 Fall 2011
B+ C2 OS
S3 Roger Cook O4 Spring
2014
--- C1 DB
6
BAD COLLEGE DATABASE
Update anomaly if Emily changes her name to Emma need to change multiple rows
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Fall 2011
C- C1 DB
S1 Phil Park O2 Fall 2011
B+ C2 OS
S2 Blem Emily O3 Spring
2012
A+ C3 PL
S2 Blem Emily O2 Fall 2011
B+ C2 OS
S3 Roger Cook O4 Spring
2014
--- C1 DB
7
BAD COLLEGE DATABASE
Delete anomaly if Roger drops out of college and we delete him we also delete that there is an offering of DB in
the spring
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Fall 2011
C- C1 DB
S1 Phil Park O2 Fall 2011
B+ C2 OS
S2 Blem Emily O3 Spring
2012
A+ C3 PL
S2 Blem Emily O2 Fall 2011
B+ C2 OS
S3 Roger Cook O4 Spring
2014
--- C1 DB
8
FUNCTIONAL DEPENDENCIES (FDS)
Constraint between 2 or more columns Represented by → X determines Y (X →Y) if there exists at most
1 value of Y for each value of X like a mathematical function f(x) = y left hand side (or LHS) is called the
determinant e.g., StdNo determines Student first name
StdNo → First Name
9
ORGANIZING FDS
Make a list can condense list by listing all dependent
columns for a given determinant e.g., StdNo →First Name, Last Name
Determinants should be minimal least # of columns required to determine values
of other columns e.g., StdNo,First Name → Last Name
10
BAD COLLEGE DATABASE
StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course
Descr. Std No, Offer No → Grade
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Fall 2011
C- C1 DB
S1 Phil Park O2 Fall 2011
B+ C2 OS
S2 Blem Emily O3 Spring
2012
A+ C3 PL
S2 Blem Emily O2 Fall 2011
B+ C2 OS
S3 Roger Cook O4 Spring
2014
--- C1 DB
11
IDENTIFYING FDS
From business narrative Look for words like unique
e.g., “Each student has a unique student number, a first name, and a last name.”
Look for 1-M relationships child (M-side) is the determinant (LHS) e.g., “Faculty teach many offerings.” e.g., Offer No → Faculty Id
12
IDENTIFYING FDS
From relational tables FDs where determinant (LHS) is not the PK or
a candidate key recall, a candidate key is column(s) that unique
identify a row e.g., Zip → State
Combined PKs does 1 column determine values of some
other columns? e.g., StdNo → First Name, Last Name
QUESTIONS?
13
14
NORMAL FORMS
Normalization remove redundancies in tables removes modification anomalies makes data easier to modify
Normal form rules about functional dependencies (FDs)
allowed each successive normal form removes FDs
15
NORMAL FORMS1NF
2NF
3NF/BCNF
16
1ST NORMAL FORM
All relational tables are already in 1NF by definition
17
2ND NORMAL FORM
Key columns columns that are part (or all of) a candidate key recall a candidate key is a key that uniquely
identifies a row Non-key columns
columns that are not part of a candidate key
18
2ND NORMAL FORM
A table is in 2NF if each non-key column depends on all candidate keys NOT on any subset of any candidate key check functional dependencies (FDs)
A 2NF violation a FD where part of a key determines a
non-key column
19
2ND NORMAL FORM
2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course
Descr.
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Spring
2012
-- C1 PL
S1 Phil Park O2 Fall 2011
B+ C2 DB
S2 Blem Emily O3 Spring
2012
-- C3 OS
S2 Blem Emily O2 Fall 2011
B+ C2 DB
20
3RD NORMAL FORM
A table is in 3NF if it is in 2NF AND each non-key column depends only on
candidate keys NOT other non-key columns e.g., CourseNr → Course Desc.
3NF violation a non-key column on the right-hand side (RHS) AND anything other than a candidate key on LHS
21
3RD NORMAL FORM
3NF prohibits transitive dependencies Transitive dependencies
if A → B & B → C, then A → C e.g., Offer No → Course No & Course No → Course Desc. then Offer No → Course Desc.
22
COMBINED 2NF & 3NF
A table is in 3NF if each non-key column depends on all candidate keys whole candidate keys and nothing but candidate keys
23
3RD NORMAL FORM
2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr.
3NF Violations CourseNo → Course Descr. OfferNo → Course Descr.
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Spring
2012
-- C1 PL
S1 Phil Park O2 Fall 2011
B+ C2 DB
S2 Blem Emily O3 Spring
2012
-- C3 OS
S2 Blem Emily O2 Fall 2011
B+ C2 DB
24
BOYCE-CODD NORMAL FORM (BCNF)
Revised, simpler version of 3NF Covers additional special cases A table is in BCNF if every determinant is a
candidate key Violations are easy to detect
determinant (LHS) is not a candidate key e.g., StdNo → Last Name
25
BOYCE-CODD NORMAL FORM (BCNF)
Excludes 2 redundancies that 3NF does not1. part of a key determines part of a key2. a non-key determines part of a key
26
BOYCE-CODD NORMAL FORM (BCNF)
StdNo OfferNo Email EnrGrade
S1 O1 [email protected]
3.5
S1 O2 [email protected]
3.6
S2 O1 [email protected]
3.8
S2 O3 [email protected]
3.5 BCNF Violations Email → StdNo
27
SIMPLE SYNTHESIS (BCNF)
Convert tables into BCNF1. Eliminate extraneous columns from LHS of
FDs2. Remove derived (transitive) FDs3. Arrange FDs into groups by determinant4. For each FD group make table with
determinant as primary key5. Merge tables where one table include all
columns of other table choose PK of one of the tables to be PK of new
table
28
BAD COLLEGE DATABASE (1)
StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade
Course No → Course Descr.
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Spring
2012
-- C1 PL
S1 Phil Park O2 Fall 2011
B+ C2 DB
S2 Blem Emily O3 Spring
2012
-- C3 OS
S2 Blem Emily O2 Fall 2011
B+ C2 DB
29
BAD COLLEGE DATABASE (2)
StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade
Course No → Course Descr.
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Spring
2012
-- C1 PL
S1 Phil Park O2 Fall 2011
B+ C2 DB
S2 Blem Emily O3 Spring
2012
-- C3 OS
S2 Blem Emily O2 Fall 2011
B+ C2 DB
30
BAD COLLEGE DATABASE (3)
StdNo → First Name, Last Name OfferNo → Term, Year, Course No Std No, Offer No → Grade Course No → Course Descr.
StdNo
First Name
Last Name
Offer No
Term Year Grade Course No
Course Descr.
S1 Phil Park O1 Spring
2012
-- C1 PL
S1 Phil Park O2 Fall 2011
B+ C2 DB
S2 Blem Emily O3 Spring
2012
-- C3 OS
S2 Blem Emily O2 Fall 2011
B+ C2 DB
31
BAD COLLEGE DATABASE (4)
StdNo First Name Last Name
S1 Emily Blem
S2 Phil Park
Offer No Term Year Course No
O1 Spring 2012 C1
O2 Fall 2011 C2
O3 Spring 2012 C3
StdNo OfferNo Grade
S1 O1 --
S1 O2 B+
S2 O3 --
S2 02 B+
Course No Course Descr.
C1 PL
C2 DB
C3 OS
32
BAD COLLEGE DATABASE (5)
StdNo First Name Last Name
S1 Emily Blem
S2 Phil Park
Offer No Term Year Course No
O1 Spring 2012 C1
O2 Fall 2011 C2
O3 Spring 2012 C3
StdNo OfferNo Grade
S1 O1 --
S1 O2 B+
S2 O3 --
S2 02 B+
Course No Course Descr.
C1 PL
C2 DB
C3 OS
33
IMPORTANCE OF NORMAL FORM VIOLATIONS
We have the BCNF synthesis process we can just make BCNF tables why do we care about detecting NF violations?
DBA has 2 jobs make new databases maintain old ones
Making new DBs requires using BCNF synthesis process
Maintaining old DBs requires detecting NF violations perhaps made by other employees detecting violations narrows scope of DB redesign
QUESTIONS?
34
35
4TH NORMAL FORM (4NF)
M-way relationships associative entity types (weak entities) multiple associations primary key made of FKs from 3 or more tables often represent important documents
glue multiple things together e.g., invoice
can sometimes contain redundancies
36
4TH NORMAL FORM (4NF)
Student
StdNoName
Offering
OfferNoLocation
Textbook
TextNoTextTitle
Enroll
37
4TH NORMAL FORM (4NF)
StdNo OfferNo TextNo
S1 O1 T1
S1 O2 T2
S1 O1 T2
S1 O2 T3
Enroll Table
38
MULTIVALUED DEPENDENCIES (MVDS)
Given table R with columns X,Y, and Z X →→ Y
each X maps to a set of Ys (between 1 and M) X →→ Z
each X maps to a set of Zs (between 1 and M) Y & Z are independent
knowing Y doesn’t tell you anything about Z and vice-versa
Y →→ Z & Y → Z Z →→ Y & Z → Y also Y,V →→ Z, unless V →→ Z
Every FD is an MVD not every MVD is an FD
39
TRIVIAL MVDS
MVD X →→ Y is trivial if Y is a subset of X OR X and Y are the only columns in the table OR X → Y and X → Z
e.g., has-job table E# →→ P#
e.g. offering table C#, S# →→ #S
Employee# Position# Course Number
Section #
Faculty ID
40
MULTIVALUED DEPENDENCES (MVDS) non-trivial MVDs manifest as redundancies
in tables there exist rows where X and Y are the same
but Z is different e.g., enroll table
O# →→ S# O# →→ T# S# independent of T#
if Emily drops 242 it doesn’t change the text books
OfferNo StudentNo TextNo
CS242A Phil
CS242A Emily
CS242A Drozdek
CS242A Weiss
41
MULTIVALUED DEPENDENCES (MVDS) non-trivial MVDs manifest as redundancies
in tables there exist rows where X and Y are the same
but Z is different e.g., enroll table
O# →→ S# O# →→ T# S# independent of T#
if Emily drops 242 it doesn’t change the text books
OfferNo StudentNo TextNo
CS242A Phil Weis
CS242A Emily Drozdek
CS242A Phil Drozdek
CS242A Emily Weiss
42
4TH NORMAL FORM (4NF)
4th normal form table in BCNF AND all MVDs are trivial
Detecting a violation are there any MVDs? are those MVDs non-trivial?
43
4TH NORMAL FORM (4NF) Resolving violations
X →→ Y X →→ Z
X Y Z
X1 Y1 Z1
X1 Y2 Z2
X1 Y2 Z1
X1 Y1 Z2
X Y
X1 Y1
X1 Y2
X Z
X1 Z1
X1 Z2
44
MORE EXAMPLES
Student Offering Grade
Phil CS242A A
Phil CS370A B
Emily CS242A B
Emily CS370A A
S →→ O & S →→ G ?
O →→ G & O →→ S ?
G →→ S & G →→ O ?
45
MORE EXAMPLES
Student Offering Grade
Phil CS242A A
Phil CS370A B
Emily CS242A B
Emily CS370A A
Offering and Grade not independent
Grade and Student not independent
Student and Offering not indepedent
S →→ O & S →→ G ?
O →→ G & O →→ S ?
G →→ S & G →→ O ?
46
MORE EXAMPLES
B →→ E & B →→ C Is this a trivial MVD?
Bank Branch Employee Customer
B3 Ann Ted
B3 Terry Alfred
B3 Ann Alfred
B3 Terry Ted
47
MORE EXAMPLES
B →→ E & B →→ C Is this a trivial MVD?
E is not a subset of B & C is not a subset of B B and E are not the only columns in the table B → E & B → C NO!!!
Bank Branch Employee Customer
B3 Ann Ted
B3 Terry Alfred
B3 Ann Alfred
B3 Terry Ted
48
MORE EXAMPLESBank Branch Employee Customer
B3 Ann Ted
B3 Terry Alfred
B3 Ann Alfred
B3 Terry Ted
Bank Branch Employee
B3 Ann
B3 Terry
Bank Branch Customer
B3 Ted
B3 Alfred
QUESTIONS?
49
50
QUIZ BREAK!!!
Part# PQty PDesc
P1 2 5mm bolt
P2 4 10mm nut
P3 2 5mm wrench
P4 4 8mm washer
PQty →→ PDesc & PQty →→ Part# ?
51
QUIZ BREAK!!!
Loc # Item Managers
L1 XBox 360 250GB
Cindy
L1 Garmin GPS Aaron
L1 XBox 360 250GB
Aaron
L1 Garmin GPS Cindy
52
EXTRA 4NF SLIDES
53
4TH NORMAL FORM (4NF)
Relationship independence 2 relationships are independent if one cannot be
derived from the other knowing one relationship tells you nothing about
the other
54
4TH NORMAL FORM (4NF)
StdNo OfferNo TextNo
S1 O1 T1
S1 O2 T2
S1 O1 T2
S1 O2 T3
Enroll Table
3 relationships StdNo -- OfferNo StdNo -- TextNo OfferNo -- TextNo
55
4TH NORMAL FORM (4NF)
StdNo -- OfferNo cannot be derived from other 2 StdNo -- TextNo & TextNo -- OfferNo
same textbook can be use for 2 offerings
OfferNo -- TextNo cannot be derived from other 2 OfferNo -- StdNo & StdNo -- TextNo
students use many text books, not all related to this offering
StdNo -- TextNo can be derived StdNo -- OfferNo & OfferNo -- TextNo
offering number gives the set of texts a student needs
56
4TH NORMAL FORM (4NF)
Multivalued Dependencies (MVDs) each X can map to a set of Ys and a set of Zs generalization of functional dependencies
each X maps to one Y each X maps to one Z
represented by X→→Y|Z every FD is an MVD
known as a trivial MVD not every MVD is an FD
57
4TH NORMAL FORM (4NF)
M-way tables sometimes introduce MVDs X →→Y X→→Z X→→Y|Z Y and Z are independent
relationship X--Y is independent of relationship X--Z
Not all M-way tables produce MVDs
58
4TH NORMAL FORM (4NF)
MVD Table Redundancies assume X1 maps to Y1 & Y2 and X1 maps to Z1 & Z2
X Y Z
X1 Y1
X1 Y2
X1 Z1
X1 Z2
59
4TH NORMAL FORM (4NF)
Need to fill in the rest of the table
X Y Z
X1 Y1 Z1
X1 Y2 Z2
X1 Y2 Z1
X1 Y1 Z2
60
4TH NORMAL FORM (4NF)
Rows below the line exist because relationship B--C can be derived from relationships A--B & A--C
Rows below line are redundant
X Y Z
X1 Y1 Z1
X1 Y2 Z2
X1 Y2 Z1
X1 Y1 Z2
61
4TH NORMAL FORM (4NF)
OfferNo StdNo TextNo
O1 S1 T1
O1 S2 T2
O1 S2 T1
O1 S1 T2
Enroll Table
OfferNo→→StdNo|TextNo offerings map to many students offerings can have many text books
Rows below the line are redundant
62
4TH NORMAL FORM (4NF)
4NF definition tables cannot contain any non-trivial MVDs
Resolving 4NF violations for each table with a non-trivial MVD split 3 column table into two 2 column tables A,B,C goes to A,B & A,C
StdNo OfferNo
S1 O1
S1 O2
OfferNo TextNo
O1 T1
O1 T2
O2 T1
O2 T3