7 copyright © 2006, oracle. all rights reserved. normalization of relational tables (part i)
TRANSCRIPT
7Copyright © 2006, Oracle. All rights reserved.
Normalization
of Relational Tables
(Part I)
7.1 - 2
Outline
• Modification anomalies ( 修改的異常 )
• Functional dependencies ( 函數性的依賴 )
• Major normal forms
• Practical concerns ( 實務的考量 )
7.1 - 3
Modification Anomalies( 修改的異常 : 修改資料時所發生的異常現象 )
• Definition:– Unexpected side effects (未預期到的副作用 ) that occurs
when changing the data in a table designed with
excessive redundancy (額外的多餘性、累贅性 ).
• Result of side effect
– Insert, update, and delete more data than desired
• Types
– Insertion Anomaly (新增的異常 )
– Update Anomaly (更新的異常 )
– Deletion Anomaly (刪除的異常 )
7.1 - 4
Example of a Poor Table Design(Big University Database)
• PK design: combination of StdSSN and OfferNo• Pros
• Easier to query (no join is needed)– enrollments of student S1 or S2– students of offering O2– students or offerings of course C2
• Cons• Table has obvious redundancies (shown by blocks in colors)
– Result : more difficult to change
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc
S1 Seattle JUN O1 Fall 2006 3.5 C1 DB
S1 Seattle JUN O2 Fall 2006 3.3 C2 VB
S2 Bothell JUN O3 Spring 2007 3.1 C3 OO
S2 Bothell JUN O2 Fall 2006 3.4 C2 VB
7.1 - 5
Insertion Anomaly( 新增的異常 )
Definition:• In an insertion, extra data beyond the desired data may be
added to the database.
Example:• Cannot insert a new student without enrolling in an
offering (Because OfferNo is part of PK)– Insert more column data than desired
• Other example ?• Why ? each data row denotes student, offering, course, enrollment.
PK consists of StdSSN denoting student and OfferNo denoting offering
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc
S1 Seattle JUN O1 Fall 2006 3.5 C1 DB
S1 Seattle JUN O2 Fall 2006 3.3 C2 VB
S2 Bothell JUN O3 Spring 2007 3.1 C3 OO
S2 Bothell JUN O2 Fall 2006 3.4 C2 VB
7.1 - 6
Update Anomaly( 更新的異常 )
Definition: • In order to modify only a single fact, it may be
necessary to change multiple rows.
Example:• If changing a course description, it must change
every enrollment of the course– Try to change C2’s course description, ….
• Other example ?
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc
S1 Seattle JUN O1 Fall 2006 3.5 C1 DB
S1 Seattle JUN O2 Fall 2006 3.3 C2 VB
S2 Bothell JUN O3 Spring 2007 3.1 C3 OO
S2 Bothell JUN O2 Fall 2006 3.4 C2 VB
colored table
7.1 - 7
Deletion Anomaly( 刪除的異常 )
Definition: • Deleting a row may inadvertently ( 不注意地 ) cause
other data to be deleted.
Example:• If we remove enrollment of student S2 in offering O3,
causing loss of information about offering O3 and course C3
• Other example ?
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc
S1 Seattle JUN O1 Fall 2006 3.5 C1 DB
S1 Seattle JUN O2 Fall 2006 3.3 C2 VB
S2 Bothell JUN O3 Spring 2007 3.1 C3 OO
S2 Bothell JUN O2 Fall 2006 3.4 C2 VB
7.1 - 8
StdSSN OfferNo EnrGrade
S1 O1 3.5
S1 O2 3.3
S2 O3 3.1
S2 O2 3.4
OfferNo OffYear CourseNo
O1 MW C1
O2 MW C2
O3 MW C3
Table Name: Offering
Table Name: Enrollment
Table Name: Student
StdSSN StdLastName StdClass
S1 WELLS JUN
S2 NORBERT JUN
S3 KENDALL JUN
CourseNo CreDesc
C1 DB
C2 VB
C3 OO
Table Name: Course
Example of a Better Table Design(Big University Database : 4 Tables Denoting 4 Objects+FKs )
DeleteAnomaly?
UpdateAnomaly?
InsertAnomaly?
InsertAnomaly?
7.1 - 9
Normalization( 正規化 )
• A good database design ensures the users can change the contents of a database without unexpected side effects (modification anomalies).
– A better solution is to modify the table design to remove the redundancies that cause the anomalies.
• Normalization:– The process of removing redundancies in a
table so that the table is easier to modify.
7.1 - 10
Constraints of Database Content
• Value-based constraintsA comparison of a column to a constant– Example: Age >= 21
• Value-neutral constraintsA comparison of columns (column to column)– PK (entity integrity constraint)
— Constraint about the PK column of one or more rows
– FK (referential integrity constraint)— Constraint about parent PK and child FK of one or more
rows
– Functional dependency ( 函數性的依賴 )— Constraint about two or more columns of a table
7.1 - 11
Functional Dependency( 函數性的依賴 )
“X determines Y” is denoted as X Y• For each X value, there is at most one Y value
• X: left-hand-side (LHS) or determinant (決定項 )
• Y: right-hand-side (RHS)• Like a mathematical function: Y = f (X)
– f : like a table
– X : like the key of a table
– Y : like a column of a table
• Example: StdSSN StdName
StdSSN StdClass
7.1 - 12
Functional Dependency(FD)
• Think about functional dependencies as
identifying potential candidate keys
• X Y denotes an FD between columns X and Y
– If X and Y are placed together in a table without other
columns, X is a candidate key.
7.1 - 13
StdSSN, OfferNo EnrGradeStdSSN StdCity, StdClassOfferNo OffTerm, OffYear, CourseNo, CrsDescCourseNo CrsDesc
Functional Dependency Diagram and List
( 函數依賴圖和函數依賴清單 )
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGradeCourseNo CrsDesc
Functional Dependency DiagramFunctional Dependency Diagram
List of Functional DependenciesList of Functional Dependencies
Table Scheme
7.1 - 14
How to Identify Functional Dependencies
• Deriving from uniqueness statement
• Deriving from 1-M relationships
• Considering minimalism (極簡化 ) of FD’s
LHS (Determinant)
7.1 - 15
How to Identify Functional Dependencies
• Deriving from uniqueness statement
Example:
– A user may state that each course offering has a
unique offering number along with the year and
term of the offering : OfferNo OfferYear, OfferTerm
7.1 - 16
How to Identify Functional Dependencies
Deriving from 1-M relationships
• For an 1-M relationship, an FD exists in
– the child table-to-parent table direction
(not the parent-to-child direction)
– Because each LHS value of an FD can be
associated with at most one RHS value.
– Example:
A faculty teaches many offerings, but an offering
is taught by one teacher : OfferNo FacNo
7.1 - 17
Minimalism (極簡化 ) of FD’s LHS (Determinant)
• The determinant of an FD
(Columns appearing at the LHS of an FD)
– Must be minimal (can not contain extra columns)
• One column vs. a combination of columns
– An FD in which the LHS contains more than one
column may represent an M-N relationship.
– Example : OrdNo, ProdNo OrdQty
Order quantity depends on the combination of order
number and product number.
How to Identify Functional Dependencies
7.1 - 18
Eliminating FDs Using Sample Data
• An FD cannot exist, If
– two rows of a table have the same value for the LHS
but different values for the RHS of the FD
• A FD cannot be proven to exist by only examining
the rows of a table.
• However you can falsify ( 否定 ) an FD by
examining the content of a table.
– Using sample data to eliminate potential FDs
7.1 - 19
Eliminating FDs Using Sample Data
Disprove X Y • Two rows that have the same X value but a different
Y value
• Example:
OfferNo OfferNo StdSSN (?) StdSSN (?)
StdSSN StdSSN OffYear (?) OffYear (?)
StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc
S1 JUN O1 2006 3.5 C1 DB
S1 JUN O2 2006 3.3 C2 VB
S2 JUN O3 2007 3.1 C3 OO
S2 JUN O2 2006 3.4 C2 VB
7.1 - 20
Normal Forms
• Normalization : the process of removing redundancies in a table so that the table is easier to modify
• A normal form is a rule about allowable FDs in tables.
• Each normal form removes certain kinds of redundancies.
• First normal form (1NF) is the starting point.
• Second Normal Form (2NF) is stronger (嚴格 ) than 1NF.– Only a subset of the 1NF tables is in 2NF.
• 3NF/BCNF is the most important in practice because higher normal forms than 3NF/BCNF involve other kinds of FDs that are less common and more difficult to understand.
7.1 - 21
Relationships of Normal Forms
1NF
2NF
3NF/BCNF
4NF
5NF
DKNF
7.1 - 22
First Normal Form(1NF, 第一正規化型式 )
• 1NF prohibits nesting or repeating data groups
in a table
• Starting point of normalization for most
relational DBMSs
– Most commercial DBMSs use 1NF tables
• A table not in 1NF is unnormalized (未正規化的 )
or nonormalized (無正規化的 ).
7.1 - 23
First Normal Form
Table above is not normalized (not in 1NF)
• The table has 2 rows
Containing repeating groups or nested columns.– S1 row has 5 nested columns (OfferNo, OffYear, …)
– S2 row has 5 nested columns (OfferNo, OffYear, …)
StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc
S1 JUN O1 2006 3.5 C1 DB
O2 2006 3.3 C2 VB
S2 JUN O3 2007 3.1 C3 OO
O2 2006 3.4 C2 VB
7.1 - 24
Convert to First Normal Form
• Replace each repeating group with a row
• In a new row, copy the nonrepeating columns
– (S1, JUN) for row two with (O2, 2006, 3.3, C2, VB)
– (S2, JUN) for row four with (O2, 2006, 3.4, C2, VB)
• Redefine PK if necessary
StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc
S1 JUN O1 2006 3.5 C1 DB
S1 JUN O2 2006 3.3 C2 VB
S2 JUN O3 2007 3.1 C3 OO
S2 JUN O2 2006 3.4 C2 VB
7.1 - 25
Second Normal Form (2NF, 第二正規化型式 )
• Goal of 2NF and 3NF
produces tables in which every key determines
the other columns
• The definition of 2NF and 3NF distinguish
between key and nonkey columns.
– A column is a key column if it is a candidate key or
a part of candidate key
– A nonkey column is any other column.
7.1 - 26
Second Normal Form (2NF, 第二正規化型式 )
• Partial Dependency ( 部分的依賴 )
A nonkey column depends on a subset of columns in
any candidate key
(A part of a compound key → A nonkey column)
• A table is in 2NF if (no partial dependency exists)
– the key contains only one column, or
– each nonkey column depends on all of the columns
in any candidate key, not a subset of columns in
any candidate key
7.1 - 27
Second Normal Form
• Violation of 2NF (partial dependency exists)
– A part of a compound key A nonkey column
– Only for checking compound keys ( 組合索引鍵 )
• A key containing only one column cannot violate 2NF
(A table containing a sinlge-column key cannot violate 2NF)
• Steps for converting to 2NF
1. Analyze FDs
2. Find violating FDs of 2NF (FD1, FD2 in next slide)
3. Splitting the original table into small tables that satisfy the 2NF definition
(Split the columns of every violating FD into a new table)
7.1 - 28
Convert to Second Normal Form(Analyze FDs)
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGradeCourseNo CrsDesc
FD 1 FD 2
FD 3
FD 4
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc
S1 Seattle JUN O1 Fall 2006 3.5 C1 DB
S1 Seattle JUN O2 Fall 2006 3.3 C2 VB
S2 Bothell JUN O3 Spring 2007 3.1 C3 OO
S2 Bothell JUN O2 Fall 2006 3.4 C2 VB
PK = ?
Any Partial Dependency? FD1, FD2, FD3, FD4 ?
7.1 - 29
Convert to Second Normal Form(Splitting Original Table )
• Splitting the original table into small tables that
satisfy the 2NF definition
– In each smaller table, the entire primary key
should determine the nonkey columns
– The original table should be recoverable by using
natural join operations on the smaller tables
– The FDs in the original table should be derivable
from the FDs in the smaller tables.
• The splitting process involves the project
operator of relational algebra
7.1 - 30
Convert to Second Normal Form
After splitting, you should add referential integrity constraints to connect the tables.
UnivTable1 (StdSSN, StdCity, StdClass)
UnivTable2 (OfferNo, OffTerm, OffYear, CourseNo, CrsDesc) UnivTable3 (StdSSN, OfferNo, EnrGrade)
FOREIGN KEY (StdSSN) REFERENCES UnivTable1FOREIGN KEY (OfferNo) REFERENCES UnivTable2
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGradeCourseNo CrsDesc
7.1 - 31
Third Normal Form (3NF)
• A table is in 3NF if– It is in 2NF (no partial dependency) and
– Each nonkey column depends only on candidate keys, not on other nonkey columns.
(no transitive dependency)
• Transitive Dependency ( 傳遞 / 遞移的依賴 )– Nonkey column depends on other nonkey columns
– If A B, B C, then A C. So, A C is a transitive dependency, and B C causes a violation of 3NF.
OfferNo CourseNo, CourseNo CrsDesc
OfferNo CrsDesc
7.1 - 32
Convert to Third Normal Form
Consider UnivTable2
OfferNo CourseNo
CourseNo CrsDesc
OfferNo CrsDesc
causes a violation of 3NF in UnivTable2CourseNo CrsDesc
UnivTable2 (OfferNo, OffTerm, OffYear, CourseNo, CrsDesc)
OfferNo OffTerm OffYear CourseNo CrsDesc
7.1 - 33
Convert to Third Normal Form
UnivTable2-1 (CourseNo, CrsDesc)
UnivTable2-2 (OfferNo, OffTerm, OffYear, CourseNo)FOREIGN KEY (CourseNo) REFERENCES UnivTable2-1
UnivTable2 (OfferNo, OffTerm, OffYear, CourseNo, CrsDesc)
OfferNo OffTerm OffYear CourseNo CrsDesc
Steps for converting to 3NF
1. Find violating FDs of 3NF
2. Splitting the original table into small tables that satisfy the 3NF definition(Split the columns of every violation FD into a new table)
7.1 - 34
自我練習作業
HW HW 第七章第七章 239239 頁頁 Questions: 1, 2, 3Questions: 1, 2, 3, 14, 15, 24, 14, 15, 24