functional dependencies and normalization 1 instructor: mohamed eltabakh [email protected]
TRANSCRIPT
What to Cover
Functional Dependencies (FDs)
Closure of Functional Dependencies
Lossy & Lossless Decomposition
Normalization2
Decomposing Relations
Greg
Dave
sName
p2
p1
pNumber
MMs2
MMs1
pNamesNumberStudentProf
FDs: pNumber pName
Greg
Dave
sName
p2
p1
pNumber
s2
s1
sNumber
Student
p2
p1
pNumber
MM
MM
pName
Professor
Greg
Dave
sName
MM
MM
pName
S2
S1
sNumber
Student
p2
p1
pNumber
MM
MM
pName
Professor
3
LosslessLossless
LossyLossy
Lossless vs. Lossy Decomposition
Assume R is divided into R1 and R2
Lossless Decomposition R1 natural join R2 should create exactly R
Lossy Decomposition R1 natural join R2 adds more records (or deletes
records) from R
4
Lossless Decomposition
5
Greg
Dave
sName
p2
p1
pNumber
MMs2
MMs1
pNamesNumberStudentProf
FDs: pNumber pName
Greg
Dave
sName
p2
p1
pNumber
s2
s1
sNumber
Student
p2
p1
pNumber
MM
MM
pName
ProfessorLosslessLossless
Student & Professor are lossless decomposition of StudentProf(Student Professor = StudentProf)⋈
Lossy Decomposition
6
Greg
Dave
sName
p2
p1
pNumber
MMs2
MMs1
pNamesNumberStudentProf
FDs: pNumber pName
Greg
Dave
sName
MM
MM
pName
S2
S1
sNumber
Student
p2
p1
pNumber
MM
MM
pName
ProfessorLossyLossy
Student & Professor are lossy decomposition of StudentProf(Student Professor != StudentProf)⋈
Goal: Ensure Lossless Decomposition
How to ensure lossless decomposition?
Answer: The common columns must be candidate key in
one of the two relations
7
Back to our example
Greg
Dave
sName
p2
p1
pNumber
MMs2
MMs1
pNamesNumberStudentProf
FDs: pNumber pName
Greg
Dave
sName
p2
p1
pNumber
s2
s1
sNumber
Student
p2
p1
pNumber
MM
MM
pName
Professor
Greg
Dave
sName
MM
MM
pName
S2
S1
sNumber
Student
p2
p1
pNumber
MM
MM
pName
Professor
8
LosslessLossless
LossyLossy
pNumber is candidate key
pName is not candidate key
What to Cover
Functional Dependencies (FDs)
Closure of Functional Dependencies
Lossy & Lossless Decomposition
Normalization9
Normalization
First Normal Form (1NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Canonical Cover of FDs
11
Normalization Set of rules to avoid “bad” schema design
Decide whether a particular relation R is in “good” form If not, decompose R to be in a “good” form
Several levels of normalization First Normal Form (1NF) BCNF Third Normal Form (3NF) Fourth Normal Form (4NF)
If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized
12
First Normal Form (1NF) Attribute domain is atomic if its elements are considered to
be indivisible units (primitive attributes)
Examples of non-atomic domains are multi-valued and composite attributes
A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic
13
We assume all relations are in 1NFWe assume all relations are in 1NF
Boyce-Codd Normal Form (BCNF): Definition
A relation schema R is in BCNF with respect to a
set F of functional dependencies if for all functional
dependencies in F+ of the form
α → β
where α ⊆ R and β ⊆ R, then at least one of the
following holds:
α → β is trivial (i.e.,β α) ⊆
α is a superkey for R
15
Remember:Candidate keys are also
superkeys
Remember:Candidate keys are also
superkeys
BCNF: Example
16
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 ER
s3 Mike p1 MM
Student
Student Info Professor Info
Is relation Student in BCNF given pNumber pName It is not trivial FD pNumber is not a key in Student relation
How to fix it and make it in BCNF???
NONO
Decomposing a Schema into BCNF
If R is not in BCNF because of non-trivial dependency α → β, then decompose R
R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α
17
Example of BCNF Decomposition
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 MM
StudentProf
FDs: pNumber pName
sNumber sName pNumber
s1 Dave p1
s2 Greg p2
Student
pNumber pName
p1 MM
p2 MM
Professor
FOREIGN KEY: Student (PNum) references Professor (PNum)
18
What is Nice about this Decomposing ???
R is decomposed into two relations R1 = (α U β ) -- α is super key in R1 R2 = (R- (β - α)) -- R2.α is foreign keys to R1.α
19
This decomposition is lossless(Because R1 and R2 can be joined based on α, and α is
unique in R1)
This decomposition is lossless(Because R1 and R2 can be joined based on α, and α is
unique in R1)
When you join R1 and R2 on α, you get R back without lose of information
StudentProf = Student ⋈Professor
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 MM
StudentProf
FDs: pNumber pName
sNumber sName pNumber
s1 Dave p1
s2 Greg p2
Student
pNumber pName
p1 MM
p2 MM
Professor
BCNF decomposition rule create lossless decomposition
20
Multi-Step Decomposition Relation R and functional dependency F
R = (customer_name, loan_number, branch_name, branch_city, assets, amount ) F = {branch_name assets branch_city,
loan_number amount branch_name}
Is R in BCNF ??
Based on branch_name assets branch_city R1 = (branch_name, assets, branch_city) R2 = (customer_name, loan_number, branch_name, amount)
Are R1 and R2 in BCNF ?
Divide R2 based on loan_number amount branch_name R3 = (loan_number, amount, branch_name) R4 = (customer_name, loan_number)
21
NONO
R2 is not R2 is not
Final Schema has R1, R3, R4Final Schema has R1, R3, R4
What is NOT Nice about BCNF
Before decomposition, we had set of functional dependencies FDs (Say F)
22
After decomposition, do we still have the same set of FDs or we lost something ??
What is NOT Nice about BCNF
Dependency Preservation After the decomposition, all FDs in F+ should be preserved
BCNF does not guarantee dependency preservation
Can we always find a decomposition that is both BCNF and preserving dependencies? No…This decomposition may not exist That is why we study a weaker normal form called (third
normal form –3NF)
23
Dependency Preserving
Assume R is decomposed to R1 and R2
Dependencies of R1 and R2 include: Local dependencies α → β
All columns of α and β must be in a single relation
Global Dependencies Use transitivity property to form more FDs across R1 and R2
relations
24
Does these dependencies match the ones in R ?
Yes Dependency preserving
No Not dependency preserving
Example of Lost FD Assume relation R(C, S, J, D, T, Q, V)
C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF
Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)
Does C CSJDTQV still exist? Yes: C CSJDQV (local), SDT (local), C CSJDQVT
(global)
25
Lossless & in BCNFLossless & in BCNF
Example of Lost FD (Cont’d) Assume relation R(C, S, J, D, T, Q, V)
C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF
Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)
Does SD T still exist? Yes: SDT (local)
26
Lossless & in BCNFLossless & in BCNF
Example of Lost FD (Cont’d) Assume relation R(C, S, J, D, T, Q, V)
C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF
Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)
Does JT CSJDTQV still exist? No this one is lost (no way from the local FDs to get this one)
27
Lossless & in BCNFLossless & in BCNF
Dependency Preservation Test
Assume R is decomposed into R1 and R2
The closure of FDs in R is F+
The FDs in R1 and R2 are FR1 and FR2, respectively
Then dependencies are preserved if: F+ = (FR1 union FR2)+
28
local dependencies in R1
local dependencies in R2
Back to Our Example Assume relation R(C, S, J, D, T, Q, V)
C is key, JT C and SD T C CSJDTQV (C is key) -- Good for BCNF JT CSJDTQV (JT is key) -- Good for BCNF SD T (SD is not a key) –Bad for BCNF
Decomposition: R1(C, S, J, D, Q, V) and R2(S, D, T)
F+ = {C CSJDTQV, JT CSJDTQV, SD T} FR1 = {C CSJDQV} local for R1 FR2 = {SD T} local for R2 FR1 U FR2 = {C CSJDQV, SD T} (FR1 U FR2)+ = {C CSJDQV, SD T, C T}
29
JT C is still missing
JT C is still missing
Dependency Preservation
BCNF does not necessarily preserve FDs.But 3NF is guaranteed to be able to preserve FDs.
30
Normalization
First Normal Form (1NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Canonical Cover of FDs
31
Third Normal Form: Motivation
There are some situations where BCNF is not dependency preserving
Solution: Define a weaker normal form, called Third Normal Form (3NF) Allows some redundancy (we will see examples later) But all FDs are preserved
32
There is always a lossless, dependency-preserving decomposition in 3NF
There is always a lossless, dependency-preserving decomposition in 3NF
Normal Form : 3NF
Relation R is in 3NF if, for every FD in F+ α β,
where α ⊆ R and β ⊆ R, at least one of the following holds:
α → β is trivial (i.e.,β α) ⊆
α is a superkey for R
Each attribute in β-α is part of a candidate key (prime attribute)
33
L.H.S is superkey ORR.H.S consists of prime attributes
L.H.S is superkey ORR.H.S consists of prime attributes
Testing for 3NF
Use attribute closure to check for each dependency α → β, if α is a superkey
If α is not a superkey, we have to verify if each attribute in (β- α) is contained in a candidate key of R
34
3NF: ExampleLot (ID, county, lotNum, area, price, taxRate)
Primary key: IDCandidate key: <county, lotNum>
FDs: county taxRatearea price
Decomposition based on county taxRateLot (ID, county, lotNum, area, price)County (county, taxRate)
35
Is relation Lot in 3NF ? NONO
Are relations Lot and County in 3NF ? Lot is not Lot is not
3NF: Example (Cont’d)Lot (ID, county, lotNum, area, price)County (county, taxRate)
Candidate key for Lot: <county, lotNum>FDs:
county taxRatearea price
Decompose Lot based on area priceLot (ID, county, lotNum, area)County (county, taxRate)Area (area, price)
36
Is every relation in 3NF ? YESYES
Comparison between 3NF & BCNF ?
If R is in BCNF, obviously R is in 3NF
If R is in 3NF, R may not be in BCNF
3NF allows some redundancy and is weaker than BCNF
3NF is a compromise to use when BCNF with good constraint enforcement is not achievable
Important: Lossless, dependency-preserving decomposition of R into a collection of 3NF relations always possible !
37
Normalization
First Normal Form (1NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Canonical Cover of FDs
38
Canonical Cover of FDs Canonical Cover (Minimal Cover) = G
Is the smallest set of FDs that produce the same F+
There are no extra attributes in the L.H.S or R.H.S of and dependency in G
Given set of FDs (F) with functional closure F+
Canonical cover of F is the minimal subset of FDs (G), where
G+ = F+
40
Every FD in the canonical cover is needed, otherwise some dependencies are lost
Every FD in the canonical cover is needed, otherwise some dependencies are lost
Example : Canonical Cover
Given F: A B, ABCD E, EF GH, ACDF EG
Then the canonical cover G: A B, ACD E, EF GH
41
The smallest set (minimal) of FDs that can generate F+
The smallest set (minimal) of FDs that can generate F+
Computing the Canonical Cover
Given a set of functional dependencies F, how to compute the canonical cover G
42
Example : Canonical Cover(Lets Check L.H.S) Given F = {A B, ABCD E, EF G, EF H, ACDF EG}
Union Step: {A B, ABCD E, EF GH, ACDF EG}
Test ABCD E Check A:
{BCD}+ = {BCD} A cannot be deleted Check B:
{ACD}+ = {A B C D E} Then B can be deleted
Now the set is: {A B, ACD E, EF GH, ACDF EG}
Test ACD E Check C:
{AD}+ = {ABD} C cannot be deleted Check D:
{AC}+ = {ABC} D cannot be deleted43
Example: Canonical Cover(Lets Check L.H.S-Cont’d)
Now the set is: {A B, ACD E, EF GH, ACDF EG}
Test EF GH Check E:
{F}+ = {F} E cannot be deleted
Check F: {E}+ = {E} F cannot be deleted
Test ACDF EG None of the H.L.S can be deleted
44
Example: Canonical Cover(Lets Check R.H.S) Now the set is: {A B, ACD E, EF GH, ACDF EG}
Test EF GH Check G:
{EF}+ = {E F H} G cannot be deleted Check H:
{EF}+ = {E F G} H cannot be deleted
Test ACDF EG Check E:
{ACDF}+ = {A B C D F E G} E can be deleted
Now the set is: {A B, ACD E, EF GH, ACDF G}
45
Example: Canonical Cover(Lets Check R.H.S-Cont’d) Now the set is: {A B, ACD E, EF GH, ACDF
G}
Test ACDF G Check G:
{ACDF}+ = {A B C D F E G} G can be deleted
Now the set is: {A B, ACD E, EF GH}
46
The canonical cover is:{A B, ACD E, EF GH}
The canonical cover is:{A B, ACD E, EF GH}
Canonical Cover
Used to find the smallest (minimal) set of FDs that have the same closure as the original set.
Used in the decomposition of relations to be in 3NF
The resulting decomposition is lossless and dependency preserving
47
Done with Normalization
First Normal Form (1NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Canonical Cover of FDs
48
What You Learned
Data Models Entity-Relationship Model & ERD Relational Model
Conversion between the data models
Relational Algebra & Operators
Structured Query Language SQL DML: Data Manipulation Language DDL: Data Definition Language
50
What You Learned (Cont’d)
Advanced SQL Triggers, Views, Cursors, Stored Procedures and Functions PL/SQL
Functional Dependencies
Normalization Rules
51