lecture 5 - babanski.com · type = pc (product)) = ø . functional dependencies • set of relation...
TRANSCRIPT
Lecture 5
CS4411: Databases II
• Execrcises from Relational Algebra • Functional Dependencies
Agenda
Relational Algebra: Exercises
§ Select 𝜎<𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑐𝑠𝑠𝑠𝑠𝑠> 𝑅 • Project 𝜋<𝑎𝑠𝑠𝑎𝑠𝑎𝑎𝑠𝑠 𝑠𝑠𝑠𝑠> 𝑅 • Rename 𝜌<𝑠𝑠𝑛 𝑠𝑠𝑠𝑠𝑠𝑎> 𝑅 § Union 𝑅 ∪ 𝑆 § Intersection 𝑅 ∩ 𝑆 • Difference 𝑅 − 𝑆 • Cross product 𝑅 × 𝑆 • Join R ⋈<𝑗𝑠𝑠𝑠 𝑠𝑠𝑠𝑐𝑠𝑠𝑠𝑠𝑠> 𝑆 • Natural join 𝑅 ⋈ 𝑆 • Division 𝑅 ÷ 𝑆
Review of Operators
Exercises
Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:
• Union: 𝑅 ∪ 𝑆
If all the tuples of R and S are different, then the union has maximum:
n + m tuples.
The minimum number of tuples that can appear in the result occurs if every tuple of one relation also appears in the other. Then the union has:
max(n, m) tuples.
Exercises
Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:
• Natural join: 𝑅 ⋈ 𝑆
If all the tuples in one relation can pair successfully with all the tuples in the other relation, then the natural join has:
n * m tuples.
The minimum number of tuples that can appear in the result occurs if none of the tuples of one relation can pair successfully with all the tuples in the other relation. Then the natural join has:
zero tuples.
Exercises
Suppose relations R and S have n tuples and m tuples respectively. Give the minimum and maximum number of tuples that the results of the following expressions can have:
• 𝜎𝐶 𝑅 x S
If the condition C brings back all the tuples of R, then the cross product will contain: n * m tuples. If the condition C brings back none of the tuples of R, then the cross product will contain: zero tuples.
Exercises
We define r as the schema of R and s as the schema of S:
(𝑅 ⋈ 𝑆) πr
δ(πr∩s(S)) where δ is the duplicate-elimination operator 𝑅 ⋈
R –(R – πr(𝑅 ⋈ 𝑆))
Exercises
πr(𝑅 ⋈ 𝑆) R –
We define r as the schema of R
Exercises
• No manufacturer of PC's may also make laptops
πmaker(σtype = laptop(Product)) ∩
πmaker(σtype = pc(Product)) = ø
Functional Dependencies
• Set of relation schemas obtained by translating from an E/R diagram might need improvement – modeling with E/R diagrams is an art, not a science; relies
on experience and intuition – multiple alternative designs are possible, how to choose?
• Problems caused by redundant storage of info – wasted space – anomalies when updating, inserting or deleting tuples
Basic idea: replace a relation schema with a collection of "smaller" schemas. This is called decomposition.
Improving Relation Schemas
• There is a theory for systematically guiding the improvement of relational designs, called normalization
• Normalization uses the notion of integrity constraints (ICs) on the information – functional dependencies – multi-valued dependencies – referential integrity constraints
Improving Relation Schemas
A database schema is in First Normal Form if all tables are flat
Name GPA Courses
Alice 3.8
Bob 3.7
Carol 3.9
Math
DB
OS
DB
OS
Math
OS
Student Name GPA
Alice 3.8
Bob 3.7
Carol 3.9
Student
Course
Math
DB
OS Student Course
Alice Math
Carol Math
Alice DB
Bob DB
Alice OS
Carol OS
Takes
Course
May need to add keys
First Normal Form (1NF)
• Based on Functional Dependencies – 2nd Normal Form (obsolete) – 3rd Normal Form – Boyce Codd Normal Form (BCNF)
• Based on Multivalued Dependencies – 4th Normal Form
• Based on Join Dependencies – 5th Normal Form
Discuss next
Normal Forms
• A form of constraint (hence, part of the schema) • Finding them is part of the database design
Functional Dependencies
Functional Dependency:
A1, A2, …, An à B1, B2, …, Bm
Meaning: If two tuples agree on the attributes
then they must also agree on the attributes A1, A2, …, An
B1, B2, …, Bm
We say "functionally determine"
Definition: A1, ..., An à B1, ..., Bm holds in R if: "t, t’ Î R, (t.A1=t’.A1 Ù ... Ù t.An=t’.An Þ t.B1=t’.B1 Ù ... Ù t.Bm=t’.Bm )
A1 ... An B1 ... Bm
if t, t’ agree here
then t, t’ agree here
t
t’
R
Functional Dependencies
• This relation satisfies C→T but does not satisfy S→T
• In general, a schema in which for a given relation R there are two properties A,B such that A→B but A is not a key is problematic!
S C T
Bart Math Mrs. Krabappel
Lisa Math Mrs. Krabappel
Lisa Logic Ms. Hoover
Example
• EmpID à Name, Phone, Position • Position à Phone • but Phone à Position
EmplID Name Phone Position
E0045 Smith 1234 Clerk
E1881 John 9876 SalesRep
E1111 Smith 9876 SalesRep
E9999 Mary 1234 Lawyer
Example
FD’s are constraints: • On some instances they hold • On others they don’t
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Green Toys 99
Does this instance satisfy all the FDs ?
name à color category à department color, category à price
Example
Consider these FDs:
Product(name, category, color, department, price)
No: color, category à price doesn’t hold
FD’s are constraints: • On some instances they hold • On others they don’t
Does this instance satisfy all the FDs ?
name à color category à department color, category à price
Example
Consider these FDs:
Product(name, category, color, department, price)
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Black Toys 99
Gizmo Stationary Green Office-supp. 59
Yes!
If some FDs are satisfied, then others are satisfied too
If all these FDs are true: name à color category à department color, category à price
Then this FD also holds: name, category à price
Why ?? We say that the new FD is implied
Inference
Is equivalent to
Splitting rule and Combing rule
A1 ... An B1 ... Bm
A1, A2, …, An à B1, B2, …, Bm
A1, A2, …, An à B1 A1, A2, …, An à B2 . . . . . A1, A2, …, An à Bm
Inference
Trivial Rule
A1 … Am where i = 1, 2, ..., n
A1, A2, …, An à Ai
Inference
Ai à Ai
Transitive Closure Rule
If A1, A2, …, An à B1, B2, …, Bm
and B1, B2, …, Bm à C1, C2, …, Cp
then A1, A2, …, An à C1, C2, …, Cp
A1 … An B1 … Bm C1 ... Cp
Inference
From: 1. name à color 2. category à department 3. color, category à price
name, category à price To:
Example
Inferred FD Which Rule did we apply ?
4. name, category à name Trivial rule 5. name, category à color Transitivity on 4, 1 6. name, category à category Trivial rule 7. name, category à color, category Split/combine on 5, 6 8. name, category à price Transitivity on 3, 7
These inference rules are called Armstrong’s axioms:
ü Reflexivity: ü if X is a set of attributes of R and YÍX, then X®Y
ü Insertion (Augmentation): • if R satisfies X®Y for two sets of attributes X,Y of R, then
for every set of attributes Z in R it holds that XZ®YZ
ü Transitivity: • if X®Y and Y®Z both hold for a relation R, then X®Z also
holds
Armstrong's Axioms
For a set of dependencies F and another dependency X®Y, we say that X®Y can be deduced from F, denoted F⊢X®Y, if X®Y can be inferred from F using only Armstrong’s axioms
Armstrong's Axioms
The following inference rules follow from Armstrong’s axioms (and can be inferred from them): üUnification: ü If X®Y and X®Z both hold then X®YZ holds üSplit: ü If X®YZ holds then X®Y and X®Z both hold üPseudo-transitivity: ü If X®Y and YW®Z both hold then XW®Z holds
29
• No need for FD’s with > 1 attribute on right. – But sometimes convenient to combine FD’s as
a shorthand. – Example: name -> addr and name -> wife
becomes name -> addr wife • > 1 attribute on left may be essential!
– Example: store candy -> price
FD's with Multiple Attributes
• Given a schema R and a set of functional dependencies F: – A superkey of R is a set of attributes KÍR such that F ⊨ K®R – A key of R is a set of attributes KÍR such:
1. K is a superkey of R 2. No proper subset of K is a superkey of R
• A key is also called a minimal key or an admissible key
– As hinted before, keys are useful in identifying problematic schemas, e.g., the existence of a dependency X®Y in which X is not a key
Keys
K is a superkey for relation R if K functionally determines all of R.
Consumers(name, addr, candiesLiked, manf, favCandy) • {name, candiesLiked} is a superkey because together these
attributes determine all the other attributes. – name -> addr favCandy – candiesLiked -> manf
Example
• {name, candiesLiked} is a key because neither {name} nor {candiesLiked} is a superkey. – name doesn’t -> manf; – candiesLiked doesn’t -> addr.
• There are no other keys, but lots of superkeys. – Any superset of {name, candiesLiked}.
32
• Keys in E/R concern entities. • Keys in relations concern tuples. • Usually, one tuple corresponds to one entity,
so the ideas are the same. • But in poor relational designs, one entity can
become several tuples, so E/R keys and Relational keys are different.
E/R and Relational Keys
Relational key = {name candiesLiked} But in E/R, name is a key for Consumers, and candiesLiked is a key for Candies. Note: 2 tuples for Janeway entity and 2 tuples for Twizzlers entity.
name addr candiesLiked manf favCandy
Jane Voyager Twizzlers Hershey Smarties Jane Voyager Smarties Nestle Smarties Spock Enterprise Twizzlers Hershey Twizzlers
Example Data
Suppose schema was obtained from an E/R diagram. • If relation R came from an entity set E, then key for
R is the keys of E • If R came from a binary relationship from E1 to E2:
– many-many: key for R is the keys of E1 and E2 – many-one: key for R is the keys for E1 (but not the keys
for E2) – one-one: key for R is the keys for E1; another key for R is
the keys for E2
Discovering Keys
Consumers Candies Likes
Likes(consumer, candy) Favorite
Favorite(consumer, candy)
Married
husband wife
Married(husband, wife)
name addr name manf
Buddies
1 2
Buddies(name1, name2)
key: consumer candy
key: name1 name2
key: consumer
keys: husband or wife
Key Example
Example: “no two courses can meet in the same room at the same time” tells us: hour, room -> course Ultimately, FD's and keys come from the semantics of the application!!! • FD's and keys apply to the schema, not to
specific instantiations of the relations
More FD's from Application
• X+ : Closure of an attribute set X – The set of all attributes that are determined by X
• K: a key
– minimum set of attributes that determines all attributes
• F+ : Closure of a dependency set F – The set of all dependencies that are implied from F
• Fmin: a minimum cover of a dependency set F
– a minimum set of FDs that is equivalent to F
Some Important Concepts
• We already know that we have FD's A -> B and B -> C, then it is also true that A -> C. • What about a complete chain of such deductions? • Called dependency set closure
Closure of a Dependency Set
In general, we need to consider: • Dependency set closure (F+) • Attributes set closure (X+) • Attribute closure (A+)
F F+ f1
f2 f3
f Implies
The closures of Dependency set and Attributes sets are used to define criteria for the goodness of a decomposition
the size of F+ could be exponential in the size of F
Sets F1 and F2 of functional dependencies are considered equal if they have the same closure (i.e. F1+ = F2+)
• Given a relation schema R and a set F of FDs – is some FD logically implied by this set F?
• Example – R = {A,B,C,G,H,I} – F = {A ->B, A ->C, CG -> H, CG -> I, B -> H} – would A ->H be logically implied by F? – yes (you can prove this using the definition of FD)
Closure of F: F+ = all FDs logically implied by F • How to compute F+?
– we can use Armstrong's axioms
Closure of a Dependency Set
Given a set F of functional dependencies, any set of functional dependencies that’s equivalent to F is called a cover (basis)
– We limit the possibilities by requiring that each dependency has a single attribute on the right-hand side
– How many bases are there for a relation R with n functional dependencies in F?
mnCover (Basis)
Recall: Sets F1 and F2 of functional dependencies are considered equal if they have the same closure (i.e. F1+ = F2+)
A minimal cover (basis) for a relation R is a basis that satisfies the following conditions: • All functional dependencies in B have singleton right-hand
sides • If any functional dependency is removed from B, the result
is no longer a basis • If any left-hand side attribute is removed from a functional
dependency of B, the result is no longer a basis
Minimal cover/basis (Minimal set of FDs)
• Every set of FDs has an equivalent minimal set • There can be several equivalent minimal sets • There is no simple algorithm for computing a minimal set of FDs that is
equivalent to a set F of FDs
The closure of an attributes set has an important role in the design of relational schemas
– For example, K is a superkey of R iff K+=R – Thus, we need an algorithm to compute it.
Closure of an Attributes Set
A key K for a given relation R is a minimal set of attributes A1, A2, ..., An such that closure K+={A1, A2, ..., An}+ is the set of all attributes of R
Given a set of attributes X={A1, …, An} and a set of FDs F: The closure, X+ is the set of attributes B s.t. {A1, …, An} à B
In English: Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X
The following simple algorithm computes the closure XF
+ for a given dependency set F and attributes set X
AList := X
Repeat
For every Y®ZÎF do
If YÍAList then
AList := AListÈZ
Until no change to AList
Return AList
Closure of an Attributes Set
Compute the closure of X={A,B} for the dependency set F={A®C, BC®A, AC®D, CE®F }
– Initialization: AList={A,B}
– From A®C, we get AList={A,B,C}
– From AC®D, we get AList={A,B,C,D}
– The other dependencies do not add anything
– The closure of an attributes set XF+ = { A,B,C,D }
Execution Example
Suppose R(A,B,C,D,E,F) and the FD's are: F={AB ® C, BC ® AD, D ® E, CF ® B} We wish to test whether AB ® D follows from F We compute {A,B}+ which is {A,B,C,D,E}. Since D is a member of the closure, we conclude that it follows.
Closure of an Attributes Set
How to find if a set of attributes X is a superkey / key? • Compute X+ • If X+ = all attributes, then X is a superkey • Consider minimal superkeys (called keys) • keys are also called Candidate Keys (CK)
Note: there can be exponentially many candidate keys!
Computing Superkeys Keys
Definitions: • Prime Attribute - Attribute A of R that is Member of some Candidate Key X • Non-Prime Attribute - An Attribute that is not a Member of any Candidate Key X
K is a superkey for relation R if K functionally determines all of R.
R = Product(name, price, category, color)
What is the key ? Let's try (name, category) as a key and compute a closure (name, category) + = (name, category, price, color)
Hence (name, category) is a key
Computing Keys
name, category à price category à color
= the set R of all attributes Ø this is a superkey Ø this is a key, since neither name
nor category alone is a superkey
When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Schema refinement means removing the data anomalies.
Data Anomalies
1. Functional Dependency a) Amstrong’s axioms b) Attribute closure (A+) c) Dependency closure (F+) d) Minimum cover (Fmin)
2. Normal Forms a) BCNF b) 3NF
3. Decomposition a) Lossless join b) Dependency preserving
Conceptual design
Schemas Integrity Constraints
Normalization Review
2NF
3NF
BCNF (Boyce Codd)
1NF Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes
Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes
Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key
Lossless Decomposition but not Dependency Preserving
Lossless Decomposition and Dependency Preserving
“Lousy Tables”
Remove Composite/ Multi-Value Attributes
“Wonderful Tables”
Normalization Review
R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Decompositions in General
• Can we get the data back correctly ? – Lossless decomposition – Lossy decomposition
• Can we recover the FD’s on the ‘big’ table from the FD’s on the small tables ? – Dependency-preserving decomposition
Problems with Decomposition
• Correct decomposition:
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Price
Gizmo 19.99
OneClick 24.99
Gizmo 19.99
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Lossless Decomposition
• Incorrect decomposition: Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Price Category
19.99 Gadget
24.99 Camera
19.99 Camera
What’s wrong ??
Lossy Decomposition
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
OneClick 19.99 Camera
Gizmo 24.99 Camera
Gizmo 19.99 Camera
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
Theorem If A1, ..., An à B1, ..., Bm Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Example: name à price, hence the first decomposition is lossless
Note: don’t need necessarily A1, ..., An à C1, ..., Cp
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Correct Decomposition
A decomposition is lossless if we can recover:
Decompose
Recover
R'(A1, ..., An, B1, ..., Bm, C1, ..., Cp) == R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R' is in general larger than R. Must ensure R' = R
When decomposing a relation R, we want the decomposition to
– minimize redundancy – avoid loss of information – preserve functional dependencies (i.e., constraints) – ensure good query performance
These objectives can be conflicting! • Boyce-Codd normal form achieves some of these
Decomposition Goals
• Decomposition into Boyce Codd Normal Form (BCNF) – Losselss
• Decomposition into 3rd Normal Form (3NF)
– Lossless – Dependency preserving
Normal Forms
3NF
BCNF
4NF
In practice: - Aim for BNCF - Settle for 3NF (it is enough) - don’t overdo it!
A simple condition for removing anomalies from relations:
Equivalently: for any set of attributes X = { A1, ..., An } , either X+ = X or X+ = all attributes in R
A relation R is in BCNF if and only if:
If there is a non-trivial dependency A1, ..., An à B in R, then {A1, ..., An} is a superkey for R
Boyce-Codd Normal Form
In English:
Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
Consider the relation: X→A “X →A” Þ The 2nd tuple also has y2 in the third column ( redundancy ! ) Such a situation cannot arise in a BCNF relation: BCNF Þ X must be a key Þ we must have X→Y
Þ we must have “y1 = y2” (1) X→A Þ The two tuples have the same value for A (2) (1) & (2) Þ The two tuples are identical
Þ This situation cannot happen in a relation
X Y A x y1 y2 x y2 ?
Should be y2
Redundancy! Not in BCNF
BCNF is Desirable
Input: relation R + FDs for R Output: decomposition of R into BCNF relations with “lossless join”
Compute keys for R Repeat until all relations are in BCNF (no BCNF violations): Pick any R’ with A ® B that violates BCNF Decompose R’ into R1(A, B) and R2(A, rest) Compute FDs for R1 and R2 Compute keys for R1 and R2
BCNF Decomposition Algorithm
A’s rest B’s
R1 R2
Note: need to compute the FDs on R1, R2
Decompose:
A relation is in BCNF Þ every entry records a piece of information that cannot
be inferred (using only FDs) from the other entries in the relation instance
Þ No redundant information !
A relation R(ABC) • B→C: The value of B determines C, and the value of C
can be inferred from another tuple with the same B value Þ redundancy ! (not BCNF)
• A→BC: Although the value of A determines the values of B and C, we cannot infer their values from other tuples because no two tuples in R have the same value for A Þ no redundancy ! (BCNF)
BCNF is Desirable
FD: SSN à Name, City Key: {SSN, PhoneNumber} Is it in BCNF?
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield
SSN+ ={SSN, Name, City} but no PhoneNumber
BCNF Example
NOT IN BNCF!!
Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield
SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234
SSN à Name, City
Let’s check anomalies: • Redundancy ? • Update ? • Delete ?
BCNF Example
IN BNCF!!
This FD is now good because it is the key
R(A,B,C,D,E) {A}+ = {A,B,C,D} ≠ {A,B,C,D,E}
R1(A,B,C,D) {C}+ = {C,D} ≠ {A,B,C,D}
R2(A,E) R11(C,D) R12(A,B,C)
{A} à {B,C} {C} à {D}
BCNF decomposition Example
Movie Bookings § Title name of movie § Theater, name of theaters showing the movie § City
FDs (not booking a movie into two theaters in the same city) : § Theater à city § Title,city à theater
Keys? Check for closure § {title, city}+ = R § {theatre, title}+ = R § {theatre,city}+ doesn't include title
Theaterà city violates BCNF
Theatre City title
BCNF: Dependency Preservation
Why city or theater are not keys?
Lets decomposed the table based on that violation
{theater, city} and {theater, title} This decomposition cannot handle Title,city à theater
Theatre city
guild Menlo Park
Theatre title
guild Antz
Theatre City title
guild M P Antz
Park M P Antz
Theatre City title
guild M P Antz
Theatre city
guild Menlo Park
Park Menlo Park
Theatre title
guild Antz
Park Antz
BCNF: Dependency Preservation
Tradeoff between BNCF and Dependency preservation
There is a BCNF violation, and we decompose.
No FDs
In BCNF we lost the FD: Company, Product à Unit
BCNF and Dependencies
R = (Unit, Company, Product) Unit à Company Company, Product à Unit
R1 = (Unit, Company) Unit à Company
R2 = (Unit, Product)
A simple condition for removing anomalies from relations:
Solution: 3rd Normal Form (3NF)
A relation R is in 3NF if and only if:
If there is a non-trivial dependency A1, ..., An à B in R, then {A1, ..., An} is a superkey for R,
or B is part of a key
This comes from BCNF
Informally: everything depends on the key or is in the key
The algorithm is complicated 1. Get a “minimal cover” of FDs 2. Find a lossless-join decomposition of R
(which might miss dependencies) 3. Add additional relations to the
decomposition to cover any missing FDs of the cover
• Result will be lossless, will be dependency-preserving 3NF; might not be BCNF
Decomposition into 3NF
• 3NF decomposition v.s. BCNF decomposition: – Use same decomposition steps, for a while – 3NF may stop decomposing, while BCNF continues
• Tradeoffs
– BCNF = no anomalies, but may lose some FDs – 3NF = keeps all FDs, but may have some anomalies
3NF and BCNF
• Consider the TEACH Relation: • TEACH(STUDENT, COURSE, INSTRUCTOR) • in 3NF but not in BCNF with
– FD1: {STUDENT, COURSE} à INSTRUCTOR – FD2: INSTRUCTOR à COURSE
• 3 Possible BNCF Decompositions of TEACH: – R1(STUDENT, INSTRUCTOR), R2(STUDENT, COURSE) – R1(COURSE, INSTRUCTOR), R2(COURSE, STUDENT) – R1(INSTRUCTOR, COURSE), R2 (INSTRUCTOR, STUDENT)
• All Three decompositions "lose" FD1! • 3NF is best since after Join, Recaptures FD1 and doesn’t
generate any spurious Tuples!
3NF and BCNF
Relational schemas may include additional dependency types
• Multivalued dependencies – Here, there may be multiple different values of Y for the same
values of X, yet the values of X fix the set of values for Y • Inclusion dependencies
– This types of dependency relates between the values of attributes in two relations in the schema
– E.g., in the train operation pS_Name (Serves) Í pS_Name (Station)
In this course, we focus on design considerations that follow from functional dependencies only
Additional Dependency Types
The multi-valued dependencies are: StudentID ->-> Phone Number StudentID ->-> Course
Multi-valued Dependencies (MVD)
Student ID Phone Number Course
981949 (519) 222-3344 CS4411
981949 (226) 231-1111 CS4411
981949 (519) 222-3344 CS3336
981949 (226) 231-1111 CS3336
• BCNF eliminates redundancy in each tuple but may leave redundancy among tuples in a relationship
• This typically happens if two many-many relationships (or in general: a combination of two types of facts) are represented in one relation
• Every street address is given 3 times and every title is repeated twice • This table does not violate BCNF but has redundancy among tuples.
MVD and Attribute Independence
Definition of Multi-valued Dependencies
Given R(A1,…,An,B1,…,Bm,C1,…,Cp) the MVD A1,…,An ->-> B1,…,Bm holds if: for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…,Cp
MVD and Attribute Independence
Exercises
Exercises
R(A,B,C,D,E,F) A → B B → A {B,C} → D C → E
B and C in this relation is a composite key that together determine D. Recall that a Candidate Key (CK) is one or more attributes that with which you can determine all other attributes in the relation. (In the end, one of the CK:s will be chosen to be the primary key.)
A candidate key is a minimal super key. Minimal means that you can’t remove any attribute from the key and still determine all other attributes.
The non-primary attributes (NP) are simply all attributes that are not part of any CK.
Exercises
R(A,B,C,D,E,F) A → B B → A {B,C} → D C → E
B and C in this relation is a composite key that together determine D. Is the relation in 2NF? Is there a composite CK? If not, 2NF is automatically achieved. Otherwise, check your boxes. In any of your composite CKs: does a part of it (i.e. a box inside the composite box) by itself determine (point to) a non-prime attribute? If yes, NF2 does not pass (and the relation is thus in 1NF). Is the relation in 3NF? Does the relation have any non-primes? If no, the relation is automatically in 3NF! Otherwise: Is there a non-prime that determines (point to) another non-prime (resulting in a transitive dependency because the last non-prime is determined indirectly)? If yes, 3NF does not pass (and the relation is thus in 2NF). Is the relation in BCNF? Are all the determinants also candidate keys? If yes, BCNF pass. This is easiest to check by looking at the dependencies you were given and check that each key (left hand side of the arrow) is in your list of CKs
violates 2NF
Exercises
R(A,B,C,D,E) A → {B,C} {B,C} → A,D D → E
3NF say that a non-prime must not determine another non-prime. Here is a problem: D determines E. Thus 3NF does not pass, and the relation only achieves 2NF!
Exercises
R(A,B,C,D) with FDs: AB->C, C->D, and D->A Indicate all BNCF violations CàA, CàD, DàA, ABàD, ABà C, ACàD, BCàA, BCàD, BDàA, BDàC, CDàA, ABCàD, ABDàC, and BCDàA
keys are AB, BC, and BD Decompose into: (ACD), (BC) Then into: (AD), (CD), (BC)
A C
D
Thus, any dependency above that does not have one of these pairs on the left is a BCNF violation
B
Exercises
R(A,B,C,D) with FDs: AB->C, BC->D, and CD->A, AD->B Indicate all BNCF violations
ABàC, BCàD, CDàA, ADàB, ABàD, ADàC, BCàA, CDàB, ABCàD, ABDàC, ACDàB and BCDàA
keys are AB, AD, BC, CD In BNCF form!
C
D
A B