ecomposition chema normalizationpages.cs.wisc.edu/~paris/cs564-s18/lectures/lecture-07.pdf ·...
TRANSCRIPT
WHAT IS THIS LECTURE ABOUT?
• Badschemasleadtoredundancy• To“correct”badschemas:decompose relations– lossless-join– dependencypreserving
• Desirednormalforms– BCNF– 3NF
2CS564[Spring2018]- ParisKoutris
DB DESIGN THEORY
• Helpsusidentifythe“bad”schemasandimprovethem1. expressconstraintsonthedata:functional
dependencies(FDs)2. usetheFDstodecomposetherelations
• Theprocess,callednormalization,obtainsaschemaina“normalform”thatguaranteescertainproperties– examplesofnormalforms:BCNF,3NF,…
3CS564[Spring2018]- ParisKoutris
WHAT IS A DECOMPOSITION?
WedecomposearelationR(A1,…,An)bycreating• R1(B1,..,Bm)• R2(C1,…,Cl)• where{𝐵#,… ,𝐵&} ∪ {𝐶#,… , 𝐶+} = {𝐴#,…𝐴.}
• TheinstanceofR1 istheprojectionofR ontoB1,..,Bm• TheinstanceofR2 istheprojectionofR ontoC1,..,Cl
5CS564[Spring2018]- ParisKoutris
EXAMPLE:DECOMPOSITION
SSN name age934729837 Paris 24123123645 John 30384475687 Arun 20
SSN name age phoneNumber934729837 Paris 24 608-374-8422934729837 Paris 24 603-534-8399123123645 John 30 608-321-1163384475687 Arun 20 206-473-8221
SSN phoneNumber934729837 608-374-8422934729837 603-534-8399123123645 608-321-1163384475687 206-473-8221
6CS564[Spring2018]- ParisKoutris
DECOMPOSITION DESIDERATA
Whatshouldagood decompositionachieve?
1. minimizeredundancy2. avoidinformationloss(lossless-join)3. preservetheFDs(dependencypreserving)4. ensuregoodqueryperformance
7CS564[Spring2018]- ParisKoutris
EXAMPLE:INFORMATION LOSS
8CS564[Spring2018]- ParisKoutris
name age phoneNumberParis 24 608-374-8422John 24 608-321-1163Arun 20 206-473-8221
Decomposeinto:R1(name,age)R2(age,phoneNumber)
name ageParis 24John 24Arun 20
age phoneNumber24 608-374-842224 608-321-116320 206-473-8221
Wecan’tfigureoutwhichphoneNumbercorrespondstowhichperson!
LOSSLESS-JOIN DECOMPOSITION
9CS564[Spring2018]- ParisKoutris
R(A,B,C)
R1(A,B) R2(B,C)
decompose(projection)
R’(A,B,C)
recover(naturaljoin)
Aschemadecompositionislossless-join ifforanyinitialinstanceR,R =R’
Anaturaljoinisajoinonthesame attributenames
A LOSSLESS-JOIN CRITERION
Startingwith:• arelationR(A)+setFofFDs• adecompositionofR intoR1(A1)andR2(A2)
wesaythatadecompositionislossless-join ifandonlyifatleastoneofthefollowingFDsisinF+ (theclosureofF):1. 𝑨𝟏 ∩ 𝑨𝟐 ⟶ 𝑨𝟏2. 𝑨𝟏 ∩ 𝑨𝟐 ⟶ 𝑨𝟐
10CS564[Spring2018]- ParisKoutris
EXAMPLE
• relationR(A,B,C,D)• FD𝐴 ⟶ 𝐵, 𝐶
Lossless-join• decompositionintoR1(A,B,C)andR2(A,D)• 𝐴, 𝐵, 𝐶 ∩ 𝐴, 𝐷 = 𝐴• ForR1wehaveindeed𝐴 ⟶ 𝐵, 𝐶
Not lossless-join• decompositionintoR1(A,B,C)andR2(D)
11CS564[Spring2018]- ParisKoutris
DEPENDENCY PRESERVING
GivenR andasetofFDsF,wedecomposeR intoR1andR2. Suppose:– R1 hasasetofFDsF1– R2 hasasetofFDsF2– F1 andF2 arecomputedfromF
Adecompositionisdependencypreserving ifbyenforcingF1 overR1 andF2 overR2,wecanenforceFoverR
12CS564[Spring2018]- ParisKoutris
GOOD EXAMPLE
Person(SSN,name,age,canDrink)• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒• 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘
decomposesinto• R1(SSN,name,age)– 𝑆𝑆𝑁⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• R2(age,canDrink)– 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘
13CS564[Spring2018]- ParisKoutris
BAD EXAMPLE
R(A,B,C)• 𝐴 ⟶ 𝐵• 𝐵, 𝐶 ⟶ 𝐴
Decomposesinto:• R1(A,B)– 𝐴⟶ 𝐵
• R2(A,C)– noFDshere!!
14CS564[Spring2018]- ParisKoutris
A Ba1 ba2 b
R1A Ca1 ca2 c
R2
recover
A B Ca1 b ca2 b c
Therecoveredtableviolates𝐵, 𝐶 ⟶ 𝐴
NORMAL FORMS
15CS564[Spring2018]- ParisKoutris
Anormalform representsa“good”schemadesign:
• 1NF(flattables/atomicvalues)• 2NF• 3NF• BCNF• 4NF• …
morerestrictive
BOYCE-CODD NORMAL FORM (BCNF)
Equivalentdefinition:foreveryattributesetX• either𝑋D = 𝑋• or𝑋D = 𝑎𝑙𝑙𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
17
ArelationR isinBCNF ifwhenever𝑋 ⟶ 𝐵 isanon-trivialFD,thenXisasuperkey inR
CS564[Spring2018]- ParisKoutris
BCNFEXAMPLE 1
18
SSN name age phoneNumber934729837 Paris 24 608-374-8422934729837 Paris 24 603-534-8399123123645 John 30 608-321-1163384475687 Arun 20 206-473-8221
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• key={𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 isa“bad”FD• Theaboverelationisnot inBCNF!
CS564[Spring2018]- ParisKoutris
BCNFEXAMPLE 2
19
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• key={𝑆𝑆𝑁}• TheaboverelationisinBCNF!
SSN name age934729837 Paris 24123123645 John 30384475687 Arun 20
CS564[Spring2018]- ParisKoutris
BCNFEXAMPLE 3
20
• key={𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}• TheaboverelationisinBCNF!• IsitpossiblethatabinaryrelationisnotinBCNF?
SSN phoneNumber934729837 608-374-8422934729837 603-534-8399123123645 608-321-1163384475687 206-473-8221
CS564[Spring2018]- ParisKoutris
BCNF DECOMPOSITION
• FindanFDthatviolatestheBCNFcondition𝐴#, 𝐴M,… , 𝐴. ⟶𝐵#, 𝐵M,…,𝐵&
• DecomposeR toR1 andR2:
• ContinueuntilnoBCNFviolationsareleft21
A’sB’s remainingattributes
R1 R2
CS564[Spring2018]- ParisKoutris
EXAMPLESSN name age phoneNumber934729837 Paris 24 608-374-8422934729837 Paris 24 603-534-8399123123645 John 30 608-321-1163384475687 Arun 20 206-473-8221
22
• TheFD𝑆𝑆𝑁⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 violatesBCNF• SplitintotworelationsR1,R2 asfollows:
SSNname
phoneNumber
R1 R2
age
CS564[Spring2018]- ParisKoutris
EXAMPLE CONT’D
SSN name age934729837 Paris 24123123645 John 30384475687 Arun 20
SSN phoneNumber934729837 608-374-8422934729837 603-534-8399123123645 608-321-1163384475687 206-473-8221
23
SSNname
phoneNumber
R1 R2
age
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
CS564[Spring2018]- ParisKoutris
BCNF DECOMPOSITION PROPERTIES
TheBCNFdecomposition:– removescertaintypesofredundancy– islossless-join– isnotalwaysdependencypreserving
24CS564[Spring2018]- ParisKoutris
BCNF IS LOSSLESS-JOIN
Example:R(A,B,C)with𝐴 ⟶ 𝐵 decomposesinto:R1(A,B)andR2(A,C)
• TheBCNFdecompositionalwayssatisfiesthelossless-joincriterion!
25CS564[Spring2018]- ParisKoutris
BCNF IS NOT DEPENDENCY PRESERVING
26CS564[Spring2018]- ParisKoutris
R(A,B,C)• 𝐴 ⟶ 𝐵• 𝐵, 𝐶 ⟶ 𝐴
TheBCNFdecompositionis:• R1(A,B)withFD𝐴 ⟶ 𝐵• R2(A,C)withnoFDs
TheremaynotexistanyBCNFdecompositionthatisFDpreserving!
BCNFEXAMPLE (1)
Books(author,gender,booktitle,genre,price)• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
Whatisthecandidatekey?• (author,booktitle)istheonlyone!
IsisinBCNF?• No,becausethelefthandsideofboth(nottrivial)FDsisnotasuperkey!
27CS564[Spring2018]- ParisKoutris
BCNFEXAMPLE (2)
Books(author,gender,booktitle,genre,price)• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
SplittingBooks usingtheFD𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:• Author(author,gender)FD:𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 inBCNF!
• Books2(authos,booktitle,genre,price)FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 notinBCNF!
28CS564[Spring2018]- ParisKoutris
BCNFEXAMPLE (3)
Books(author,gender,booktitle,genre,price)• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
SplittingBooks usingtheFD𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:• Author(author,gender)FD:𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 inBCNF!
• Splitting Books2(author,booktitle,genre,price):– BookInfo (booktitle,genre,price)FD:𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 inBCNF!
– BookAuthor (author,booktitle)inBCNF!
29CS564[Spring2018]- ParisKoutris
3NFDEFINITION
31
ArelationR isin3NF ifwhenever𝑋 ⟶ 𝐴, oneofthefollowingistrue:
• 𝐴 ∈ 𝑋 (trivialFD)
• X isasuperkey
• A ispartofsomekeyofR (primeattribute)
CS564[Spring2018]- ParisKoutris
BCNFimplies3NF!!
3NFCONT’D
• Example:R(A,B,C)with𝐴, 𝐵 ⟶ 𝐶 and𝐶 ⟶ 𝐴– isin3NF.Why?– isnotinBCNF.Why?
• CompromiseusedwhenBCNFnotachievable:aimforBCNFandsettlefor3NF
• Lossless-joinanddependencypreservingdecompositionintoacollectionof3NFrelationsisalwayspossible!
32CS564[Spring2018]- ParisKoutris
3NFALGORITHM
1. ApplythealgorithmforBCNFdecompositionuntilallrelationsarein3NF(wecanstopearlierthanBCNF)
2. ComputeaminimalbasisF’ ofF3. Foreachnon-preservedFD𝑋 ⟶ 𝐴 inF’,addanew
relationR(X,A)
33CS564[Spring2018]- ParisKoutris
3NFEXAMPLE (1)
StartwithrelationR(A,B,C,D)withFDs:• 𝐴⟶ 𝐷• 𝐴,𝐵 ⟶ 𝐶• 𝐴,𝐷 ⟶ 𝐶• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴,𝐵
Step1:findaBCNFdecomposition• R1 (B,C)• R2 (A,B,D)
34CS564[Spring2018]- ParisKoutris
3NFEXAMPLE (2)
StartwithrelationR(A,B,C,D)withFDs:• 𝐴 ⟶𝐷• 𝐴,𝐵 ⟶ 𝐶• 𝐴,𝐷 ⟶ 𝐶• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴,𝐵
Step2:computeaminimalbasisoftheoriginalsetofFDs:• 𝐴⟶ 𝐷• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴• 𝐷 ⟶ 𝐵
35CS564[Spring2018]- ParisKoutris
3NFEXAMPLE (3)
StartwithrelationR(A,B,C,D)withFDs:• 𝐴⟶ 𝐷• 𝐴,𝐵 ⟶ 𝐶• 𝐴,𝐷 ⟶ 𝐶• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴,𝐵
Step3:addanewrelationforanyFDinthebasisthatisnotsatisfied:• allthedependenciesinF’aresatisfied!• theresultingdecompositionR1,R2 isalsoBCNF!
36CS564[Spring2018]- ParisKoutris
IS NORMALIZATION ALWAYS GOOD?
• Example:supposeAandBarealwaysusedtogether,butnormalizationsaystheyshouldbeindifferenttables– decompositionmightproduceunacceptableperformanceloss
• Example:datawarehouses– hugehistoricalDBs,rarelyupdatedaftercreation– joinsexpensiveorimpractical
37CS564[Spring2018]- ParisKoutris