cs 347: parallel and distributed data management notes02: distributed db design
DESCRIPTION
CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design. Hector Garcia-Molina. Chapter 5 Ozsu & Valduriez. Distributed DB Design. Top-down approach: - have DB… - how to split and allocate the sites - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/1.jpg)
CS 347 Notes 02 1
CS 347: Parallel and Distributed
Data ManagementNotes02: Distributed DB
Design
Hector Garcia-Molina
![Page 2: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/2.jpg)
CS 347 Notes 02 2
Distributed DB DesignTop-down approach: - have DB…
- how to split and allocate the sites
Multi-DBs (or bottom-up): no design issues!
Chapter 5Ozsu & Valduriez
![Page 3: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/3.jpg)
CS 347 Notes 02 3
Two issues in DDB design:
• Fragmentation• Allocation
Note: issues not independent, but will cover separately
![Page 4: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/4.jpg)
CS 347 Notes 02 4
Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *
from E from Ewhere loc=Sa where loc=Sband… and ...
![Page 5: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/5.jpg)
CS 347 Notes 02 5
Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *
from E from Ewhere loc=Sa where loc=Sband… and ...
Motivation: Two sites: Sa, Sb Qa QbSa Sb
![Page 6: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/6.jpg)
CS 347 Notes 02 6
• It does not take a rocket scientist to figure out fragmentation...
![Page 7: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/7.jpg)
CS 347 Notes 02 7
# NM Loc Sal E 5
78
Sa 10Sally Sb 25Tom Sa 15
Joe
# NM Loc Sal # NM Loc Sal58
Sa 10Tom Sa 15Joe 7 Sb 25Sally
..
..
....
F
At Sa At Sb
![Page 8: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/8.jpg)
CS 347 Notes 02 8
F = { F1, F2 }
F1 = loc=Sa E F2 = loc=Sb E
![Page 9: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/9.jpg)
CS 347 Notes 02 9
F = { F1, F2 }
F1 = loc=Sa E F2 = loc=Sb E
called primary horizontal fragmentation
![Page 10: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/10.jpg)
CS 347 Notes 02 10
Fragmentation• Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
• VerticalR
![Page 11: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/11.jpg)
CS 347 Notes 02 11
Fragmentation• Horizontal Primary
depends on local attributesR Derived
depends on foreign relation
• VerticalR Fragmentation also called
Sharding
![Page 12: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/12.jpg)
CS 347 Notes 02 12
Three common horizontal partitioning techniques• Round robin• Hash partitioning• Range partitioning
![Page 13: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/13.jpg)
CS 347 Notes 02 13
• Round robinRD0 D1 D2t1 t1t2 t2t3 t3t4 t4... t5• Evenly distributes data• Good for scanning full relation• Not good for point or range queries
![Page 14: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/14.jpg)
CS 347 Notes 02 14
• Hash partitioningRD0 D1 D2t1h(k1)=2 t1t2h(k2)=0 t2t3h(k3)=0 t3t4h(k4)=1 t4...• Good for point queries on key; also for joins• Not good for range queries; point queries not on
key• If hash function good, even distribution
![Page 15: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/15.jpg)
CS 347 Notes 02 15
• Range partitioningRD0 D1 D2t1: A=5 t1t2: A=8 t2t3: A=2 t3t4: A=3 t4...• Good for some range queries on A• Need to select good vector: else unbalance
data skew execution skew
4 7
partitioningvector
V0 V1
![Page 16: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/16.jpg)
CS 347 Notes 02 16
Which are good fragmentations?Example:F = { F1, F2 }
F1 = sal<10 E F2 = sal>20 E
![Page 17: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/17.jpg)
CS 347 Notes 02 17
Which are good fragmentations?Example:F = { F1, F2 }
F1 = sal<10 E F2 = sal>20 E
Problem: Some tuples lost!
![Page 18: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/18.jpg)
CS 347 Notes 02 18
Which are good fragmentations?Second example:F = { F3, F4 }
F3 = sal<10 E F4 = sal>5 E
![Page 19: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/19.jpg)
CS 347 Notes 02 19
Which are good fragmentations?Second example:F = { F3, F4 }
F3 = sal<10 E F4 = sal>5 E
Tuples with 5 < sal < 10 are duplicated...
![Page 20: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/20.jpg)
CS 347 Notes 02 20
Prefer to deal with replication explicitly
Example: F = { F5, F6, F7 }
F5 = sal 5 E F6 = 5< sal <10
EF7 = sal 10 E
Then replicate F6 if convenient (part of allocation problem)
![Page 21: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/21.jpg)
CS 347 Notes 02 21
Desired properties for horizontal fragmentationR F ={ F1, F2, … }
(1) Completenesst R, Fi F such that t
Fi
![Page 22: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/22.jpg)
CS 347 Notes 02 22
(2) Disjointness t Fi, Fj such that tFj, i j, Fi, Fj F
(3) Reconstruction - ignore
![Page 23: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/23.jpg)
CS 347 Notes 02 23
How do we get completeness and disjointness?(1) Check it “manually”!
e.g., F1 = sal<10 E ; F2 = sal10 E
![Page 24: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/24.jpg)
CS 347 Notes 02 24
How do we get completeness and disjointness?(2) “Automatically” generate fragments
with these properties
Desired simple predicates Fragments
![Page 25: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/25.jpg)
CS 347 Notes 02 25
Example of generation• Say queries use predicates:
A<10, A>5, Loc = SA, Loc = SB
• Next: - generate “minterm” predicates
- eliminate useless ones
![Page 26: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/26.jpg)
CS 347 Notes 02 26
Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB(8) A<10 ¬(A>5) ¬(Loc=SA)
¬(Loc=SB)
![Page 27: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/27.jpg)
CS 347 Notes 02 27
Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB(8) A<10 ¬(A>5) ¬(Loc=SA)
¬(Loc=SB)
![Page 28: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/28.jpg)
CS 347 Notes 02 28
Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB(8) A<10 ¬(A>5) ¬(Loc=SA)
¬(Loc=SB)A 5
5 < A < 10
![Page 29: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/29.jpg)
CS 347 Notes 02 29
Minterm predicates (part II) (9) ¬(A<10) A>5 Loc=SA Loc=SB(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
![Page 30: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/30.jpg)
CS 347 Notes 02 30
Minterm predicates (part II) (9) ¬(A<10) A>5 Loc=SA Loc=SB(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
A 10
![Page 31: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/31.jpg)
CS 347 Notes 02 31
Final fragments:
F2: 5 < A < 10 Loc=SA F3: 5 < A < 10 Loc=SB F6: A 5 Loc=SA F7: A 5 Loc=SB F10: A 10 Loc=SA F11: A 10 Loc=SB
![Page 32: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/32.jpg)
CS 347 Notes 02 32
Note: elimination of useless fragments depends on application semantics:
e.g.: if LOC could be SA, SB, we need to add fragments
F4: 5 <A <10 Loc SA Loc SB
F8: A 5 Loc SA Loc SB F12: A 10 Loc SA Loc SB
![Page 33: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/33.jpg)
CS 347 Notes 02 33
Why does this work?Predicates: p1 p2 p3 p4
p1 p2 p3 ¬ p4
¬ p1 ¬ p2 ¬ p3 ¬ p4
...
![Page 34: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/34.jpg)
CS 347 Notes 02 34
(1) Completeness: Take t Rpi(t) must be T or F!
Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F
Then t is in fragment with predicate p1 p2 ¬ p3 ¬ p4
![Page 35: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/35.jpg)
CS 347 Notes 02 35
(2) DisjointnessSay t Fragment p1 p2 ¬ p3 ¬ p4Then:
p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F
t cannot be in any other fragment!
![Page 36: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/36.jpg)
CS 347 Notes 02 36
Summary• Given simple predicates Pr= { p1, p2,..
pm }minterm predicates are
M={m | m = pk*, 1 k m }
where pk* is pk or is ¬ pk
pkPr
• Fragments m R for all m M are complete and disjoint
![Page 37: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/37.jpg)
Another Desired Fragmentation Property:Match Access Patterns
CS 347 Notes 02 37
data Adata B
data C
frequentlyaccessedtogether
try toplace insame
fragment
![Page 38: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/38.jpg)
CS 347 Notes 02 38
Return to example:E(#, NM, LOC, SAL,…)Common queries:
Qa: select * Qb: select * from E from E where LOC=Sa where
LOC=Sb and … and ...
![Page 39: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/39.jpg)
CS 347 Notes 02 39
Three choices:
(1) Pr = { } F1 ={ E }(2) Pr = {LOC=Sa, LOC=Sb}
F2={ loc=Sa E, loc=Sb E }(3) Pr = {LOC=Sa, LOC=Sb, Sal<10}
F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E }
![Page 40: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/40.jpg)
CS 347 Notes 02 40
In other words:Loc=Sa sal < 10
Loc=Sa sal 10
Loc=Sb sal < 10
Loc=Sb sal 10
F1 F3F2
Qa: Select … loc = Sa ...
Qb: Select … loc = Sb ...
![Page 41: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/41.jpg)
CS 347 Notes 02 41
In other words:Loc=Sa sal < 10
Loc=Sa sal 10
Loc=Sb sal < 10
Loc=Sb sal 10
F1 F3F2
Qa: Select … loc = Sa ...
Qb: Select … loc = Sb ...
F2 is good…(not F1 , F3 )
![Page 42: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/42.jpg)
CS 347 Notes 02 42
Derived horizontal fragmentation
Example: E(#, NM, SAL, LOC) F={ E1, E2} by LOC J(#, DES,…)
Common query for project: [Given employee name,
list projects (s)he works in]
![Page 43: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/43.jpg)
CS 347 Notes 02 43
E1
(at Sa) (at Sb)
E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…
# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…
# Description5 work on 347 hw7 go to moon5 build table12 rest…
J
![Page 44: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/44.jpg)
CS 347 Notes 02 44
E1
(at Sa) (at Sb)
E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…
# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…
J1 J2
J1 = J E1 J2 = J E2
# Des5 work on 347 hw5 build table…
# Des7 go to moon12 rest…
![Page 45: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/45.jpg)
CS 347 Notes 02 45
Derived horizontal fragmentationR, F = { F1, F2, ... Fn}
S, D = {D1, D2, …Dn} where Di =S
Fi
Convention: R is owner S is member
F could beprimary or derived
![Page 46: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/46.jpg)
CS 347 Notes 02 46
• Checking completeness and disjointness of derived fragmentation
But no #= 33 in E1 nor in E2!
# Des…33 build chair…
Example: Say J is:
This J tuple will not be in J1 nor J2Fragmentation not complete
![Page 47: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/47.jpg)
CS 347 Notes 02 47
Need to enforcereferential integrity constraint:
join attr(#) of member relation
joint attr(#) of owner
relation
To get completeness
![Page 48: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/48.jpg)
CS 347 Notes 02 48
# NM Loc Sal5 Joe Sa 10…
# NM Loc Sal5 Fred Sb 20…
Example:
E1 E2
# Description5 day off…
# Description5 day off…
# Description5 day off…
J1
J
J2
Fragmentationis not
disjoint!
![Page 49: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/49.jpg)
CS 347 Notes 02 49
Join attribute(#) should bekey of owner relationTo get
disjointness
![Page 50: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/50.jpg)
CS 347 Notes 02 50
Summary: horizontal fragmentation• Type: primary, derived• Properties: completeness,
disjointness
![Page 51: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/51.jpg)
CS 347 Notes 02 51
Vertical fragmentation
E1
# NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15…
# NM Loc5 Joe Sa7 Sally Sb8 Fred Sa…
# Sal5 107 258 15…
E
E2
Example:
![Page 52: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/52.jpg)
CS 347 Notes 02 52
R[T] R1[T1] Ti T
Rn[Tn]
Just like normalization of relations
...
![Page 53: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/53.jpg)
CS 347 Notes 02 53
Properties: R[T] Ri[Ti]
(1) Completeness
U Ti = Tall i
![Page 54: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/54.jpg)
CS 347 Notes 02 54
(2) DisjointnessTi Tj = for all i,j ij
E(#,LOC,SAL)E1(#,LOC)
E2(SAL)
![Page 55: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/55.jpg)
CS 347 Notes 02 55
(2) DisjointnessTi Tj = for all i,j ij
E(#,LOC,SAL)E1(#,LOC)
E2(SAL)
Not a desirable property!!(could not reconstruct R!)
![Page 56: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/56.jpg)
CS 347 Notes 02 56
(3) Lossless join
Ri = Rall i
One way to achieve lossless join: Repeat key in all fragments, i.e.,
Key Ti for all i
![Page 57: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/57.jpg)
CS 347 Notes 02 57
How do we decide what attributes are grouped with which?
E1(#,NM,LOC)E2(#,SAL)
Example:E(#,NM,LOC,SAL) E1(#,NM)
E2(#,LOC)E3(#,SAL)?
![Page 58: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/58.jpg)
CS 347 Notes 02 58
A1 A2 A3 A4 A5A1 - - - - -A2 50 - - - -A3 45 48 - - -A4 1 2 0 - -A5 0 0 4 75 -
Attribute affinity matrix
![Page 59: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/59.jpg)
CS 347 Notes 02 59
A1 A2 A3 A4 A5A1 - - - - -A2 50 - - - -A3 45 48 - - -A4 1 2 0 - -A5 0 0 4 75 -
Attribute affinity matrix
R1[K,A1,A2,A3] R2[K,A4,A5]
![Page 60: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/60.jpg)
CS 347 Notes 02 60
• Textbook (Ozsu & Valduriez) discusses– How to build affinity matrix– How to identify attribute clusters– How to partition relation
• You are not responsible for– Clustering and partitioning algorithms (i.e., Skip pages 135-145)
![Page 61: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/61.jpg)
CS 347 Notes 02 61
AllocationExample: E(#,NM,LOC,SAL)
F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa...Qb: select … where loc=Sb…
Site a Site b
Where do F1,F2 go?
?
![Page 62: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/62.jpg)
CS 347 Notes 02 62
Issues• Where do queries originate• What is communication cost?
and size of answers, relations,…
• What is storage capacity, cost at sites? and size of fragments?
• What is processing power at sites?
![Page 63: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/63.jpg)
CS 347 Notes 02 63
• What is query processing strategy?
– How are joins done?– Where are answers collected?
More Issues
![Page 64: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/64.jpg)
CS 347 Notes 02 64
• Cost of updating copies?• Writes and concurrency control?• ...
Do we replicate fragments?
![Page 65: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/65.jpg)
CS 347 Notes 02 65
Optimization problem:• What is best placement of fragments
and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...
• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...
![Page 66: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/66.jpg)
CS 347 Notes 02 66
Optimization problem:• What is best placement of fragments
and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...
• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...
This is an incredibly hard problem
![Page 67: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/67.jpg)
CS 347 Notes 02 67
Example: Single fragment F
Read cost: [ti MIN Cij]
i: Originating site of requestti: Read traffic at SiCij: Retrieval cost
Accessing fragment F at Sj from Si
ji=1
m
![Page 68: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/68.jpg)
CS 347 Notes 02 68
Scenario - Read cost1 2
.
. .
.
.
.3
i
C=inf
ci,3
ci,1 ci,2
Stream of readrequests for F ti REQ/SECC=inf
C=infF
F F
![Page 69: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/69.jpg)
CS 347 Notes 02 69
Write cost Xj ui C’ij
i: Originating site of requestj: Site being updatedXj: 0 if F not stored at Sj
1 if F stored at Sjui: Write traffic at Si C’ij: Write cost
Updating F at Sj from Si
i=1 j=1
m m
![Page 70: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/70.jpg)
CS 347 Notes 02 70
Scenario - write cost
Updates ui updates/sec. .
. .
. .i
F
F F
![Page 71: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/71.jpg)
CS 347 Notes 02 71
Storage cost:
Xi di
Xi: 0 if F not stored at Si1 if F stored at Si
di: storage cost at Si
i=1
m
![Page 72: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/72.jpg)
CS 347 Notes 02 72
Target function:
min [ti MIN Cij + Xj ui C’ij ]
+ Xi di
ji=1 j=1
i=1
m m
m
![Page 73: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/73.jpg)
CS 347 Notes 02 73
Can add more complications:
Examples: - Multiple fragments- Fragment sizes- Concurrency control cost
![Page 74: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/74.jpg)
Case Study: PNUTS• Where in the World is My Data?
Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, Hector Garcia-Molina; VLDB 2011
• Distributed object/tuple store for Yahoo!
CS 347 Notes 02 74
![Page 75: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/75.jpg)
Case Study: PNUTS• Issue: Where to locate data• Issue: What and where to replicate
CS 347 Notes 02 75
![Page 76: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/76.jpg)
PNUTS Discussion
• Dynamic vs Static fragment placement
• Caching vs Replication
CS 347 Notes 02 76
![Page 77: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/77.jpg)
Policy Constraints• MIN_COPIES: The minimum number of
full replicas of the record that must exist.
• INCL_LIST: An inclusion list -- the locations where a full replica of the record must exist.
• EXCL_LIST: An exclusion list -- the locations where a full replica of the record cannot exist.
CS 347 Notes 02 77
![Page 78: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/78.jpg)
Example Rule
• Rule 1: • IF TABLE_NAME = "Users“
THENSET 'MIN_COPIES' = 2
CONSTRAINT_PRI = 0
CS 347 Notes 02 78
![Page 79: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/79.jpg)
Another Example Rule
• Rule 2:• IF TABLE_NAME = "Users" AND
FIELD STR('home location') = 'France‘THEN
SET 'MIN_COPIES' = 3 ANDSET 'EXCL LIST' = 'USWest,
USEast‘CONSTRAINT PRI = 1
CS 347 Notes 02 79
![Page 80: CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design](https://reader034.vdocuments.mx/reader034/viewer/2022051402/56815b1d550346895dc8d248/html5/thumbnails/80.jpg)
CS 347 Notes 02 80
Summary• Description of fragmentation• Good fragmentations• Design of fragmentation• Allocation