pmit-6102 advanced database systems
DESCRIPTION
PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 06 Distributed Database Design. Outline. Distributed Database Design Distributed Design Problem Distributed Design Issues Fragmentation Data Allocation. - PowerPoint PPT PresentationTRANSCRIPT
PMIT-6102Advanced Database Systems
By-Jesmin Akhter
Assistant Professor, IIT, Jahangirnagar University
Lecture 06 Distributed Database Design
Outline Distributed Database Design
Distributed Design Problem Distributed Design Issues
o Fragmentationo Data Allocation
Slide 3
The design of a distributed computer system involves making decisions on the placement of data and
programs across the sites of a computer network as well as possibly designing the network itself
In the case of distributed DBMSs, the distribution of applications involves two things: distribution of the distributed DBMS software distribution of the application programs that run on it.
Are not significant problem Assume a copy of the distributed DBMS software exists
at each site where data are stored Network has already been designed
We concentrate on distribution of dataSlide 4
Distributed Design Problem
Distribution Design Issues
The following set of interrelated questions covers the entire issue.
Why fragment at all?
How should we fragment?
How much should we fragment?
How to test correctness?
How should we allocate?
What is the necessary information for fragmentation and allocation?
Slide 5
Reasons for Fragmentation
The important issue is the appropriate unit of distribution. A relation is not a suitable unit, for a number of
reasons.o First, application views are usually subsets of relations. o Therefore, the locality of accesses of applications is
defined not on entire relations but on their subsetso Hence consider subsets of relations as distribution
units.
Slide 6
Reasons for Fragmentation
The relation is not replicated and is stored at only one site, results in an unnecessarily high volume of remote data
accesses
The relation is replicated at all or some of the sites where the applications reside. May has unnecessary replication, which causes problems in
executing updates may not be desirable if storage is limited.
Finally, the decomposition of a relation into fragments, each being treated as a unit, permits a number of transactions to execute concurrently. Thus fragmentation typically increases the level of concurrency
and therefore the system throughput.
Slide 7
Relation instances are essentially tables, so the issue is one of finding alternative ways of dividing a table into smaller ones.
There are clearly two alternatives for this: dividing it horizontally or dividing it vertically.
Slide 8
Fragmentation Alternatives
Fragmentation Alternatives – Horizontal
PROJ1 : projects with budgets less than $200,000
PROJ2 : projects with budgets greater than or equal to $200,000
PROJ1
PNO PNAME BUDGET LOC
P3 CAD/CAM 250000 New York
P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston
PNO PNAME LOC
P1 Instrumentation 150000
Montreal
P2Database Develop. 135000 New York
BUDGET
PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1Instrumentation 150000 Montreal
P3 CAD/CAM 250000P2Database Develop. 135000
P4 Maintenance 310000 ParisP5 CAD/CAM 500000 Boston
Example of Horizontal Partitioning
Slide 9
Fragmentation Alternatives – Vertical
PROJ1: information about project budgets
PROJ2: information about project names and locations
PNO BUDGET
P1 150000
P3 250000P2 135000
P4 310000P5 500000
PNO PNAME LOC
P1 Instrumentation Montreal
P3 CAD/CAM New YorkP2Database Develop. New York
P4 Maintenance ParisP5 CAD/CAM Boston
PROJ1 PROJ2
New YorkNew York
PROJ
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P3 CAD/CAM 250000P2 Database Develop.135000
P4 Maintenance 310000 ParisP5 CAD/CAM 500000 Boston
Example of Vertical PartitioningSlide 10
Correctness of Fragmentation
Completeness Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be found in some Ri
This property, which is identical to the lossless decomposition property of normalization
it ensures that the data in a global relation are mapped into fragments without any loss
Slide 11
Correctness of Fragmentation
Reconstruction If relation R is decomposed into fragments FR={R1,
R2, ..., Rn}, then there should exist some relational operator ∇ such that
R = Ri ,∇ The reconstructability of the relation from its fragments
ensures that constraints defined on the data in the form of dependencies are preserved.
Slide 12
Correctness of Fragmentation
Disjointness If relation R is horizontally decomposed into fragments
FR={R1, R2, ..., Rn}, and data item dj is in Rj, then dj should not be in any other fragment Rk (k ≠ j ).
This criterion ensures that the horizontal fragments are disjoint.
If relation R is vertically decomposed, its primary key attributes are typically repeated in all its fragments (for reconstruction).
Therefore, in case of vertical partitioning, disjointness is defined only on the non-primary key attributes of a relation.
Slide 13
Allocation Alternatives
Non-replicated partitioned : each fragment resides at only one site
Replicated fully replicated : each fragment at each site partially replicated : each fragment at some of the sites
Slide 14
Information Requirements
The information needed for distribution design can be divided into four categories:
Database information
Application information
Communication network information
Computer system information
Slide 15
Fragmentation
Horizontal Fragmentation (HF)
Vertical Fragmentation (VF)
Hybrid Fragmentation (HF)
Slide 16
Horizontal Fragmentation (HF)
Horizontal fragmentation partitions a relation along its tuples. each fragment has a subset of the tuples of the relation.
There are two versions of horizontal partitioning: Primary horizontal fragmentation Derived horizontal fragmentation
Primary horizontal fragmentation of a relation is performed using predicates that are defined on that relation.
Derived horizontal fragmentation is the partitioning of a relation results from predicates being defined on another relation.
Slide 17
Information Requirements of HF
Database Information how the database relations are connected to one another,
especially with joins.
In the relational model, these relationships are also depicted as relations.
cardinality of each relation: card(R)
TITLE, SAL
SKILL
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC
ENO, PNO, RESP, DUR
EMP PROJ
ASG
L1
L2 L3
Expression of Relationships Among Relations Using Links
Slide 18
Application Information simple predicates : Given R[A1, A2, …, An], a simple
predicate pj is
pj : Ai θValue
where θ {=,<,≤,>,≥,≠}, Value Di and Di is the domain of Ai.
For relation R we define Pr = {p1, p2, …,pm}
Example :PNAME = "Maintenance"
BUDGET ≤ 200000
Information Requirements of HF
Slide 19
Application Information minterm predicates : Given R and Pr = {p1, p2, …,pm}
define M = {m1,m2,…,mr} as
M = { mi | mi = pjPr pj* }, 1≤j≤m, 1≤i≤z
where pj* = pj or pj* = ¬(pj).
Information Requirements of HF
Example
m1: PNAME="Maintenance" BUDGET≤200000
m2: NOT(PNAME="Maintenance") BUDGET≤200000
m3: PNAME= "Maintenance" NOT(BUDGET≤200000)
m4: NOT(PNAME="Maintenance") NOT(BUDGET≤200000)
Slide 20
Application Information In terms of quantitative information about user applications,
we need to have two sets of data.
minterm selectivities: sel(mi)
o The number of tuples of the relation that would be accessed by a user query which is specified according to a given minterm predicate mi.
access frequencies: acc(qi)
o The frequency with which a user application qi accesses data.
o Access frequency for a minterm predicate can also be defined.
Information Requirements of HF
Slide 21
Primary Horizontal Fragmentation
Definition :A primary horizontal fragmentation is defined by a selection operation on the owner relations of a database schema. Therefore, given relation R, its horizontal fragments are given by
Ri = Fi(R), 1 ≤ i ≤ w
where Fi is a selection formula used to obtain fragment Ri. If Fi is in conjunctive normal form, it is a minterm predicate (mi).
A horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate mi.
Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates.
Set of horizontal fragments also referred to as minterm fragments.
Slide 22
Primary Horizontal Fragmentation PROJ1 = LOC=“Montreal” (PROJ)
PROJ2 = LOC=“New York” (PROJ)
PROJ3 = LOC=“Paris” (PROJ)
Slide 23Primary Horizontal Fragmentation of Relation PROJ
PHF – AlgorithmGiven: A relation R, the set of simple predicates Pr
Output: The set of fragments of R = {R1, R2,…,Rw} which obey the fragmentation rules.
Preliminaries :
Pr should be complete
Pr should be minimal
Slide 24
Completeness of Simple Predicates A set of simple predicates Pr is said to be complete if
and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to Pr.
Example :Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.Find the budgets of projects at each location. (1)Find projects with budgets less than $200000. (2)
Slide 25
Completeness of Simple Predicates
According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}
which is not complete with respect to (2).
Modify
Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”, BUDGET≤200000,BUDGET>200000}
which is complete.
Slide 26
Minimality of Simple Predicates If a predicate influences how fragmentation is performed,
(i.e., causes a fragment f to be further fragmented into, say, fi and fj) then there should be at least one application that accesses fi and fj differently.
In other words, the simple predicate should be relevant in determining a fragmentation.
If all the predicates of a set Pr are relevant, then Pr is minimal.
Pi is relevant if and only if
acc(mi ) the access frequency of a minterm mi.
Slide 27
)(
)(
)(
)(
j
j
i
i
fcard
macc
fcard
macc
Minimality of Simple Predicates
Example :
Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}
is minimal (in addition to being complete). However, if we add
PNAME = “Instrumentation”
then Pr is not minimal.
Slide 28
COM_MIN Algorithm
Given: a relation R and a set of simple predicates Pr
Output: a complete and minimal set of simple predicates Pr' for Pr
Rule 1: a relation or fragment is partitioned into at least two parts which are accessed differently by at least one application.
Slide 29
COM_MIN Algorithm Initialization :
find a pi Pr such that pi partitions R according to Rule 1
set Pr' = pi ; Pr Pr – {pi} ; F {fi}
Iteratively add predicates to Pr' until it is complete find a pj Pr such that pj partitions some fk defined
according to minterm predicate over Pr' according to Rule 1
set Pr' = Pr' {pj}; Pr Pr – {pj}; F F {fj}
if pk Pr' which is nonrelevant then
Pr' Pr' – {pk}
F F – {fk}
Slide 30
PHORIZONTAL AlgorithmMakes use of COM_MIN to perform fragmentation.
Input: a relation R and a set of simple predicates Pr
Output: a set of minterm predicates M according to which relation R is to be fragmented
Pr' COM_MIN (R,Pr) determine the set M of minterm predicates
determine the set I of implications among pi Pr‘
Iteratively eliminate the contradictory minterms from M
M M-mi
Slide 31
PHF – Example Two candidate relations : PAY and PROJ. Fragmentation of relation PAY
Application: Check the salary info and determine raise. Employee records kept at two sites application run at
two sites Simple predicates
p1 : SAL ≤ 30000
p2 : SAL > 30000
Pr = {p1,p2} which is complete and minimal Pr'=Pr
Minterm predicates
m1 : (SAL ≤ 30000)
m2 : NOT(SAL ≤ 30000) = (SAL > 30000)
Slide 32
PHF – Example
TITLE
Mech. Eng.
Programmer
SAL
27000
24000
PAY1 PAY2
TITLE
Elect. Eng.
Syst. Anal.
SAL
40000
34000
Slide 33
PHF – Example Fragmentation of relation PROJ
Applications:o Find the name and budget of projects given their location.
– Issued at three siteso Access project information according to budget
– one site accesses ≤200000 other accesses >200000 Simple predicates For application (1)
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris” For application (2)
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000 Pr = Pr' = {p1,p2,p3,p4,p5}
Slide 34
Fragmentation of relation PROJ continued
Minterm fragments left after elimination
m1 : (LOC = “Montreal”) (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”) (BUDGET > 200000)
m3 : (LOC = “New York”) (BUDGET ≤ 200000)
m4 : (LOC = “New York”) (BUDGET > 200000)
m5 : (LOC = “Paris”) (BUDGET ≤ 200000)
m6 : (LOC = “Paris”) (BUDGET > 200000)
PHF – Example
Slide 35
PHF – Example
PROJ1
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P2 Database Develop.135000 New York
PROJ3
PROJ4 PROJ6
PNO PNAME BUDGET LOC
P3 CAD/CAM 250000 New York
PNO PNAME BUDGET LOC
MaintenanceP4 310000 Paris
The result of the primary horizontal fragmentation of PROJ forms six fragments FPROJ = {PROJ1, PROJ2, PROJ3, PROJ4, PROJ5, PROJ} according to the minterm predicates M .
Since fragments PROJ2, and PROJ5 are empty, they are not depicted in Figure
Slide 36
Completeness
Since Pr' is complete and minimal, the selection predicates are complete
Reconstruction
If relation R is fragmented into FR = {R1,R2,…,Rr}
R = Ri FR Ri
Disjointness
Minterm predicates that form the basis of fragmentation should be mutually exclusive.
PHF – Correctness
Slide 37
Derived Horizontal Fragmentation
Defined on a member relation of a link according to a selection operation specified on its owner. Each link is an equijoin. Equijoin can be implemented by means of semijoins.
TITLE, SAL
SKILL
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC
ENO, PNO, RESP, DUR
EMP PROJ
ASG
L1
L2 L3
Slide 38
DHF – DefinitionGiven a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are defined as
Ri = R ⋉ Si, 1≤i≤w
where w is the maximum number of fragments that will be defined on R and
Si = Fi (S)
where Fi is the formula according to which the primary horizontal fragment Si is defined.
Slide 39
Given link L1 where owner(L1)=SKILL and member(L1)=EMP
EMP1 = EMP ⋉ SKILL1
EMP2 = EMP ⋉ SKILL2
whereSKILL1 = SAL≤30000(SKILL)
SKILL2 = SAL>30000(SKILL)
DHF – Example
ENO ENAME TITLE
E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE7 R. Davis Mech. Eng.
EMP1
ENO ENAME TITLE
E1 J. Doe Elect. Eng.
E2 M. Smith Syst. Anal.E5 B. Casey Syst. Anal.
EMP2
E6 L. Chu Elect. Eng.
E8 J. Jones Syst. Anal.Slide 40
DHF – Correctness Completeness
Referential integrityo ensures that the tuples of any fragment of the member
relation are also in the owner relation.o Let R be the member relation of a link whose owner is
relation S which is fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A be the join attribute between R and S. Then, for each tuple t of R, there should be a tuple t' of S such that
t[A] = t' [A]
Reconstruction Let a relation R with fragmentation FR = { R1,R2,……Rw }
Disjointness Simple join graphs between the owner and the member
fragments. Slide 41
Slide 42
DHF – Correctness
More difficult than horizontal, because more alternatives exist.
Example: In horizontal partitioning, if the total number of simple
predicates in Pr is n, there are 2n possible minterm predicates that can be defined on it.
some of these will contradict the existing implications, further reducing the candidate fragments that need to be considered
In the case of vertical partitioning if a relation has m non-primary key attributes, the number of possible fragments is equal to B(m), which is the mth Bell number.
For large values of m;B(m)= approximately (mm) for m=10, B(m) =115,000, for m=15, B(m) =109, for m=30, B(m) = 1023
Vertical Fragmentation
Slide 43
In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user applications.
In this case a vertical fragmentation may be followed by a horizontal one, or vice versa, producing a tree structured Partitioning.
Since the two types of partitioning strategies are applied one after the other, this alternative is called hybrid fragmentation.
It has also been named mixed fragmentation or nested fragmentation.
Slide 44
Hybrid Fragmentation
Hybrid Fragmentation
R
HFHF
R1
VF VFVFVFVF
R11 R12 R21 R22 R23
R2
It is also called mixed fragmentation or nested fragmentation.
Slide 45
To reconstruct the original global relation in case of hybrid fragmentation, one starts at the leaves of the partitioning tree and moves upward by performing joins and unions.
The fragmentation is complete if the intermediate and leaf fragments are complete.
Similarly, disjointness is guaranteed if intermediate and leaf fragments are disjoint.
Slide 46
Correctness of Hybrid Fragmentation
Allocation Allocation Problem
Given
F = {F1, F2, …, Fn} fragments
S ={S1, S2, …, Sm} network sites
on which a set of applications Q = {q1, q2,…, qq} is running.
The allocation problem involves finding the “optimal” distribution of F to S. Optimality can be defined with respect to two measures:
Minimal costo The cost function consists of the cost of storing each Fi at a site Sj,o the cost of querying Fi at site Sj , the cost of updating Fi at all sites where it is
stored,o the cost of data communication.
Performanceo minimize the response time.o maximize the system throughput at each site.
Slide 47
General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint
Decision Variable
Allocation Model
xij 1 if fragment Fi is stored at site Sj
0 otherwise
Slide 48
Total Cost
Storage Cost (of fragment Fj at Sk)
We choose a different approach in our model of the database allocation problem (DAP) and specify it as consisting of the processing cost (PC) and the transmission cost (TC).
Thus the query processing cost (QPC) for application qi is:
processing component + transmission component
Allocation Model
(unit storage cost at Sk) (size of Fj) xjk
query processing cost all queries
cost of storing a fragment at a site
all fragments
all sites
Slide 49
Allocation Model Query Processing Cost
Processing component PC, consists of three cost factors the access cost (AC) + the integrity enforcement cost (IE) + the
concurrency control cost (CC) Access cost
o The first two terms calculate the number of accesses of user query qi to fragment Fj.
o We assume that the local costs of processing them are identical.o The summation gives the total number of accesses for all the
fragments referenced by qi. Multiplication by LPCk gives the cost of this access at site Sk.
o We again use xij to select only those cost values for the sites where fragments are stored.
Integrity enforcement and concurrency control costso Can be similarly calculated
(no. of update accesses+ no. of read accesses) all fragments
all sites xij local processing cost at a site
Slide 50
Query Processing Cost
Transmission component
cost of processing updates + cost of processing retrievals In update queries it is necessary to inform all the sites where replicas exist,
while in retrieval queries, it is sufficient to access only one of the copies. In addition, at the end of an update request, there is no data transmission
back to the originating site other than a confirmation message, whereas the retrieval-only queries may result in significant data transmission.
Cost of updates
Retrieval Cost
Allocation Model
update message cost all fragments
all sites
acknowledgment cost all fragments
all sites
minall sites
all fragments (cost of retrieval command cost of sending back the result)
Slide 51
Allocation Model Constraints
Response Time
execution time of query ≤ max. allowable response time for that query
Storage Constraint (for a site)
Processing constraint (for a site)
storage requirement of a fragment at that site
all fragments
storage capacity at that site
processing load of a query at that site all queries
processing capacity of that site
Slide 52
Thank You
Slide 53