1 advanced database topics copyright © ellis cohen 2002-2005 distributed databases: organization...
TRANSCRIPT
1
Advanced Database Topics
Copyright © Ellis Cohen 2002-2005
Distributed Databases:Organization &
Query Processing
These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
For more information on how you may use them, please see http://www.openlineconsult.com/db
Copyright © Ellis Cohen, 2002-2005 2
Topics
Distributed Database Architecture
Location Transparency
Data Placement & Fragmentation
Distributed Query Processing
Copyright © Ellis Cohen, 2002-2005 3
Distributed Databases
DistributionSpreading data across multiple network
nodes
Partitioning & Fragmentation Distribute tables divided into
vertical or horizontal partsReplication
Replicating (parts of) tables across multiple nodes
Why would we want to distribute or replicate data?
Copyright © Ellis Cohen, 2002-2005 4
Distribution & Replication
DistributionIntegrate separate databasesDecrease network latency by locating data
near greatest demandLocate data within
secure administrative boundariesParallel processing
ReplicationDecreased network latency by placing
replicas near multiple high demand sitesHigh availability & reliability in face of failureMore parallel processingScalability (single copy no longer bottleneck)Disconnected operation
Copyright © Ellis Cohen, 2002-2005 5
Date's 12 Rules for Integrated Distributed Systems 1. Local autonomy 2. No reliance on a central site3. Continuous operation 4. Location transparency 5. Fragmentation transparency 6. Replication transparency 7. Distributed query processing 8. Distributed transaction management 9. Hardware independence 10. Operating system independence 11. Network independence 12. DBMS independence
(transparent heterogeneity)
To the user, a distributed DB system should look exactly like a non-distributed system
Copyright © Ellis Cohen, 2002-2005 6
Design Issues for DDBMS's Keep track of data names & locationsDecide what to fragment & replicateDecide placement (allocation, distribution) of
objects, fragments & replicasDevise strategies for executing transactions &
queries that access data from multiple sitesManage distributed transactions,
including backup & recovery from– individual site crashes– communication link failures
Decide which copy/copies of replicated data to access
Maintain consistency of replicated data copies
Copyright © Ellis Cohen, 2002-2005 7
Distributed Database Architectures
Copyright © Ellis Cohen, 2002-2005 8
Distributed Database Architectures
Architectures– Multi-Database Architecture
• Appears to user as separate databases– TP Monitor / Application Server Architecture
• Separate server to handle transaction management & other services (e.g. security)
– Federated Database Architecture• Appears to user as a single database providing a
global schema integrating disparate DB's– Collaborating Database Architecture
• A collection of peer databases, which interconnect to one another, providing a global schema to users who connect to an individual peer
HeterogeneityHomogeneous: every site runs same type of DBMSHeterogeneous: different sites run different DBMS's
(perhaps even non-relational ones)
Copyright © Ellis Cohen, 2002-2005 9
Coordination
Coordination of a distributed transaction is managed by a coordinator, which resides at a single node
• Multi-Database ArchitectureClient is the coordinator
• TP Monitor / Application Server ArchitectureTP Monitor / App Server is coordinator
• Federated Database ArchitectureFederation Server is coordinator
• Collaborating Database ArchitectureThe peer connected to by the client is the
coordinator
Copyright © Ellis Cohen, 2002-2005 10
Multi-Database Architecture
Client
DB Server DB Server
Client acts as coordinator
• Issues queries directly to multiple DB servers (subordinates)
• Integrates the results
• Handles distributed transaction management(as well as it can)
Subordinates
Coordinator
Copyright © Ellis Cohen, 2002-2005 11
Sub-query DistributionSuppose a coordinator wants to execute the query that lists the project managed by the highest paid employee
SELECT * FROM Projs WHERE pmgr =(SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps))
If subordinate S1 holds the Projs table, and subordinate S2 holds the Emps tables, then the coordinator will request S2 to execute the sub-query
SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps)
Will get the result back (let's call it result), and request S1 to execute (and return the results of) the sub-query
SELECT * FROM Projs WHERE pmgr = result
Copyright © Ellis Cohen, 2002-2005 12
Sub-transactionsImagine a coordinator C has started a
transaction TC, and is executing a query as part of TC.– The coordinator divides the query up into
sub-queries, which it sends to various subordinates.
– It labels each subquery with TC, the identity of the main transaction.
When a subordinate S is passed a sub-query– If it has not yet seen the label TC, it creates a local
transaction TS (called a sub-transaction), and associates TS with TC.
– If it has seen TC before, it looks up the corresponding TS.
In either case, S runs the sub-query as part of the local sub-transaction TS
Copyright © Ellis Cohen, 2002-2005 13
Transaction Manager
Transaction Manager
DB Server
Client
DB Server
API's & Protocols
standardized by X/Open
Client acts as coordinator
• Uses Transaction Manager to handle distributed transaction management
Client still• Issues queries
directly to multiple DB servers
• integrates the results
Sub-transactions
Copyright © Ellis Cohen, 2002-2005 14
Distributed Transaction Management
Coordinator's transaction manager communicates with each subordinate (participating DB server)
Each subordinate manages its own sub-transactions– Reflects queries performed by that subordinate
on behalf of the parent transaction– Enforces ACID requirements of the subordinate– Enables independent recovery by each
subordinate
Provides distributed concurrency control to ensure global serializability
Provides atomic commit protocol to ensure global atomicity & durability
Copyright © Ellis Cohen, 2002-2005 15
The Distributed Commit Problem
• A distributed transaction which executes at multiple sites must either be committed at all sites or aborted at all sites
• Not acceptable for one sub-transaction to commit and one abort.
• If the coordinator just sends a COMMIT message to two subordinates S1 and S2– S1 could get the COMMIT message and commit
– S2 could crash just before it gets the COMMIT message, and before writing any local subtransaction state to stable storage) -- i.e. S2 is aborted
• Obviously a more complicated protocol is needed, which we will address later
Copyright © Ellis Cohen, 2002-2005 16
TP Monitor / Application Server
Client uses TP Monitor / App Server to execute transactions
Application Server may use load balancing to decide which Application Server should coordinate transaction
Transaction executing within App Server
• makes direct calls to multiple DB servers
• integrates the results
• Uses App Server's Transaction Mgr to handle distributed transaction management
Client
App Server App Server
DB Server
. . .
DB Server DB Server
. . .
Sub-queries
Copyright © Ellis Cohen, 2002-2005 17
HeterogeneityHeterogeneous Databases
– Different data types– Different SQL commands or syntax– Different protocols– Different embedded programming
languages– Different security mechanisms
(authentication & access control)– Different concurrency mechanisms
Heterogeneous Data Models– Different names– Different values (esp units)– Different constraints & derived values
Copyright © Ellis Cohen, 2002-2005 18
Heterogeneity TransparencyNon-Transparent:
Client must deal with some or all aspects of database heterogeneity directly
Semi-Transparent:Mapping layer hides most
differences among databases
Coordinator may still be able to exploit differences (e.g. pass-through SQL)
TransparentMapping layer hides
differences among databases and among data models
DB Server
Mapping Layer
Coordinator
Copyright © Ellis Cohen, 2002-2005 19
Mapping Architecture
DB Server DB Server DB Server
Mapping layer may reside in• Coordinator• DB Server• separate
Gateway Server
Mapping Layer
Coordinator
Coordinator may be• Client• App Server• DB Server
Copyright © Ellis Cohen, 2002-2005 20
Federated Database Architecture
Federation Layer supports• Transaction Management• Heterogeneity Mapping Layer • Global Schema supported by
Distributed Query Processing
Federation Layer may be• Software layer callable by
client (i.e. extended transaction manager)
• Provided by separate Federated DB Server (e.g. extended TP Monitor)
• Integrated with DB server (i.e. Collaborating DB Architecture)
DB Server
FederationLayer
Client / App Server
DB Server
TransactionsQueries
Sub-TransactionsSub-Queries
Copyright © Ellis Cohen, 2002-2005 21
Collaborating Database Architecture
DB Server
DB Server
DB Server
DB Server
Client can connect to one of a set of DB Servers
Connecting DB Server• Provides global schema• May choose a different DB
Server to coordinate transaction (e,g, based on load balancing or one nearest data)
Coordinating DB Server• Handles distributed
transaction management• Handles distributed query
management
DB Servers• Appear homogeneous• May themselves be Federated
DBs or Gateway Servers
Client
Collaborating DB servers generally
communicate using private protocol
Could itself be an App/DB/Gateway
Server
Copyright © Ellis Cohen, 2002-2005 22
Location Transparency
Copyright © Ellis Cohen, 2002-2005 23
1. DB objects must be able to reside and be created at multiple sites in a system
2. Each DB object must be able to be uniquely named by a transaction
3. The name for a DB object used by a transaction must enable the object to be located efficiently
4. It must be possible to write transaction code that will not need to be modified if either
• the transaction is executed at a different site
• The DB objects accessed are moved
Location Transparency Requirements
Copyright © Ellis Cohen, 2002-2005 24
Explicit Site Naming
SELECT * FROM [email protected]
If @ (as in Oracle) reflects the table's current location, this does not support the key transparency requirement.
However, if @ identifies the table's birth site, which then holds the table's forwarding location (where it is currently located, or which does further forwarding), the transparency is retained.
Security considerationsIn what security domain does the transaction run
on the remote machine?What if the user currently running does not have
an account on the remote machine?
Copyright © Ellis Cohen, 2002-2005 25
Synonyms
joe@boston> create SYNONYM emp for [email protected]
joe@boston> SELECT * FROM emp
Is emp a– Local synonym [can only be used by joe?]– Part of joe's schema?
dilip@boston> SELECT * FROM joe.emp
Even if synonyms are automatically replicated on every machineno guarantee of location transparency because of naming conflicts
Copyright © Ellis Cohen, 2002-2005 26
Location Transparency via Global Directory Management
Design a global directory hierarchyProvides a separate naming scope
for storing synonyms
joe@boston> CREATE PUBLIC GLOBAL DIRECTORY /stuffjoe@boston> CREATE PUBLIC DIRECTORY /stuff/empinfo
// invented syntax
joe@boston> CREATE PUBLIC GLOBAL SYNONYM /stuff/empinfo/emp FOR [email protected]
sam@podunk> SELECT * FROM /stuff/empinfo/emp
Where is the global directory stored?– Centralized directory manager (name server)
susceptible to bottlenecks and failures– Needs to be replicated
Copyright © Ellis Cohen, 2002-2005 27
Data Placement & Fragmentation
Copyright © Ellis Cohen, 2002-2005 28
Data PlacementCompany HQ in Des MoinesWarehouses in SF, NY, Denver
SfCust( custid, addr )NyCust( custid, addr )DenverCust( custid, addr )
A. Place all 3 in DesMoinesB. Place SfCust in SF
NyCust in NY DenverCust in Denver
C. Place SfCust in SF & DesMoines NyCust in NY & Des Moines DenverCust in Denver & DesMoines
How would you decide?
Copyright © Ellis Cohen, 2002-2005 29
Data FragmentationHorizontal Fragmentation•Each fragment is a subset of
rows•Rows do not overlap (else
doing partial replication)•Reconstruction by union•Updates may requires tuple
migration
Vertical Fragmentation•Each fragment is a subset of
columns•All fragments include
primary key columns or share ROWIDs
•Reconstruction by join•Updates do not require
tuple migration
Why would you choose one or another of these
approaches?
Copyright © Ellis Cohen, 2002-2005 30
Rules for Data Fragmentation
Completeness All the data of the global relation must be mapped to the fragments
Reconstruction It must always be possible to reconstruct each global relation from its fragments
Disjointedness If fragments are disjoint, then decisions about replication of data can be made somewhat separately from decisions about fragmentation
Copyright © Ellis Cohen, 2002-2005 31
Horizontal Fragmentation
Create:CREATE TABLE emp ( … ) PARTITION ( [email protected] WHERE deptno = 10, [email protected] WHERE deptno = 20, [email protected] WHERE deptno = 30, [email protected] OTHERWISE)
// invented syntax loosely based on OracleThe predicates defining all the fragments should be complete and mutually exclusive (or else there is replication)
Reconstruct:SELECT * FROM [email protected] UNIONSELECT * FROM [email protected] UNIONSELECT * FROM [email protected] UNIONSELECT * FROM [email protected]
Copyright © Ellis Cohen, 2002-2005 32
Fragmentation TransparencySELECT ename, job FROM emp
WHERE sal > 50000
SELECT ename, job FROM [email protected] WHERE sal > 50000
UNIONSELECT ename, job FROM
[email protected] WHERE sal > 50000
UNIONSELECT ename, job FROM
[email protected] WHERE sal > 50000
UNIONSELECT ename, job FROM
[email protected] sal > 50000
Integrate decomposed queries via
union
Implementas
Copyright © Ellis Cohen, 2002-2005 33
Fragmentation Transparency for Updates
UPDATE emp SET deptno = 30 WHERE empno = 6749;
// assumes you know deptno currently 20;// much more complicated otherwise
Implementing this update requirestuple migration
SELECT * INTO anEmpFROM [email protected] empno = 6749;
DELETE FROM [email protected] empno = 6749;
INSERT INTO [email protected] ( 6749, anEmp.ename, anEmp.job, anEmp.mgr, anEmp.hiredate, anEmp.sal, anEmp.comm, 30 );
Implementas
Copyright © Ellis Cohen, 2002-2005 34
Vertical Fragmentation Create:CREATE TABLE emp ( empno int primary key, … )
PARTITION ( ename, job, mgr, deptno AS
[email protected], hiredate AS [email protected], sal, comm AS [email protected])
// invented syntax loosely based on OracleThe rows defining all the fragments should be complete and mutually exclusive. All automatically include the primary key empno to match up rows (or use some other mechanism to match ROWIDs)
Reconstruct:SELECT i.empno, i.job, i.mgr, h.hiredate,
a.sal, a.comm, i.deptnoFROM [email protected] i, NATURAL JOIN [email protected] h, NATURAL JOIN [email protected] a
Copyright © Ellis Cohen, 2002-2005 35
Hybrid Fragmentation
CREATE TABLE emp ( empno int primary key, … ) PARTITION (
ename, job, mgr, deptno AS ( [email protected] where deptno = 10,[email protected] where deptno = 20,[email protected] where deptno = 30,[email protected] otherwise )
hiredate AS [email protected],
sal, comm AS [email protected])
// invented syntax loosely based on Oracle
Copyright © Ellis Cohen, 2002-2005 36
Data Placement Revisited
Company HQ in Des MoinesWarehouses in SF, NY, Denver
Cust( custid, addr, whse )whse is 'SF', 'NY', or 'Denver'
A. Place Cust at Des MoinesA. Partition Cust by whse
SfCust@SF NyCust@NY DenverCust@Denver
C. Leave Cust at Des Moines and also partition as SfCust@SF, NyCust@NY & DenverCust@Denver
How would you decide?
Copyright © Ellis Cohen, 2002-2005 37
Database Design ProblemHard Optimization Problem
(even w/o considering replication)– Fragmentation: How to fragment tables– Allocation/Placement:
Where to place tables and fragments
Relative to minimizing/maximizing some cost function - e.g.– minimize query response time– maximize throughput– must be approximate, since determining actual
query plan is a separate optimization problem
Subject to constraints - e.g.– Available storage, bandwidth, processing power,
…– Keep 90% of response time below X
Copyright © Ellis Cohen, 2002-2005 38
Optimization Approach
Factors to ConsiderThe originating site(s) of queries/updates
Which attributes are accessed together
Which attributes & combinations of selection predicates are used from which sites, with which frequencies
Frequencies of updates that affect combinations of selection predicates
Data integration costs (costs of joins and unions for fragments) vs increase in parallelism
Costs of communication, concurrency control, security & integrity maintenance
Copyright © Ellis Cohen, 2002-2005 39
Distributed Query Processing
Copyright © Ellis Cohen, 2002-2005 40
Distributed Query Processing
Query processingBased on algorithms that analyze queries
and convert them into a series of data manipulation operations.
The problemDeciding a strategy for executing each
query over the network in the most cost effective way, however the cost is defined.
Main factorsI/O, CPU, Communication costsOpportunity for pipelining & parallel
operations
Copyright © Ellis Cohen, 2002-2005 41
Distributed Query Example
Given tablesemp( empno, ename, deptno, sal, … )
at site S1 (largest)project( pno, pname, mgr, … )
at site S2dept( deptno, dname, loc )
at site S3 (smallest)
deptS3
projS2
empS1
Copyright © Ellis Cohen, 2002-2005 42
Sub-Querying & Shipping
Queries are executed via a combination of computing queries and shipping data.
For example, suppose we want to execute a query to find out the name of each project, along with its project manager & the name of that manager's department
SELECT pname, ename, dnameFROM project p, emp e, dept dWHERE p.mgr = e.empno AND e.deptno = d.deptno
Copyright © Ellis Cohen, 2002-2005 43
Alternative 1Ship dept & project to S1Process query at S1
Alternative 2Ship emp & project to S3Process query at S3
Consider Cost-based Alternatives
deptS3
projS2
empS1
deptS3proj
S2
empS1
Which one is better?
Copyright © Ellis Cohen, 2002-2005 44
Alternative 1Ship dept & project to S1Process query at S1
Alternative 2Ship emp & project to S3Process query at S3
In general, alternative #1is better, because itinvolves shipping less information
But to really determine the best approach, you must consider– Communication costs to S1 vs S3
(what if slow line between S2 & S1)– Relative processing speeds and scheduling
algorithms at S1 vs S3– Size of result & location of coordinator
Evaluating Alternativesdept
S3proj
S2
empS1
deptS3proj
S2
empS1
Copyright © Ellis Cohen, 2002-2005 45
Intermixing Querying & Shipping
Rather than shipping base tables and performing a single query, it may make sense to– do a query at one site
– ship the query results to another site
– do a query at that site joining the results received with data available at a that site
In general, a distributed query plan involves a (potentially lengthy) sequence of performing queries and shipping data (either base tables or query results)
Copyright © Ellis Cohen, 2002-2005 46
Distributed Query Planning Example
For example, suppose we are only interested in projects, where the project manager makes more than 8000/month. For those projects, we want the name of the project, the name of the project manager & the name of that manager's department.
ProcessSELECT pname, ename, dname
FROM project p, emp e, dept dWHERE p.mgr = e.empno AND e.deptno = d.deptnoAND e.sal > 8000
deptS3
projS2
empS1
If there are not very many employees who make > 8000, what's the best plan for executing this query?
Copyright © Ellis Cohen, 2002-2005 47
Restrict before Ship
At S1, COMPUTE emplet ASSELECT empno, ename FROM emp
WHERE sal > 8000
SHIP emplet & deptFROM S1 TO S2
AT S2, COMPUTESELECT ename, dname, pname
FROM emplet e, dept d, project pWHERE p.mgr = e.empnoAND e.deptno = d.deptno
dept
S3
proj
S2
empS1
Copyright © Ellis Cohen, 2002-2005 48
Semijoins
A semijoin is
• a join between two (or more tables) where
• one of the tables is just used to restrict the result, but not provide any data
Example
List the names of employees whose departments are located in NY
SELECT e.empno FROM emp e, dept dWHERE e.deptno = d.deptno
All the result data comes from the emp table
The dept table is joined with emp, simply torestrict the tuples chosen from the emp table
Copyright © Ellis Cohen, 2002-2005 49
Using Semijoins in Distributed Queries
Db
Sb
DaSa
1
2
1) Some data (generally the result of a query) is shipped from site Sa to site Sb
2) The shipped data is used in a semijoin with the data at Sb.This produces a subset of the data at Sb, restricted based on the data shipped from Sa
3) The result of the semijoin is shipped back to Sa, where it is combined with data already there
3
If S1 is the coordinator (where the results must end up), how can semijoins be used to produce a more efficient
solution to the project manager query?
Copyright © Ellis Cohen, 2002-2005 50
Using SemijoinsAt S3, COMPUTE deptlet ASSELECT deptno, dnameFROM Dept
SHIP deptlet FROM S3 TO S1
At S1, COMPUTE emproj ASSELECT pmgr, pnameFROM project, emplWHERE pmgr = empno ORDER by pmgr
SHIP emproj FROM S2 TO S1
At S1, COMPUTE emplet ASSELECT empno, enameFROM empWHERE sal > 8000
At S1, COMPUTE empl AS SELECT empno FROM emplet
SHIP empl FROM S1 TO S2
deptS3
proj
S2
emp
S1
Shipping empl to S2 limits the tuples from proj to be sent back to S1
12
3
4
At S3, COMPUTE deptlet ASSELECT pname, ename, dnameFROM emplet e, deptlet d, emproj pWHERE e.deptno = d.deptno and e.empno = p.pmgr
1
2
3
4
Copyright © Ellis Cohen, 2002-2005 51
Planning Alternatives
Result-Based or Stream-Based– Result-Based: A site waits until it receives
the entire result set shipped to it before it can use it in a query
– Stream-Based: A query at a site will use data streamed to it as it arrives from another site (also called pipelining)
Sequential or Parallel– Sequential: A site ships data to (or
requests data) from one other site at a time
– Parallel: A site can ship data to (or request data from) multiple sites in parallel
Copyright © Ellis Cohen, 2002-2005 52
Streaming & Pipelining
AT S1, COMPUTE empdept ASSELECT empno, ename, dname FROM emp, deptWHERE emp.deptno = dept.deptnoAND sal > 8000ORDER BY empno
STREAM empdept FROM S1 TO S2
deptS3
proj
S2
empS1
When would this approach be useful?
At S3, COMPUTE deptlet ASSELECT deptno, dnameFROM dept
SHIP deptlet FROM S3 TO S1
AT S2, COMPUTESELECT p.pname, ed.ename, ed.dnameFROM project p, empdept edWHERE p.mgr = ed.empno
Copyright © Ellis Cohen, 2002-2005 53
Parallelism & Streaming
AT S1, COMPUTESELECT pname, ename, dnameFROM emp e, eproj p, deptlet dWHERE e.deptno = d.deptno AND e.empno = p.pmgr
AS S2, COMPUTE eproj ASSELECT pmgr, pnameFROM project p, empl eWHERE e.empno = p.pmgr
STREAM eproj FROM S2 TO S1
AT S1, COMPUTE dempl ASSELECT DISTINCT deptnoFROM empWHERE sal > 8000
STREAM dempl FROM S1 TO S3
dept
S3
proj
S2
emp
S1
At S1, COMPUTE empl ASSELECT empnoFROM empWHERE sal > 8000ORDER BY empno
STREAM empl FROM S1 TO S2 AS S3, COMPUTE deptlet ASSELECT deptno, dnameFROM dept d, dempl eWHERE d.deptno = e.deptno
STREAM deptlet FROM S3 TO S1
Do in parallel
Copyright © Ellis Cohen, 2002-2005 54
What's Best
Informally, we've talked about how query planning finds the best way to process the query, involving
• subqueries• shipping/streaming• parallel execution
But when we say "best", what do we actually mean?
Copyright © Ellis Cohen, 2002-2005 55
Possible Query Plan Goals
Fastest complete result
Fastest first result
Minimize resource usageof specific resources
Combination of the above
Copyright © Ellis Cohen, 2002-2005 56
Query Optimization
Build initial tree for query– Build tree reflecting relational algebra corresponding
to query– Modify tree to account for fragmentation (more
complex if distributed fragments overlap)– Incorporate simplest ship operations into tree for
accessing remote data
Perform global query optimization– Apply transformation operators that produce an
equivalent tree– Account for pipelining & parallelism as well– Use heuristic search algorithm (e.g. hill climbing,
simulated annealing, genetic algorithms) to find best distributed query plan considering replicas
– Use cost function incorporating time taken by I/O, CPU & communication (best if statistics on size of relations & result sets are maintained)
Copyright © Ellis Cohen, 2002-2005 57
Global vs Local Query Optimization
Global Optimization produces – A set of decomposed queries to be sent
to various DB servers– Combined with ship/stream
instructions– All placed in a parallel/sequential
control flow graph
Local Optimization– Each local server determines best way
to execute each decomposed query sent to it (though global optimization may generate preliminary plans)