1 advanced database topics copyright © ellis cohen 2002-2005 distributed databases: organization...

1

Advanced Database Topics

Copyright © Ellis Cohen 2002-2005

Distributed Databases:Organization &

Query Processing

These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

For more information on how you may use them, please see http://www.openlineconsult.com/db

Copyright © Ellis Cohen, 2002-2005 2

Topics

Distributed Database Architecture

Location Transparency

Data Placement & Fragmentation

Distributed Query Processing


Distributed Databases

DistributionSpreading data across multiple network

nodes

Partitioning & Fragmentation Distribute tables divided into

vertical or horizontal partsReplication

Replicating (parts of) tables across multiple nodes

Why would we want to distribute or replicate data?


Distribution & Replication

DistributionIntegrate separate databasesDecrease network latency by locating data

near greatest demandLocate data within

secure administrative boundariesParallel processing

ReplicationDecreased network latency by placing

replicas near multiple high demand sitesHigh availability & reliability in face of failureMore parallel processingScalability (single copy no longer bottleneck)Disconnected operation


Date's 12 Rules for Integrated Distributed Systems 1. Local autonomy 2. No reliance on a central site3. Continuous operation 4. Location transparency 5. Fragmentation transparency 6. Replication transparency 7. Distributed query processing 8. Distributed transaction management 9. Hardware independence 10. Operating system independence 11. Network independence 12. DBMS independence

(transparent heterogeneity)

To the user, a distributed DB system should look exactly like a non-distributed system


Design Issues for DDBMS's Keep track of data names & locationsDecide what to fragment & replicateDecide placement (allocation, distribution) of

objects, fragments & replicasDevise strategies for executing transactions &

queries that access data from multiple sitesManage distributed transactions,

including backup & recovery from– individual site crashes– communication link failures

Decide which copy/copies of replicated data to access

Maintain consistency of replicated data copies


Distributed Database Architectures


Distributed Database Architectures

Architectures– Multi-Database Architecture

• Appears to user as separate databases– TP Monitor / Application Server Architecture

• Separate server to handle transaction management & other services (e.g. security)

– Federated Database Architecture• Appears to user as a single database providing a

global schema integrating disparate DB's– Collaborating Database Architecture

• A collection of peer databases, which interconnect to one another, providing a global schema to users who connect to an individual peer

HeterogeneityHomogeneous: every site runs same type of DBMSHeterogeneous: different sites run different DBMS's

(perhaps even non-relational ones)


Coordination

Coordination of a distributed transaction is managed by a coordinator, which resides at a single node

• Multi-Database ArchitectureClient is the coordinator

• TP Monitor / Application Server ArchitectureTP Monitor / App Server is coordinator

• Federated Database ArchitectureFederation Server is coordinator

• Collaborating Database ArchitectureThe peer connected to by the client is the

coordinator


Multi-Database Architecture

Client

DB Server DB Server

Client acts as coordinator

• Issues queries directly to multiple DB servers (subordinates)

• Integrates the results

• Handles distributed transaction management(as well as it can)

Subordinates

Coordinator


Sub-query DistributionSuppose a coordinator wants to execute the query that lists the project managed by the highest paid employee

SELECT * FROM Projs WHERE pmgr =(SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps))

If subordinate S1 holds the Projs table, and subordinate S2 holds the Emps tables, then the coordinator will request S2 to execute the sub-query

SELECT empno FROM Emps WHERE sal = (SELECT max(sal) FROM Emps)

Will get the result back (let's call it result), and request S1 to execute (and return the results of) the sub-query

SELECT * FROM Projs WHERE pmgr = result


Sub-transactionsImagine a coordinator C has started a

transaction TC, and is executing a query as part of TC.– The coordinator divides the query up into

sub-queries, which it sends to various subordinates.

– It labels each subquery with TC, the identity of the main transaction.

When a subordinate S is passed a sub-query– If it has not yet seen the label TC, it creates a local

transaction TS (called a sub-transaction), and associates TS with TC.

– If it has seen TC before, it looks up the corresponding TS.

In either case, S runs the sub-query as part of the local sub-transaction TS


Transaction Manager

Transaction Manager

DB Server

Client

DB Server

API's & Protocols

standardized by X/Open

Client acts as coordinator

• Uses Transaction Manager to handle distributed transaction management

Client still• Issues queries

directly to multiple DB servers

• integrates the results

Sub-transactions


Distributed Transaction Management

Coordinator's transaction manager communicates with each subordinate (participating DB server)

Each subordinate manages its own sub-transactions– Reflects queries performed by that subordinate

on behalf of the parent transaction– Enforces ACID requirements of the subordinate– Enables independent recovery by each

subordinate

Provides distributed concurrency control to ensure global serializability

Provides atomic commit protocol to ensure global atomicity & durability


The Distributed Commit Problem

• A distributed transaction which executes at multiple sites must either be committed at all sites or aborted at all sites

• Not acceptable for one sub-transaction to commit and one abort.

• If the coordinator just sends a COMMIT message to two subordinates S1 and S2– S1 could get the COMMIT message and commit

– S2 could crash just before it gets the COMMIT message, and before writing any local subtransaction state to stable storage) -- i.e. S2 is aborted

• Obviously a more complicated protocol is needed, which we will address later


TP Monitor / Application Server

Client uses TP Monitor / App Server to execute transactions

Application Server may use load balancing to decide which Application Server should coordinate transaction

Transaction executing within App Server

• makes direct calls to multiple DB servers

• integrates the results

• Uses App Server's Transaction Mgr to handle distributed transaction management

Client

App Server App Server

DB Server

. . .

DB Server DB Server

. . .

Sub-queries


HeterogeneityHeterogeneous Databases

– Different data types– Different SQL commands or syntax– Different protocols– Different embedded programming

languages– Different security mechanisms

(authentication & access control)– Different concurrency mechanisms

Heterogeneous Data Models– Different names– Different values (esp units)– Different constraints & derived values


Heterogeneity TransparencyNon-Transparent:

Client must deal with some or all aspects of database heterogeneity directly

Semi-Transparent:Mapping layer hides most

differences among databases

Coordinator may still be able to exploit differences (e.g. pass-through SQL)

TransparentMapping layer hides

differences among databases and among data models

DB Server

Mapping Layer

Coordinator


Mapping Architecture

DB Server DB Server DB Server

Mapping layer may reside in• Coordinator• DB Server• separate

Gateway Server

Mapping Layer

Coordinator

Coordinator may be• Client• App Server• DB Server


Federated Database Architecture

Federation Layer supports• Transaction Management• Heterogeneity Mapping Layer • Global Schema supported by


Federation Layer may be• Software layer callable by

client (i.e. extended transaction manager)

• Provided by separate Federated DB Server (e.g. extended TP Monitor)

• Integrated with DB server (i.e. Collaborating DB Architecture)

DB Server

FederationLayer

Client / App Server

DB Server

TransactionsQueries

Sub-TransactionsSub-Queries


Collaborating Database Architecture

DB Server

DB Server

DB Server

DB Server

Client can connect to one of a set of DB Servers

Connecting DB Server• Provides global schema• May choose a different DB

Server to coordinate transaction (e,g, based on load balancing or one nearest data)

Coordinating DB Server• Handles distributed

transaction management• Handles distributed query

management

DB Servers• Appear homogeneous• May themselves be Federated

DBs or Gateway Servers

Client

Collaborating DB servers generally

communicate using private protocol

Could itself be an App/DB/Gateway

Server


Location Transparency


1. DB objects must be able to reside and be created at multiple sites in a system

2. Each DB object must be able to be uniquely named by a transaction

3. The name for a DB object used by a transaction must enable the object to be located efficiently

4. It must be possible to write transaction code that will not need to be modified if either

• the transaction is executed at a different site

• The DB objects accessed are moved

Location Transparency Requirements


Explicit Site Naming

SELECT * FROM [email protected]

If @ (as in Oracle) reflects the table's current location, this does not support the key transparency requirement.

However, if @ identifies the table's birth site, which then holds the table's forwarding location (where it is currently located, or which does further forwarding), the transparency is retained.

Security considerationsIn what security domain does the transaction run

on the remote machine?What if the user currently running does not have

an account on the remote machine?


Synonyms

joe@boston> create SYNONYM emp for [email protected]

joe@boston> SELECT * FROM emp

Is emp a– Local synonym [can only be used by joe?]– Part of joe's schema?

dilip@boston> SELECT * FROM joe.emp

Even if synonyms are automatically replicated on every machineno guarantee of location transparency because of naming conflicts


Location Transparency via Global Directory Management

Design a global directory hierarchyProvides a separate naming scope

for storing synonyms

joe@boston> CREATE PUBLIC GLOBAL DIRECTORY /stuffjoe@boston> CREATE PUBLIC DIRECTORY /stuff/empinfo

// invented syntax

joe@boston> CREATE PUBLIC GLOBAL SYNONYM /stuff/empinfo/emp FOR [email protected]

sam@podunk> SELECT * FROM /stuff/empinfo/emp

Where is the global directory stored?– Centralized directory manager (name server)

susceptible to bottlenecks and failures– Needs to be replicated


Data Placement & Fragmentation


Data PlacementCompany HQ in Des MoinesWarehouses in SF, NY, Denver

SfCust( custid, addr )NyCust( custid, addr )DenverCust( custid, addr )

A. Place all 3 in DesMoinesB. Place SfCust in SF

NyCust in NY DenverCust in Denver

C. Place SfCust in SF & DesMoines NyCust in NY & Des Moines DenverCust in Denver & DesMoines

How would you decide?


Data FragmentationHorizontal Fragmentation•Each fragment is a subset of

rows•Rows do not overlap (else

doing partial replication)•Reconstruction by union•Updates may requires tuple

migration

Vertical Fragmentation•Each fragment is a subset of

columns•All fragments include

primary key columns or share ROWIDs

•Reconstruction by join•Updates do not require

tuple migration

Why would you choose one or another of these

approaches?


Rules for Data Fragmentation

Completeness All the data of the global relation must be mapped to the fragments

Reconstruction It must always be possible to reconstruct each global relation from its fragments

Disjointedness If fragments are disjoint, then decisions about replication of data can be made somewhat separately from decisions about fragmentation


Horizontal Fragmentation

Create:CREATE TABLE emp ( … ) PARTITION ( [email protected] WHERE deptno = 10, [email protected] WHERE deptno = 20, [email protected] WHERE deptno = 30, [email protected] OTHERWISE)

// invented syntax loosely based on OracleThe predicates defining all the fragments should be complete and mutually exclusive (or else there is replication)

Reconstruct:SELECT * FROM [email protected] UNIONSELECT * FROM [email protected] UNIONSELECT * FROM [email protected] UNIONSELECT * FROM [email protected]


Fragmentation TransparencySELECT ename, job FROM emp

WHERE sal > 50000

SELECT ename, job FROM [email protected] WHERE sal > 50000

UNIONSELECT ename, job FROM

[email protected] WHERE sal > 50000


[email protected] WHERE sal > 50000


[email protected] sal > 50000

Integrate decomposed queries via

union

Implementas


Fragmentation Transparency for Updates

UPDATE emp SET deptno = 30 WHERE empno = 6749;

// assumes you know deptno currently 20;// much more complicated otherwise

Implementing this update requirestuple migration

SELECT * INTO anEmpFROM [email protected] empno = 6749;

DELETE FROM [email protected] empno = 6749;

INSERT INTO [email protected] ( 6749, anEmp.ename, anEmp.job, anEmp.mgr, anEmp.hiredate, anEmp.sal, anEmp.comm, 30 );

Implementas


Vertical Fragmentation Create:CREATE TABLE emp ( empno int primary key, … )

PARTITION ( ename, job, mgr, deptno AS

[email protected], hiredate AS [email protected], sal, comm AS [email protected])

// invented syntax loosely based on OracleThe rows defining all the fragments should be complete and mutually exclusive. All automatically include the primary key empno to match up rows (or use some other mechanism to match ROWIDs)

Reconstruct:SELECT i.empno, i.job, i.mgr, h.hiredate,

a.sal, a.comm, i.deptnoFROM [email protected] i, NATURAL JOIN [email protected] h, NATURAL JOIN [email protected] a


Hybrid Fragmentation

CREATE TABLE emp ( empno int primary key, … ) PARTITION (

ename, job, mgr, deptno AS ( [email protected] where deptno = 10,[email protected] where deptno = 20,[email protected] where deptno = 30,[email protected] otherwise )

hiredate AS [email protected],

sal, comm AS [email protected])

// invented syntax loosely based on Oracle


Data Placement Revisited

Company HQ in Des MoinesWarehouses in SF, NY, Denver

Cust( custid, addr, whse )whse is 'SF', 'NY', or 'Denver'

A. Place Cust at Des MoinesA. Partition Cust by whse

SfCust@SF NyCust@NY DenverCust@Denver

C. Leave Cust at Des Moines and also partition as SfCust@SF, NyCust@NY & DenverCust@Denver

How would you decide?


Database Design ProblemHard Optimization Problem

(even w/o considering replication)– Fragmentation: How to fragment tables– Allocation/Placement:

Where to place tables and fragments

Relative to minimizing/maximizing some cost function - e.g.– minimize query response time– maximize throughput– must be approximate, since determining actual

query plan is a separate optimization problem

Subject to constraints - e.g.– Available storage, bandwidth, processing power,

…– Keep 90% of response time below X


Optimization Approach

Factors to ConsiderThe originating site(s) of queries/updates

Which attributes are accessed together

Which attributes & combinations of selection predicates are used from which sites, with which frequencies

Frequencies of updates that affect combinations of selection predicates

Data integration costs (costs of joins and unions for fragments) vs increase in parallelism

Costs of communication, concurrency control, security & integrity maintenance



Query processingBased on algorithms that analyze queries

and convert them into a series of data manipulation operations.

The problemDeciding a strategy for executing each

query over the network in the most cost effective way, however the cost is defined.

Main factorsI/O, CPU, Communication costsOpportunity for pipelining & parallel

operations


Distributed Query Example

Given tablesemp( empno, ename, deptno, sal, … )

at site S1 (largest)project( pno, pname, mgr, … )

at site S2dept( deptno, dname, loc )

at site S3 (smallest)

deptS3

projS2

empS1


Sub-Querying & Shipping

Queries are executed via a combination of computing queries and shipping data.

For example, suppose we want to execute a query to find out the name of each project, along with its project manager & the name of that manager's department

SELECT pname, ename, dnameFROM project p, emp e, dept dWHERE p.mgr = e.empno AND e.deptno = d.deptno


Alternative 1Ship dept & project to S1Process query at S1

Alternative 2Ship emp & project to S3Process query at S3

Consider Cost-based Alternatives

deptS3

projS2

empS1

deptS3proj

S2

empS1

Which one is better?


Alternative 1Ship dept & project to S1Process query at S1

Alternative 2Ship emp & project to S3Process query at S3

In general, alternative #1is better, because itinvolves shipping less information

But to really determine the best approach, you must consider– Communication costs to S1 vs S3

(what if slow line between S2 & S1)– Relative processing speeds and scheduling

algorithms at S1 vs S3– Size of result & location of coordinator

Evaluating Alternativesdept

S3proj

S2

empS1

deptS3proj

S2

empS1


Intermixing Querying & Shipping

Rather than shipping base tables and performing a single query, it may make sense to– do a query at one site

– ship the query results to another site

– do a query at that site joining the results received with data available at a that site

In general, a distributed query plan involves a (potentially lengthy) sequence of performing queries and shipping data (either base tables or query results)


Distributed Query Planning Example

For example, suppose we are only interested in projects, where the project manager makes more than 8000/month. For those projects, we want the name of the project, the name of the project manager & the name of that manager's department.

ProcessSELECT pname, ename, dname

FROM project p, emp e, dept dWHERE p.mgr = e.empno AND e.deptno = d.deptnoAND e.sal > 8000

deptS3

projS2

empS1

If there are not very many employees who make > 8000, what's the best plan for executing this query?


Restrict before Ship

At S1, COMPUTE emplet ASSELECT empno, ename FROM emp

WHERE sal > 8000

SHIP emplet & deptFROM S1 TO S2

AT S2, COMPUTESELECT ename, dname, pname

FROM emplet e, dept d, project pWHERE p.mgr = e.empnoAND e.deptno = d.deptno

dept

S3

proj

S2

empS1


Semijoins

A semijoin is

• a join between two (or more tables) where

• one of the tables is just used to restrict the result, but not provide any data

Example

List the names of employees whose departments are located in NY

SELECT e.empno FROM emp e, dept dWHERE e.deptno = d.deptno

All the result data comes from the emp table

The dept table is joined with emp, simply torestrict the tuples chosen from the emp table


Using Semijoins in Distributed Queries

Db

Sb

DaSa

1

2

1) Some data (generally the result of a query) is shipped from site Sa to site Sb

2) The shipped data is used in a semijoin with the data at Sb.This produces a subset of the data at Sb, restricted based on the data shipped from Sa

3) The result of the semijoin is shipped back to Sa, where it is combined with data already there

3

If S1 is the coordinator (where the results must end up), how can semijoins be used to produce a more efficient

solution to the project manager query?


Using SemijoinsAt S3, COMPUTE deptlet ASSELECT deptno, dnameFROM Dept

SHIP deptlet FROM S3 TO S1

At S1, COMPUTE emproj ASSELECT pmgr, pnameFROM project, emplWHERE pmgr = empno ORDER by pmgr

SHIP emproj FROM S2 TO S1

At S1, COMPUTE emplet ASSELECT empno, enameFROM empWHERE sal > 8000

At S1, COMPUTE empl AS SELECT empno FROM emplet

SHIP empl FROM S1 TO S2

deptS3

proj

S2

emp

S1

Shipping empl to S2 limits the tuples from proj to be sent back to S1

12

3

4

At S3, COMPUTE deptlet ASSELECT pname, ename, dnameFROM emplet e, deptlet d, emproj pWHERE e.deptno = d.deptno and e.empno = p.pmgr

1

2

3

4


Planning Alternatives

Result-Based or Stream-Based– Result-Based: A site waits until it receives

the entire result set shipped to it before it can use it in a query

– Stream-Based: A query at a site will use data streamed to it as it arrives from another site (also called pipelining)

Sequential or Parallel– Sequential: A site ships data to (or

requests data) from one other site at a time

– Parallel: A site can ship data to (or request data from) multiple sites in parallel


Streaming & Pipelining

AT S1, COMPUTE empdept ASSELECT empno, ename, dname FROM emp, deptWHERE emp.deptno = dept.deptnoAND sal > 8000ORDER BY empno

STREAM empdept FROM S1 TO S2

deptS3

proj

S2

empS1

When would this approach be useful?

At S3, COMPUTE deptlet ASSELECT deptno, dnameFROM dept

SHIP deptlet FROM S3 TO S1

AT S2, COMPUTESELECT p.pname, ed.ename, ed.dnameFROM project p, empdept edWHERE p.mgr = ed.empno


Parallelism & Streaming

AT S1, COMPUTESELECT pname, ename, dnameFROM emp e, eproj p, deptlet dWHERE e.deptno = d.deptno AND e.empno = p.pmgr

AS S2, COMPUTE eproj ASSELECT pmgr, pnameFROM project p, empl eWHERE e.empno = p.pmgr

STREAM eproj FROM S2 TO S1

AT S1, COMPUTE dempl ASSELECT DISTINCT deptnoFROM empWHERE sal > 8000

STREAM dempl FROM S1 TO S3

dept

S3

proj

S2

emp

S1

At S1, COMPUTE empl ASSELECT empnoFROM empWHERE sal > 8000ORDER BY empno

STREAM empl FROM S1 TO S2 AS S3, COMPUTE deptlet ASSELECT deptno, dnameFROM dept d, dempl eWHERE d.deptno = e.deptno

STREAM deptlet FROM S3 TO S1

Do in parallel


What's Best

Informally, we've talked about how query planning finds the best way to process the query, involving

• subqueries• shipping/streaming• parallel execution

But when we say "best", what do we actually mean?


Possible Query Plan Goals

Fastest complete result

Fastest first result

Minimize resource usageof specific resources

Combination of the above


Query Optimization

Build initial tree for query– Build tree reflecting relational algebra corresponding

to query– Modify tree to account for fragmentation (more

complex if distributed fragments overlap)– Incorporate simplest ship operations into tree for

accessing remote data

Perform global query optimization– Apply transformation operators that produce an

equivalent tree– Account for pipelining & parallelism as well– Use heuristic search algorithm (e.g. hill climbing,

simulated annealing, genetic algorithms) to find best distributed query plan considering replicas

– Use cost function incorporating time taken by I/O, CPU & communication (best if statistics on size of relations & result sets are maintained)


Global vs Local Query Optimization

Global Optimization produces – A set of decomposed queries to be sent

to various DB servers– Combined with ship/stream

instructions– All placed in a parallel/sequential

control flow graph

Local Optimization– Each local server determines best way

to execute each decomposed query sent to it (though global optimization may generate preliminary plans)

1 advanced database topics copyright © ellis cohen 2002-2005 distributed databases: organization...

Documents

distributed transactions

nondistributed system

distributed db system

single database

integrated distributed

replication transparency

copycopies of replicated

network independence