database ppt

45
Parallel and Distributed Databases • CS263 Lecture 16

Upload: sandeep-dubey

Post on 24-Apr-2015

2.170 views

Category:

Education


8 download

DESCRIPTION

Database Presentation

TRANSCRIPT

Page 1: Database ppt

Parallel and Distributed Databases

• CS263 Lecture 16

Page 2: Database ppt
Page 3: Database ppt

LECTURE PLAN

Parallel DBMS - What and Why?

What is a Client/Server DBMS?

Why do we need Distributed DBMSs?

Date’s rules for a Distributed DBMS

Benefits of a Distributed DBMS

Issues associated with a Distributed DBMS

Disadvantages of a Distributed DBMS

Page 4: Database ppt

PARALLEL DATABASE SYSTEM

Page 5: Database ppt

PARALLEL DBMSsWHY DO WE NEED THEM?

• More and More Data!

We have databases that hold a high amount of data, in the order of 1012 bytes:

10,000,000,000,000 bytes!

• Faster and Faster Access!

We have data applications that need to process data at very high speeds:

10,000s transactions per second!

SINGLE-PROCESSOR DBMS AREN’T UP TO THE JOB!

Page 6: Database ppt

Improves Response Time.

INTERQUERY PARALLELISM

It is possible to process a number of transactions in parallel with each other.

Improves Throughput.

INTRAQUERY PARALLELISM

It is possible to process ‘sub-tasks’ of a transaction in parallel with each other.

PARALLEL DBMSsBENEFITS OF A PARALLEL DBMS

Page 7: Database ppt

Speed-Up.

As you multiply resources by a certain factor, the time taken to execute a transaction should be reduced by the same factor:

10 seconds to scan a DB of 10,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs

PARALLEL DBMSsHOW TO MEASURE THE BENEFITS

Scale-up.

As you multiply resources the size of a task that can be executed in a given time should be increased by the same factor.

1 second to scan a DB of 1,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs

Page 8: Database ppt

Sub-linear speed-up

Linear speed-up (ideal)

Number of CPUs

Nu

mb

er o

f tr

ansa

ctio

ns/

seco

nd

1000/Sec

5 CPUs

2000/Sec

10 CPUs 16 CPUs

1600/Sec

PARALLEL DBMSsSPEED-UP

Page 9: Database ppt

10 CPUs2 GB Database

Number of CPUs, Database size

Nu

mb

er o

f tr

ansa

ctio

ns/

seco

nd

Linear scale-up (ideal)

Sub-linear scale-up

1000/Sec

5 CPUs1 GB Database

900/Sec

PARALLEL DBMSsSCALE-UP

Page 10: Database ppt

MEMORYCPU

CPU

CPU

CPU

CPU

CPU

Shared Memory – Parallel Database Architecture

Page 11: Database ppt

CPU

CPU

CPU

CPU

CPU

CPU

Shared Disk – Parallel Database Architecture

M

M

M

M

M

M

Page 12: Database ppt

Shared Nothing – Parallel Database Architecture

CPUM

CPUM

CPUM

CPU M

CPU M

Page 13: Database ppt

MAINFRAME DATABASE SYSTEM

Page 14: Database ppt

DUMB

DUMB

DUMB

SP

EC

IAL

ISE

D N

ET

WO

RK

CO

NN

EC

TIO

NTERMINALSMAINFRAME COMPUTER

PRESENTATION LOGICBUSINESS LOGICDATA LOGIC

Page 15: Database ppt

CLIENT/SERVER DATABASE SYSTEM

Page 16: Database ppt

CLIENT/SERVER DBMS

Manages user interface

Accepts user data

Processes application/business logic

Generates database requests (SQL)

Transmits database requests to server

Receives results from server

Formats results according to application logic

Present results to the user

CLIENT PROCESS

Page 17: Database ppt

CLIENT/SERVER DBMS

Accepts database requests

Processes database requests

Performs integrity checks

Handles concurrent access

Optimises queries

Performs security checks

Enacts recovery routines

Transmits result of database request to client

SERVER PROCESS

Page 18: Database ppt

Data Request Data Response

CLIENT/SERVERCLIENT/SERVERDBMS ARCHITECTUREDBMS ARCHITECTURE

CLIENT#1

CLIENT#2

CLIENT#3

PRESENTATION LOGIC

BUSINESS LOGIC

DATA LOGIC

(FAT CLIENT)

D/BASE

SERVER

Page 19: Database ppt

D/BASE

SERVER

Data Request Data Response

CLIENT/SERVERCLIENT/SERVERDBMS ARCHITECTUREDBMS ARCHITECTURE

CLIENT#1

CLIENT#2

CLIENT#3

PRESENTATION LOGIC

BUSINESS LOGICDATA LOGIC

(THIN CLIENT)

PL

/SQ

L

Page 20: Database ppt

LAN

CLIENT

CLIENT

LAN

CLIENT CLIENT

CLIENT CLIENT

LAN

CLIENT

CLIENT

LAN

CLIENT

Leyton

CLIENT

CLIENT CLIENT

Stratford

DB

MS

WID

E A

RE

A N

ET

WO

RK

Barking Leytonstone

DISTRIBUTED PROCESSING ARCHITECTUREDISTRIBUTED PROCESSING ARCHITECTURE

CLIENT

CLIENT

CLIENT

CLIENT

Page 21: Database ppt

DISTRIBUTED DATABASE SYSTEM

Page 22: Database ppt

A distributed database system is a collection of logically related databases that co-operate in a transparent manner.

Transparent implies that each user within the system may access all of the data within all of the databases as if they were a single database

There should be ‘location independence’ i.e.- as the user is unaware of where the data is located it is possible to move the data from one physical location to another without affecting the user.

DISTRIBUTED DATABASESWHAT IS A DISTRIBUTED DATABASE?

Page 23: Database ppt

WID

E A

RE

A N

ET

WO

RK

LAN

CLIENT CLIENT

CLIENT CLIENT

DB

MS

DISTRIBUTED DATABASE ARCHITECTUREDISTRIBUTED DATABASE ARCHITECTURE

LAN

CLIENT CLIENT

CLIENT CLIENT

DB

MS

Leytonstone

CLIENT CLIENT

CLIENT

DB

MS

Stratford

CLIENT

CLIENT CLIENT

CLIENT

DB

MS

Barking

CLIENT

CLIENT

CLIENT

Leyton

Page 24: Database ppt

D/BASE

SERVER #1CLIENT

#1

D/BASE

SERVER #2

CLIENT#2

CLIENT#3

M:N CLIENT/SERVER DBMS ARCHITECTUREM:N CLIENT/SERVER DBMS ARCHITECTURE

NOT TRANSPARENT!NOT TRANSPARENT!

Page 25: Database ppt

DB Computer Network

Site 2

Site 1

GSC

DDBMS

DC LDBMS

GSC

DDBMS

DC

LDBMS = Local DBMS DC = Data Communications GSC = Global Systems Catalog DDBMS = Distributed DBMS

COMPONENTS OF A DDBMS

Page 26: Database ppt

• Reduced Communication Overhead

Most data access is local, less expensive and performs better.

• Improved Processing Power

Instead of one server handling the full database, we now have a collection of machines handling the same database.

• Removal of Reliance on a Central Site

If a server fails, then the only part of the system that is affected is the relevant local site. The rest of the system remains functional and available.

DISTRIBUTED DATABASESADVANTAGES

Page 27: Database ppt

• Expandability

It is easier to accommodate increasing the size of the global (logical) database.

• Local autonomy

The database is brought nearer to its users. This can effect a cultural change as it allows potentially greater control over local data .

DISTRIBUTED DATABASESADVANTAGES

Page 28: Database ppt

A distributed system looks exactly like a non-distributed system to the user!

– Local autonomy– No reliance on a central site– Continuous operation– Location independence– Fragmentation independence– Replication independence– Distributed query independence– Distributed transaction processing– Hardware independence– Operating system independence– Network independence– Database independence

DISTRIBUTED DATABASESDATE’S TWELVE RULES FOR A DDBMS

Page 29: Database ppt

Data Allocation

Data Fragmentation

Distributed Catalogue Management

Distributed Transactions

Distributed Queries – (see chapter 20)

DISTRIBUTED DATABASESISSUES

Page 30: Database ppt

Locality of reference Is the data near to the sites that need it?

Reliability and availability Does the strategy improve fault tolerance and accessibility?

Performance Does the strategy result in bottlenecks or under-utilisation of resources?

Storage costs How does the strategy effect the availability and cost of data storage?

Communication costs How much network traffic will result from the strategy?

DISTRIBUTED DATABASESDATA ALLOCATION METRICS

Page 31: Database ppt

CENTRALISED

DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES

Locality of Reference

Reliability/Availability

Storage Costs

Performance

Communication Costs

Lowest

Lowest

Lowest

Unsatisfactory

Highest

Page 32: Database ppt

PARTITIONED/FRAGMENTED

DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES

Locality of Reference

Reliability/Availability

Storage Costs

Performance

Communication Costs

High

Low (item) – High (system)

Lowest

Satisfactory

Low

Page 33: Database ppt

COMPLETE REPLICATION

DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES

Locality of Reference

Reliability/Availability

Storage Costs

Performance

Communication Costs

Highest

Highest

Highest

High

High (update) – Low (read)

Page 34: Database ppt

SELECTIVE REPLICATION

DISTRIBUTED DATABASESDATA ALLOCATION STRATEGIES

Locality of Reference

Reliability/Availability

Storage Costs

Performance

Communication Costs

High

Average

Satisfactory

Low

Low (item) – High (system)

Page 35: Database ppt

Usage Applications are usually interested in ‘views’ not whole relations.

Efficiency It’s more efficient if data is close to where it is frequently used.

Parallelism It is possible to run several ‘sub-queries’ in tandem.

Security Data not required by local applications is not stored at the local site.

DISTRIBUTED DATABASESWHY FRAGMENT DATA?

Page 36: Database ppt

DISTRIBUTED DATABASESHORIZONTAL DATA FRAGMENTATION

333.00STRATFORDKHAN456

500.00BARKINGONO400

340.14BARKINGGREEN350

23.17STRATFORDSMITH345

200.00BARKINGGRAY324

1000.00STRATFORDJONES200

BALANCEBRANCHCUSTOMERACCOUNT

Horizontal Fragmentation: Consists of a Restriction on a Relation.

e.g., ( branch = ‘Stratford’ Account)

Page 37: Database ppt

DISTRIBUTED DATABASESHORIZONTAL DATA FRAGMENTATION

STRATFORD

STRATFORD

STRATFORD

333.00KHAN456

23.17SMITH345

1000.00JONES200

BALANCEBRANCHCUSTOMERACCT NO.

BARKINGBARKING

BARKING

500.00ONO400340.14GREEN350

200.00GRAY324

BALANCEBRANCHCUSTOMERACCT NO.

STRATFORD BRANCH

BARKING BRANCH

Page 38: Database ppt

DISTRIBUTED DATABASESVERTICAL DATA FRAGMENTATION

KJTR78KHA456T0208-500-5821STRATFORDKHAN456

ZZEE56GRA324S0208-545-7528BARKINGGRAY324

XXYY22JON200T0208-500-9000STRATFORDJONES200

PASSWORDLOGINPHONE NOSITENAMES#

Vertical Fragmentation: Consists of a Projection on a Relation.

e.g., ( S#, NAME, SITE, PHONE NO Student)

Page 39: Database ppt

DISTRIBUTED DATABASESVERTICAL DATA FRAGMENTATION

STRATFORD

BARKING

STRATFORD

KHAN456

GRAY3240208-500-5821

0208-545-7528

0208-500-9000JONES200

PHONE NO.SITENAMES#

KJTR78ZZEE56

XXYY22

KHA456T456GRA324S324

JON200T200

PASSWORDLOGIN-IDS#

STUDENT ADMINISTRATION

NETWORK ADMINISTRATION

Page 40: Database ppt

DISTRIBUTED DATABASESDISTRIBUTED CATALOG MANAGEMENT

• Centralised Global Catalog

One site maintains the full global catalog. All changes to any local system catalog have to be propagated to the site maintaining the global catalog. Bad performance, single point of failure, compromises site autonomy.

• Dispersed Catalog

There is no physical global catalog. Each time a remote data item is required, the catalogues from ALL other sites are examined for the item. This has severe performance penalties.

Page 41: Database ppt

DISTRIBUTED DATABASESDISTRIBUTED CATALOG MANAGEMENT

• Replicated Global Catalog

Each site maintains its own global catalog. Although this greatly speeds up remote data location, it is very inefficient to maintain. A detail of every data item added, changed or deleted locally has to be propagated to ALL other sites .

• Local-Master Catalog

Each site maintains both its local system catalog as well as a catalog of all of its data items that are replicated at other sites. This avoids compromising site autonomy, is fairly efficient, and is not a single point of failure.

Page 42: Database ppt

AT

OM

IC D

IST

RIB

UT

ED

TR

AN

SA

CT

ION

DISTRIBUTED DATABASESDISTRIBUTED TRANSACTIONS

Stratford DB

Barking DB

Leyton DB

StratfordDBMS

StratfordClient

StratfordClient

StratfordClient

BarkingDBMS

LeytonDBMS

Global Transaction

(a) Debit Stratford A/C £500(b) Credit Barking A/C £350(c) Credit Leyton A/C £150

(a)

(b)

(c)

Page 43: Database ppt

TWO-PHASE COMMIT (2PC) - OK

Page 44: Database ppt

TWO-PHASE COMMIT (2PC) - ABORT

‘Global Abort’

Page 45: Database ppt

Architectural complexity.

Cost.

Security.

Integrity control more difficult.

Lack of standards.

Lack of experience.

Database design more complex.

DISTRIBUTED DATABASESDISADVANTAGES OF DDBMSs