parallel databases pres
TRANSCRIPT
-
8/8/2019 Parallel Databases Pres
1/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Parallel Databases
-
8/8/2019 Parallel Databases Pres
2/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Introduction
A variety of hardware architectures allow multiple
computers to share and access to data. A parallel
database can allow access to a single database by
users on multiple machines, with increased
performance.
-
8/8/2019 Parallel Databases Pres
3/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Purpose
Databases are growing increasingly large
Large volumes of transaction data are collected and stored for
later analysis.
Multimedia objects like images are increasingly stored indatabases
-
8/8/2019 Parallel Databases Pres
4/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Large-scale parallel database
Large-scale parallel database systems increasingly
used for:
storing large volumes of data
providing high throughput for transaction processing
-
8/8/2019 Parallel Databases Pres
5/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Queries
Queries are expressed in high level language (SQL)
makes parallelization easier.
Different queries can be run in parallel with each other.Concurrency control takes care of conflicts.
Thus, databases naturally provide themselves to
parallelism.
-
8/8/2019 Parallel Databases Pres
6/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Parallelism in Databases
Data can be partitioned across multiple disks for
parallel I/O.
Individual relational operations
(e.g., sort, join, aggregation)
can be executed in parallel
database.
-
8/8/2019 Parallel Databases Pres
7/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Parallel Database Architectures
-
8/8/2019 Parallel Databases Pres
8/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Parallel Database Architectures
Shared memory
Shared disk
Shared nothing
Hierarchical
-
8/8/2019 Parallel Databases Pres
9/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Parallel Database Architectures
Shared memory -- processors share a common
memory (Extremely efficient communication between
processors).
Shared disk -- processors share a common disk
(Shared-disk systems can scale to a somewhat larger
number of processors, but communication between
processors is slower.)
-
8/8/2019 Parallel Databases Pres
10/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Shared nothing -- Each processor has exclusive
access to its main memory.
Hierarchical -- hybrid of the above architectures(Combines characteristics of shared-memory, shared-
disk, and shared-nothing architectures.)
-
8/8/2019 Parallel Databases Pres
11/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Parallel Level
A coarse-grain parallel machine
consists of a small number of
powerful processors
-
8/8/2019 Parallel Databases Pres
12/19
04/25/2005 Yan Huang - CSCI5330 DatabaseImplementation Parallel Database
Massively Parallel
A massively parallel orfine grain parallel machineutilizes thousands of smaller processors.
-
8/8/2019 Parallel Databases Pres
13/19
04/25/2005 Yan Huang - CSCI5330 Database
Implementation Parallel Database
Parallel System Performance Measure
Speedup: = small system elapsed time
large system elapsed time
Scaleup: = small system small problem elapsed time
big system big problem elapsed time
-
8/8/2019 Parallel Databases Pres
14/19
04/25/2005 Yan Huang - CSCI5330 Database
Implementation Parallel Database
Database Performance Measures
throughput --- the number of tasks that can becompleted in a given time interval.
response time --- the amount of time it takes to
complete a single task from the time it is submitted.
-
8/8/2019 Parallel Databases Pres
15/19
04/25/2005 Yan Huang - CSCI5330 Database
Implementation Parallel Database
Factors Limiting Speedup and Scaleup
Speedup and scaleup are often sublinear due to:
Startup costs: Cost of starting up multiple processes
may dominate computation time, if the degree of
parallelism is high.
Interference: Processes accessing shared resources(e.g.,system bus, disks, or locks) compete with each
other
Skew: Overall execution time determined by slowest of
parallely executing tasks.
-
8/8/2019 Parallel Databases Pres
16/19
04/25/2005 Yan Huang - CSCI5330 Database
Implementation Parallel Database
Parallel Database Issues
Data Partitioning
Parallel Query Processing
-
8/8/2019 Parallel Databases Pres
17/19
04/25/2005 Yan Huang - CSCI5330 Database
Implementation Parallel Database
I/O Parallelism
Horizontal partitioning tuples of a relation are dividedamong many disks such that each tuple resides on one
disk.
Partitioning techniques (number of disks = n):
Round-robin Hash partitioning
Range partitioning
-
8/8/2019 Parallel Databases Pres
18/19
04/25/2005 Yan Huang - CSCI5330 Database
Implementation Parallel Database
Typical Database Query Types
Sequential scan
Point query
Range query
-
8/8/2019 Parallel Databases Pres
19/19
04/25/2005 Yan Huang - CSCI5330 Database
Implementation Parallel Database
Comparison of Partitioning Techniques
Round
Robin
Hashing Range
Sequential
Scan
Best/good
parallelism
Good Good
Point Query Difficult Good for hash key Good for range
vector
Range Query Difficult Difficult Good for range
vector