hash-based algorithms for operator load-balancing in database middleware systems

1
www.walsaip.uprm.edu W ALSAIP WALSAIP Supported By Hash-Based Algorithms For Operator Load-Balancing In Database Middleware Systems Angel L. Villalain-Garcia – M.S. Student Prof. Manuel Rodriguez – Advisor ADM Group, ECE Department, University of Puerto Rico, Email: [email protected] Mayagüez Campus Scientific applications are becoming highly dependant on Wide Area Network environments to access data and computing resources. Traditional database middleware systems have succeeded on integrating heterogeneous data sources but fail on scaling to WANs because they are focused on centralized architectures with static characteristics. Our goal is to design a distributed database middleware system that takes into consideration the dynamic characteristics of each host on the WAN to efficiently run queries on it. We call this system NetTraveler Our goal with NetTraveler is to design an efficient query execution environment for distributed and parallel query execution to improve response time based on the characteristics of the hosts on the WAN. Project Description 1 WAN Scenario 2 INTERNET Mobile D evices R elational D BM S (O racle,PostgreSQ L, etc) XM L R epositories O therD ata R epositories W ireless Sensors LAN Query Service Broker (QSB) Coordination of the Execution Process. •Interface exposed to the client •P2P behavior Information Gateway (IG) Interface for DB access Registration Server (RS) Coordinates federations and acts as a Catalog Manager. Data Synchronization Server (DSS) Acts as a data source when needed. Used for recovery of Used for recovery of queries. queries. Data Processing Server (DPS) Interface for grid services and sensors Oracle XM L Agenda Inform ation Gatew ay Inform ation Gatew ay Inform ation Gatew ay Q uery Service B roker R egistration Server DSS D ata Processing Server LAN W AN NetTraveler Architecture 3 Proposed Work 4 Traditional DBMS Plan Parser Semantic Analyzer Logical O ptimizer Physical Optim izer Q uery Plan Execution Centralized Query Optimizer Distributed and Parallel DBMS Plan Parser Semantic Analyzer Logical O ptimizer Physical Optimizer Q uery Plan Execution Physical Optimizer Q uery Plan Execution Decentralized Query Optimizer References 8 1. David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. “The gamma database machine project”. pages 609–626, 1994. 2. Sumit Ganguly, Waqar Hasan, and Ravi Krishnamurthy. Query optimization for parallel execution. In SIGMOD ’92: Proceedings of the 1992 ACM SIGMOD international conference on Management of data, pages 9–18, New York, NY, USA, 1992. ACM Press. 3. Wei Hong and Michael Stonebraker. Optimization of parallel query execution plans in xprs. In PDIS ’91: Proceedings of the first international conference on Parallel and distributed information systems, pages 218–225, Los Alamitos, CA, USA, 1991. IEEE Computer Society Press. 4. Manuel Rodriguez-Martinez, Nick Roussopoulos, “MOCHA: A Self-Extensible Database Middleware System for Distributed Data Sources”, In Proceeding of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, pp. 213-224, May 2000. 5. Waqar Hasan,Daniela Florescu, Patrick Valduriez, Open Issues in Parallel Query Optimization,SIGMOD Rec., Vol. 25 No. 3 pp. 171-179, September 1996. Methodology 5 Use of different Java technology for implementing our proposed architecture: •Web Services, Axis •Data access using JDBC, Hibernate Future Work & Results 7 •Study Scheduling Effect and improvements •Hash Join operators and functionalities •Second Phase Optimizer Conventional DBMS are usually modelled as a centralized architecture with several pipelined steps. One important step is the Query Optimizer (QO). A QO is the process that determines the needed actions to solve a query. Image below represents the most important aspects of a QO and its output. Due to several reasons as cited on [5] centralized QO fails to scale to WAN environment. With that in mind we proposed the idea of a decentralized QO that could create plans that exploits parallel and distributed execution on WAN environment (see image below). Both the query optimizer and the query parallelization step would need information of the available resources on each host. Our basic assumption is that data sources are replicated, we just need to select the better host for each step of the query execution. The result is explained on the next figure. Q QSB 1 QSB 2 QSB 4 QSB 3 IG 3 IG 2 DSS 1 Client IG 1 R Q 1 Q 2 Q 3 Q 1 Q 2 Q 3 R 1 R 1 R 2 R 3 R 2 R 3 R Project Status 6 Currently the system provides an environment for remote query execution and the orchestration of the several services to bring up the result for the client in the way explain by the graphic shown below. Q QSB 1 QSB 2 QSB 4 QSB 3 IG 3 IG 2 DSS 1 Client IG 1 R Q Q Q R R R R Actual systems support for parallel query execution of operations and static distribution of the load across participating sites.

Upload: paxton

Post on 06-Jan-2016

42 views

Category:

Documents


1 download

DESCRIPTION

Client. Client. DSS 1. DSS 1. Q. Q. R. R. R. R. QSB 1. QSB 1. Q 1. Q 3. R 2. Q 2. R 3. R 1. QSB 2. QSB 2. QSB 3. QSB 3. QSB 4. QSB 4. 6. Q. Q 1. Q. Q 2. Q 3. Q. Project Status. R. R 1. R 2. R. R 3. R. Centralized Query Optimizer. Traditional DBMS Plan. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hash-Based Algorithms For Operator Load-Balancing In Database Middleware Systems

www.walsaip.uprm.eduWALSAIP

WALSAIPSupported By

Hash-Based Algorithms For Operator Load-Balancing In Database Middleware Systems

Angel L. Villalain-Garcia – M.S. Student Prof. Manuel Rodriguez – AdvisorADM Group, ECE Department, University of Puerto Rico, Email: [email protected] Mayagüez Campus

Scientific applications are becoming highly dependant on Wide Area Network environments to access data and computing resources. Traditional database middleware systems have succeeded on integrating heterogeneous data sources but fail on scaling to WANs because they are focused on centralized architectures with static characteristics.

Our goal is to design a distributed database middleware system that takes into consideration the dynamic characteristics of each host on the WAN to efficiently run queries on it. We call this system NetTraveler

Our goal with NetTraveler is to design an efficient query execution environment for distributed and parallel query execution to improve response time based on the characteristics of the hosts on the WAN.

Project Description1

WAN Scenario2

INTERNET

Mobile Devices

Relational DBMS(Oracle, PostgreSQL,

etc)

XML Repositories

Other Data Repositories

Wireless Sensors

LAN

Query Service Broker (QSB)

Coordination of the Execution Process.•Interface exposed to the client•P2P behavior

Information Gateway (IG)

Interface for DB access

Registration Server (RS)

Coordinates federations and acts as a Catalog Manager.

Data Synchronization Server (DSS)

Acts as a data source when needed.•Used for recovery of queries.Used for recovery of queries.

Data Processing Server (DPS)

Interface for grid services and sensors

Oracle XML Agenda

InformationGateway

InformationGateway

InformationGateway

QueryServiceBroker

RegistrationServer

DSS

DataProcessing

Server

LAN

WAN

NetTraveler Architecture3

Proposed Work4

Traditional DBMS Plan

Parser

Semantic Analyzer

Logical Optimizer

Physical Optimizer

Query Plan Execution

Centralized Query Optimizer

Distributed and Parallel DBMS Plan

Parser

Semantic Analyzer

Logical Optimizer

Physical Optimizer

Query Plan Execution

Physical Optimizer

Query Plan Execution

Decentralized Query Optimizer

References8

1. David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. “The gamma database machine project”. pages 609–626, 1994.

2. Sumit Ganguly, Waqar Hasan, and Ravi Krishnamurthy. Query optimization for parallel execution. In SIGMOD ’92: Proceedings of the 1992 ACM SIGMOD international conference on Management of data, pages 9–18, New York, NY, USA, 1992. ACM Press.

3. Wei Hong and Michael Stonebraker. Optimization of parallel query execution plans in xprs. In PDIS ’91: Proceedings of the first international conference on Parallel and distributed information systems, pages 218–225, Los Alamitos, CA, USA, 1991. IEEE Computer Society Press.

4. Manuel Rodriguez-Martinez, Nick Roussopoulos, “MOCHA: A Self-Extensible Database Middleware System for Distributed Data Sources”, In Proceeding of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, pp. 213-224, May 2000.

5. Waqar Hasan,Daniela Florescu, Patrick Valduriez, Open Issues in Parallel Query Optimization,SIGMOD Rec., Vol. 25 No. 3 pp. 171-179, September 1996.

Methodology5Use of different Java technology for implementing our proposed architecture:

•Web Services, Axis•Data access using JDBC, Hibernate

Future Work & Results7•Study Scheduling Effect and improvements•Hash Join operators and functionalities•Second Phase Optimizer

Conventional DBMS are usually modelled as a centralized architecture with several pipelined steps. One important step is the Query Optimizer (QO). A QO is the process that determines the needed actions to solve a query. Image below represents the most important aspects of a QO and its output.

Due to several reasons as cited on [5] centralized QO fails to scale to WAN environment. With that in mind we proposed the idea of a decentralized QO that could create plans that exploits parallel and distributed execution on WAN environment (see image below).

Both the query optimizer and the query parallelization step would need information of the available resources on each host.

Our basic assumption is that data sources are replicated, we just need to select the better host for each step of the query execution. The result is explained on the next figure.

Q

QSB1

QSB2 QSB4QSB3

IG3IG2

DSS1

Client

IG1

R

Q1

Q2

Q3

Q1 Q2 Q3

R1

R1 R2 R3

R2R3

R

Project Status6Currently the system provides an environment for remote query execution and the orchestration of the several services to bring up the result for the client in the way explain by the graphic shown below.

Q

QSB1

QSB2 QSB4QSB3

IG3IG2

DSS1

Client

IG1

R

Q Q QR R R

R

Actual systems support for parallel query execution of operations and static distribution of the load across participating sites.