query processing in distributed database

24
Distributed Query Optimization Bayu Adhi Tama, ST, MTI [email protected]

Upload: btama

Post on 26-Nov-2015

76 views

Category:

Documents


4 download

DESCRIPTION

Adapted from:Connolly, T and Carolyn Begg. Database Systems: A Practical Approach to Design, Implementation, and Management 4th Ed, Pearson Education Limited, 2005.

TRANSCRIPT

Page 1: Query Processing in Distributed Database

Distributed Query Optimization

Bayu Adhi Tama, ST, [email protected]

Page 2: Query Processing in Distributed Database

Distributed Query Optimization

Page 3: Query Processing in Distributed Database

Distributed Query Optimization

• Query decomposition: takes query expressed on global relations and performs partial optimization using centralized QO techniques. Output is some form of RAT based on global relations.

• Data localization: takes into account how data has been distributed. Replace global relations at leaves of RAT with their reconstruction algorithms.

Page 4: Query Processing in Distributed Database

Distributed Query Optimization

• Global optimization: uses statistical information to find a near-optimal execution plan. Output is execution strategy based on fragments with communication primitives added.

• Local optimization: Each local DBMS performs its own local optimization using centralized QO techniques.

Page 5: Query Processing in Distributed Database

Data Localization

• In QP, represent query as R.A.T. and, using transformation rules, restructure tree into equivalent form that improves processing.

• In DQP, need to consider data distribution. • Replace global relations at leaves of tree with

their reconstruction algorithms - RA operations that reconstruct global relations from fragments:– For horizontal fragmentation, reconstruction algorithm is

Union; – For vertical fragmentation, it is Join.

Page 6: Query Processing in Distributed Database

Data Localization

• Then use reduction techniques to generate simpler and optimized query.

• Consider reduction techniques for following types of fragmentation:– Primary horizontal fragmentation.– Vertical fragmentation.– Derived fragmentation.

Page 7: Query Processing in Distributed Database

Reduction for Primary Horizontal Fragmentation

• If selection predicate contradicts definition of fragment, this produces empty intermediate relation and operations can be eliminated.

• For join, commute join with union.• Then examine each individual join to

determine whether there are any useless joins that can be eliminated from result.

• A useless join exists if fragment predicates do not overlap.

Page 8: Query Processing in Distributed Database

Example 23.2 Reduction for PHF

SELECT *FROM Branch b, PropertyForRent pWHERE b.branchNo = p.branchNo AND p.type = ‘Flat’;

P1: branchNo=‘B003’ type=‘House’ (PropertyForRent)

P2:branchNo=‘B003’ type=‘Flat’ (PropertyForRent)

P3:branchNo!=‘B003’ (PropertyForRent)

B1: branchNo=‘B003’ (Branch)

B2: branchNo!=‘B003’ (Branch)

Page 9: Query Processing in Distributed Database

Example 23.2 Reduction for PHF

Page 10: Query Processing in Distributed Database

Example 23.2 Reduction for PHF

Page 11: Query Processing in Distributed Database

Example 23.2 Reduction for PHF

Page 12: Query Processing in Distributed Database

Reduction for Vertical Fragmentation

• Reduction for vertical fragmentation involves removing those vertical fragments that have no attributes in common with projection attributes, except the key of the relation.

Page 13: Query Processing in Distributed Database

Example 23.3 Reduction for Vertical Fragmentation

SELECT fName, lNameFROM Staff;

S1: staffNo, position, sex, DOB, salary(Staff)

S2: staffNo, fName, lName, branchNo (Staff)

Page 14: Query Processing in Distributed Database

Example 23.3 Reduction for Vertical Fragmentation

Page 15: Query Processing in Distributed Database

Reduction for Derived Fragmentation

• Use transformation rule that allows join and union to be commuted.

• Using knowledge that fragmentation for one relation is based on the other and, in commuting, some of the partial joins should be redundant.

Page 16: Query Processing in Distributed Database

Example 23.4 Reduction for Derived Fragmentation

SELECT *FROM Branch b, Client cWHERE b.branchNo = c.branchNo AND

b.branchNo = ‘B003’;

B1 = branchNo=‘B003’ (Branch)B2 = branchNo!=‘B003’ (Branch)Ci = Client branchNo Bi i = 1, 2

Page 17: Query Processing in Distributed Database

Example 23.4 Reduction for Derived Fragmentation

Page 18: Query Processing in Distributed Database

Global Optimization

• Objective of this layer is to take the reduced query plan for the data localization layer and find a near-optimal execution strategy.

• In distributed environment, speed of network has to be considered when comparing strategies.

• If know topology is that of WAN, could ignore all costs other than network costs.

• LAN typically much faster than WAN, but still slower than disk access.

Page 19: Query Processing in Distributed Database

Oracle’s DDBMS Functionality

• Oracle does not support type of fragmentation discussed previously, although DBA can distribute data to achieve similar effect.

• Thus, fragmentation transparency is not supported although location transparency is.

• Discuss:– connectivity– global database names and database links– transactions– referential integrity– heterogeneous distributed databases– Distributed QO.

Page 20: Query Processing in Distributed Database

Database Links

• Used to build distributed databases.• Defines a communication path from one Oracle

database to another (possibly non-Oracle) database.

• Acts as a type of remote login to remote database.

CREATE PUBLIC DATABASE LINKRENTALS.GLASGOW.NORTH.COM;

SELECT * FROM [email protected];UPDATE [email protected] salary = salary*1.05;

Page 21: Query Processing in Distributed Database

Heterogeneous Distributed Databases

• Here one of the local DBMSs is not Oracle.• Oracle Heterogeneous Services and a non-

Oracle system-specific agent can hide distribution and heterogeneity.

• Can be accessed through:– transparent gateways– generic connectivity.

Page 22: Query Processing in Distributed Database

Transparent Gateways

Page 23: Query Processing in Distributed Database

Generic Connectivity

Page 24: Query Processing in Distributed Database

Oracle Distributed Query Optimization

• A distributed query is decomposed by the local Oracle DBMS into a number of remote queries, which are sent to remote DBMS for execution.

• Remote DBMSs execute queries and send results back to local node.

• Local node then performs any necessary postprocessing and returns results to user.

• Only necessary data from remote tables are extracted, thereby reducing amount of data that needs to be transferred.