cse441w database systems - dr. ali r....
TRANSCRIPT
AR Hurson323 CS Buildinghursonmstedu
Query Processing and Query Optimization in
Centralized Database Systems
Query processing is defined as the activitiesinvolved in parsing validation translationoptimization and execution of a query
The aims of query processing process are totransform a query written in a high-levellanguage SQL into a correct and efficientexecution strategy expressed in low-levellanguage and to execute the strategy togenerate the result
Database Systems
6
3
Query processingA query processing involves three stepsParsing validation and TranslationOptimizationEvaluation (execution)
Database Systems
4
Query processingQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Database Systems
5
Query processing An Example
Select balanceFrom accountWhere balance lt 2500
Database Systems
6
Query processing An Exampleσbalance lt2500 (Πbalance (account) )
orΠbalance(σbalance lt2500 (account) )
Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution
Database Systems
7
Query processing An ExampleFactors such as number of accesses to the disks
and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the
number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query processing is defined as the activitiesinvolved in parsing validation translationoptimization and execution of a query
The aims of query processing process are totransform a query written in a high-levellanguage SQL into a correct and efficientexecution strategy expressed in low-levellanguage and to execute the strategy togenerate the result
Database Systems
6
3
Query processingA query processing involves three stepsParsing validation and TranslationOptimizationEvaluation (execution)
Database Systems
4
Query processingQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Database Systems
5
Query processing An Example
Select balanceFrom accountWhere balance lt 2500
Database Systems
6
Query processing An Exampleσbalance lt2500 (Πbalance (account) )
orΠbalance(σbalance lt2500 (account) )
Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution
Database Systems
7
Query processing An ExampleFactors such as number of accesses to the disks
and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the
number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
3
Query processingA query processing involves three stepsParsing validation and TranslationOptimizationEvaluation (execution)
Database Systems
4
Query processingQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Database Systems
5
Query processing An Example
Select balanceFrom accountWhere balance lt 2500
Database Systems
6
Query processing An Exampleσbalance lt2500 (Πbalance (account) )
orΠbalance(σbalance lt2500 (account) )
Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution
Database Systems
7
Query processing An ExampleFactors such as number of accesses to the disks
and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the
number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
4
Query processingQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Database Systems
5
Query processing An Example
Select balanceFrom accountWhere balance lt 2500
Database Systems
6
Query processing An Exampleσbalance lt2500 (Πbalance (account) )
orΠbalance(σbalance lt2500 (account) )
Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution
Database Systems
7
Query processing An ExampleFactors such as number of accesses to the disks
and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the
number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
5
Query processing An Example
Select balanceFrom accountWhere balance lt 2500
Database Systems
6
Query processing An Exampleσbalance lt2500 (Πbalance (account) )
orΠbalance(σbalance lt2500 (account) )
Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution
Database Systems
7
Query processing An ExampleFactors such as number of accesses to the disks
and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the
number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
6
Query processing An Exampleσbalance lt2500 (Πbalance (account) )
orΠbalance(σbalance lt2500 (account) )
Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution
Database Systems
7
Query processing An ExampleFactors such as number of accesses to the disks
and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the
number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
7
Query processing An ExampleFactors such as number of accesses to the disks
and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the
number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query processing An ExampleTo simplify the cost estimation we can assume
that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to
distinguish the difference between sequentialIO and random IO as well
Database Systems
8
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query processing An ExampleOne also needs to distinguish between the
number of data blocks being read and writtenTechniques such as pipelining and parallelism
if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to
execute basic operations
Database Systems
9
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
10
Query processing An Example
Account
Πbalance
σbalance lt2500
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two
fashion Static or dynamic
Database Systems
15
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing
One option is to dynamically carry out thedecomposition and optimization every time thequery is run
Alternative is static query optimization wherethe query is parsed validated and optimizedonce
16
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
13
Query OptimizationIn general optimization is required in such a
system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra
that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
14
Query OptimizationThe overall goal of an optimization is to choose
an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a
human programmer since
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
15
Query OptimizationAn optimizer will have a wealth of information
available to it that human programmers typicallydo not haveIf the data base statistics changes drastically
then an optimizer may choose a differentstrategyOptimizer can potentially considers several
strategies for a given requestOptimizer is written by an expert
Database SystemsQuery Parser amp
TranslatorInternal
Representation
ExecutionPlan
QueryOutput
Optimizer
Statisticsabout data
ExecutionEngine
DATA BASE
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Running Example
16
Database Systems
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPARTMENT
Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation
DEPT_Location
Pname Pnumber Plocation Dnum
PROJECTEssn Pno Hours
WORKS_ON
DEPENDENTEssn Dependent_name Sex Bdate Relationship
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working
on a project named ldquoAquariusrdquo
SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT
WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo
Database Systems
17
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
Database Systems
18
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash An Example
Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations
on base relations before performing the Cartesianproducts
Database Systems
19
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
Database Systems
20
ΠLname
Works_on
times
Employee
σBdate gt lsquo1957-12-31rsquo
timesσEssn = Ssn
Project
σPname = lsquoAquariusrsquo
σPnumber = Pnno
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations
Database Systems
21
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
Database Systems
22
ΠLname
times
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
σEssn = Ssn
Works_on
times
σPnumber = Pno
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation
Database Systems
23
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
Database Systems
24
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquoWorks_on
Pnumber = Pno
Essn = Ssn
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery
Database Systems
25
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
Query Optimization mdash Running Example
Database Systems
26
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
ΠLname
Employee Works_on
times Project
times
σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo
ΠLname
Employee
σBdate gt lsquo1957-12-31rsquo
Project
σPname = lsquoAquariusrsquo
Pnumber = Pno
Essn = Ssn
ΠEssnLnameΠSsn
Works_on
ΠEssnPnoΠPnumber
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
31
Database SystemsSystem CatalogQuery
Decomposition
Query Optimization Database Statics
Code Generation
Runtime Execution
Result
Database
Relational AlgebraExpression
Execution Plan
Query
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
28
Query Optimization mdash A Simple Example
S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull
SP
Database Systems
29
Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2
SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo
Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2
Database Systems
Query Optimization mdash A Simple Example
Database Systems
30
S SP
times
σ(SS = SPS and SPP = lsquoP2rsquo)
ΠSname
31
Query Optimization mdash A Simple Example
S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull
A SS=SPS B
Database Systems
Query Optimization mdash A Simple Example
Database Systems
30
S SP
times
σ(SS = SPS and SPP = lsquoP2rsquo)
ΠSname
31
Query Optimization mdash A Simple Example
S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull
A SS=SPS B
Database Systems
31
Query Optimization mdash A Simple Example
S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull
A SS=SPS B
Database Systems
32
Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will
generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by
WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to
produce the final result
Database Systems
33
Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will
involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation
over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname
Database Systems
Query Optimization mdash A Simple Example
SP
σ (SPP = lsquoP2rsquo)
Database Systems
SS = SPS
S
ΠSname
35
Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance
measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples
So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic
Database Systems
36
Optimization ProcessCast the query into some internal representation
mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra
Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))
Database Systems
37
Optimization Process
S SP
Join (SS = SPS)
Restrict (SpP = lsquoP2rsquo)
Project (Sname)
Result
Database Systems
38
Optimization ProcessConvert the result of the previous step into a
canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example
Database Systems
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
33
Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will
involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation
over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname
Database Systems
Query Optimization mdash A Simple Example
SP
σ (SPP = lsquoP2rsquo)
Database Systems
SS = SPS
S
ΠSname
35
Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance
measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples
So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic
Database Systems
36
Optimization ProcessCast the query into some internal representation
mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra
Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))
Database Systems
37
Optimization Process
S SP
Join (SS = SPS)
Restrict (SpP = lsquoP2rsquo)
Project (Sname)
Result
Database Systems
38
Optimization ProcessConvert the result of the previous step into a
canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example
Database Systems
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash A Simple Example
SP
σ (SPP = lsquoP2rsquo)
Database Systems
SS = SPS
S
ΠSname
35
Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance
measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples
So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic
Database Systems
36
Optimization ProcessCast the query into some internal representation
mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra
Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))
Database Systems
37
Optimization Process
S SP
Join (SS = SPS)
Restrict (SpP = lsquoP2rsquo)
Project (Sname)
Result
Database Systems
38
Optimization ProcessConvert the result of the previous step into a
canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example
Database Systems
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
35
Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance
measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples
So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic
Database Systems
36
Optimization ProcessCast the query into some internal representation
mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra
Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))
Database Systems
37
Optimization Process
S SP
Join (SS = SPS)
Restrict (SpP = lsquoP2rsquo)
Project (Sname)
Result
Database Systems
38
Optimization ProcessConvert the result of the previous step into a
canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example
Database Systems
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
36
Optimization ProcessCast the query into some internal representation
mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra
Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))
Database Systems
37
Optimization Process
S SP
Join (SS = SPS)
Restrict (SpP = lsquoP2rsquo)
Project (Sname)
Result
Database Systems
38
Optimization ProcessConvert the result of the previous step into a
canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example
Database Systems
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
37
Optimization Process
S SP
Join (SS = SPS)
Restrict (SpP = lsquoP2rsquo)
Project (Sname)
Result
Database Systems
38
Optimization ProcessConvert the result of the previous step into a
canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example
Database Systems
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
38
Optimization ProcessConvert the result of the previous step into a
canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example
Database Systems
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
39
Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))
(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))
Database Systems
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
40
Optimization ProcessGeneral rule It is a good idea to perform
the restriction before the join becauseIt reduces the size of the input to the join
operationIt reduces the size of the output from the join
Database Systems
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
41
Optimization Process
WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)
Database Systems
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
42
Optimization ProcessGeneral rule Transform restriction condition
into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form
evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel
Database Systems
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
43
Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2
Database Systems
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
44
Optimization ProcessGeneral rule A sequence of restrictions can be
combined into a single restriction
Database Systems
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
45
Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]
Database Systems
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Optimization ProcessGeneral rule A sequence of projections can be
transferred into a single projection
46
Database Systems
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
47
Optimization ProcessGeneral rule A restriction and projection can
be converted into a projection and restriction
Database Systems
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
48
Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least
one part(SP Join P) [S]
However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to
SP [S]
Database Systems
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
49
Optimization ProcessAn equivalence rule says that expressions in different
forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression
Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics
Database Systems
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
50
Optimization ProcessRule 1 Conjunctive selection operations
(cascade of selections) can be deconstructedinto a sequence of individual selections
σθ1andθ2(E) = σθ1(σθ2(E))
Database Systems
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
51
Optimization ProcessRule 2 Selection operation is commutative
σθ1(σθ2(E)) = σθ2(σθ1(E))
Database Systems
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
52
Optimization ProcessRule 3 A sequence of projections is the
same as the last projection operation(cascade of projections)
ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)
Database Systems
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
53
Optimization ProcessRule 4 A combination of selection and
Cartesian product operations isequivalent to theta join operation
This can be extended toσθ (E1 X E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2
Database Systems
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
54
Optimization ProcessRule 5 Theta join operation is
commutative
E1 θ E2 = E2 θ E1 θ
E1 E2
θ
E2 E1
Database Systems
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
55
Optimization ProcessRule 6 Natural join is associative
(E1 E2) E3 = E1 (E2 E3)
E1 E2
E3
E3E2
E1
Database Systems
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
56
Optimization ProcessRule 7 Theta join is associative in the
following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)
Where θ2 involves attributes from only E2 and E3
Database Systems
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
DefinitionSelectivity is defined as the ratio of the number of
tuples that satisfy the equality condition to thecardinality of the relation
119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905
|119904119904(119877119877)|Selectivity is used to estimate size of intermediate
relation and hence number of accesses
Database Systems
57
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization
Database Systems
58
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Selectivity on key attribute and search onequality then
119904119904 =1
|119904119904(119877119877)
Database Systems
59
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Selectivity on an attribute with i distinctvalues is
119904119904 = |119904119904(119877119877)
119904119904|119904119904(119877119877)
Hence the number of tuples that satisfy anequality search is
1119894119894
|r(R)|
Database Systems
60
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
61
Optimization ProcessRule 8 Selection operation distribute
over the theta join under the followingconditionsWhen all attributes in selection condition θ0
involve only the attributes of one relation (E1in this case)
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
Database Systems
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
62
Optimization ProcessRule 8
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
σθ0
θ
E1 E2
θ
σθ0 E2
E1
Database Systems
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
63
Optimization ProcessRule 9 The projection operation
distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in
L1 cup L2
ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
Database Systems
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
64
Optimization ProcessRule 10 Set union and set intersection
operations are commutative
Note set difference is not commutative
(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)
Database Systems
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
65
Optimization ProcessRule 11 Set union and set intersection
operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)
(E1 cap E2) cap E3 = E1 cap (E2 cap E3)
Database Systems
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
66
Optimization ProcessRule 12 Selection operation distributes over
the set union set intersection and set differenceoperations
σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)
Database Systems
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
67
Optimization ProcessRule 12
σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)
Database Systems
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
68
Optimization ProcessRule 12
σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)
Database Systems
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
69
Optimization ProcessRule 13 Projection operation distributes over
the set union set intersection and setdifference operations
ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)
Database Systems
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
70
Optimization ProcessChoose candidate low-level procedure mdash After
transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce
IO cost andphysical clustering of records To reduce IO cost hellip
comes into play
Database Systems
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
71
Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent
representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution
strategy is a plan for accessing the data executingthe query and storing the intermediate results
Database Systems
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
72
Optimization ProcessGenerate query plans mdash The final stage of
optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a
method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip
Database Systems
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
73
Optimization ProcessThere are two main techniques for query
optimizationHeuristic rulesSystematic estimation approach
In this course as noted before we will talkabout the heuristic rules
Database Systems
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
74
Optimization Process heuristic rules
Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier
than projections
Database Systems
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
75
Optimization Process heuristic rules
Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution
Database Systems
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
DefinitionMaterialized evaluation Generation of
intermediate result (relation)Pipeline evaluation Combining several
operations
76
Database Systems
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Assume we want to perform
77
Πa1 a2 (r s)
We can perform the join operation materialize the resultant and then apply projection
Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing
Database Systems
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)
Further assume the following querySELECT SSname
FROM R SWHERE RSid = SSid
AND Rbid = 100 AND Srating gt 5
Database Systems
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
Database Systems
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
Database Systems
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous
query is going to be executed
Database Systems
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
σbid = 100 and rating gt 5
Sid = Sid
R S
ΠSname
On the fly
On the fly
σrating gt 5
Sid = Sid
R S
ΠSname
σbid = 100
On the fly
Database Systems
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Cost of PlanThe cost associated with each plan needs to be
estimated This will be accomplished byestimating the cost of each operation
Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration
Database Systems
83
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search
space
84
Database Systems
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
85
Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file
and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the
selection condition involves an equality comparisonon a key attribute on which the file is ordered
Database Systems
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
86
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a
single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)
σSSN = 123456789(EMPLOYEE)
Database Systems
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
87
Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve
multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)
σDNUMBER gt 5(DEPARTMENT)
Database Systems
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
88
Query Optimization mdash Search methods for Selection
Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)
σDNO = 5(EMPLOYEE)
Database Systems
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash Search methods for Selection
Conjunctive selection conjunctive selection isof the following form
σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of
the following formσθ1orθ2or hellip orθn (r)
Database Systems
89
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
90
Query Optimization mdash Search methods for Selection
Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions
Database Systems
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access
path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition
The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition
Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation
Database Systems
91
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
92
Query Optimization mdash JOIN Operation
Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]
R A=B S
Database Systems
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (nested loop)
Suppose we want to perform
A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation
Database Systems
r rA Θ sB s
93
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (nested loop)
The following algorithm performs the nestedloop join operation
For each tr ε r do beginFor each ts ε s do begin
If rA Θ sB true then add tr || ts to the resultend
end
Database Systems
94
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (nested loop)
Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the
physical space and hence we need bs + br blockaccesses
Database Systems
95
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (nested loop)
If one of the relations fits in the physical spacethen bs + br block accesses will be the cost
Database Systems
96
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (block nestedloop)
If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses
Database Systems
97
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (block nested loop)
For each block Br of r do beginFor each block Bs of s do begin
For each tr ε Br do beginFor each ts ε Bs do begin
If rA Θ sB true then add tr || ts to the resultend
endend
end
Database Systems
98
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (block nestedloop)
Cost of block nested loop in term of numberof block accesses is br bs + br
How can we improve block nested loop
Database Systems
99
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
100
Query Optimization mdash JOIN Operation
Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]
r A=B s
Database Systems
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
101
Query Optimization mdash JOIN Operation
Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly
Database Systems
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to
each relation As the algorithm proceeds the pointers movethrough the relations
Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is
bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory
Database Systems
102
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
103
Query Optimization mdash JOIN Operation
hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation
Database Systems
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash Complex JOIN Operation
Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e
conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections
Database Systems
104
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash Complex JOIN Operation
Consider the following join operation
One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing
one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions
Database Systems
105
r θ1andθ2and hellip andθn s
r θ1 s
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash Complex JOIN OperationNow consider the following join operation
The join can be performed as the union of the tuples inindividual joins
Database Systems
106
r θ1orθ2or hellip orθn s
r θi s
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
107
Query Optimization mdash Project Operation
A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we
may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique
Database Systems
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
108
Query Optimization mdash Set Operations
Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by
sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement
Union intersection and difference operations
Database Systems
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
QuestionsDevise algorithms to perform variation of outer
join operationsDevise algorithms to perform aggregate
operations
Database Systems
109
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)
Database Systems
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
111
Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate
AddressFROM Project Department EmployeeWHERE Dnum = Dnumber
AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo
Database Systems
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash An Example
The above query can be translated into
ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and
MNGSSN=SSN (Project times (Department times Employee)))
Database Systems
112
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash An Example
Database Systems
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN
Employee
Department
times
times
113
Database Systems
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
Query Optimization mdash An Example
The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes
Database Systems
114
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
115
Query Optimization mdash An Example
However the above query based on theschemas of the relations can be translatedinto
Database Systems
ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-
116
Query Optimization mdash An Example
ΠPnumberDnumLnameAddressBdate
Project
σPlocation=ldquocaliforniardquo
Employee
MNGSSN=SSN
Dnum=Dnumber
Department
Database Systems
- Query Processing and Query Optimization in Centralized Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
- Database Systems
-