part 4 data storage and query. chapter 14 query optimization
Click here to load reader
Post on 18Jan2018
227 views
Embed Size (px)
DESCRIPTION
May 2008Database System Concepts  Chapter 14 Query Optimization 3 §14.1 Overview why optimization needed §14.2 Transformation of relational expressions equivalence rules §14.4 Choice of evaluation plans — Query optimization costbased optimization,§14.2 heuristic optimization, §14.3 Main parts in Chapter 14TRANSCRIPT
PART 4
DATA STORAGE AND QUERY
Database System Concepts  Chapter 14 Query Optimization 
Chapter 14
Query OptimizationDatabase System Concepts  Chapter 14 Query Optimization 
 14.1 Overviewwhy optimization needed 14.2 Transformation of
relational expressionsequivalence rules14.4 Choice of evaluation
plans Query optimization costbased optimization,14.2heuristic
optimization, 14.3
Main parts in Chapter 14
Database System Concepts  Chapter 14 Query Optimization 
 A SQL query statement may corresponds to several equivalent
expressions ( or query evaluation plans ), each of which may have
different costsQuery optimization is needed to choice an efficient
queryevaluation plan with lower costsE.g.1. select *
from r, s
where r.A = s.A
r.A=s.A (r s) is much slower than r s
14.1 Overview
Database System Concepts  Chapter 14 Query Optimization 
 E.g.2 Find the names of all customers who have an account at
any branch located in Brooklyn select customername
from branch, account, depositor
where branch.name=account.name
AND account.number =depositor.number
AND branchcity = Brooklyn
Fig.14.114.1 Overview
Database System Concepts  Chapter 14 Query Optimization 
Fig. 14.1 Equivalent Expression
Database System Concepts  Chapter 14 Query Optimization 
 The procedures of optimizationgenerating the equivalent
expressions/query trees, by transforming of relational algebra
expressions according to equivalence rules in14.2generating
alternative evaluation plans, by annotating the resultant
equivalent expressions with implementation algorithms for each
operations in the expressionschoosing the optimal (that is,
cheapest) or nearoptimal plan based on the estimated cost, by
costbased optimizationheuristic optimization
14.1 Overview (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 The cost of operations is needed to estimate based on
statistical information about relations e.g. number of tuples,
number of distinct values for join attributes, etc.
14.1 Overview (cont.)
Database System Concepts  Chapter 14 Query Optimization 
14.2 Transformation of Relational Expressions
Definition. Two relational algebra expressions are equivalent, if on every legal database instances, the two expression generate the same set of tuplesDatabase System Concepts  Chapter 14 Query Optimization 
Fig.14.0.1 Schema diagram for the banking enterprise
14.2.1 Equivalence Rules
Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule1.(, 12) Conjunctive selection operations can be deconstructed into a sequence of individual selectionse.g. refer to Rule2. () Selection operations are commutative :Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule3. () Only the final operation in a sequence of project operations is needednote: it is required that L1 L2 . Lne.g. {loannumber} { {loannumber, amount} (loan) }= {loannumber} (loan)
Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule4. () Selections can be combined with Cartesian products and theta joinsa. (E1 E2) = E1 E2
note: definition of join
b. 1(E1 2 E2) = E1 1 2 E2
e.g. branchname=Perryridge( borrower loan) =borrower (branchname=Perryridge) (borrower.loannumbr=loan.loannumber) loan
By Rule4, the number of operations can be reduced, and the costs of the lefthand expressions are less than that of the righthand expressions often used in heuristic query optimizingDatabase System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule5. () Theta join operations are commutativeE1 E2 = E2 E1
Fig. 14.2Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule6. (, associative)a. natural join operations are associative
(E1 E2) E3 = E1 (E2 E3)
b. Theta joins are associative in the following manner:
(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3), where 2 involves attributes from only E2 and E3
Database System Concepts  Chapter 14 Query Optimization 
Fig.14.2 Pictorial Depiction of Equivalence Rules
Rule 5
Rule 6a
Rule 7a
Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule7. Selection operation distributes over thetajoin(,)a. when all the attributes in 0 involve only the attributes of one of the expressions , e.g. E1, being joined
0E1 E2) = (0(E1)) E2
b. when 1 involves only the attributes of E1, and 2 involves only the attributes of E21 E1 E2) = (1(E1)) ( (E2))
e.g. in Fig.14.0.2Database System Concepts  Chapter 14 Query Optimization 
 When Rule7 is applied, the size of relations participating in
thetajoin operation is reduced, thus evaluation costs are reduced
account
branch
Fig.14.0.2 An example of Rule7
Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule8. Projection operation distributes over thetajoin(,)a. L1 and L2 are the attributes of E1 and E2 respectively. if join condition involves only attributes in L1 L2, thenDatabase System Concepts  Chapter 14 Query Optimization 
{branchname, branchcity} {branchname, amount}
branch
loan
{branchname, branchcity}
{branchname, amount}
branch
loan
Fig.14.0.3 An example of Rule 8a
L1
L2
E1:
branch (branchname, branchcity, assets)
E2:
loan(loannumber, branchname, amount)
Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
b. consider a join E1 E2
let L1 and L2 be sets of attributes from E1 and E2, respectivelylet L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, andlet L4 be attributes of E2 that are involved in join condition , but are not in L1 L2.Database System Concepts  Chapter 14 Query Optimization 
{branchcity} { loannumber}
branch
loan
{branchcity, assets}
{loannumber, amount}
branch
loan
Fig.14.0.4 An example of Rule 8b
B. assets> L. amount
L1
L2
L4
L3
B. assets> L. amount
E1:
branch (branchname, branchcity, assets)
E2:
loan(loannumber, branchname, amount)
{branchcity} { loannumber}
Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule9. The set operations union and intersection are commutative
E1 E2 = E2 E1
E1 E2 = E2 E1set difference is not commutative
Rule10. Set union and intersection are associative(E1 E2) E3 = E1 (E2 E3)
(E1 E2) E3 = E1 (E2 E3)Database System Concepts  Chapter 14 Query Optimization 
14.2.1 Equivalence Rules (cont.)
Rule11. () The selection operation distributes over , and . (E1 E2) = (E1) (E2) similarly for and in place of (E1 E2) = (E1) E2 similarly for in place of , but not for Rule12. () The projection operation distributes over unionL(E1 E2) = (L(E1)) (L(E2))
Database System Concepts  Chapter 14 Query Optimization 
14.2.2 Example One
Find the names of all customers who have an account at some branch located in Brooklyn
customername(branchcity = Brooklyn
(branch (account depositor)))Transformation using rule 7a.
customername
((branchcity =Brooklyn (branch))
(account depositor))Performing the selection as early as possible reduces the size of the relation to be joined.Database System Concepts  Chapter 14 Query Optimization 
14.2.2 Example Two
For When we computet (branchcity = Brooklyn (branch) account )
Push projections using equivalence rules 8a and 8b; eliminate unneeded attributes from intermediate results to get:
we obtain a temporal relation t whose schema is:
T(branchname, branchcity, assets, accountnumber, balance)
customername (
accountnumber t(branchname, branchcity, assets, accountnumber, balance)
depositor(customername, accountnumber))customername((branchcity = Brooklyn (branch) account) depositor)
Database System Concepts  Chapter 14 Query Optimization 
14.2.2 Example Three
Find the names of all customers with an account at a Brooklyn branch whose account balance is over $1000.
customername((branchcity = Brooklyn balance > 1000
(branch (account depositor))) Transformation using join associatively (Rule 6a), refer to Fig.14.3Database System Concepts  Chapter 14 Query Optimization 
14.2.2 Example Three (cont.)
Fig. 14.3 Multiple transformations
Database System Concepts  Chapter 14 Query Optimization 
14.2.3 Join Ordering
For all relations r1, r2, and r3,(r1 r2) r3 = r1 (r2 r3 )
If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3so that we compute and store a smaller temporary relation.
Database System Concepts  Chapter 14 Query Optimization 
 Consider the expression
customername ((branchcity = Brooklyn (branch))
Could compute account depositor first, and join result with
account depositor)
branchcity = Brooklyn (branch)
, but account depositor is likely to be a large relation.Since it is more likely that only a small fraction of the banks customers have accounts in branches located in Brooklyn, it is better to computebranchcity = Brooklyn (branch) account
first.
14.2.3 An Example
Database System Concepts  Chapter 14 Query Optimization 
 How to estimate the cost of an operationestimating the size of
the resultant relations produced by the operation p, according to
statistics in the catalog about the relations on which the
operation is conducted e.g. customername(balance2500(account)
customer)computing the cost of the operation in terms of disk
access, by means of the cost formulae shown in chapter13disk access
is the predominant cost, and the number of block transfers from
disk is used as a measure of the actual cost of evaluation.
14.3 Estimating Statistics of Expression Results
Database System Concepts  Chapter 14 Query Optimization 
 nr: number of tuples in a relation rbr: number of blocks
containing tuples of rlr: size of a tuple of r in bytesfr: blocking
factor of rthe number of tuples of r that fit into one blockV(A,
r): number of distinct values that appear in r for attribute Asame
as the size of A(r)Statistics about indices, such as heights of
B+trees, that is HTi, and the number of leaf pages in the indices,
are also in catalog
14.3.1 Statistics in Catalog for Optimization
Database System Concepts  Chapter 14 Query Optimization 
 If tuples of r are stored together physically in a file, then:
Statistics in Catalog for Optimization (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 An example for statistical informationfaccount= 20 (20 tuples
of account fit in one block)V(branchname, account) = 50 (50
branches)V(balance, account) = 500 (500 different balance
values)naccount = 10000 (account has 10,000 tuples)assume the
following indices exist on accounta primary, B+tree index for
attribute branchnamea secondary, B+tree index for attribute
balance
Statistics in Catalog for Optimization (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 The size estimation of the result of a selection operation
depends on the selection predicates, that is single equality
predicate, single comparison predicate, and combination of
predicatesEquality selection A=a (r)assuming uniform distribution
of values in relations, and the value a appears in attribute A of
some record in r the selection is estimated to have nr / V(A, r)
tuples nr /V(A, r) /fr blocks that these tuples occupy
14.3.2 Selection Size Estimation
Database System Concepts  Chapter 14 Query Optimization 
 Comparison selections AV(r) the lowest and highest values of
attributes, that is min(A, r) and max(A, r), are available in
catalogthe estimated number of records/tuples satisfying comparison
condition A V0, if v < min(A, r)nr, if v max(A, r) , otherwise
in absence of statistical information cost is assumed to be
nr / 2
14.3.2 Selection Size Estimation (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 The selectivity of a condition i is the probability that a
tuple in the relation r satisfies i . If si is the number of
satisfying tuples in r, the selectivity of i is given by si
/nr.Conjunction: 1 2. . . n (r). The estimate for number of
tuples in the result is:
Disjunction:1 2 . . . n (r). Estimated number of tuples:
Negation: (r). Estimated number of tuples:
nr size((r))14.3.2 Selection Size Estimation (cont.)
Database System Concepts  Chapter 14 Query Optimization 
14.4 Choice of Evaluation Plans
How to generate the evaluation plan for an relational algebra expression with lower or the lowest costgenerating of the optimized query treesselecting implementation algorithms for each operations in the treepipeline or materialization strategy for the expressionE.g. Fig.14.5
Query optimizationDatabase System Concepts  Chapter 14 Query Optimization 
Fig.14.5 An evaluation plan
Database System Concepts  Chapter 14 Query Optimization 
 Query optimization choosing the evaluation plan with lowest or
the lower costQuery optimization is conducted by two
strategies/approachescostbased optimization,14.4.2heuristic
optimization, 14.4.3!Practical query optimizers incorporate
elements of these two approaches
14.4 Choice of Evaluation Plans
(cont.)Database System Concepts  Chapter 14 Query Optimization 
 Costbased optimizationsearch all the possible plans and choose
the best plana optimal plan with the minimum costs can be
obtainedCostbased optimization is expensive, though results
optimal plan
14.4.2 Costbased Optimization
Database System Concepts  Chapter 14 Query Optimization 
 Heuristic optimizationuses heuristics (or heuristic rules) to
choose a better plana suboptimal plan with lower costs can be
obtainedPrinciples: transforms the querytree by heuristics to
reduce costperform selection early to reduce the number of
tuplesperform projection early to reduces the number of
attributessubstitute Cartesian product and selection with join
14.4.3 Heuristic Optimization
Database System Concepts  Chapter 14 Query Optimization 
 perform most restrictive selection and join operations before
other similar operationsthe most restrictive operations generate
resulting relations with smallest size
14.4.3 Heuristic Optimization (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 Step1 rule1conjunctive selection Step2 rule2, rule7.a, rule7.b,
rule11 Step3 rule6restrictive selectionrestrictive selection refer
to Fig. 14.3
Steps in Heuristic Optimization
Database System Concepts  Chapter 14 Query Optimization 
 Step4 rule4.a, Step5 rule3, 8.a, 8.b,12, Step6 e.g. Fig.14.5
5
Steps in Heuristic Optimization
(cont.)
Database System Concepts  Chapter 14 Query Optimization 
 Given S(S#, sname, age, sex)C(C#, cname, teacher)SC(S#, C#,
grade) Find all the students, who are male, and get grades more
than 90 when they learn some one courseStep1. SQL statement SELECT
DISTINCT sname
FROM S, SC
WHERE S.s# = SC.s# AND sex=M AND grade > 90
join condition
selection conditions
Example One
Database System Concepts  Chapter 14 Query Optimization 
 Principles
select A1, A2, , An
from r1, r2, , rn
where P
corresponds to
A1, A2, , An (P ( r1 r2 rn ) )
Example One (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 Step2. Initial Relational algebra expression E1 =
sname( s.s#=sc.s# sex=M grade>90 (S SC ) )
Step3. Initial query tree Fig. 14.0.5 (a), E1Step4. By means of Rule1, Rule7b, distribute operations over relation S and SC Fig. 14.0.5 (b), E2Example One (cont.)
Database System Concepts  Chapter 14 Query Optimization 
sname
grade>90
sex=M
S
SC
bE2
sname
S
(S#, sname, age, sex)
SC (S#, C#, grade)
(a) Initial query tree
E1
s.s#=sc.s# sex=M grade>90
s.s#=sc.s#
Database System Concepts  Chapter 14 Query Optimization 
 Step5. By means of Rule4.a, replace selection and Cartesian
product with natural join Fig. 14.0.5 (c), E3Step6. !! By means of
Rule8b, distribute operations over relation S and SCFig. 14.0.5
(d),The final optimized expression E4 that is equivalent to initial
expression E1 is sname {sname, S# (sex=M (S))
grade>90 (grade, S# (SC)) }
Example One (cont.)
Database System Concepts  Chapter 14 Query Optimization 
cE3
s.s#=sc.s#
sname, S#
SC(S#, C#, grade)
dE4
sname
grade>90
sex=M
S
(S#, sname, age, sex)
SC (S#, C#, grade)
s.s#=sc.s#
sname
sex=M
grade>90
S#
S
(S#, sname, age, sex)
Database System Concepts  Chapter 14 Query Optimization 
 Consider the following insurance database, where the primary
keys are underlined, Person(driverid, name, address) Car(license,
model, year)Accident(reportnumber, date,
location)Participated(driverid , license, reportnumber,
damageamount)
Example Two
Database System Concepts  Chapter 14 Query Optimization 
 ProblemsGive a SQL statement to find out the driver name,
license, reportnumber of the accidents which happened in Beijing
and before May 3, 2008Give the initial query tree for this query,
and construct an optimized and equivalent relational algebra
expression for it by means of heuristic optimization
Example Two (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 Step1. SQL statement Select name, license,
Accident.reportnumber
From Person, Participate, Accident
Where Person.driver_id = Participate.driver_id AND
Accident.report_number = Participate.report_number
AND Location=Beijing AND date < 5/3/2008
Step2. Initial expressionname, license, Accident.reportnumber
( Person.driver_id = Participate.driver_id AND Accident.report_number = Participate.report_number AND Location=Beijing AND date < 5/3/2003
(Person Participate Accident )
Example Two (cont.)
Database System Concepts  Chapter 14 Query Optimization 
name, license, Accident.reportnumber
Person.driver_id = Participate.driver_id AND Accident.report_number = Participate.report_number AND Location=Beijing AND date < 5/3/2003
Person
Participate
Accident
(a) initial query tree
Database System Concepts  Chapter 14 Query Optimization 
name, license, Accident.reportnumber
Accident.report_number
Person
(driverid, name, address)
Participate
(driverid, license, reportnumber, damageamount)
Accident
(reportnumber, date ,location)
(b) optimal query tree
Accident.report.number = Participate.report_number
Person.driver_id = Participate.driver_id
driver_id,name
driver_id,license,
reportnumber
Location=BeijingAND date < 5/3/2003
name, license,
reportnumber
Database System Concepts  Chapter 14 Query Optimization 
 Consider the following relations in banking enterprise
database, where the primary keys are underlinedbranch (branchname,
branchcity, assets), loan ( loannumber, branchname ,
amount)borrower( customername, loannumber, borrowdate)customer
(customername, customerstreet, customercity)account
(accountnumber, branchname, balance)depositor (customername,
accountnumber , depositdate)
Example Three
Database System Concepts  Chapter 14 Query Optimization 
 For the query Find the names of all customers who have an loan
at any branch that is located in Brooklyn and have assets more than
$100,000, requiring that loanamount is less than $1000give an SQL
statement for this query given a initial query tree for the query,
and convert it into an optimized query tree by means of heuristic
optimization
Example Three (cont.)
Database System Concepts  Chapter 14 Query Optimization 
 SQL
select customername
from borrower, loan, branch
where loan.loannumber=borrower.loannumber
and branch.branchname=loan.branchname
and branchcity=Brooklyn
and assets>100000 and amount
customername
loan.loannumber
customer name, loannumber
borrower ( customername,
loannumber, borrowdate)
loan.loannumber
,branchname
amount< 100loan( loannumber, branchname , amount)
branchname
branchcity=BrooklynAND assets>100000
Branch (branchname,
branchcity, assets)
Database System Concepts  Chapter 14 Query Optimization 
Employee(Employee#, Name, Address, SuperE#, ED#)
Department(D#, Dname, manager#, departlocation)
Project(P#, Pname, Plocation, PD#)
P# PD#, NameAddressSQL Select P#, PD#, Name, Address)From Project, Department, Employee
Where Plocation = AND PD#= D#
AND manager# = Employee#
Example Four (cont.)
Database System Concepts  Chapter 14 Query Optimization 
P#, PD#, Name, Address
P#, PD#, manager#
employee#, name, address
Employee
P#, PD#
Plocation=Project
D#, manager#
Department
Database System Concepts  Chapter 14 Query Optimization 
 Consider the following schema, where the primary keys are
underlined , Suppliers (supplierid, suppliername, city,
telephone, address) Parts ( partsid, partsname, partscolor)
Catalog (supplierid, partsid, price ) . (1) Give an SQL statement
to find out the name and telephone of the suppliers who supply a
red part whose price is below $2000. (2) Translate this SQL
statement into an initial query tree, and give an optimized query
tree for it, by means of heuristic query optimization.
Database System Concepts  Chapter 14 Query Optimization 
Appendix A Sybase
Select max(BscpID)From BSC
Database System Concepts  Chapter 14 Query Optimization 
Appendix A Sybase
Database System Concepts  Chapter 14 Query Optimization 
Database System Concepts  Chapter 14 Query Optimization 
 3
delete from loan
where amount between 0 and 50
Appendix A Sybase
Database System Concepts  Chapter 14 Query Optimization 
delete from loan
where amount in (
select amount
from loan
where amount >= 0 and amount
Appendix B
DDBS
DDBS, /. S(S#SNAMEAGESEX) , 104A C(C#CNAMETEACHER), 105B SC(S#C#GRADE), 106A 100bitv = 104 bit/s d=1s DDBSAB 2 MATHDatabase System Concepts  Chapter 14 Query Optimization 
 SQLSELECT S#SNAME
FROM SSC C
WHERE S.S# = SC.S# AND SC.C# = C.C# ()
AND SEX=M AND CNAME= MATH
(query cost)TC (x) =1s + bit 104 bit/s
Appendix B
DDBS ()
Database System Concepts  Chapter 14 Query Optimization 
 1CAATC1 = 1s + 105 100 bit 104 bit/s 103 s 16.7 min2SSCBBTC2 =
2s + 104 +106 100 bit 104 bit/s 10100 s28h
Appendix B
DDBS ()
Database System Concepts  Chapter 14 Query Optimization 
 3ASCS105C#, BC() CNAME=MATH105TC3 2 105 1
2.34BCCNAME=MATH10C#ASSCMATHSC.C#= C#?(S.SEX=M?)
Appendix B
DDBS ()
Database System Concepts  Chapter 14 Query Optimization 
 TC4 2 10 1 20s5ASSC105BBTC5= 1s + 105 100 bit 10 4bit/s 103 s
16.7 m6BCCNAME=MATH10AATC6 = 1s + 10 100 bit 104 bit/s 1s
Appendix B
DDBS ()
Database System Concepts  Chapter 14 Query Optimization 
Appendix C DB2
IBM DB2SQLDB2DB2SQLDB2SET CURRENT QUERY OPTION01DB2/6000 Version 1Database System Concepts  Chapter 14 Query Optimization 
Appendix C DB2
2opt level 53DB2 for Z/OS5Heuristic Rules7Heuristic Rules9Database System Concepts  Chapter 14 Query Optimization 
Appendix C DB2
4SQLSQLCLIDRDADRDA ARExplain TableSELECTOSSQLVisual ExplainwindowswindowsUNIXDB2SQLDADA ARSQLDatabase System Concepts  Chapter 14 Query Optimization 
Appendix C DB2
db2exfmtexplain tableDADA ARdb2explnstatic packageCLIDRDA ARDatabase System Concepts  Chapter 14 Query Optimization 
Have
a
break
Database System Concepts  Chapter 14 Query Optimization 
)
(
))
))
(
(
(
(
1
2
1
E
E
L
Ln
L
L
P
=
P
P
P
K
K
)))
(
(
))
(
((
)
.....
(
2
......
1
2
1
4
2
3
1
2
1
2
1
E
E
E
E
L
L
L
L
L
L
L
L
=
q
q
=
r
f
r
n
r
b
)
,
min(
)
,
max(
)
,
min(
.
r
A
r
A
r
A
v
n
r


n
r
n
r
n
s
s
s
n
*
*
*
*
.
.
.
2
1

*
*

*


*
)
1
(
...
)
1
(
)
1
(
1
2
1
r
n
r
r
r
n
s
n
s
n
s
n
))
(
(
)
(
2
1
2
1
E
E
q
q
q
q
s
s
s
=
))
(
(
))
(
(
1
2
2
1
E
E
q
q
q
q
s
s
s
s
=
))
(
(
))
(
(
)
(
2
......
1
2
.......
1
2
1
2
1
E
E
E
E
L
L
L
L
=
q
q