cse441w database systems - dr. ali r....

116
A.R. Hurson 323 CS Building [email protected] Query Processing and Query Optimization in Centralized Database Systems

Upload: truongthien

Post on 07-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

AR Hurson323 CS Buildinghursonmstedu

Query Processing and Query Optimization in

Centralized Database Systems

Query processing is defined as the activitiesinvolved in parsing validation translationoptimization and execution of a query

The aims of query processing process are totransform a query written in a high-levellanguage SQL into a correct and efficientexecution strategy expressed in low-levellanguage and to execute the strategy togenerate the result

Database Systems

6

3

Query processingA query processing involves three stepsParsing validation and TranslationOptimizationEvaluation (execution)

Database Systems

4

Query processingQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Database Systems

5

Query processing An Example

Select balanceFrom accountWhere balance lt 2500

Database Systems

6

Query processing An Exampleσbalance lt2500 (Πbalance (account) )

orΠbalance(σbalance lt2500 (account) )

Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution

Database Systems

7

Query processing An ExampleFactors such as number of accesses to the disks

and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the

number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric

Database Systems

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query processing is defined as the activitiesinvolved in parsing validation translationoptimization and execution of a query

The aims of query processing process are totransform a query written in a high-levellanguage SQL into a correct and efficientexecution strategy expressed in low-levellanguage and to execute the strategy togenerate the result

Database Systems

6

3

Query processingA query processing involves three stepsParsing validation and TranslationOptimizationEvaluation (execution)

Database Systems

4

Query processingQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Database Systems

5

Query processing An Example

Select balanceFrom accountWhere balance lt 2500

Database Systems

6

Query processing An Exampleσbalance lt2500 (Πbalance (account) )

orΠbalance(σbalance lt2500 (account) )

Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution

Database Systems

7

Query processing An ExampleFactors such as number of accesses to the disks

and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the

number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric

Database Systems

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

3

Query processingA query processing involves three stepsParsing validation and TranslationOptimizationEvaluation (execution)

Database Systems

4

Query processingQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Database Systems

5

Query processing An Example

Select balanceFrom accountWhere balance lt 2500

Database Systems

6

Query processing An Exampleσbalance lt2500 (Πbalance (account) )

orΠbalance(σbalance lt2500 (account) )

Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution

Database Systems

7

Query processing An ExampleFactors such as number of accesses to the disks

and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the

number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric

Database Systems

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

4

Query processingQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Database Systems

5

Query processing An Example

Select balanceFrom accountWhere balance lt 2500

Database Systems

6

Query processing An Exampleσbalance lt2500 (Πbalance (account) )

orΠbalance(σbalance lt2500 (account) )

Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution

Database Systems

7

Query processing An ExampleFactors such as number of accesses to the disks

and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the

number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric

Database Systems

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

5

Query processing An Example

Select balanceFrom accountWhere balance lt 2500

Database Systems

6

Query processing An Exampleσbalance lt2500 (Πbalance (account) )

orΠbalance(σbalance lt2500 (account) )

Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution

Database Systems

7

Query processing An ExampleFactors such as number of accesses to the disks

and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the

number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric

Database Systems

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

6

Query processing An Exampleσbalance lt2500 (Πbalance (account) )

orΠbalance(σbalance lt2500 (account) )

Note there might be different ways to define and execute a queryIt is the role of optimizer to select an efficient way to execute aquery Therefore the optimizer needs to determine differentways (plans) that one can execute a query determine theexecution cost of each plan and then choose the most costeffective plan for execution

Database Systems

7

Query processing An ExampleFactors such as number of accesses to the disks

and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the

number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric

Database Systems

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

7

Query processing An ExampleFactors such as number of accesses to the disks

and CPU time must be taken into considerationto estimate cost of a planIn large databases however disk accesses (the

number of data block transfers) are usually themost dominating cost factor Hence it can beused as a cost metric

Database Systems

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query processing An ExampleTo simplify the cost estimation we can assume

that all block transfers cost the same (ievariances in rotational latency and seek time areignored)For more accurate measure one also need to

distinguish the difference between sequentialIO and random IO as well

Database Systems

8

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query processing An ExampleOne also needs to distinguish between the

number of data blocks being read and writtenTechniques such as pipelining and parallelism

if possible depending on the underlyingplatform can be applied to execute basicoperationsDifferent algorithms can be developed to

execute basic operations

Database Systems

9

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

10

Query processing An Example

Account

Πbalance

σbalance lt2500

Database Systems

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query optimization is the activity ofchoosing an efficient execution strategy forprocessing a queryQuery optimization can be done in two

fashion Static or dynamic

Database Systems

15

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

There are two choices in carrying the firstphases (ie parsing validation translation andoptimization) of query processing

One option is to dynamically carry out thedecomposition and optimization every time thequery is run

Alternative is static query optimization wherethe query is parsed validated and optimizedonce

16

Database Systems

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

13

Query OptimizationIn general optimization is required in such a

system if the system is expected to achieveacceptable objectives (eg performance)It is one of the strength of relational algebra

that optimization can be done automaticallysince relational expression are at a sufficientlyhigh semantic level

Database Systems

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

14

Query OptimizationThe overall goal of an optimization is to choose

an efficient strategy for evaluation of a givenrelational expression (ie a query)An optimizer might actually do better than a

human programmer since

Database Systems

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

15

Query OptimizationAn optimizer will have a wealth of information

available to it that human programmers typicallydo not haveIf the data base statistics changes drastically

then an optimizer may choose a differentstrategyOptimizer can potentially considers several

strategies for a given requestOptimizer is written by an expert

Database SystemsQuery Parser amp

TranslatorInternal

Representation

ExecutionPlan

QueryOutput

Optimizer

Statisticsabout data

ExecutionEngine

DATA BASE

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Running Example

16

Database Systems

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPARTMENT

Dname Dnumber Mgr_ssn Mgr_start_date Dnumber Dlocation

DEPT_Location

Pname Pnumber Plocation Dnum

PROJECTEssn Pno Hours

WORKS_ON

DEPENDENTEssn Dependent_name Sex Bdate Relationship

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running ExampleFind the last name of employees born after 1957 and working

on a project named ldquoAquariusrdquo

SELECT LnameFROM EMPLOYEE WORKS_ON PROJECT

WHERE Pname = lsquoAquariusrsquo AND Pnumber = Pno AND Essn= Ssn AND Bdate gt lsquo1957-12-31rsquo

Database Systems

17

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

Database Systems

18

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash An Example

Execution of the previous query tree generates avery large relation because of performing Cartesianproducts on input relationsIt makes sense to perform some Select operations

on base relations before performing the Cartesianproducts

Database Systems

19

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

Database Systems

20

ΠLname

Works_on

times

Employee

σBdate gt lsquo1957-12-31rsquo

timesσEssn = Ssn

Project

σPname = lsquoAquariusrsquo

σPnumber = Pnno

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

By closer observation one should realize that justone tuple from the Project will be involved with thequery So it makes sense to switch the order ofoperations on input relations

Database Systems

21

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

Database Systems

22

ΠLname

times

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

σEssn = Ssn

Works_on

times

σPnumber = Pno

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

It also makes sense to replace any Cartesianproduct followed by a Select operation with aJoin operation

Database Systems

23

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

Database Systems

24

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquoWorks_on

Pnumber = Pno

Essn = Ssn

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

It also makes sense to reduce the size ofintermediate results by keeping just attributesthat are needed for correct execution of thisquery

Database Systems

25

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Query Optimization mdash Running Example

Database Systems

26

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

ΠLname

Employee Works_on

times Project

times

σPname = lsquoAquariusrsquo and Pnumber = Pnno and Essn = Ssn and Bdate gt lsquo1957-12-31rsquo

ΠLname

Employee

σBdate gt lsquo1957-12-31rsquo

Project

σPname = lsquoAquariusrsquo

Pnumber = Pno

Essn = Ssn

ΠEssnLnameΠSsn

Works_on

ΠEssnPnoΠPnumber

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

31

Database SystemsSystem CatalogQuery

Decomposition

Query Optimization Database Statics

Code Generation

Runtime Execution

Result

Database

Relational AlgebraExpression

Execution Plan

Query

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

28

Query Optimization mdash A Simple Example

S Sname Status CityS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

SS P QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 bull bull bull

SP

Database Systems

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

S

P

QTY

S1

P1

300

S1

P2

200

S1

P3

400

S1

P4

200

S1

P5

100

S1

P6

100

(

(

(

S

Sname

Status

City

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

29

Query Optimization mdash A Simple ExampleGet names of suppliers who supply part P2

SELECT DISTINCT SnameFROM S SPWHERE SS = SPSAND SPP = lsquoP2rsquo

Suppose that the cardinality of S and SP are 100and 10000 respectively Furthermore assume50 tuples in SP are for part P2

Database Systems

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

Query Optimization mdash A Simple Example

Database Systems

30

S SP

times

σ(SS = SPS and SPP = lsquoP2rsquo)

ΠSname

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

31

Query Optimization mdash A Simple Example

S Sname Status SCity S P QTY S1 Smith 20 London S1 P1 300 S1 Smith 20 London S1 P2 200 S1 Smith 20 London S1 P3 400 S1 Smith 20 London S1 P4 200 S1 Smith 20 London S1 P5 100 S1 Smith 20 London S1 P6 100 S2 Jones 10 Paris S2 P1 300 S2 Jones 10 Paris S2 P2 400 bull bull

A SS=SPS B

Database Systems

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

S

Sname

Status

SCity

S

P

QTY

S1

Smith

20

London

S1

P1

300

S1

Smith

20

London

S1

P2

200

S1

Smith

20

London

S1

P3

400

S1

Smith

20

London

S1

P4

200

S1

Smith

20

London

S1

P5

100

S1

Smith

20

London

S1

P6

100

S2

Jones

10

Paris

S2

P1

300

S2

Jones

10

Paris

S2

P2

400

(

(

32

Query Optimization mdash A Simple ExampleWithout an optimizer the system willGenerates Cartesian product of S and SP This will

generate a relation of size 1000000 tuples mdash Toolarge to be kept in the main memoryRestricts results of previous step as specified by

WHERE clause This means reading 1000000tuples of which 50 will be selectedProjects the result of previous step over Sname to

produce the final result

Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

33

Query Optimization mdash A Simple ExampleAn Optimizer on the other handRestricts SP to just the tuples for part P2 This will

involve reading 10000 tuples but produces arelation with 50 tuplesJoins the result of the previous step with S relation

over S This involves the retrieval of only 100tuples and the generation of a relation with at most50 tuplesProjects the result of the last operation over Sname

Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash A Simple Example

SP

σ (SPP = lsquoP2rsquo)

Database Systems

SS = SPS

S

ΠSname

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

35

Query Optimization mdash A Simple ExampleIf the number of tuples IOrsquos is used as the performance

measure then it is clear that the second approach is farfaster that the first approach In the first case wereadwrite about 3000000 tuples and in the secondcase we read about 10000 tuples

So a simple policy mdash doing restriction and then joininstead of doing product and then a restriction sounds agood heuristic

Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

36

Optimization ProcessCast the query into some internal representation

mdash Convert the query to some internalrepresentation that is more suitable for machinemanipulation relational algebra

Now we can build a query tree very easilyΠ(Sname)(σP = ldquoP2rdquo(S SS =SPSSP ))

Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

37

Optimization Process

S SP

Join (SS = SPS)

Restrict (SpP = lsquoP2rsquo)

Project (Sname)

Result

Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

38

Optimization ProcessConvert the result of the previous step into a

canonical form mdash during this phase optimizerperforms a number of optimization that areldquoguaranteed to be goodrdquo regardless of the actualdata value and the access paths For Example

Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

39

Optimization Process(A Join B) WHERE restriction-on-B can be transformed into(A Join (B WHERE restriction-on-B))

(A Join B) WHERE restriction-on-A AND restriction-on-B can be transformed into(A WHERE restriction-on-A) Join (B WHERE restriction-on-B))

Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

40

Optimization ProcessGeneral rule It is a good idea to perform

the restriction before the join becauseIt reduces the size of the input to the join

operationIt reduces the size of the output from the join

Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

41

Optimization Process

WHERE p OR (q AND r)can be converted intoWHERE (p OR q) AND (p OR r)

Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

42

Optimization ProcessGeneral rule Transform restriction condition

into an equivalent condition in conjunctivenormal form becauseA condition that is in conjunctive normal form

evaluates to ldquotruerdquo only if every conjunct evaluatesto ldquotruerdquo Consequently it evaluates to ldquofalserdquo ifany conjunct evaluates to ldquofalserdquo This is speciallyuseful in the domain of parallel systems whereconjuncts can be evaluated in parallel

Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

43

Optimization Process(A WHERE restriction-1) WHERE restriction-2can be converted intoA WHERE restriction-1 AND restriction-2

Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

44

Optimization ProcessGeneral rule A sequence of restrictions can be

combined into a single restriction

Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

45

Optimization Process(A [projection-1]) [projection-2]can be converted intoA [projection-2]

Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Optimization ProcessGeneral rule A sequence of projections can be

transferred into a single projection

46

Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

47

Optimization ProcessGeneral rule A restriction and projection can

be converted into a projection and restriction

Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

48

Optimization ProcessFinally consider the following queryGet the supplier numbers who supply at least

one part(SP Join P) [S]

However we know that P is the foreign key inSP therefore the above query is semanticallyequivalent to

SP [S]

Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

49

Optimization ProcessAn equivalence rule says that expressions in different

forms are equivalent In another words an expressionin one form can be replaced by its equivalentexpression

Since the computational cost of equivalent relationsmay vary the optimizer can use equivalence rules totransform expression while satisfying performancemetrics

Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

50

Optimization ProcessRule 1 Conjunctive selection operations

(cascade of selections) can be deconstructedinto a sequence of individual selections

σθ1andθ2(E) = σθ1(σθ2(E))

Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

51

Optimization ProcessRule 2 Selection operation is commutative

σθ1(σθ2(E)) = σθ2(σθ1(E))

Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

52

Optimization ProcessRule 3 A sequence of projections is the

same as the last projection operation(cascade of projections)

ΠL1(ΠL2(hellip (ΠLn(E))hellip)) = ΠL1(E)

Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

53

Optimization ProcessRule 4 A combination of selection and

Cartesian product operations isequivalent to theta join operation

This can be extended toσθ (E1 X E2) = E1 θ E2

σθ1 (E1 θ2 E2) = E1 θ1andθ2 E2

Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

54

Optimization ProcessRule 5 Theta join operation is

commutative

E1 θ E2 = E2 θ E1 θ

E1 E2

θ

E2 E1

Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

55

Optimization ProcessRule 6 Natural join is associative

(E1 E2) E3 = E1 (E2 E3)

E1 E2

E3

E3E2

E1

Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

56

Optimization ProcessRule 7 Theta join is associative in the

following manner(E1 θ1 E2) θ2andθ3 E3 = E1 θ1andθ3(E2 θ2 E3)

Where θ2 involves attributes from only E2 and E3

Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

DefinitionSelectivity is defined as the ratio of the number of

tuples that satisfy the equality condition to thecardinality of the relation

119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904119904 =119900119900119900119900 119904119904119905119905119905119905119904119904119904119904119904119904 119904119904119904119904119904119904119904119904119904119904119900119900119904119904119904119904119904119904119904119904 119904119904119905119904119904 119904119904119904119904119904119904119904119904119904119904119905

|119904119904(119877119877)|Selectivity is used to estimate size of intermediate

relation and hence number of accesses

Database Systems

57

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

In practice selectivities of all conditions isnot available so we use estimatedselectivity as part of statistical data to aidquery optimization

Database Systems

58

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Selectivity on key attribute and search onequality then

119904119904 =1

|119904119904(119877119877)

Database Systems

59

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Selectivity on an attribute with i distinctvalues is

119904119904 = |119904119904(119877119877)

119904119904|119904119904(119877119877)

Hence the number of tuples that satisfy anequality search is

1119894119894

|r(R)|

Database Systems

60

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

61

Optimization ProcessRule 8 Selection operation distribute

over the theta join under the followingconditionsWhen all attributes in selection condition θ0

involve only the attributes of one relation (E1in this case)

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

62

Optimization ProcessRule 8

σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2

σθ0

θ

E1 E2

θ

σθ0 E2

E1

Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

63

Optimization ProcessRule 9 The projection operation

distributes over theta-join under thefollowing conditionJoin condition θ only involves attributes in

L1 cup L2

ΠL1cup L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))

Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

64

Optimization ProcessRule 10 Set union and set intersection

operations are commutative

Note set difference is not commutative

(E1 cup E2) = (E2 cup E1)(E1 cap E2) = (E2 cap E1)

Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

65

Optimization ProcessRule 11 Set union and set intersection

operations are associative(E1 cup E2) cup E3 = E1 cup (E2 cup E3)

(E1 cap E2) cap E3 = E1 cap (E2 cap E3)

Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

66

Optimization ProcessRule 12 Selection operation distributes over

the set union set intersection and set differenceoperations

σp (E1 E2) = σp (E1) σp (E2)σp (E1 E2) = σp (E1) (E2)

Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

67

Optimization ProcessRule 12

σp (E1 cup E2) = σp (E1) cup σp (E2)σp (E1 cup E2) ne σp (E1) cup (E2)

Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

68

Optimization ProcessRule 12

σp (E1 cap E2) = σp (E1) cap σp (E2)σp (E1 cap E2) = σp (E1) cap (E2)

Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

69

Optimization ProcessRule 13 Projection operation distributes over

the set union set intersection and setdifference operations

ΠL (E1 E2) = (ΠL (E1)) (ΠL (E2))ΠL (E1 cup E2) = ΠL (E1) cup ΠL (E2)ΠL (E1 cap E2) = ΠL (E1) cap ΠL (E2)

Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

70

Optimization ProcessChoose candidate low-level procedure mdash After

transferring the query into more desirable form theoptimizer must then decide how to evaluate the transformedquery At this stage issues such asexistence of indexes or other access paths To reduce

IO cost andphysical clustering of records To reduce IO cost hellip

comes into play

Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

71

Optimization ProcessSo in shortafter scanning and parsingthe query will be translated into an equivalent

representation this internal representation is in theform of a query tree or query graphan execution strategy will be chosen The execution

strategy is a plan for accessing the data executingthe query and storing the intermediate results

Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

72

Optimization ProcessGenerate query plans mdash The final stage of

optimization involve the construction of a set ofcandidate query plans and the choice of ldquothe best ofthese plansrdquoChoosing the cheapest plan naturally requires a

method for assigning a cost to any given plan mdashThis cost formula should estimate the number ofdisk accesses CPU utilization and execution timespace utilizationhellip

Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

73

Optimization ProcessThere are two main techniques for query

optimizationHeuristic rulesSystematic estimation approach

In this course as noted before we will talkabout the heuristic rules

Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

74

Optimization Process heuristic rules

Perform selection operations as early aspossiblePerform projections earlyIt is usually better to perform selections earlier

than projections

Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

75

Optimization Process heuristic rules

Based on heuristic rules the optimizer usesequivalence relationships to reorder operationsin a query for execution

Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

DefinitionMaterialized evaluation Generation of

intermediate result (relation)Pipeline evaluation Combining several

operations

76

Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Assume we want to perform

77

Πa1 a2 (r s)

We can perform the join operation materialize the resultant and then apply projection

Alternatively we can do the following When the joinoperation generates a tuple it will be passes directly to the project operation for processing

Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Assume the following relationsS (Sid integer Sname string rating integer age real)R (Sid integer bid integer day dates rname string)

Further assume the following querySELECT SSname

FROM R SWHERE RSid = SSid

AND Rbid = 100 AND Srating gt 5

Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

ΠSname (σbid = 100 AND rating gt 5 (R Sid=Sid S ))

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

ΠSname ((σbid = 100 R) Sid=Sid (σrating gt 5 S ))

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Assume the underlying platform canperform the basic relational operations inldquopipelinerdquo fashion ndash ie result of oneoperation is fed to another operationIn this case articulate the way the previous

query is going to be executed

Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

σbid = 100 and rating gt 5

Sid = Sid

R S

ΠSname

On the fly

On the fly

σrating gt 5

Sid = Sid

R S

ΠSname

σbid = 100

On the fly

Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Cost of PlanThe cost associated with each plan needs to be

estimated This will be accomplished byestimating the cost of each operation

Factors such as size of relation (s) underlyingarchitecture buffer size size of the memoryldquoreduction factorrdquo for each operation hellip needto be taken into consideration

Database Systems

83

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Optimization Process mdash Search methodsfor SelectionGeneral Philosophy Make effort to reduce the search

space

84

Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

85

Optimization Process mdash Search methods forSelectionLinear search Retrieve every records in the file

and test whether or not its attribute values satisfythe selection condition (In this case data is notorganized and no meta data is available)Binary search Use binary search method if the

selection condition involves an equality comparisonon a key attribute on which the file is ordered

Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

86

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve a

single record Use the primary index or hash key toretrieve the record if the selection conditioninvolves an equality comparison on a key attributewith a primary index or hash key (note in this caseat most one record is retrieved)

σSSN = 123456789(EMPLOYEE)

Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

87

Optimization Process mdash Search methods forSelectionUsing a primary index or hash key to retrieve

multiple records If the comparison condition is gtlt le ge on a key field with a primary index use theindex to find the record satisfying thecorresponding equality condition and then retrieveall the subsequent records in the file (note in thiscase data is also sorted)

σDNUMBER gt 5(DEPARTMENT)

Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

88

Query Optimization mdash Search methods for Selection

Using a clustering index to retrieve multiplerecords If the selection condition involves anequality comparison on a non-key attribute withclustering index use the clustering index to retrieveall the records satisfying the selection condition(clustered data)

σDNO = 5(EMPLOYEE)

Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash Search methods for Selection

Conjunctive selection conjunctive selection isof the following form

σθ1andθ2and hellip andθn (r)Disjunctive selection disjunctive selection is of

the following formσθ1orθ2or hellip orθn (r)

Database Systems

89

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

90

Query Optimization mdash Search methods for Selection

Conjunctive selection If an attribute involved inany single simple condition in the conjunctivecondition has an access path that allows the use ofany aforementioned techniques use that conditionto retrieve the records and then apply the rest of theconditions

Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash Search methods for SelectionDisjunctive selection by union of record pointers If access

path exists for all the attributes involved in disjunctiveselection then each index is scanned for pointers to tuplesthat satisfy individual condition

The union of all the retrieved pointers yields the set ofpointers to tuples satisfying the disjunctive condition

Note even if one of the conditions does not have an accesspath we will have to perform a linear scan of the relation

Database Systems

91

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

92

Query Optimization mdash JOIN Operation

Nested loop For each record t isin R (outer loop)retrieve every record of s isin S (inner loop) and thencheck the join condition t[A] = s[B]

R A=B S

Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Suppose we want to perform

A and B are attributes or set of attributes (iejoin attributes) of relations r and s Furtherassume nr = | r | and ns = | s | are the cardinalityof the relations Finally assume br and bs arethe number of blocks of each relation

Database Systems

r rA Θ sB s

93

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (nested loop)

The following algorithm performs the nestedloop join operation

For each tr ε r do beginFor each ts ε s do begin

If rA Θ sB true then add tr || ts to the resultend

end

Database Systems

94

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (nested loop)

Cost of nested loop algorithm is nr nsIn best case scenario both relations fit into the

physical space and hence we need bs + br blockaccesses

Database Systems

95

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (nested loop)

If one of the relations fits in the physical spacethen bs + br block accesses will be the cost

Database Systems

96

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (block nestedloop)

If the buffer is too small to hold either relationentirely we can still obtain a major saving inthe number of block accesses

Database Systems

97

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (block nested loop)

For each block Br of r do beginFor each block Bs of s do begin

For each tr ε Br do beginFor each ts ε Bs do begin

If rA Θ sB true then add tr || ts to the resultend

endend

end

Database Systems

98

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (block nestedloop)

Cost of block nested loop in term of numberof block accesses is br bs + br

How can we improve block nested loop

Database Systems

99

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

100

Query Optimization mdash JOIN Operation

Use of access structure to retrieve the matchingrecord(s) If an index or hash key exists for one ofthe join attributes say B of s retrieve each record trisin r one at a time and then use the access structureto retrieve all the matching records ts isin S thatsatisfy tr[A] = ts[B]

r A=B s

Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

101

Query Optimization mdash JOIN Operation

Sort-merge If the records of r and s are physicallysorted by the value of the join attributes then thistechnique can be applied by scanning r and slinearly

Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash JOIN Operation (Merge)1 pointer initially pointing to the first tuple is assigned to

each relation As the algorithm proceeds the pointers movethrough the relations

Since the relations are sorted each tuple is accessed onceand hence the number of block accesses is

bs + brAssuming that the set of all tuples with the same value forthe join attributes fit in the main memory

Database Systems

102

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

103

Query Optimization mdash JOIN Operation

hash-join The records of both files r and s arehashed to the same hash file using the same hashingfunction A single pass through each file hashesthe records to the hash file buckets Each bucket isthen examined for records from r and s withmatching join attribute values to produce a possibleresult for the join operation

Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash Complex JOIN Operation

Nested loop join can be used regardless of thejoin condition The other join techniquesthough more efficient than nested loop canhandle simple join conditionsJoin with complex join conditions (i e

conjunctive and disjunctive conditions) can beimplemented using techniques discussed forconjunctive and disjunctive selections

Database Systems

104

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash Complex JOIN Operation

Consider the following join operation

One or more of the join techniques may beapplicable for joins on individual conditionsWe can perform the overall join by first computing

one of the simpler joins say The result ofcomplete join consists of those tuples in theintermediate result that satisfy the remainingconditions

Database Systems

105

r θ1andθ2and hellip andθn s

r θ1 s

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash Complex JOIN OperationNow consider the following join operation

The join can be performed as the union of the tuples inindividual joins

Database Systems

106

r θ1orθ2or hellip orθn s

r θi s

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

107

Query Optimization mdash Project Operation

A project operation Πltattribute-listgt(R) isstraightforward to implement if ltattribute listgtincludes a key of relation RIf ltattribute listgt does not include a key then we

may end up with duplicates Duplicates can beeliminated by sorting the result and theneliminating the duplicate or by using hashingtechnique

Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

108

Query Optimization mdash Set Operations

Cartesian product is very expensive operation toperform Hence it is important to avoid it as muchas possibleThe other set operations can be implemented by

sorting the relations and then a single scan througheach relation is sufficient to generate the resultHashing technique is another way to implement

Union intersection and difference operations

Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

QuestionsDevise algorithms to perform variation of outer

join operationsDevise algorithms to perform aggregate

operations

Database Systems

109

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash An ExampleAssume the following relationsDepartment (Dname Dnumber Mgr-ssn hellip)Project (Pname Pnumber Plocation Dnum)Employee (Fname Lname Ssn Bdate address Dno hellip)

Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

111

Query Optimization mdash An ExampleSELECT Pnumber Dnum Lname Bdate

AddressFROM Project Department EmployeeWHERE Dnum = Dnumber

AND MGRSSN = SSNAND Plocation = lsquoCaliforniarsquo

Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash An Example

The above query can be translated into

ΠPnumberDnumLnameAddressBdate(σPlocation=ldquocaliforniardquo and Dnum=Dnumber and

MNGSSN=SSN (Project times (Department times Employee)))

Database Systems

112

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash An Example

Database Systems

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo and Dnum=Dnumber and MNGSSN=SSN

Employee

Department

times

times

113

Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

Query Optimization mdash An Example

The previous scenario will result in an inefficientquery processing Assume Project Departmentand Employee relations had tuples sizes of 100 50and 150 bytes and contained 100 20 and 5000tuples respectively Then the Cartesian productswould generate a relation of 10 million tuples eachof 300 bytes

Database Systems

114

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

115

Query Optimization mdash An Example

However the above query based on theschemas of the relations can be translatedinto

Database Systems

ΠPnumberDnumLnameAddressBdate(((σPlocation=ldquocaliforniardquo (Project)) Dnum=Dnumber (Department ) ) MNGSSN=SSN (Employee))

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems

116

Query Optimization mdash An Example

ΠPnumberDnumLnameAddressBdate

Project

σPlocation=ldquocaliforniardquo

Employee

MNGSSN=SSN

Dnum=Dnumber

Department

Database Systems

  • Query Processing and Query Optimization in Centralized Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems
  • Database Systems