query processing and optimization
Embed Size (px)
DESCRIPTION
Query Processing and Optimization. General Overview. Relational model  SQL Formal & commercial query languages Functional Dependencies Normalization Physical Design Indexing Query Processing and Optimization. Review: QP & O. SQL Query. Query Processor. Parser. Query Optimizer.  PowerPoint PPT PresentationTRANSCRIPT

Query Processing and Optimization

General OverviewRelational model  SQLFormal & commercial query languagesFunctional DependenciesNormalizationPhysical DesignIndexingQuery Processing and Optimization

Review: QP & O SQL QueryData: result of the queryQuery ProcessorParserAlgebraicExpressionQuery Optimizer Execution planEvaluator

Review: QP & O Query OptimizerQuery Execution PlanAlgebraicRepresentationQuery RewriterAlgebraic RepresentationPlan GeneratorData Stats

Selections Involving ComparisonsQuery: Att K (r )A6 (primary index, comparison). (Relation is sorted on Att)For Att V(r) use index to find first tuple v and scan relation sequentially from thereFor AttV (r) just scan relation sequentially till first tuple > v; do not use indexCost: EA5 =HTi + c / fr (where c is the cardinality of result)
kk... HTi

Query: Att K (r )
Cardinality: More metadata on r are needed:min (att, r) : minimum value of att in rmax(att, r): maximum value of att in rThen the selectivity of Att = K (r ) is estimated as: (or nr /2 if min, max unknown)Intuition: assume uniform distribution of values between min and max
min(attr, r)max(attr, r)KHow big is c?

Plan generation: Range QueriesA6: (secondary index, comparison).
Cost: EA6 = HTi 1+ #of leaf nodes to read + # of file blocks to read = HTi 1+ LBi * (c / nr) + c, if att is a candidate key
k, k+1k+mk... HTi k+1k+m...Att K (r )

Plan generation: Range QueriesA6: (secondary index, range query). If att is NOT a candidate key
k, k+1k+mk... HTi k+1k+m...k......

Cost: EA6 = HTi 1+ #of leaf nodes to read + #of file blocks to read +#buckets to read = HTi 1+ LBi * (c / nr) + c + x

Join OperationSize and plans for join operation
Running example: depositor customer
Metadata: ncustomer = 10,000 ndepositor = 5000fcustomer = 25 fdepositor = 50bcustomer= 400 bdepositor= 100V(cname, depositor) = 2500 (each customer has on average 2 accts)cname in depositor a foreign key for customer
depositor(cname, acct_no)customer(cname, cstreet, ccity)

Cardinality of Join QueriesWhat is the cardinality (number of tuples) of the join? E1: Cartesian product: ncustomer * ndepositor = 50,000,000E2: Attribute cname common in both relations, 2500 different cnames in depositor Size: ncustomer * (avg# of tuples in depositor with same cname) = ncustomer * (ndepositor / V(cname, depositor))= 10,000 * (5000 / 2500) = 20,000

Cardinality of Join QueriesE3: cname is a foreign key for depositor on customer
Size: ndepositor * (avg # of tuples in customer with same cname) = ndepositor * 1 = 5000
Note: If cname is a key for customer but it is NOT a foreign key for depositor,(i.e., not all cnames of depositor are in customer) then 5000 an UPPER BOUNDSome customer names may not match w/ any customers in customer

Cardinality of Joins in generalAssume join: R S
If R, S have no common attributes: nr * nsIf R,S have attribute A in common: (take min)
If R, S have attribute A in common and:A is a candidate key for R: nsA is candidate key in R and candidate key in S : min(nr, ns)A is a key for R, foreign key for S: = ns

NestedLoop JoinAlgorithm 1: Nested Loop Join Idea:Query: R S
t1t2t3Ru1u2u3SBlocks of...resultsCompare: (t1, u1), (t1, u2), (t1, u3) .....
Then: GET NEXT BLOCK OF S
Repeat: for EVERY tuple of R

NestedLoop JoinAlgorithm 1: Nested Loop Join for each tuple tr in R do for each tuple us in S do test pair (tr,us) to see if they satisfy the join condition if they do (a match), add tr us to the result. R is called the outer relation and S the inner relation of the join.
Query: R S

NestedLoop Join (Cont.)Cost:Worst case, if buffer size is 3 blocks br + nr bs disk accesses.Best case: buffer big enough for entire INNER relation + 2 br + bs DAs.
ncustomer = 10,000 ndepositor = 5000fcustomer = 25 fdepositor = 50bcustomer= 400 bdepositor= 100Assuming worst case memory availability cost estimate is5000 400 + 100 = 2,000,100 disk accesses with depositor as outer relation, and 10000 100 + 400 = 1,000,400 disk accesses with customer as the outer relation.If smaller relation (depositor) fits entirely in memory, the cost estimate will be 500 disk accesses. (actually we need 2 more blocks)

Join AlgorithmsAlgorithm 2: Block Nested Loop Join Idea:Query: R S
t1t2t3Ru1u2u3SBlocks of...resultsCompare: (t1, u1), (t1, u2), (t1, u3) (t2, u1), (t2, u2), (t2, u3) (t3, u1), (t3, u2), (t3, u3)
Then: GET NEXT BLOCK OF S
Repeat: for EVERY BLOCK of R

Block NestedLoop JoinBlock Nested Loop Joinfor each block BR of R do for each block BS of S do for each tuple tr in BR do for each tuple us in Bs do begin Check if (tr,us) satisfy the join condition if they do (match), add tr us to the result.

Block NestedLoop Join (Cont.)Cost:Worst case estimate: br bs + br block accesses. Best case: br + bs block accesses. Same as nested loop.
Improvements to nested loop and block nested loop algorithms for a buffer with M blocks:In block nestedloop, use M 2 disk blocks as blocking unit for outer relations, where M = memory size in blocks; use remaining two blocks to buffer inner relation and output Cost = br / (M2) bs + br If equijoin attribute forms a key or inner relation, stop inner loop on first matchScan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU replacement)

Join AlgorithmsAlgorithm 3: Indexed Nested Loop Join Idea:Query: R S
t1t2t3RSBlocks of...resultsFor each tuple ti of R
if ti.A = K (A is the attribute R,S have in common)then use the index to compute att = K (S )
Demands: index on A for S(fill w/blocks of S or index blocks)

Indexed NestedLoop JoinIndexed Nested Loop Join
For each tuple tR in the outer relation R, use the index to look up tuples in S that satisfy the join condition with tuple tR.
Worst case: buffer has space for only one page of R, and, for each tuple in R, we perform an index lookup on s.Cost of the join: br + nr cWhere c is the cost of traversing the index and fetching all matching s tuples for one tuple from rc can be estimated as cost of a single selection on s using the join condition.If indices are available on join attributes of both R and S, use the relation with fewer tuples as the outer relation.

Example of NestedLoop Join CostsQuery: depositor customer(cname, acct_no) (cname, ccity, cstreet)Metadata: customer: ncustomer = 10,000 fcustomer = 25 bcustomer = 400
depositor: ndepositor = 5000 fdepositor = 50 bdepositor = 100
V (cname, depositor) = 2500
i a primary index on cname (dense) for customer (fi = 20)Minimal buffer

Plan generation for JoinsAlgorithm 2: Block Nested Loop
1a: customer = OUTER relation depositor = INNER relation cost: bcustomer + bdepositor * bcustomer = 400 +(400 *100) = 40,4001b: customer = INNER relation depositor = OUTER relation cost: bdepositor + bdepositor * bcustomer = 100 +(400 *100) = 40,100

Plan generation for JoinsAlgorithm 3: Indexed Nested LoopWe have index on cname for customer. Depositor is the outer relationCost: bdepositor + ndepositor * c = 100 +(5000 *c ) , c is the cost of evaluating a selection cname=K using index.What is c? Primary index on cname, cname a key for customer
c = HTi +1

Plan generation for JoinsWhat is HTi ? cname a key for customer. V(cname, customer) = 10,000
fi = 20, i is denseLBi = 10,000/20 = 500HTi ~ logfi(LBi) + 1 = log20 500 + 1 = 4Cost of index nested loop is:= 100 + (5000 * (4+1)) = 25,100 BA (cheaper than NLJ)

Another Join StrategyAlgorithm: Merge JoinIdea: suppose R, S are both sorted on A (A is the common attribute)Query: R S
AA12342235Compare: (1, 2) advance pR (2, 2) match, advance pS add to result (2, 2) match, advance pS add to result (2, 3) advance pR (3, 3) match, advance pS add to result (3, 5) advance pR (4, 5) read next block of R......

MergeJoinGIVEN R, S both sorted on AInitializationReserve blocks of R, S into buffer reserving one block for resultPr= 1, Ps =1Join (assuming no duplicate values on A in R)WHILE !EOF( R) && !EOF(S) DO if BR[Pr].A == BS[Ps].A then output to result; Ps++ else if BR[Pr].A < BS[Ps].A then Pr++ else (same for Ps) if Pr or Ps point past end of block, read next block and set Pr(Ps) to 1

MergeJoin (Cont.)
Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory)Thus number of block accesses for mergejoin is bR + bS But.... What if one/both of R,S not sorted on A?Ans: May be worth sorting first and then perform merge join (SortMerge Join)
Cost: bR + bS + sortR + sortS

External SortingNot the same as internal sortingInternal sorting: minimize CPU (count comparisons) best: quicksort, mergesort, ....
External sorting: minimize disk accesses (what we re sorting doesnt fit in memory!) best: external merge sortWHEN used?1) SORTMERGE join2) ORDER BY queries3) SELECT DISTINCT (duplicate elimination)

External SortingIdea: 1. Sort fragments of file in memory using internal sort (runs). Store runs on disk.2. Merge runs. E.g.: gadcberdmpda
241931331416162132714
adg193124bce143316dmr21316adp1472sortsortsortsortabc191433mergeadd14721mergeaabcdddegmpr
141914337213116243216
merge

External Sorting (cont.)Algorithm Let M = size of buffer (in blocks)
1. Sort runs of size M blocks each (except for last) and store. Use internal sort on each run.
2. Merge M1 runs at a time into 1 and store. Merge for all runs.
3. if step 2 results in more than 1 run, goto step 2.Run 1Run 2Run 3Run m1Output........

External Sorting (cont.)Cost: 2 bR * (logM1(bR / M) + 1)Intuition: Step 1: create runs every block read and written once cost 2 bR I/OsStep 2: Merge every merge iteration requires reading and writing entire file (2 bR I/Os)Iteration #
1
2
3 .....Runs Left to Merge
Total:
logM1(bR / M)

What if we need to sort?Mergesort Join Sorting depositor:bdepositor = 100Sort depositor = 2 * 100 * (log2(100 / 3) + 1)= 1400 Same for customer.
Total: 100 + 400 + 1400 + 7200 = 9100 I/Os!Query: depositor customerStill beats BNLJ (40K), INLJ (25K)Why not use SMJ always?Ans: 1) Sometimes inner relation can fit in memory 2) Sometimes index is small
3) SMJ only work for natural joins, equijoins

Hash joinsApplicable only to natural joins, equijoinsDepends upon hash function h, used to partition both relations must map values of joins attributes to { 0, ..., n1} s.t. n = #partitions

HashJoin AlgorithmPartition the relation S using hashing function h so that each si fits in memory. Use 1 block of memory as the output buffer for each partition. (at least n blocks)2.Partition R using h.For each partition #i (0, n1)Use BNLJ to compute the join between Ri and Si : Ri Si (optimal since si fits in memory, inner relation)Algorithm: Hash Join S is called the build input and R is called the probe input.Note: can reduce CPU costs by building inmemory hash index for each si using a different hash function than h.
 Hash JoinPartitioning:must choose: # of partitions, nhashing function, h (each tuple {0, ..., n1})Goals (in order of importance)1. Each partition of build relation should fit in memory(=> h is uniform, n is large)2. For partitioning step, can fit 1 output block of each partition in memory (=> n is small (

Hash JoinGoal #1: Partitions of build relations should fit in memory:Memory(M blocks)1n...n should be?
Ans: (reserving 2 blocks for R partition, output of BNLJ)
(In practice, a little large (fudge factor~1.2) as not all memory available for partition joins)

Hash JoinGoal #2: keep n < M what if not possible?
Recursive partitioning!Idea:Iteration #1: Partition S into M1 partitions using h1Iteration #2: Partition each partition of S into M1 partitions using a different hash function h2......repeat until partition S into >=

Cost of HashJoinCost:case 1: No recursive partitioning 1. Partition S: bS reads and bS + n writes. Why n? 2. Rartition R: bR reads and bR + n writes.
3. n partition joins: bR + bS + 2n Reads
Total: 3(bR + bS ) +4 nTypically n is small enough (roughly ) so it can be ignored.

Cost of HashJoincase 2: Recursive PartitioningRecall: partition build relation M1 ways at each time.
So, total number of iterations: logM1(n) ~ logM1(bS / M2) ~ logM1(bS / M1) == logM1 bS  1
Cost:1. partition S : 2 bS (logM1 bS  1)2. partition R: 2 bR (logM1 bS  1)3. n partition joins: bR + bS Total cost estimate is: 2(bR + bS )( logM1(bS)1) + bR + bS

Example of Cost of HashJoinAssume that memory size is M=3 blocksbdepositor= 100 and bcustomer = 400.depositor is to be used as build input. customer depositorNORecursive partitioning:2(bcust + bdep ) (log2(bdep) 1)+ bdep + bcust
= 1000 (6) + 500 = 6500 I/Os !
Why ever use SortMergeJoin?1) both input relations already sorted2) skewless hash functions hard.