query processing: joins

of 36/36
1 Query Processing: Joins

Post on 03-Jan-2016




1 download

Embed Size (px)


Query Processing: Joins. Computing Joins. One of the most expensive operations Lot of methods devised to join efficiently Block nested loops join Index nested loops join Sort-merge join Hash join. Schema for Examples. Buyers( id : integer , name : string, rating : integer, age : real) - PowerPoint PPT Presentation


  • *Query Processing: Joins

  • Computing JoinsOne of the most expensive operationsLot of methods devised to join efficientlyBlock nested loops joinIndex nested loops joinSort-merge joinHash join


  • Schema for ExamplesBids:Each tuple is 40 bytes long, 100 tuples per page, 1000 pages (i.e., 100,000 tuples, 4MB total)Buyers:Each tuple is 50 bytes long, 80 tuples per page, 500 pages (i.e., 40,000 tuples, 2MB total) Buyers(id: integer, name: string, rating: integer, age: real)Bids (bid: integer, pid: integer, day: dates, product: string)

  • Computing Joins:Equality Joins With One Join ColumnIn algebra: R S. Common! Must be carefully optimized. R S is large; so, R S followed by a selection is inefficient.Assume: M pages in R, pR tuples per page, N pages in S, pS tuples per page.We will consider more complex join conditions later.Cost metric: # of page I/Os. We will ignore output costs.SELECT *FROM Bids R, Buyers SWHERE R.bid=S.id

  • Simple Nested Loops JoinFor each tuple in the outer relation R, we scan the entire inner relation S. Cost: M + (pR * M) * N = 1000 + 100*1000*500 I/Os. = 50,001,000 pages!foreach tuple tr in Bids R doforeach tuple ts in Buyers S doif tr.bid= ts.id then output (tr, ts)

  • Page-Oriented Simple Nested Loops JoinPage-oriented Nested Loops join: For each page of R, get each page of S, and write out matching pairs of tuples Cost: M + M*N = 1000 + 1000*500 = 501000If smaller relation (S) is outer, cost = 500 + 500*1000 = 500500Better to put smaller relation as outer loop foreach page r in R do foreach page s in S do foreach tuple tr in r do foreach tuple ts in s do if tr.bid= ts.id then output (tr, ts)

  • *Block-Nested Loops JoinCost can be reduced to M + (M/B-2) N by using B buffers


  • Block Nested Loops JoinUse one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block of outer R.Use in-memory hash-table for efficient matching.. . .. . .R & SHash table for block of R(k < B-1 pages)Input buffer for SOutput bufferJoin Result

  • Examples of Block Nested LoopsCost: Scan of outer + #outer blocks * scan of inner#outer blocks =With Bids (R) as outer, and say B-2=100 (blocksize):Cost of scanning R is 1000 I/Os; a total of 10 blocks.Per block of R, we scan Buyers (S); 10*500 I/Os.Total Cost: 1000 + 10*500 = 6000 pagesWith 100-page blocksize of Buyers S as outer:Cost of scanning S is 500 I/Os; a total of 5 blocks.Per block of S, we scan Reserves; 5*1000 I/Os.Total Cost: 500 + 5*1000 = 5500 pages!So, cheaper to keep smaller relation as outer

  • Index Nested Loops JoinIf there is an index on the join column of one relation (say S), can make it the inner and exploit the index.Cost: M + ( (M*pR) * cost of finding matching S tuples) For each R tuple, cost of probing S index is about 1.2 for hash index, 2-4 for B+ tree. Cost of then finding S tuples depends on clustering.Clustered index: 1 I/O (typical), unclustered: upto 1 I/O per matching S tuple.foreach tuple tr in R doforeach tuple ts in Sif bid == id (use index) add to result

  • Examples of Index Nested LoopsHash-index on id of Buyers (as inner):Scan Bids R: 1000 page I/Os, 100*1000 tuples.For each R tuple: 1.2 I/Os to get data entry in index, plus 1 I/O to get (the exactly one) matching S tuple. Total: 100000*2.2 + 1000 = 221,000 I/OsHash-index on bid of Bids (as inner):Scan Buyers S: 500 page I/Os, 80*500 tuples.For each S tuple: 1.2 I/Os to find index page with data entries, plus cost of retrieving matching R tuples. Assuming uniform distribution, 2.5 bids per buyer (100,000 / 40,000). Cost of retrieving them is 1 or 2.5 I/Os depending on whether the index is clustered.Total: 500+40000*2.2 = 88,500 (clustered) or 500+40000*3.5 = 140,500 (unclustered)

  • Sort-Merge Join (R S)Sort R and S on the join column, then scan them to do a merge (on join col.), and output result tuples.Advance scan of R until current R-tuple >= current S tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple.At this point, all R tuples with same value in Ri (current R group) and all S tuples with same value in Sj (current S group) match; output for all pairs of such tuples.Then resume scanning R and S.R is scanned once; each S group is scanned once per matching R tuple. (Multiple scans of an S group are likely to find needed pages in buffer.)

  • *Sort-Merge Joinsort Bids R on bidsort Buyers S on idwhile !eof(R) and !eof(S) do scan r and s concurrently until tR.bid=tS.id if (tR.bid=tS.id=c) output bid=cid=c)RStR.bid = ctS.id = c

  • *Sort-Merge JoinCost of sorts assuming B buffers = 2 M log B-1 M + 2 N log B-1 N Cost of merge depends on whether matching groups can be fit in buffers If yes, merge step takes M + NCost: 2M logB-1M + 2N logB-1N + (M+N)Let blocksize(B-1) = 100, then cost is:2*1000*1.5 + 2*500*1.35 + 1000+500 = 5850 pages!

  • *Hash-JoinStep 1: Hash R on bid and S on id into the same set of bucketsStep 2: Since matching tuples must be in same bucket, read each bucket in turn and output the result of the joinCost (assuming each bucket fits in memory) = 3 (M+ N ) (not including final output cost)

  • Hash-JoinPartition both relations using hash function h: R tuples in partition i will only match S tuples in partition i.Read in a partition of R, hash it using h2 ( h!). Scan matching partition of S, search for matches.

  • Observations on Hash-JoinIf we build an in-memory hash table to speed up the matching of tuples, a little more memory is needed.If the hash function does not partition uniformly, one or more R partitions may not fit in memory. Can apply hash-join technique recursively to do the join of this R-partition with corresponding S-partition.

  • Cost of Hash-JoinIn partitioning phase, read+write both relations: 2(M+N). In matching phase, read both: M+NCost: 3(M+N) = 3(1000+500) = 4500 I/Os.Sort-Merge Join vs. Hash Join:Given a minimum amount of memory both have a cost of about 3(M+N) I/Os (assuming that logB-1M=1).Hash Join superior if relation sizes differ greatly. Also, Hash Join shown to be highly parallelizable.Sort-Merge less sensitive to data skew; result is sorted!

  • General Join ConditionsEqualities over several attributes (e.g., R.A=S.A AND R.B=S.B):For Index NL, build index on (if S is inner); or use existing indexes on A or B.For Sort-Merge and Hash Join, sort/partition on combination of the two join columns.Inequality conditions (e.g., R.B < S.B):For Index NL, need (clustered!) B+ tree index.Range probes on inner; # matches likely to be much higher than for equality joins.Hash Join not applicable.Sort Merge Join can be used, but more expensiveBlock NL quite likely to be the best join method here.

  • *ComparisonHash Join seems best, but is sensitive to data skewSort-merge join had good asymptotic behavior (i.e., as relation size increases)Sometime nested-block loops join can be the best (depends on relation parameters) Choice may be affected by indices (e.g., Clustered B-Tree on some relation in sort-merge join)

  • Set OperationsIntersection and cross-product special cases of join.Union (Distinct) and Except similar; well do union.Sorting based approach to union:Sort both relations (on combination of all attributes).Scan sorted relations and merge them.Alternative: Merge runs from Pass 0 for both relations.Hash based approach to union:Partition R and S using hash function h.For each S-partition, build in-memory hash table (using h2), scan corresponding R-partition and add tuples to table while discarding duplicates.

  • Aggregate Operations (AVG, MIN, etc.)Without grouping:In general, requires scanning the relation.Given index whose search key includes all attributes in the SELECT or WHERE clauses, can do index-only scan. With grouping:Sort on group-by attributes, then scan relation and compute aggregate for each group. (Can improve upon this by combining sorting and aggregate computation.)Similar approach based on hashing on group-by attributes.Given tree index whose search key includes all attributes in SELECT, WHERE and GROUP BY clauses, can do index-only scan; if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.

  • Impact of BufferingIf several operations are executing concurrently, estimating the number of available buffer pages is guesswork.Repeated access patterns interact with buffer replacement policy.e.g., Inner relation is scanned repeatedly in Simple Nested Loop Join. With enough buffer pages to hold inner, replacement policy does not matter. Otherwise, MRU is best, LRU is worst (sequential flooding).Does replacement policy matter for Block Nested Loops?What about Index Nested Loops? Sort-Merge Join?

  • SummaryA virtue of relational DBMSs: queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).Many alternative implementation techniques for each operator; no universally superior technique for most operators. Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops.

  • *Query Optimization

  • *Query EvaluationProblem: An SQL query is declarative does not specify a query execution plan.A relational algebra expression is procedural there is an associated query execution plan.Solution: Convert SQL query to an equivalent relational algebra and evaluate it using the associated query execution plan.But which equivalent expression is best?

  • *Naive ConversionSELECT DISTINCT TargetListFROM R1, R2, , RNWHERE Condition

    is equivalent to

    TargetList (Condition (R1 R2 ... Rn))

    but this may imply a very inefficient query execution plan.

    Example: Name (Id=ProfId and CrsCode=CS532 (Professor Teaching)) Result can be < 100 bytes But if each relation is 50K then we end up computing an intermediate result Professor Teaching of size 1G before shrinking it down to just a few bytes.

    Problem: Find an equivalent relational algebra expression that can be evaluated efficiently.

  • *Query Processing Architecture

  • *Query OptimizerUses heuristic algorithms to evaluate relational algebra expressions. This involves: estimating the cost of a relational algebra expressiontransforming one relational algebra expression to an equivalent onechoosing access paths for evaluating the subexpressionsQuery optimizers do not optimize just try to find reasonably good evaluation strategies. Optimal plan can be very hard to find!

  • Highlights of System R OptimizerImpact:Most widely used currently; works well for < 10 joins.Cost estimation: Approximate art at best.Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes.Considers combination of CPU and I/O costs.Plan Space: Too large, must be pruned.Only the space of left-deep plans is considered.Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.Cartesian products avoided.

  • Overview of Query OptimizationPlan: Tree of Relational Algebra ops, with choice of algorithm for each op.Each operator typically implemented using a `pull interface: when an operator is `pulled for the next output tuple, it `pulls on its inputs and computes them.Two main issues:For a given query, what plans are considered?Algorithm to search plan space for cheapest (estimated) plan.How is the cost of a plan estimated?Ideally: Want to find best plan. Practically: Avoid worst plans!

  • *Equivalence Preserving TransformationsTo transform a relational expression into another equivalent expression we need transformation rules that preserve equivalenceEach transformation ruleIs provably correct (i.e., does preserve equivalence)Has a heuristic associated with it

  • *Selection and Projection RulesBreak complex selection into simpler ones:Cond1Cond2 (R) Cond1 (Cond2 (R) )Break projection into stages:attr (R) attr ( attr (R)), if attr attrCommute projection and selection: attr (Cond(R)) Cond ( attr (R)), if attr all attributes in Cond

  • *Commutativity and Associativity of Join (and Cartesian Product as Special Case)Join commutativity: R S S Rused to reduce cost of nested loop evaluation strategies (smaller relation should be in outer loop)Join associativity: R (S T) (R S) Tused to reduce the size of intermediate relations in computation of multi-relational join first compute the join that yields smaller intermediate resultN-way join has T(N) N! different evaluation plansT(N) is the number of parenthesized expressionsN! is the number of permutationsQuery optimizer cannot look at all plans (might take longer to find an optimal plan than to compute query brute-force). Hence it does not necessarily produce optimal plan

  • *Pushing Selections and ProjectionsCond (R S) R Cond S If Cond relates attributes of both R and SReduces size of intermediate relation since rows can be discarded soonerCond (R S) Cond (R) S If Cond involves only the attributes of RReduces size of intermediate relation since rows of R are discarded soonerattr(R S) attr(attr (R) S), if attr attr attributes(R)reduces the size of an operand of product

  • *Equivalence ExampleC1 C2 C3 (R S) C1 (C2 (C3 (R S) ) ) C1 (C2 (R) C3 (S) ) C2 (R) C1 C3 (S) assuming C2 involves only attributes of R, C3 involves only attributes of S, and C1 relates attributes of R and S