Transcript

- Slide 1
- Query processing and optimization
- Slide 2
- Advanced DatabasesQuery processing and optimization2 Definitions Query processing translation of query into low-level activities evaluation of query data extraction Query optimization selecting the most efficient query evaluation
- Slide 3
- Advanced DatabasesQuery processing and optimization3 Query Processing (1/2) SELECT * FROM student WHERE name=Paul Parse query and translate check syntax, verify names, etc translate into relational algebra (RDBMS) create evaluation plans Find best plan (optimization) Execute plan student cidname 00112233Paul 00112238Rob 00112235Matt takes cidcourseid 00112233312 00112233395 00112235312 course courseidcoursename 312Advanced DBs 395Machine Learning
- Slide 4
- Advanced DatabasesQuery processing and optimization4 Query Processing (2/2) query parser and translator relational algebra expression optimizer evaluation plan evaluation engine output data statistics
- Slide 5
- Advanced DatabasesQuery processing and optimization5 Relational Algebra (1/2) Query language Operations: select: project: union: difference: - product: x join:
- Slide 6
- Advanced DatabasesQuery processing and optimization6 Relational Algebra (2/2) SELECT * FROM student WHERE name=Paul name=Paul (student) name ( cid
- Advanced DatabasesQuery processing and optimization20 External Sort-Merge Algorithm (3/3) Merge stage: merge sorted runs What if N > M ? perform multiple passes each pass merges M-1 runs until relation is processed in next pass number of runs is reduced final pass generated sorted output
- Slide 21
- Advanced DatabasesQuery processing and optimization21 Sort-Merge Example d95 a12 x44 s95 f12 o73 t45 n67 e87 z11 v22 b38 file memory t45 n67 e87 z11 v22 b38d95a12x44a12d95x44 R1R1 f12 o73 s95 R2R2 e87 n67 t45 R3R3 b38 v22 z11 R4R4 a12f a d95d a12d95 x44 s95 f12 o73 run pass v22 t45 s95 z11 x44 o73 a12 b38 n67 f12 d95 e87
- Slide 22
- Advanced DatabasesQuery processing and optimization22 Sort-Merge cost B R the number of pages of R Sort stage: 2 * B R read/write relation Merge stage: initially runs to be merged each pass M-1 runs sorted thus, total number of passes: at each pass 2 * B R pages are read read/write relation apart from final write Total cost: 2 * B R + 2 * B R * - B R
- Slide 23
- Advanced DatabasesQuery processing and optimization23 Projection 1,2 (R) remove unwanted attributes scan and drop attributes remove duplicate records sort resulting records using all attributes as sort order scan sorted result, eliminate duplicates (adjucent) cost initial scan + sorting + final scan
- Slide 24
- Advanced DatabasesQuery processing and optimization24 Join name ( coursename=Advanced DBs ((student cid takes) courseid course) ) implementations nested loop join block-nested loop join indexed nested loop join sort-merge join hash join
- Slide 25
- Advanced DatabasesQuery processing and optimization25 Nested loop join (1/2) R S for each tuple t R of R for each t S of S if (t R t S match) output t R.t S end Works for any join condition S inner relation R outer relation
- Slide 26
- Advanced DatabasesQuery processing and optimization26 Nested loop join (2/2) Costs: best case when smaller relation fits in memory use it as inner relation B R +B S worst case when memory holds one page of each relation S scanned for each tuple in R N R * B s + B R
- Slide 27
- Advanced DatabasesQuery processing and optimization27 Block nested loop join (1/2) for each page X R of R foreach page X S of S for each tuple t R in X R for each t S in X S if (t R t S match) output t R.t S end
- Slide 28
- Advanced DatabasesQuery processing and optimization28 Block nested loop join (2/2) Costs: best case when smaller relation fits in memory use it as inner relation B R +B S worst case when memory holds one page of each relation S scanned for each page in R B R * B s + B R
- Slide 29
- Advanced DatabasesQuery processing and optimization29 Indexed nested loop join R S Index on inner relation (S) for each tuple in outer relation (R) probe index of inner relation Costs: B R + N R * c c the cost of index-based selection of inner relation relation with fewer records as outer relation
- Slide 30
- Advanced DatabasesQuery processing and optimization30 Sort-merge join R S Relations sorted on the join attribute Merge sorted relations pointers to first record in each relation read in a group of records of S with the same values in the join attribute read records of R and process Relations in sorted order to be read once Cost: cost of sorting + B S + B R dD eE xX vV e67 e87 n11 v22 z38
- Slide 31
- Advanced DatabasesQuery processing and optimization31 Hash join R S use h 1 on joining attribute to map records to partitions that fit in memory records of R are partitioned into R 0 R n-1 records of S are partitioned into S 0 S n-1 join records in corresponding partitions using a hash-based indexed block nested loop join Cost: 2*(B R +B S ) + (B R +B S ) R R0R0 R1R1 R n-1...... S S0S0 S1S1 S n-1......
- Slide 32
- Advanced DatabasesQuery processing and optimization32 Exercise: joins R S N R =2 15 B R = 100 N S =2 6 B S = 30 B + index on S order 4 full nodes nested loop join: best case - worst case block nested loop join: best case - worst case indexed nested loop join
- Slide 33
- Advanced DatabasesQuery processing and optimization33 Evaluation evaluate multiple operations in a plan materialization pipelining coursename=Advanced DBs studenttakes cid; hash join courseid; index- nested loop course name
- Slide 34
- Advanced DatabasesQuery processing and optimization34 Materialization create and read temporary relations create implies writing to disk more page writes coursename=Advanced DBs studenttakes cid; hash join courseid; index- nested loop course name
- Slide 35
- Advanced DatabasesQuery processing and optimization35 Pipelining (1/2) creating a pipeline of operations reduces number of read-write operations implementations demand-driven - data pull producer-driven - data push coursename=Advanced DBs studenttakes cid; hash join ccourseid; index- nested loop course name
- Slide 36
- Advanced DatabasesQuery processing and optimization36 Pipelining (2/2) can pipelining always be used? any algorithm? cost of R S materialization and hash join: B R + 3(B R +B S ) pipelining and indexed nested loop join: N R * HT i coursename=Advanced DBs studenttakes cid courseid course pipelined materialized R S
- Slide 37
- Query Optimization
- Slide 38
- Advanced DatabasesQuery processing and optimization38 Choosing evaluation plans cost based optimization enumeration of plans R S T, 12 possible orders cost estimation of each plan overall cost cannot optimize operation independently
- Slide 39
- Advanced DatabasesQuery processing and optimization39 Cost estimation operation (, , ) implementation size of inputs size of outputs sorting coursename=Advanced DBs studenttakes cid; hash join courseid; index- nested loop course name
- Slide 40
- Advanced DatabasesQuery processing and optimization40 Size Estimation (1/2) SC(A,R) multiplying probabilities probability that a record satisfy none of :
- Slide 41
- Advanced DatabasesQuery processing and optimization41 Size Estimation (2/2) R x S N R * N S R S R S = : N R * N S R S key for R: maximum output size is N s R S foreign key for R: N S R S = {A}, neither key of R nor S N R *N S / V(A,S) N S *N R / V(A,R)
- Slide 42
- Advanced DatabasesQuery processing and optimization42 Expression Equivalence conjunctive selection decomposition commutativity of selection combining selection with join and product 1 (R x S) = R 1 S commutativity of joins R 1 S = S 1 R distribution of selection over join 1^2 (R S) = 1 (R) 2 (S) distribution of projection over join A1,A2 (R S) = A1 (R) A2 (S) associativity of joins: R (S T) = (R S) T
- Slide 43
- Advanced DatabasesQuery processing and optimization43 Cost Optimizer (1/2) transforms expressions equivalent expressions heuristics, rules of thumb perform selections early perform projections early replace products followed by selection (R x S) with joins R S start with joins, selections with smallest result create left-deep join trees
- Slide 44
- Advanced DatabasesQuery processing and optimization44 Cost Optimizer (2/2) coursenam = Advanced DBs studenttakes cid; hash join ccourseid; index- nested loop course name coursename=Advanced DBs studenttakes cid; hash join ccourseid; index- nested loop course name
- Slide 45
- Advanced DatabasesQuery processing and optimization45 Cost Evaluation Exercise name ( coursename=Advanced DBs ((student cid takes) courseid course) ) R = student cid takes S = course N S = 10 records assume that on average there are 50 students taking each course blocking factor: 2 records/page what is the cost of coursename=Advanced DBs (R courseid S) what is the cost of R coursename=Advanced DBs S assume relations can fit in memory
- Slide 46
- Advanced DatabasesQuery processing and optimization46 Summary Estimating the cost of a single operation Estimating the cost of a query plan Optimization choose the most efficient plan