Download - ADBMS-TypicalQueryOptimizer
INTROUDUCTION
bullThe DBMS describes the data that it manages including tables and indexes This descriptive data or metadata stored in special tables called the system catalogs is used to find the best way to evaluate a query
bullSQL queries are translated into an extended form of relational algebra and query evaluation plans are represented as trees of relational operators along with labels that identify the algorithm to use at each code
bull Relational operators serve as building blocks for evaluating queries and the implementation of these operators is carefully optimized for good performance
bullQueries are composed of several operators and the algorithms for individual operators can be combined in many ways to evaluate a query The process of finding a good evaluation plan is called query optimization
Overview of Query Evaluation
The system catalog
bull We can store a table using one of several alternative file structures and we can create one or more indexes ndash each stored as a file ndash on every table
bull In a relational DBMS every file contains either the tuples in a table or the entries in an index
bull The collection of files corresponding to users tables and indexes represents the data in the database
bull A relational DBMS maintains information about every table and index that it contains
bull The descriptive information is itself stored in a collection of special tables called the catalog tables
bull The catalog tables are also called the data dictionary the system catalog or simply the catalog
Information in the catalog
At a minimum we have system-wide information about individual tables indexes and views
bull For each table - Its table name file name and the file structure of the file in which it is stored - The attribute name and type of each of its attributes - The index name of each index on the table - The integrity constraints on the tablebull For each index - The index name and the structure of the index - The search key attributesbull For each view - Its view name and definition
In addition statistics about tables and indexes are stored in the system catalogs and updated periodically
The following information is commonly stored
bullCardiality of tuples NTuples(R) for each table RbullSize of pages NPages(R) for each table RbullIndex Cardinality of distinct key values NKeys(I) for each index Ibull Index Size of pages INPages(I) for each index Ibull Index Height of nonleap levels IHeight(I) for each tree indiex IbullIndex Range The minimum present key value ILow(I) and maximum present key value IHigh(I) for each index I
The database contains the two tables areSailors (sid integer sname string rating integer age real)Reserves (sid integer bid integer day dates rname string)
ContdContd
bullSeveral alternative algorithms are available for implementing each relational operator and for most operators no algorithm is universally superior
bullSeveral factors influence which algorithm performs best including the sizes of the tables involved existing indexes and sort orders the size of the available buffer pool and the buffer replacement policy
Algorithms for evaluating relational operators use some simple ideas extensivelybull Indexing Can use WHERE conditions to retrieve small set of tuples (selections joins)bull Iteration Sometimes faster to scan all tuples even if there is an index (And sometimes we can scan the data entries in an index instead of the table itself) Partitioning By using sorting or hashing we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs
Introduction to operator Introduction to operator evaluationevaluation
Access Paths
bullAn access path is a method of retrieving tuples from a table
bull File scan or index that matches a selection condition(in the query) -A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key Eg Tree index on lta b cgt matches the selection a=5 AND b=3 and a=5 AND bgt6 but not b=3
-A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index Eg Hash index on lta b cgt matches a=5 AND b=3 AND c=5 but it does not match b=3 or a=5 AND b=3 or agt5 AND b=3 AND c=5
bullA Note on Complex Selections -(daylt8994 AND rname=lsquoPaulrsquo) OR bid=5 OR sid=3) -Selection conditions are first converted to conjunctive normal form (CNF) (daylt8994 OR bid=5 OR sid=3 ) AND (rname=lsquoPaulrsquo OR bid=5 OR sid=3)
bull Find the most selective access path retrieve tuples using it and apply any remaining terms that donrsquot match the index
bull Most selective access path An index or file scan that we estimate will require the fewest page IOs
bull Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuplespages fetched
Consider daylt8994 AND bid=5 AND sid=3 A B+ tree index on day can be used then bid=5 and sid=3 must be checked for each retrieved tuple Similarly a hash index on ltbid sidgt could be used daylt8994 must then be checked
ContdContd
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
The system catalog
bull We can store a table using one of several alternative file structures and we can create one or more indexes ndash each stored as a file ndash on every table
bull In a relational DBMS every file contains either the tuples in a table or the entries in an index
bull The collection of files corresponding to users tables and indexes represents the data in the database
bull A relational DBMS maintains information about every table and index that it contains
bull The descriptive information is itself stored in a collection of special tables called the catalog tables
bull The catalog tables are also called the data dictionary the system catalog or simply the catalog
Information in the catalog
At a minimum we have system-wide information about individual tables indexes and views
bull For each table - Its table name file name and the file structure of the file in which it is stored - The attribute name and type of each of its attributes - The index name of each index on the table - The integrity constraints on the tablebull For each index - The index name and the structure of the index - The search key attributesbull For each view - Its view name and definition
In addition statistics about tables and indexes are stored in the system catalogs and updated periodically
The following information is commonly stored
bullCardiality of tuples NTuples(R) for each table RbullSize of pages NPages(R) for each table RbullIndex Cardinality of distinct key values NKeys(I) for each index Ibull Index Size of pages INPages(I) for each index Ibull Index Height of nonleap levels IHeight(I) for each tree indiex IbullIndex Range The minimum present key value ILow(I) and maximum present key value IHigh(I) for each index I
The database contains the two tables areSailors (sid integer sname string rating integer age real)Reserves (sid integer bid integer day dates rname string)
ContdContd
bullSeveral alternative algorithms are available for implementing each relational operator and for most operators no algorithm is universally superior
bullSeveral factors influence which algorithm performs best including the sizes of the tables involved existing indexes and sort orders the size of the available buffer pool and the buffer replacement policy
Algorithms for evaluating relational operators use some simple ideas extensivelybull Indexing Can use WHERE conditions to retrieve small set of tuples (selections joins)bull Iteration Sometimes faster to scan all tuples even if there is an index (And sometimes we can scan the data entries in an index instead of the table itself) Partitioning By using sorting or hashing we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs
Introduction to operator Introduction to operator evaluationevaluation
Access Paths
bullAn access path is a method of retrieving tuples from a table
bull File scan or index that matches a selection condition(in the query) -A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key Eg Tree index on lta b cgt matches the selection a=5 AND b=3 and a=5 AND bgt6 but not b=3
-A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index Eg Hash index on lta b cgt matches a=5 AND b=3 AND c=5 but it does not match b=3 or a=5 AND b=3 or agt5 AND b=3 AND c=5
bullA Note on Complex Selections -(daylt8994 AND rname=lsquoPaulrsquo) OR bid=5 OR sid=3) -Selection conditions are first converted to conjunctive normal form (CNF) (daylt8994 OR bid=5 OR sid=3 ) AND (rname=lsquoPaulrsquo OR bid=5 OR sid=3)
bull Find the most selective access path retrieve tuples using it and apply any remaining terms that donrsquot match the index
bull Most selective access path An index or file scan that we estimate will require the fewest page IOs
bull Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuplespages fetched
Consider daylt8994 AND bid=5 AND sid=3 A B+ tree index on day can be used then bid=5 and sid=3 must be checked for each retrieved tuple Similarly a hash index on ltbid sidgt could be used daylt8994 must then be checked
ContdContd
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Information in the catalog
At a minimum we have system-wide information about individual tables indexes and views
bull For each table - Its table name file name and the file structure of the file in which it is stored - The attribute name and type of each of its attributes - The index name of each index on the table - The integrity constraints on the tablebull For each index - The index name and the structure of the index - The search key attributesbull For each view - Its view name and definition
In addition statistics about tables and indexes are stored in the system catalogs and updated periodically
The following information is commonly stored
bullCardiality of tuples NTuples(R) for each table RbullSize of pages NPages(R) for each table RbullIndex Cardinality of distinct key values NKeys(I) for each index Ibull Index Size of pages INPages(I) for each index Ibull Index Height of nonleap levels IHeight(I) for each tree indiex IbullIndex Range The minimum present key value ILow(I) and maximum present key value IHigh(I) for each index I
The database contains the two tables areSailors (sid integer sname string rating integer age real)Reserves (sid integer bid integer day dates rname string)
ContdContd
bullSeveral alternative algorithms are available for implementing each relational operator and for most operators no algorithm is universally superior
bullSeveral factors influence which algorithm performs best including the sizes of the tables involved existing indexes and sort orders the size of the available buffer pool and the buffer replacement policy
Algorithms for evaluating relational operators use some simple ideas extensivelybull Indexing Can use WHERE conditions to retrieve small set of tuples (selections joins)bull Iteration Sometimes faster to scan all tuples even if there is an index (And sometimes we can scan the data entries in an index instead of the table itself) Partitioning By using sorting or hashing we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs
Introduction to operator Introduction to operator evaluationevaluation
Access Paths
bullAn access path is a method of retrieving tuples from a table
bull File scan or index that matches a selection condition(in the query) -A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key Eg Tree index on lta b cgt matches the selection a=5 AND b=3 and a=5 AND bgt6 but not b=3
-A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index Eg Hash index on lta b cgt matches a=5 AND b=3 AND c=5 but it does not match b=3 or a=5 AND b=3 or agt5 AND b=3 AND c=5
bullA Note on Complex Selections -(daylt8994 AND rname=lsquoPaulrsquo) OR bid=5 OR sid=3) -Selection conditions are first converted to conjunctive normal form (CNF) (daylt8994 OR bid=5 OR sid=3 ) AND (rname=lsquoPaulrsquo OR bid=5 OR sid=3)
bull Find the most selective access path retrieve tuples using it and apply any remaining terms that donrsquot match the index
bull Most selective access path An index or file scan that we estimate will require the fewest page IOs
bull Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuplespages fetched
Consider daylt8994 AND bid=5 AND sid=3 A B+ tree index on day can be used then bid=5 and sid=3 must be checked for each retrieved tuple Similarly a hash index on ltbid sidgt could be used daylt8994 must then be checked
ContdContd
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
The following information is commonly stored
bullCardiality of tuples NTuples(R) for each table RbullSize of pages NPages(R) for each table RbullIndex Cardinality of distinct key values NKeys(I) for each index Ibull Index Size of pages INPages(I) for each index Ibull Index Height of nonleap levels IHeight(I) for each tree indiex IbullIndex Range The minimum present key value ILow(I) and maximum present key value IHigh(I) for each index I
The database contains the two tables areSailors (sid integer sname string rating integer age real)Reserves (sid integer bid integer day dates rname string)
ContdContd
bullSeveral alternative algorithms are available for implementing each relational operator and for most operators no algorithm is universally superior
bullSeveral factors influence which algorithm performs best including the sizes of the tables involved existing indexes and sort orders the size of the available buffer pool and the buffer replacement policy
Algorithms for evaluating relational operators use some simple ideas extensivelybull Indexing Can use WHERE conditions to retrieve small set of tuples (selections joins)bull Iteration Sometimes faster to scan all tuples even if there is an index (And sometimes we can scan the data entries in an index instead of the table itself) Partitioning By using sorting or hashing we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs
Introduction to operator Introduction to operator evaluationevaluation
Access Paths
bullAn access path is a method of retrieving tuples from a table
bull File scan or index that matches a selection condition(in the query) -A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key Eg Tree index on lta b cgt matches the selection a=5 AND b=3 and a=5 AND bgt6 but not b=3
-A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index Eg Hash index on lta b cgt matches a=5 AND b=3 AND c=5 but it does not match b=3 or a=5 AND b=3 or agt5 AND b=3 AND c=5
bullA Note on Complex Selections -(daylt8994 AND rname=lsquoPaulrsquo) OR bid=5 OR sid=3) -Selection conditions are first converted to conjunctive normal form (CNF) (daylt8994 OR bid=5 OR sid=3 ) AND (rname=lsquoPaulrsquo OR bid=5 OR sid=3)
bull Find the most selective access path retrieve tuples using it and apply any remaining terms that donrsquot match the index
bull Most selective access path An index or file scan that we estimate will require the fewest page IOs
bull Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuplespages fetched
Consider daylt8994 AND bid=5 AND sid=3 A B+ tree index on day can be used then bid=5 and sid=3 must be checked for each retrieved tuple Similarly a hash index on ltbid sidgt could be used daylt8994 must then be checked
ContdContd
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
bullSeveral alternative algorithms are available for implementing each relational operator and for most operators no algorithm is universally superior
bullSeveral factors influence which algorithm performs best including the sizes of the tables involved existing indexes and sort orders the size of the available buffer pool and the buffer replacement policy
Algorithms for evaluating relational operators use some simple ideas extensivelybull Indexing Can use WHERE conditions to retrieve small set of tuples (selections joins)bull Iteration Sometimes faster to scan all tuples even if there is an index (And sometimes we can scan the data entries in an index instead of the table itself) Partitioning By using sorting or hashing we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs
Introduction to operator Introduction to operator evaluationevaluation
Access Paths
bullAn access path is a method of retrieving tuples from a table
bull File scan or index that matches a selection condition(in the query) -A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key Eg Tree index on lta b cgt matches the selection a=5 AND b=3 and a=5 AND bgt6 but not b=3
-A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index Eg Hash index on lta b cgt matches a=5 AND b=3 AND c=5 but it does not match b=3 or a=5 AND b=3 or agt5 AND b=3 AND c=5
bullA Note on Complex Selections -(daylt8994 AND rname=lsquoPaulrsquo) OR bid=5 OR sid=3) -Selection conditions are first converted to conjunctive normal form (CNF) (daylt8994 OR bid=5 OR sid=3 ) AND (rname=lsquoPaulrsquo OR bid=5 OR sid=3)
bull Find the most selective access path retrieve tuples using it and apply any remaining terms that donrsquot match the index
bull Most selective access path An index or file scan that we estimate will require the fewest page IOs
bull Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuplespages fetched
Consider daylt8994 AND bid=5 AND sid=3 A B+ tree index on day can be used then bid=5 and sid=3 must be checked for each retrieved tuple Similarly a hash index on ltbid sidgt could be used daylt8994 must then be checked
ContdContd
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Access Paths
bullAn access path is a method of retrieving tuples from a table
bull File scan or index that matches a selection condition(in the query) -A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key Eg Tree index on lta b cgt matches the selection a=5 AND b=3 and a=5 AND bgt6 but not b=3
-A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index Eg Hash index on lta b cgt matches a=5 AND b=3 AND c=5 but it does not match b=3 or a=5 AND b=3 or agt5 AND b=3 AND c=5
bullA Note on Complex Selections -(daylt8994 AND rname=lsquoPaulrsquo) OR bid=5 OR sid=3) -Selection conditions are first converted to conjunctive normal form (CNF) (daylt8994 OR bid=5 OR sid=3 ) AND (rname=lsquoPaulrsquo OR bid=5 OR sid=3)
bull Find the most selective access path retrieve tuples using it and apply any remaining terms that donrsquot match the index
bull Most selective access path An index or file scan that we estimate will require the fewest page IOs
bull Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuplespages fetched
Consider daylt8994 AND bid=5 AND sid=3 A B+ tree index on day can be used then bid=5 and sid=3 must be checked for each retrieved tuple Similarly a hash index on ltbid sidgt could be used daylt8994 must then be checked
ContdContd
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
bull Find the most selective access path retrieve tuples using it and apply any remaining terms that donrsquot match the index
bull Most selective access path An index or file scan that we estimate will require the fewest page IOs
bull Terms that match this index reduce the number of tuples retrieved other terms are used to discard some retrieved tuples but do not affect number of tuplespages fetched
Consider daylt8994 AND bid=5 AND sid=3 A B+ tree index on day can be used then bid=5 and sid=3 must be checked for each retrieved tuple Similarly a hash index on ltbid sidgt could be used daylt8994 must then be checked
ContdContd
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Algorithms for relational operations
1 Selection The selection operation is a simple retrieval of tuples from a table and its implementation is essentially covered in our discussion of access paths
Given a selection of the form Rattr op value(R) Rattr op value(R) if there is no index if there is no index on R attr we have to scan Ron R attr we have to scan R
If one or more indexes on R match the selection we can If one or more indexes on R match the selection we can use index to retrieve matching tuples and apply any use index to retrieve matching tuples and apply any matching selection conditions to further restrict the matching selection conditions to further restrict the result setresult set
Using an Index for Selectionsbull Cost depends on qualifying tuples and clusteringbull Cost of finding qualifying data entries (typically
small)plus cost of retrieving records (could be large wo clustering)
In example assuming uniform distribution of namesabout 10 of tuples qualify (100 pages 10000 tuples) With a clustered index cost is little more than 100 IOs if unclustered upto 10000 IOs
SELECT FROM Reserves R WHERE Rrname lt lsquoCrsquo As a rule thumb it is probably cheaper to simply scan As a rule thumb it is probably cheaper to simply scan
the entire table(instead of using an unclustered index) if the entire table(instead of using an unclustered index) if over 5 of the tuples are to be retrievedover 5 of the tuples are to be retrieved
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
ContdContd2 Projection The projection operation requires us to drop certain fields
of the input which is easy to do The expensive part is removing duplicates
SQL systems donrsquot remove duplicates unless the keyword DISTINCT
is specified in a query SELECT DISTINCT Rsid Rbid FROM Reserves R
bull Sorting Approach Sort on ltsid bidgt and remove duplicates (Can optimize this by dropping unwanted information while sorting)
bull Hashing Approach Hash on ltsid bidgt to create partitions Load partitions into memory one at a time build in-memory hash structure and eliminate duplicates
bull If there is an index with both Rsid and Rbid in the search key may be cheaper to sort data entries
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
ContdContd3 Join Joins are expensive operations and very common This systems
typically support several algorithms to carry out joins
bull Consider the join of Reserves and Sailors with the join condition Reservessid=Sailorssid Suppose one of the tables say Sailors has an index on the sid column We can scan Reserves and for each tuple use the index to probe Sailors for matching tuples This approach is called index nested loops join
Ex The cost of scanning Reserves and using the index to retrieve the matching Sailors tuple for each Reserves tuple The cost of scanning Reserves is 1000 There are 1001000 =100000 tuples in Reserves For each of these tuples retrieving the index page containing the rid of the matching Sailors tuple costs 12 IOs(on avg) in addition we have to retrieve the Sailors page containing the qualifying tuple Therefore we have 100000(1+12) IOs to retrieve matching Sailors tuples The total cost is 221000 IOs
bull If we do not have an index that matches the join condition on either table we cannot use index nested loops In this case we can sort both tables on the join column and then scan them to find matches This is called sort-merge join
Ex We can sort Reserves and Sailors in two passes Read and Write Reserves in each pass the sorting cost is 221000=4000 IOs Similarly we can sort Sailors at a cost of 22500=2000 IOs In addition the second phase of the sort-merge join algorithm requires an additional scan of both tables Thus the total cost is 4000+2000+1000+500=7500 IOs
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Introduction to query optimization
bullQuery optimization is one of the most important tasks of a relational DBMS A more detailed view of the query optimization and execution layer in the DBMS architecture is as shown in Fig(1)Queries are parsed and then presented to query optimizer which is responsible for identifying an efficient execution plan The optimizer generates alternative plan and chooses the plan with the least estimated cost
bullThe space of plans considered by a typical relational query optimizer can be carried out on the result of the - - -- algebra expression Optimizing such a relational algebra expression involves two basic steps
Query Parser
Query OptimizerPlan
Generator
Plan cost Estimator
Parsed query
Query
Evaluation Plan
Query Plan Evaluator
Catalog Manager
Fig(1) Query Parsing Optimization amp Execution
1 Enumerating alternative plans for evaluating the expression Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
2 Estimating the cost of each enumerated plan and choosing the plan with the lowest estimated cost
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Query Evaluation Plans A query evaluation plan consists of an extended relational algebra tree with additional annotations at each node indicating the access methods to use for each table and the implementation method to use for each relational operator
Ex select ssname from Reserves r Sailors s where rsid=ssid and rbid=100 and sratinggt5
The above query can be expressed in relational algebra as follows
Sname( bid=100 ^ ratinggt5(Reserves sid=sid Sailors))
This expression is shown in the form of a tree in Fig(2)
Sname
bid=100 ^ ratinggt5
sid=sid
Reserves Sailors
Fig(2) Query Expressed as a Relational Algebra Tree
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
The algebra expression partially specifies how to evaluate the query ndash we first compute the natural join of Reserves and Sailors then perform the selections and finally project the sname field
To obtain a fully specified evaluation plan we must decide on an implementation for each of the algebra operations involved For ex we can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections and projections to each tuple in the result of the join as it is produced the result of the join before the selections and projections is never stored in its entirety This query evaluation plan is shown in Fig(3)
Sname (on-the-fly)
bid=100 ^ ratinggt5 (on-the-fly)
sid=sid (simple nested loops)
(File scan)Reserves Sailors(File scan)
Fig(3) Query Evaluation Plan for Sample Query
ContdContd
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Alternative PlansMotivating Example SELECT Ssname FROM Reserves R Sailors S WHERE Rsid=Ssid AND
Rbid=100 AND Sratinggt5
bull Cost 500+5001000 IOsbull By no means the worst plan bull Misses several opportunities selections could have been `pushedrsquo
earlier no use is made of any available indexes etcbull Goal of optimization To find more efficient plans that compute the
same answerRA Tree is as shown in fig(2) and plan is as shown in fig(3)
Alternative Plans 1 (No Indexes) as shown in fig(4)bull Main difference push selects
1 With 5 buffers cost of plan2 Scan Reserves (1000) + write temp T1 (10 pages if we have 100
boats uniform distribution)3 Scan Sailors (500) + write temp T2 (250 pages if we have 10
ratings)4 Sort T1 (2210) sort T2 (23250) merge (10+250)5 Total 3560 page IOs
bull If we used BNL join join cost = 10+4250 total cost = 2770bull If we `pushrsquo projections T1 has only sid T2 only sid and sname
1 T1 fits in 3 pages cost of BNL drops to under 250 pages total lt 2000
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Sname (on-the-fly)
sid=sid (Sort-merge join)
(scan write to temp T1)(scan write to temp T1) bid=100 ratinggt5 (scan write to (scan write to temp T1)temp T1)
(File scan)Reserves Sailors(File scan)
Fig(4) A second query evaluation planAlternative Plans 2 With Indexes as shown in fig(5)bull With clustered index on bid of Reserves we get 100000100 = 1000 tuples on 1000100 = 10 pagesbullINL with pipelining (outer is not materialized) - Projecting out unnecessary fields from outer doesnrsquot helpbullJoin column sid is a key for Sailors - At most one matching tuple unclustered index on sid OKbullDecision not to push ratinggt5 before the join is based on availability of sid index on Sailorsbull Cost Selection of Reserves tuples (10 IOs) for each must get matching Sailors tuple (100012) total 1210 IOs
ContdContd
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Sname (on-the-fly)
ratinggt5 (on-the-fly)
sid=sid (Index nested loops with
pipelining)
(use hash index do not write result to temp)(use hash index do not write result to temp) bid=100 Sailors(Hash index on sid)
(Hash index on sid)Reserves
Fig(6) A query evaluation plan using Indexes
ContdContd
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Highlights of System R OptimizerbullImpact -Most widely used currently works well for lt 10 joinsbullCost estimation Approximate art at best -Statistics maintained in system catalogs used to estimate cost of operations and result sizes -Considers combination of CPU and IO costsbullPlan Space Too large must be pruned -Only the space of left-deep plans is considered -gt Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation -Cartesian products avoidedCost EstimationFor each plan considered must estimate cost Must estimate cost of each operation in plan treebull Depends on input cardinalitiesbull Wersquove already discussed how to estimate the cost of operations (sequential scan index scan joins etc)bull Must also estimate size of result for each operation in treebull Use information about the input relationsbull For selections and joins assume independence of predicates
ContdContd
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Summary
bullThere are several alternative evaluation algorithms for each relational operator
bullA query is evaluated by converting it to a tree of operators and evaluating the operators in the tree
bullMust understand query optimization in order to fully understand the performance impact of a given database design (relations indexes) on a workload (set of queries)bullTwo parts to optimizing a query
Consider a set of alternative plans - Must prune search space typically left-deep plans onlyMust estimate cost of each plan that is considered - Must estimate size of result and cost for each plan node
- Key issues Statistics indexes operator implementations
ContdContd
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
EXTERNAL SORTING
Why SortbullA classic problem in computer sciencebullData requested in sorted order - eg find students in increasing gpa orderbull Sorting is first step in bulk loading B+ tree indexbullSorting useful for eliminating duplicate copies in a collection of records (Why)bullSort-merge join algorithm involves sortingbullProblem sort 1Gb of data with 1Mb of RAM - why not virtual memory
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
When does a DBMS sort dataSorting a collection of records on some search key is a very useful operation The key can be a single attribute or an ordered list of attributes Sorting is required in a variety of situations including the following important ones
bullUsers may want answers in some order for example by increasing age
bullSorting records is the first step in bulk loading a tree index
bullSorting is useful for eliminating duplicate copies in a collection of records
bullA widely used algorithm for performing a very important relational algebra operation called join requires a sorting step
Although main memory sizes are growing rapidly the ubiquity of database systems has lead to increasingly larger datasets as well When the data to be sorted is too large to fit into available main memory we need an external sorting algorithm Such algorithm seek to minimize the cost of disk accesses
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
A SIMPLE TWO-WAY MERGE SORT
We begin by presenting a simple algorithm to illustrate the idea behind external sorting This algorithm utilizes only three pages of main memory and it is presented only for pedagogical purposes When sorting a file several sorted subfiles are typically generated in intermediate steps Here we refer to each subfile as a run
Even if the entire file does not fit into the available main memory we can sort it by breaking it into smaller subfiles sorting these subfiles and then merging them using a minimal amount of main memory at any given time In the first pass the pages in the file are read in one at a time After a page is read in the records on it are sorted and the sorted page is written out Quicksort or any other in ndashmemory sorting technique can be used to sort the records on a page In subsequent passes pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long This algorithm is shown in Fig(1)
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
proc 2-way_extsort (file)
Given a file on disk sorts it using three buffer pages
Produce runs that are one page long Pass 0
Read each page into memory sort it Write it out
Merge pairs of runs to produce longer runs until only
one run ( containing all records of input file) is left
While the number of runs at end of previous pass is gt 1
Pass i=12hellip
While there are runs to be merged from previous pass
Choose next two runs (from previous pass)
Read each run into an input buffer page at a time
Merge the runs and write to the output buffer
force output buffer to disk one page at a time
endproc
Fig(1) Two-Way Merge Sort
ContdContd
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
ContdContd
If the number of pages in the input file is 2k for some k then
Pass 0 produces 2k sorted runs of one page
each
Pass 1 produces 2k-1 sorted runs of two pages each
Pass 2 produces 2k-2 sorted runs of four pages each
And so on until
Pass K produces one sorted run of 2k pages
In each pass we read every page in the file process it and write it out Therefore we have two disk IOs per page per pass The number of passes is [log2N]+1 where N is the number of pages in the file The overall cost is 2N([log2N]+1) IOs
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
The algorithm is illustrated on an example input file containing Two-way Merge Sort of a seven pages in fig(2)
34 62 94 87 56 31 2
34 26 49 78 56 13 2
23 47 13
46 89 56 2
23
44
67
89
12
35
6
12
23
34
45
66
78
9
Input file
Pass 0
1-page runsPass 1
2-page runs
Pass 2
4-page runs
Pass 3
8-page runs
ContdContd
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
The sort takes four passes and in each pass we read and write seven pages for a total of 56 IOs This result agrees with the preceding analysis because 27([log27]+1)=56 The dark pages in the figure illustrate what would happen on a file of eight pages the number of passes remains at four([log28]+1=4) but we read and write an additional page in each pass for a total of 64 IOs The algorithm requires just three buffer pages in main memory as shown in fig(3) illustrates This observation raises an important point Even if we have more buffer space available this simple algorithm does not utilize it effectively
ContdContd
Input 1
Input 2
output
Disk DiskMain memory buffers
Fig(3) Two-Way Merge Sort with Three buffer pages
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Suppose that B buffer pages are available in memory and that we need to sort large file with N pages The intuition behind the generalized algorithm that we now present is to retain the basic structure of making multiple passes while trying to minimize the number of passes There are two important modifications to the two-way merge sort algorithm
1 In pass 0 read in B pages at a time and sort internally to produce [NB] runs of B pages each (except for the last run which may contain fewer pages) This modification is illustrated in fig(4) using the input from fig(2) and a buffer pool with four pages
2 In passes i=12hellip use B-1 buffer pages for input and use the remaining page for output hence you do a (B-1)-way merge in each pass The utilization of buffer pages in the merging passes is illustrated in fig(5)
EXTERNAL MERGE SORT
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Fig(4) External Merge Sort with B buffer pages Pass 0
34
62
94 89
87
89
56
31
2
8912
35
6
34
23
44
78
671st output run
ContdContd
Input file
2nd output runBuffer pool with B=4 pages
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
ContdContd
Input 1
Disk Disk
B Main memory buffers
Fig(5) External Merge Sort with B Buffer Pages Pass i gt0
Input 2
Input B-1
Output
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
The first refinement reduces the
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-
Summary1048576 External sorting is important DBMS may dedicate part of buffer pool for sorting1048576 External merge sort minimizes disk IO cost Pass 0 Produces sorted runs of size B ( buffer pages) Later passes merge runs of runs merged at a time depends on B and block size Larger block size means less IO cost per page Larger block size means smaller runs merged In practice of runs rarely more than 2 or 3
- PowerPoint Presentation
- The system catalog
- Information in the catalog
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
-