chapter 12: query processing - tarleton state university 12: query processing . ... database system...

27
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Chapter 12: Query Processing

Upload: dodang

Post on 13-Mar-2018

235 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

Database System Concepts, 6th Ed.

©Silberschatz, Korth and Sudarshan

See www.db-book.com for conditions on re-use

Chapter 12: Query Processing

Page 2: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.2 Database System Concepts - 6th Edition

Basic Steps in Query Processing

1. Parsing and translation

2. Optimization

3. Evaluation

Page 3: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.3 Database System Concepts - 6th Edition

Basic Steps in Query Processing

Parsing and translation

translate the query into its internal form. This is then

translated into relational algebra.

Parser checks syntax, verifies relations

Evaluation

The query-execution engine takes a query-evaluation plan,

executes that plan, and returns the answers to the query.

Page 4: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.4 Database System Concepts - 6th Edition

Basic Steps in Query Processing :

Optimization

A relational algebra expression may have many equivalent

expressions

E.g., salary75000(salary(instructor)) is equivalent to

salary(salary75000(instructor))

Each relational algebra operation can be evaluated using one of

several different algorithms

Correspondingly, a relational-algebra expression can be

evaluated in many ways.

Page 5: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.5 Database System Concepts - 6th Edition

Optimization

Example:

We can use an index on salary to find instructors with salary <

75000,

or we can perform a complete (serial) relation scan and discard

instructors with salary 75000

The sequence of annotated relational algebra expressions is called

an evaluation plan.

An annotated relational algebra

expression specifying the

detailed evaluation strategy is

called an evaluation primitive

Page 6: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.6 Database System Concepts - 6th Edition

Basic Steps: Optimization

The goal of Query Optimization: Amongst all equivalent

evaluation plans choose the one with the lowest cost.

How do we measure cost?

Number of tuples in each relation

Size of tuples

Number of disk access operations

CPU time

RAM space needed

Time needed for communications over a network (LAN, SAN)

Etc.

Page 7: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.7 Database System Concepts - 6th Edition

QUIZ: Query Processing

What are the 3 main steps in query processing?

What is an evaluation primitive?

What is a query evaluation plan?

Page 8: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.8 Database System Concepts - 6th Edition

QUIZ: Query Processing

What are the 3 main steps in query processing?

What is an evaluation primitive?

What is a query evaluation plan?

Page 9: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.9 Database System Concepts - 6th Edition

12.2 Measures of Query Cost

Time “cost” = total elapsed time for answering the query

Many factors contribute to time cost …

Typically disk access is the predominant time cost (also

relatively easy to estimate!).

Measured by taking into account

Number of seeks * average-seek-cost

Number of blocks read * average-block-read-cost

Number of blocks written * average-block-write-cost

Cost to write a block is greater than cost to read a block b/c data is

read back after being written to ensure that the write was free of

errors!

Page 10: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.10 Database System Concepts - 6th Edition

12.2 Measures of Query Cost

Cost to write a block is greater than cost to read a block b/c data is

read back after being written to ensure that the write was free of

errors!

Page 11: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.11 Database System Concepts - 6th Edition

Measures of Query Cost

Simplification: we just use the number of block transfers

from disk and the number of seeks as the cost measures

tT – time to transfer one block

tS – time for one seek

Cost for b block transfers plus S seeks

b * tT + S * tS

Simplification: we ignore CPU costs (time)

Real systems do take CPU cost into account

Simplification: we do not include cost to writing the output

to disk. Why? All execution plans we must choose from

end up with the same data …

4 ms 0.1 ms

Page 12: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.12 Database System Concepts - 6th Edition

QUIZ: Measures of Query Cost

Cost for b block transfers plus S seeks

b * tT + S * tS

If the size of a block is 4 KB, and the transfer rate is

200 MB/s, calculate the block transfer time.

≈ 4 ms ≈ 0.1 ms

Page 13: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.13 Database System Concepts - 6th Edition

QUIZ: Measures of Query Cost

Cost for b block transfers plus S seeks

b * tT + S * tS

If the size of a block is 4 KB, and the transfer rate is

200 MB/s, calculate the block transfer time.

4 KB / 200 MB/s = 4/(200*1024) s = 0.0000195 s =

= 0.0192 ms ≈ 0.02 ms

Note: Today’s best “enterprise level” HDDs have transfer rates around 200 Mb/s when

handling pairs of consecutive blocks (2x4 KB = 8 KB), see for example this test on a

Seagate drive.

≈ 4 ms ≈ 0.1 ms

Page 14: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.14 Database System Concepts - 6th Edition

QUIZ: Measures of Query Cost

Cost for b block transfers plus S seeks

b * tT + S * tS

Evaluation of a query requires 3 disk seeks, each

followed by transfer of 50 blocks.

What is the total time cost?

≈ 4 ms ≈ 0.1 ms

Page 15: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.15 Database System Concepts - 6th Edition

QUIZ: Measures of Query Cost

Cost for b block transfers plus S seeks

b * tT + S * tS

Evaluation of a query requires 3 disk seeks, each

followed by transfer of 50 blocks.

What is the total time cost?

3*50* 0.1 ms + 3*4 ms = 15 ms + 12 ms = 27 ms

3*50*0.02 ms + 3*4 ms = 3 ms + 12 ms = 15 ms

≈ 4 ms ≈ 0.1 ms

If using transfer time from prev. example

Page 16: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.16 Database System Concepts - 6th Edition

Perspective: Transfer Rates

Theoretical maximum for SATA: 600 MB/s (SATA III), but in

practice ~100-200 MB/s

SATA Express announced in 2014: 2 GB/s (theoretical)

PCI Express (PCIe) 3.0 bus: ~1 GB/s per lane, with typical 16x

configurations :

SSD connected through PCI have surpassed 1 GB/s

SATA = Serial ATA (Advanced Technology Attachment)

PCI = Periferal Component Interconnect

http://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/

Page 17: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.17 Database System Concepts - 6th Edition

12.3 Selection Operation

Selection w/o an index is a file scan

Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition.

Cost estimate: 1 seek + br block transfers

br denotes number of blocks containing records from relation r

If selection is on a key attribute, can stop on finding record

cost = 1 seek + (br /2) block transfers

Linear search can be applied regardless of

selection condition or

ordering of records in the file, or

availability of indices average

Page 18: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.18 Database System Concepts - 6th Edition

Extra-credit

Page 19: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.19 Database System Concepts - 6th Edition

QUIZ: Linear search

A table has 50,000 tuples, each 100 Bytes in length.

What is the average time cost using linear search?

State all your assumptions!

Page 20: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.20 Database System Concepts - 6th Edition

QUIZ: Linear search

A table has 50,000 tuples, each 100 Bytes in length.

What is the average time cost using linear search?

State all your assumptions!

The table has a total of 50,000*100 = 5 million B

5 million B / 4096 B/block = 1,220.7 blocks

1,220.7 / 2 = 610.3 → 611 blocks on average

611 * 0.1 ms + 4 ms = 65.1 ms

Page 21: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.21 Database System Concepts - 6th Edition

12.3 Selection Operation

Why not binary search?

Generally does not make sense since data is not stored consecutively …

except when there is an index available,

and binary search requires more seeks than index search

Page 22: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.22 Database System Concepts - 6th Edition

Selections Using Indices

Index scan – search algorithms that use an index

selection condition must be on search-key of index!

A2 (primary index, equality on key). Retrieve a single record

that satisfies the corresponding equality condition

Cost = (hi + 1) * (tT + tS)

A3 (primary index, equality on nonkey) Retrieve multiple

records (duplicates).

Records will be in consecutive blocks (Why?)

Let b = number of blocks containing matching records

Cost = hi * (tT + tS) + tS + tT * b

Page 23: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.23 Database System Concepts - 6th Edition

Extra-credit

Page 24: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.24 Database System Concepts - 6th Edition

QUIZ: Linear search

A table has 50,000 tuples, each 100 Bytes in length. The

index is a B+ tree with n = 100. In our table, there are no

more than 1000 duplicates for any nonkey value.

What is the average time cost using primary index, equality

on nonkey?

State all your assumptions!

Page 25: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.25 Database System Concepts - 6th Edition

Selections Using Indices

A4 (secondary index, equality on nonkey).

Retrieve a single record if the search-key is a candidate key

Cost = (hi + 1) * (tT + tS)

Retrieve multiple records if search-key is not a candidate key

each of n matching records may be on a different block

Cost = (hi + n) * (tT + tS)

– Can be very expensive!

Page 26: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.26 Database System Concepts - 6th Edition

Selections Involving Comparisons

Can implement selections of the form AV (r) or A V(r) by using

a linear file scan,

or by using indices in the following ways:

A5 (primary index, comparison). (Relation is sorted on A)

For A V(r) use index to find first tuple v and scan relation sequentially from there

For AV (r) just scan relation sequentially until first tuple > v; do not use index

A6 (secondary index, comparison).

For A V(r) use index to find first index entry v and scan index sequentially from there, to find pointers to records.

For AV (r) just scan leaf pages of index finding pointers to records, until first entry > v

In either case, retrieve records that are pointed to

– requires an I/O for each record

– Linear file scan may be cheaper

Page 27: Chapter 12: Query Processing - Tarleton State University 12: Query Processing . ... Database System Concepts - 6th Edition 12.11 ©Silberschatz, ... – time for one seek

©Silberschatz, Korth and Sudarshan 12.27 Database System Concepts - 6th Edition

This is the end of the material covered in

our class

We stop on p.545, before 12.3.3 Complex Selections

Review today in the lab!