08pdf physical optim

59
Advanced Databases 8 Physical data structures and query optimization

Upload: asdqweasdqweasd

Post on 06-Apr-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 1/59

Advanced Databases

8 Physical data structuresand query optimization

Page 2: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 2/59

Physical data structures and query optimization82

Study of “inside” DB technology: why?

DBMSs provide “transparent” services:

So transparent that, so far, we could ignore many

implementation details! So far DBMSs have always been a “black box” 

So… why should we open the box?

Knowing how it works may help to use it better

Some services are provided separately

Page 3: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 3/59

Physical data structures and query optimization83

DataBase Management System — DBMS

A system (software product) capable of managing datacollections which are:

large ((much) larger than the central memoryavailable on the computers that run the software)

persistent (with a lifetime which is independent of single executions of the programs that access them)

s are n use y severa app ca ons a a me

guaranteeing reliability (i.e. tolerance to hardwareand software failures) and privacy (by disciplining andcontrolling all accesses).

Page 4: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 4/59

Physical data structures and query optimization84

Access and query manager

 

Buffer manager

Access methods manager

SQL

Query manager

SecondaryMemory

 

manager

Page 5: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 5/59

Physical data structures and query optimization85

Technology of DBMSs - topics

Query management ("optimization")

Physical data structures and access structures

Buffer and secondary memory management Reliability control

Concurrency control

Distributed architectures

Page 6: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 6/59

Physical data structures and query optimization86

Programs can only refer to data stored in main memory

Databases must be stored (mainly) in secondarymemory for two reasons:

size

persistence

Data stored in secondary memory can only be used if 

Main and Secondary memory (1)

first transferred to main memory (which explains the terms "main" and "secondary")

Page 7: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 7/59

Physical data structures and query optimization87

Main and Secondary memory (2)

Secondary memory devices are organized in blocks of (usually) fixed length (order of magnitude: a few KBs)

The only available operations for such devices arereading and writing one page, i.e. the byte streamcorresponding to a block;

For convenience and simplicity, we will use block and

Page 8: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 8/59

Physical data structures and query optimization88

Secondary memory access:

seek time (8-12ms) - head positioning

latency time (2-8ms) - disc rotation transfer time (~1ms) - data transfer 

as an average, hardly less than 10 ms overall

 

Main and Secondary memory (3)

e cos o an access o secon ary memory s or ers omagnitude higher than that to main memory

In "I/O bound" applications the cost exclusively dependson the number of accesses to secondary memory

Page 9: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 9/59

Physical data structures and query optimization89

New storage technologies:

SSD

Only transfer time

Different model, (surprisingly, apparently) similarperformance (for DB-like loads)

Continuous writes stress the erasing process (that is

Main and Secondary memory (4)

 

Other approaches

Main memory databases again, different cost models

Efficient when read operations significantly exceed writes

Page 10: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 10/59

Physical data structures and query optimization810

The File System (FS) is the component of the OperatingSystems which manages access to secondary memory

DBMSs make limited use of FS functionalities: to createand delete files and for reading and writing singleblocks or sequences of consecutive blocks.

The DBMS directly manages the file organization, both

DBMS and file system (1)

 

with respect to the internal structure of each block.

Page 11: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 11/59

Physical data structures and query optimization811

DBMS and file system (2)

The DBMS manages the blocks of allocated files as if they were a single large space in secondary memory.

It builds in such space the physical structures withwhich tables are implemented.

A file is typically dedicated to a single table, but….

It may happen that a file contains data belonging tomore an one a e an a e up es o one a eare split in more than one file.

Page 12: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 12/59

Physical data structures and query optimization812

Blocks and records

Blocks (the "physical" components of a file) and records(the "logical" components) generally have differentsize:

The size of a block depends on the file system

The size of a record depends on the needs of applications and is normally variable within a file

Page 13: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 13/59

Physical data structures and query optimization813

Block Factor

The number of records within a block

SR: Size of a record (assumed constant in the file forsimpicity: "fixed length record")

SB: Size of a block

if SB > SR, there may be many records in each block:

 

The rest of the space can be

used ("spanned" records (or "hung-up" records))

non used ("unspanned" records)

Page 14: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 14/59

Physical data structures and query optimization814

Physical access structures Used for the efficient storage and manipulation of data

within the DBMS

Encoded as access methods, that is, software modules

providing data access and manipulation primitives foreach physical access structure

Each DBMS has a distinctive and limited set of access

We will consider three types of data access structures: Sequential Hash-based Tree-based (or index-based)

Page 15: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 15/59

Physical data structures and query optimization815

Organization of tuples within pages Each access method has its own page organization

In the case of sequential and hash-based methods each pagehas:

An initial part (block header ) and a final part (block trailer )containing control information used by the file system

An initial part ( page header ) and a final part ( page trailer )containing control information about the access method

A page dictionary , which contains pointers to each item of use u e emen ary a a con a ne n e page

A useful part , which contains the data. In general, the pagedictionary and the useful data grow as opposing stacks

A checksum, to detect corrupted data

Tree structures have a different page organization

Page 16: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 16/59

Page 17: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 17/59

Physical data structures and query optimization817

Page manager primitives  Insertion and update of a tuple

may require a reorganization of the page (there isenough space to store the extra bytes) or even

usage of a new page (if there is not enough space)

Deletion of a tuple

often carried out by marking the tuple as ‘invalid’ 

  ccess o a e o a par cu ar up e

after identifying the tuple by means of its key or itsoffset, the field is identified according to the offsetand the length of the field itself 

Page 18: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 18/59

Physical data structures and query optimization818

Sequential structures Characterized by a sequential arrangement of tuples in the

secondary memory

Three cases: entry-sequenced, array, sequentially-ordered

In an entry-sequenced organization, the sequence of thetuples is dictated by their order of entry

In an array organization, the tuples (all of the same size)are arran ed as in an arra and their ositions de end on

the values of an index (or indexes) In a sequentially-ordered organization, the position of each

tuple in the sequence depends on the value of a key field ,that induces the ordering

Page 19: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 19/59

Physical data structures and query optimization819

“Entry-sequenced” sequential structure Optimal for carrying out sequential reading and writing

operations

Optimal for space occupancy, as it uses all the blocksavailable for files and all the space within the blocks

Non optimal with respect to

searching specific data units

  up a es a ncrease e s ze o a up e

Page 20: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 20/59

Physical data structures and query optimization820

“Array” sequential structure

Possible only when the tuples are of fixed length

Made of n adjacent blocks, each block with m slotsavailable to store m tuples

Each tuple has a numeric index i and is placed in thei -th position of the array

Page 21: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 21/59

Physical data structures and query optimization821

Each tuple has a position based on the value of the key field

Historically, such structures were used on sequential devices (tapes).This has fallen out of use, but for data streams and system logs

The main problems are insertions or updates which increase thephysical space - they require reordering techniques for the tuplesalready present:

Options to avoid global reorderings:

 

“Sequentially-ordered” sequential structure

 

Leaving a certain number of slots free at the time of first loading,followed by ‘local reordering’ operations

Integrating the sequentially ordered files with an overflow file, wherenew tuples are inserted into blocks linked to form an overflow chain

Page 22: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 22/59

Physical data structures and query optimization822

Hash-based access structures

Ensure an efficient associative access to data, based onthe value of a key field

A hash-based structure has B blocks (often adjacent)

A hash algorithm is applied to the key field and returnsa value between zero and B-1. This value is interpretedas the position of the block in the file, and used both

This is the most efficient technique for queries withequality predicates, but it is rather inefficient forqueries with interval predicates

Page 23: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 23/59

Physical data structures and query optimization823

Primitive interface: hash(fileId,Key):BlockId

The implementation consists of two parts.

folding, transforms the key values so that they become po-

sitive integer values, uniformly distributed over a large range hashing transforms the positive binary number into a number

between zero and B - 1

Optimal performance if the file is larger than necessary. Let:

Features of hash-based structures

T be the number of tuples expected for the file, F be the average number of tuples stored in each page;

then a good choice for B is T/(0.8 x F), using only 80% of theavailable space

Page 24: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 24/59

Physical data structures and query optimization824

Collisions occur when the same block number is associated to toomany tuples. They are critical when the maximum number of tuplesper block is exceeded

Collisions are solved by adding an overflow chain

This gives the additional cost of scanning the chain The average length of the overflow chain is a function of the ratio

T/(F x B) and of the average number F of tuples per page:

1 2 3 5 10 F 

Collisions

.5 0.5 0.177 0.087 0.031 0.005

.6 0.75 0.293 0.158 0.066 0.015

.7 1.167 0.494 0.286 0.136 0.042

.8 2.0 0.903 0.554 0.289 0.110

.9 4.495 2.146 1.377 0.777 0.345

T /(F xB)

Page 25: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 25/59

Physical data structures and query optimization825

An example

40 records

hash table with 50

positions: 1 collision of 4

values

2 collisions of 310201260

10205610

9201159

5200205

5116455566005

4200604

2116202

2200902

2205802

1205751

166301060600

M mod 50M

41115541

40102690

38102338

38200138

3720588733210533

30200430

28205478

27205977

24205724

22210522

1920561918200268

M mod 50M

5 collisions of 2values

17205667

17205617

14200464

12205762

1220591210205460

10102360

49206049

48200498

46205796

46200296

4520584543205693

42206092

Page 26: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 26/59

Physical data structures and query optimization826

Performs best for direct access based on equality forvalues of the key

Collisions (overflow) are typically managed with linked

blocks into an area called overflow file

Inefficient for access based on interval predicates orbased on the value of non-key attributes

" "

About hashing

as es egenera e e ex ra-space s oo sma

(should be at least 120% of the minimum requiredspace) and if the file size changes a lot over time

Page 27: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 27/59

Physical data structures and query optimization827

Tree structures

The most frequently used in relational DBMSs

SQL indexes are implemented in this way

Gives associative access based on the value of a key

no constraints on the physical location of the tuples

Note: the primary key of the relational model and thekeys for hash-based and tree structures are differentconcepts

Page 28: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 28/59

Physical data structures and query optimization828

Index: an auxiliary structure for the efficient access tothe records of a file based upon the values of a givenfield or record of fields – called the index key

The index concept: analytic index of a book, seen as apair (term-page list), alphabetically ordered, at the endof a book

 

Index file

 

Page 29: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 29/59

Physical data structures and query optimization829

Tree structures

Each tree has:

one root node

several intermediate nodes

several leaf nodes

Each node corresponds to a block

paolomaurobice dino renzo teresa

paolo

mauro renzo

Pointers to tuples (arbitrarily organized)

root node

first level

secondlevel

The links between the nodes are established by pointers to mass memory In general, each node has a large number of descendants (fan out), and

therefore the majority of pages are leaf nodes

In a balanced tree, the lengths of the paths from the root node to the leaf nodes are all equal. Balanced trees give optimal performance.

Page 30: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 30/59

Physical data structures and query optimization830

Structure of the tree nodes

..... .....1 1 i0P K P K  i Pi K F  PF 

sub-tree with keysK K 1 i i+1K K K  K K F sub-tree with keyssub-tree with keys

Page 31: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 31/59

Physical data structures and query optimization831

Primary vs. secondary indexes

Indexes can be used as a primary access structure:

Tuples are stored in the index nodes or in a file orderedaccording to the index key (also: clustered index)

Possibly a “sparse” index: with less index entries than thenumber of tuples of the file, as tuples are ordered in the file

Indexes are (more) often used as secondary access structures

Tu les are stored accordin to another structure hashed,entry-sequenced, another index with a different key)

The index nodes only contain key values and pointers

Necessarily a “dense” index: one index entry pointing to everytuple in the file is required (or the tuple is “lost”)

Page 32: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 32/59

Physical data structures and query optimization832

B and B+ trees

B+ trees

The leaf nodes are linked in a chain ordered by the key

Supports interval queries efficiently

The most used by relational DBMSs B trees

No sequential connection for leaf nodes

Intermediate nodes use two pointers for each key value K i 

one points directly to the block that contains the tuplecorresponding to K i  the other points to a sub-tree with keys greater than K i 

and less than K i +1

33

Page 33: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 33/59

Physical data structures and query optimization833

An example of B+ tree

paolo

mauro renzo

root node

first level

secondlevel

Pointers to index nodes

paolomaurobice dino renzo teresa

Pointers to blocks of tuples (arbitrarily organized)

34

Page 34: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 34/59

Physical data structures and query optimization834

An example of B tree

k1 k6

k2 k3 k4 k5 k7 k8 k9

k10

t(k2) t(k3) t(k4) t(k5) t(k1) t(k6) t(k10) t(k7) t(k8) t(k9)

Page 35: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 35/59

36

Page 36: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 36/59

Physical data structures and query optimization836

Split and Merge operations

SPLIT: required when the insertion of a new tuplecannot be done locally to a node

Causes an increase of pointers in the superior node

and thus could recursively cause another split

MERGE: required when two “close” nodes have entriesthat could be condensed into a single node. Done in

from the root to the leaves. Causes a decrease of pointers in the superior node

and thus could recursively cause another merge

h i l d d i i i37

Page 37: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 37/59

Physical data structures and query optimization83

Split and merge

a. insert k3: split

Initial situationk1 k6

k1 k2 k4 k5

k1 k3 k6

k1 k2 k3 k4 k5

b. delete k2: merge k1

k1 k3

k6

k4 k5

Ph i l d t t t d ti i ti838

Page 38: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 38/59

Physical data structures and query optimization8

Index usage

Syntax in SQL:

create [unique] index IndexName on

TableName( AttributeList)

drop index IndexName

Every table should have:

A primary index , with key-sequenced structure, normallyuni ue, on the rimar ke 

Several secondary indexes, both unique and not unique,on the attributes most used for selections and joins

They are progressively added, checking that the systemactually uses them, and without excess

Ph i l d t t t d ti i ti839

Page 39: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 39/59

Physical data structures and query optimization8

Query optimization

Optimizer: an important module in the architecture of a DBMS

It receives a query written in SQL and produces an accessprogram in ‘object’ or ‘internal’ format, which uses the data

access methods. Steps:

Lexical, syntactic and semantic analysis

 

Algebraic optimization Cost-based optimization

Code generation

Physical data structures and query optimization840

Page 40: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 40/59

Physical data structures and query optimization8

Internal representation of queries

A tree representation, similar to that of relational algebra:

Leaf nodes correspond to the physical data structures(tables, indexes, files).

intermediate nodes represent physical data accessoperations that are supported by the access methods

Typical operations include sequential scans, orderings,

and aggregate queries, as well as materialization choicesfor intermediate results

Physical data structures and query optimization841

Page 41: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 41/59

Physical data structures and query optimization8

Query optimization input-output

Input: query in SQL

SELECT R.a FROM R,S,T WHERE R.a=S.a AND R.b=T.b

Output: execution plan

project

select(R.a=S.a, R.b=T.b)

build3

probe2

dupElim&project

T

cartProd

S

cartProd

R scan T

build2

scan R

probe1

scan S build1

Page 42: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 42/59

Physical data structures and query optimization843

Page 43: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 43/59

Physical data structures and query optimization8

Relation profiles

Profiles contain quantitative information about tables and arestored in the data dictionary:

the cardinality (number of tuples) of each table T

the dimension in bytes of each attribute A j  in T the number of distinct values of each attribute A j  in T

the minimum and maximum values of each attribute A j  in T

  er o ca y ca cu a e y ac va ng appropr a e sys em

primitives (for example, the update statistics command) Used in cost-based optimization for estimating the size of the

intermediate results produced by the query execution plan

Physical data structures and query optimization844

Page 44: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 44/59

Physical data structures and query optimization8

Sequential scan

Performs a sequential access to all the tuples of a tableor of an intermediate result, at the same time executingvarious operations, such as:

Projection to a set of attributes Selection on a simple predicate (of type: Ai = v )

Sort (ordering)

 , ,

currently accessed during the scan Primitives:

O pen, next, read, modify, insert, delete, close

Physical data structures and query optimization845

Page 45: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 45/59

Physical data structures and query optimization8

Sort

This operation is used for ordering the data accordingto the value of one or more attributes. We distinguish:

Sort in main memory, typically performed by means

of ad-hoc algorithms Sort of large files, which can not be transferred to

main memory, performed by merging smaller parts

Physical data structures and query optimization846

Page 46: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 46/59

Physical data structures and query optimization8

Indexed access

Indexes are used when queries include:

simple predicates (of the type Ai = v ) interval predicates (of the type v 1 ≤≤≤≤ Ai ≤≤≤≤ v 2)

These predicates are said to be supported by indexes built on Ai

With conjunctions of supported predicates, the DBMS choosesthe most selective supported predicate for the primary access,and evaluates the other predicates in main memory

With disjunctions of predicates:

if any of them is not supported a scan is needed;

if all are supported, indexes can be used (on all of them) andthen duplicate elimination is normally required

Physical data structures and query optimization847

Page 47: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 47/59

ys ca da a s uc u es a d que y op a o8

Join Methods

Joins are the most frequent (and costly) operations inDBMSs

There are several methods for join evaluation, among

which: nested-loop, merge-scan and hashed .

These three methods are based on scanning, hashing,an or er ng.

Physical data structures and query optimization848

Page 48: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 48/59

y q y p

Nested-loop join

External

scan

a

a ---------------

---------------- a ---------------

External table Internal table

A A

Internal scan or

indexed access

a ---------------

Physical data structures and query optimization849

Page 49: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 49/59

y q y p

Merge-scan join

Left

scan

Right

scan A

ab

b

A

aa

b

----------------

---------------- ---------------

Left Table Right Table

c

ce

c

ee

g

Physical data structures and query optimization850

Page 50: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 50/59

Hashed join

hash(a)

a

hash(a)

a a

d ee m

c

Left Table Right Table

A A

a

 j

z

 j j

Page 51: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 51/59

Physical data structures and query optimization852

Page 52: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 52/59

Approach to query optimization

Optimization approach:

Make use of profiles and of approximate cost formulas

Construct a decision tree, in which each node corresponds

to a choice; each leaf node corresponds to a specificexecution plan.

Assign to each plan a cost:

 total     I/O I/O   cpu cpu

Choose the plan with the lowest cost, based on operationsresearch (branch and bound)

Optimizers should obtain ‘good’ solutions in a very short time

Physical data structures and query optimization853

Page 53: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 53/59

An example of decision tree

1 1 1 1 1

1 2(R ST)

21R S T

21S)(R T

1 2(S T) R

merge-scannested-loopnested-loop, hash-join,R internal R external hash on R

hash-join,hash on S

2 2 2 221

S)(Rnested-loop, nested-loop, hash-join,merge-scanT internal T external hash on T hash on

hash-join,

.............strategy 1 strategy 2 strategy 3 strategy 4 strategy 5

Physical data structures and query optimization854

Page 54: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 54/59

Statistic

profiles

 

Optimizer

Analyzer

User

Result 

Catalog

Query processing components

Query 

Internal representation 

Execution plan 

Mass memory: - Data- Indexes

Execution layer

Physical data structures and query optimization855

Page 55: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 55/59

Centralized architecture (DBMS)

DBMS

Statistic

profiles

 

Optimizer

Analyzer

User

Result 

Catalog

Query 

Internal representation 

Execution plan 

Mass memory: - Data- Indexes

Execution layer

Physical data structures and query optimization856

Page 56: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 56/59

Distributed database with master-slave optimization

Database

Execution layer

Site S2 (slave)

Site S3 (slave)

User

Analyzer

Site S1 (master)

Distributed 

execution plan 

Distributedcatalogs

Distributedprofiles

Optimizer

Database

Execution layer

Database

Execution layer

Physical data structures and query optimization857

Page 57: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 57/59

Distributed optimization with negotiation

Site S2

Execution

Distributed negotiation 

User

Analyzer

Site S1

Distributed 

execution plan 

Distributedcatalogs

Distributedprofiles

Optimizer Database

ExecutionlayerOptimizer

Site S3Database

 layerOptimizer

Database

Execution layer

Page 58: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 58/59

Physical data structures and query optimization859

Page 59: 08pdf Physical Optim

8/3/2019 08pdf Physical Optim

http://slidepdf.com/reader/full/08pdf-physical-optim 59/59

USER

QUERY ANALYZER

OPTIMIZER

[TRANSACTION]

ACCESS MANAGER

 

RELIABILITY MANAGER

CONCURRENCY MANAGER

LOG

DUMP

LOCKTABLESLOCK

TABLES

Overall view: components of a DBMS

 

MAINMEMORY

STATISTICS INDEXES

USER DATA SYSTEM DATA