# fo optimization2

Author: timothy-wood

Post on 03-Jun-2018

213 views

Category:

## Documents

Embed Size (px)

TRANSCRIPT

• 8/12/2019 Fo Optimization2

1/32

Query processing and

o timization

Reading (5th edition): Chapters 6.1-6.3, 15.1-15.3, 15.7-15.8.2

Jose M. Pea

[email protected]

• 8/12/2019 Fo Optimization2

2/32

ERdiagram

Relational model

MySQL

• 8/12/2019 Fo Optimization2

3/32

Relation schema

Attributes

-

yymmdd-xxxx

Textual string less than 30 chars

Textual string less than 30 chars

rrr - nn nn nn

aaaaannn

Positive integer0

• 8/12/2019 Fo Optimization2

4/32

Relation (state)

PNumber Name Address Telephone E-mail Age123456-7890 Anders

AnderssonRydsvgen 1 013-11 22 33 andan111 25

112233-4455 Veronika Alsters 2 013-22 33 44 ver e222 27

Tuple = list of values in the corresponding domains, or NULL

• 8/12/2019 Fo Optimization2

5/32

Key constraints Relation = set of tuples.

Then, no duplicates are allowed.

Then, every tuple is uniquely identifiable

super ey, can ate ey, pr mary eywhich are all time-invariant).

PNumber Name Address Telephone E-mail Age

Rydsvgen 1 013-11 22 33 andan111 25

Alstersg 2 013-22 33 44 verpe222 27

• 8/12/2019 Fo Optimization2

6/32

Integrity constraints

Entity integrity constraint = no primarykey value is NULL.

domain(FK) = domain(PK) and (ii) everyvalue of FK in R1 refers to an existing

tuple in R2 or is NULL. Referential integrity constraint =

conditions (i) and (ii) above hold.

• 8/12/2019 Fo Optimization2

7/32

• 8/12/2019 Fo Optimization2

8/32

Select Selects the tuples of a relation satisfying

some condition over its attributes.

)(3)21( RZAYAXA =

• 8/12/2019 Fo Optimization2

9/32

Example: select

112233-4455 Elin Rydsvgen 1 112233

223344-5566 Nisse Alstersgatan 3 223344

334455-6677 Nisse Rydsvgen 3 334455

STUDENT:

113322-1122 Pelle Rydsvgen 2 113322552233-1144 Monika Rydsvgen 4 443322

442211-2222 Patrik Rydsvgen 6 111122

334433-1111 Camilla Alstersgatan 1 665544

)('')'334455'''( STUDENTCamillaNameTelNrNisseName ===

334455-6677 Nisse Rydsvgen 3 334455

334433-1111 Camilla Alstersgatan 1 665544

• 8/12/2019 Fo Optimization2

10/32

Project Projects a relation over some attributes.

The result must be a relation = duplicatesare removed.

3,2,1 AAA

• 8/12/2019 Fo Optimization2

11/32

Example: project

PNum Name Address TelNr112233-4455 Elin Rydsvgen 1 112233

223344-5566 Nisse Alstersgatan 3 223344

334455-6677 Nisse R dsv en 3 334455

STUDENT:

)(, STUDENTNamePNum

PNum Name

112233-4455 Elin

223344-5566 Nisse

334455-6677 Nisse

?)(STUDENTName

• 8/12/2019 Fo Optimization2

12/32

Union, intersection anddifference

SRISRU SR

, . .same number of attributes and with thesame domains.

The result must be a relation =duplicates are removed (union).

• 8/12/2019 Fo Optimization2

13/32

112233-4455 Elin Rydsvgen 1 112233

223344-5566 Nisse Alstersgatan 3 223344

334455-6677 Nisse Rydsvgen 3 334455

STUDENT:

884455-4455 Monika Teknikringen 1 111112

223344-5566 Nisse Alstersgatan 3 223344

668877-7766 Patrik Teknikringen 3 332211

223344-5566 Nisse Alstersgatan 3 223344

• 8/12/2019 Fo Optimization2

14/32

Cartesian productName STATE

Los Angeles Calif

Oakland Calif

Atlanta Ga

Name STATE Key City

Los Angeles Calif 5 San Fransisco

Los Angeles Calif 7 Oakland

Los Angeles Calif 8 Boston

Oakland Calif 5 San Fransisco

Oakland Calif 7 Oakland

Oakland Calif 8 Boston

R:

San Fransisco Calif

Boston Mass

Key City

5 San Fransisco

7 Oakland

8 Boston

AtlantaGa 5 San Fransisco

Atlanta Ga 7 Oakland

Atlanta Ga 8 Boston

San Fransisco Calif 5 San Fransisco

San Fransisco Calif 7 Oakland

San Fransisco Calif 8 Boston

Boston Mass 5 San Fransisco

Boston Mass 7 Oakland

Boston Mass 8 Boston

S: R x S

• 8/12/2019 Fo Optimization2

15/32

Join

Joins two tuples from two relations if they satisfysome condition over their attributes.

Join = Cartesian product followed by selection.

Tuples with NULL in the condition attributes donot appear in the result.

Recall: Join only on foreign key-primary key

attributes.

R.A1=S.B3 AND R.A5

• 8/12/2019 Fo Optimization2

16/32

Example: joinName STATE

Los Angeles Calif

Oakland Calif

Atlanta Ga

Key City

5 San Fransisco

7 Oakland

R:

S:

San Fransisco Calif

Boston Mass8 Boston

Name STATE Key City

Oakland Calif 7 Oakland

San Fransisco Calif 5 San Fransisco

Boston Mass 8 Boston

R.Name=S.CityR S

• 8/12/2019 Fo Optimization2

17/32

Name STATE Key City

Los Angeles Calif 5 San Fransisco

Los Angeles Calif 7 Oakland

Los Angeles Calif 8 Boston

OaklandCalif 5 San Fransisco

Oakland Calif 7 Oakland

Oakland Calif 8 Boston

Atlanta Ga 5 San Fransisco

Atlanta Ga 7 Oakland

Atlanta Ga 8 Boston

San Fransisco Calif 5 San Fransisco

San Fransisco Calif 7 Oakland

San Fransisco Calif 8 Boston

Boston Mass 5 San Fransisco

Boston Mass 7 Oakland

Boston Mass 8 Boston

• 8/12/2019 Fo Optimization2

18/32

Example: joinName Area

Los Angeles 2

Oakland 9

Atlanta 7

R:

Name Area Key City

Los Angeles 2 5 San Fransisco

Los Angeles 2 7 Oakland

Los Angeles 2 8 Boston

Boston 16

Key City

5 San Fransisco

7 Oakland

8 Boston

S: R.Area

• 8/12/2019 Fo Optimization2

19/32

Name Area Key City

Los Angeles 2 5 San Fransisco

Los Angeles 2 7 Oakland

Los Angeles 2 8 Boston

Oakland 9 5 San FransiscoOakland 9 7 Oakland

Oakland 9 8 Boston

Atlanta 7 7 Oakland

Atlanta 7 8 Boston

San Fransisco 11 5 San Fransisco

San Fransisco11 7 Oakland

San Fransisco 11 8 Boston

Boston 16 5 San Fransisco

Boston 16 7 Oakland

Boston 16 8 Boston

• 8/12/2019 Fo Optimization2

20/32

Variants of join

Theta join = join. Equijoin = join with only equality conditions.

=

duplicate attributes is removed (attributes inthe conditions must have the same name).

Unless otherwise specified, natural join joins

all the attributes with the same name in Rand S.

AR S*

• 8/12/2019 Fo Optimization2

21/32

Example

• 8/12/2019 Fo Optimization2

22/32

Query trees Tree that represents a relational algebra expression. Leaves = base tables. Internal nodes = relational algebra operators applied to the nodes

children. The tree is executed from leaves to root.

Example: List the last name of the employees born after 1957 who work .

SELECT E.LNAMEFROM EMPLOYEE E, WORKS_ON W, PROJECT PWHERE P.PNAME = Aquarius AND P.PNUMBER = W.PNO AND W.ESSN = E.SSN AND E.BDATE > 1957-12-31

Canonial query tree

SELECT attributesFROM A, B, CWHERE condition

XX

CA B

condition

attributes

Construct the canonical query tree as follows Cartesian product of the FROM-tables

Select with WHERE-condition Project to the SELECT-attributes

• 8/12/2019 Fo Optimization2

23/32

Equivalent query trees

• 8/12/2019 Fo Optimization2

24/32

Real World

Model

DatabaseProcessing of

User 4

User 3

User 2

User 1

Overview

Physicaldatabase

management

system

• 8/12/2019 Fo Optimization2

25/32

Query processingStarsIn( movieTitle, movieYear, starName )MovieStar( name, address, gender, birthdate )

SELECT movieTitleFROM StarsInWHERE starName IN (

SELECT nameFROM MovieStarWHERE birthdate LIKE %1960);

Canonical query tree(usually very inefficient)

• 8/12/2019 Fo Optimization2

26/32

Parsing and validating Control of used relations

Have to be declared in FROM Must exist in the database

Control and resolve attributes

Attributes must exist in the relations

Type checking

Attributes that are compared must be of the same type

• 8/12/2019 Fo Optimization2

27/32

Query optimizer: Heuristic

Heuristic: Use joins instead of cartesian product+selections and doselection and projection as soon as possible, in order to keep theintermediate tables as small as possible, because

If the tables do not fit in memory, then we need to perform fewerdisc accesses

If the tables fit in memory, then we use less memory

,

If the tables have to be sorted, joined, etc., then we use lesscomputation power

ORDER_ID, ENTRY_DATE

ENTRY_DATE>2001-08-30

ORDER

ENTRY_DATE>2001-08-30( ORDER_ID, ENTRY_DATE( ORDER ) )

n = 6 tuples

4+4+27 (= 35) bytes

total: 210 bytes

n = 6 tuples

4+27 (=31) bytes

total: 181 bytes

n = 2 tuples

4+27 (=31) bytes

total: 62 bytes

ORDER_ID, ENTRY_DATE

ENTRY_DATE>2001-08-30

ORDER

ORDER_ID, ENTRY_DATE( ENTRY_DATE>2001-08-30( ORDER ) )

n = 6 tuples

4+4+27 (= 35) bytes

= 210 bytes

n = 2 tuples

4+4+27 (=35) bytes

= 70bytes

n = 2 tuples

4+27 (=31) bytes

= 62 bytes

• 8/12/2019 Fo Optimization2

28/32

Query optimizer: Heuristic Algorithm:

1. Break up conjunctive select into cascade

2. Move down select as far as possible in the tree

3. Rearrange select operations: The most restrictive should be executed first

4. Convert Cartesian product followed by selection into join

5. Move down project operations as far as possible in the tree. Create newprojections so that only the required attributes are involved in the tree

Fewest tuples ? Smallestsize ? Smallest selectivity ?

DBMS catalog containsrequired info.

.

• 8/12/2019 Fo Optimization2

29/32

Equivalence rules

• 8/12/2019 Fo Optimization2

30/32

• 8/12/2019 Fo Optimization2

31/32

ExercisesTrue or false ?

SELECT *

FROM ol_order_line, it_item

WHERE ol_item_id = it_item_id

AND ol_order_id = 1001

Optimize the queries below:

• 8/12/2019 Fo Optimization2

32/32

Execution plans Execution plan: Optimized query tree extended

with access methods and algorithms toimplement the operations.