optimization of nested queries sujatha thanigaimani cosc 6421

25
Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Upload: bertram-sharp

Post on 14-Dec-2015

230 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Optimization of Nested Queries

Sujatha Thanigaimani

COSC 6421

Page 2: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Outline

• Introduction

• Kim’s Algorithm for efficient processing

• Count bug – Solution

• inequality bug – Solution

• Alternate Algorithm

• Modification of Kim’s algorithm

Page 3: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Nested Queries • Queries containing other queries

• Inner query:– Can appear in FROM or WHERE clause

“outer query” “inner query”

Example: SELECT cname FROM borrower WHERE cname IN (SELECT cname FROM depositor)

think this as a functionthat returns the result of the inner query

Page 4: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Evaluation of Nested Queries

Naive method :

Tuple Iteration Semantics (TIS) - inefficient.

Kim’s Algorithm Rationale :

Interesting and powerful feature of SQL. Unnesting :

Process of transforming nested queries into canonical form.

Classified the Nested Queries for better understanding and processing

Page 5: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Types :

SUPPLIER(sno, sname, sloc, sbudget),PARTS(pno,pname,qoh,color),PROJECT(jno,jname,pno,jbudget,jloc)SHIPMENT(sno,pno,jno,qty,shipdate)

Type-A Nesting:

Not correlated, aggregated sub query

Example :

SELECT SNO FROM SP WHERE PNo= (SELECT MAX(PN0) FROM P)

can be evaluated independently of the outer query block, and the result of its evaluation will be a single constant

Page 6: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Type-N Nesting :

Non correlated, not aggregated subquery

SELECT SNO FROM SP

WHERE PNO IS lN

(SELECT PNO FROM P

WHERE WEIGHT> 50)

Evaluation : inner query block Q is processed, resulting in

a list of values X which can then be substituted for the inner

query block so that PNO IS IN Q becomes PNO IS IN

X.The resulting query is then evaluated by nested iteration

Page 7: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Type-J Nesting :

Correlated, not aggregated subquery

SELECT SNAME FROM S WHERE SNO IS IN (SELECT SNOFROM SP WHERE QTY> 100 AND SPORIGIN = S. CITY).

Type-JA Nesting :

Correlated, aggregated subquery

SELECT PNAM FROM P WHERE PNO= (SELECT MAX(PN0) FROM SP WHERE SPORlGlN = P.CITY)

Evaluation : In TIS, the inner query block is processed once foreach tuple of the outer relation which satisfies all simple predicates onthe outer relation ----- inefficient

Kim developed alternate algorithms for efficient processing ofnested queries.

Page 8: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Algorithm NEST-N-J (for type-N or type-J)

1. Combine the FROM clauses of all query blocks into one FROM

clause

2. AND together the WHERE clauses of all query blocks,

replacing IS IN by =

3. Retain the SELECT clause of the outermost query block

The result is a canonical query logically equivalent to the

original nested query.

SELECT RiCk SELECT RiCk

FROM Ri FROM Ri,Rj

WHERE RiCh IS IN WHERE RiCh = RjCm

(SELECT RjCm FROM Rj)

Page 9: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Algorithm NEST-JA

1. Generate a temporary relation Rt(C1,Cn,Cn+l) from R2 such that Rt Cn+l is the result of applying the aggregate function AGG on the Cn+l column of R2 which have matching values of RI for Cl,C2, etc

SELECT R1.Cn+2 Rt(C1,..,Cn,Cn+1)=(SELECT

FROM R1 C1,Cn,AGG(Cn+1)

WHERE R1.Cn+1 = FROM R2

(SELECT AGG(R2.Cn+1) GROUP BY C1,..,Cn)

FROM R2

WHERE R2.C1 = R1.C1 AND

R2.C1 = R1.C1 AND

R2.Cn = R1.C1);

Page 10: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

2. Transform the inner query block of the initial query bychanging all references to R2 columns Join predicateswhich also reference Rl to the corresponding Rt columns. The result isa type-J nested query, which can be passed to algorithm NEST-N-J fortransformation to its canonical equivalent.

SELECT R1.Cn+2FROM R1 WHERE R1.Cn+1 = (SELECT Rt.Cn+1FROM RtWHERE Rt.C1 = R1.C1 AND

Rt.C2 = R1.C2 AND

Rt.Cn = R1.C1);

Page 11: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Count bug :

PARTS (PNUM,QOH)

SUPPLY (PNUM,QUAN,SHIPDATE)

SELECT PNUM FROM PARTS WHERE QOH =

(SELECT COUNT( SHlPDATE ) FROM SUPPLY

WHERE SUPPLY. PNUM = PARTS.PNUM AND SHIPDATE < l – l - 80)

Parts

PNUM QOH

3 6

10 1

8 0

PNUM QUAN SHIPDATE

3 4 7-3-79

3 2 10-1-78

10 1 6-8-78

10 2 8-10-81

8 5 5-7-83

Supply

PNUM

10

8

Result by TIS Result

PNUM

10

Page 12: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Solution using Outer Join

R X

A

B

S Y

B

C

E

R=+S X Y

A null

B B

null C

null E

Page 13: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Solution with outer joinstemp (SUPPNUM,CT) =

(select parts.PNUM, count(SHIPDATE)from parts, supplywhere SHIPDATE < 1-1-80 and

parts.PNUM =+ supply.PNUMgroup by parts.PNUM)

parts.PNUM =+ supply.PNUM (for SHIPDATE < 1-1-80)

Parts.PNUM Parts.QOH Supply.PNUM Supply.QUON Supply.SHIPDATE

3 6 3 4 7-3-79

3 6 3 2 10-1-78

10 1 10 1 6-8-78

8 0 null null null

Page 14: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

TEMP

SUPPNUM CT

3 2

10 1

8 0Final Result

PNUM

10

8

Drawbacks :

1. If the sub query has COUNT(*), this will always return a result > 0

because of the outer join. The '*' must be changed to a column name

from the inner relation.

SELECT PNUM

FROM PARTS,TEMP

WHERE PARTS.QOH = TEMP.CT AND PARTS.PNUM

= TEMP.SUPPNUM

Page 15: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

2. Duplicates Problem :

Parts

PNUM QOH

3 2

3 6

10 1

10 0

8 0

Supply

PNUM QUAN SHIPDATE

3 4 7-3-79

3 2 10-1-78

10 1 6-8-78

Result by TIS Our Result

PNUM

3

10

8

PNUM

8

SUPPNUM CT

3 4

10 2

8 0

Page 16: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Solution:

1. Remove duplicates before the join in the creation of Temp table is performed.

TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS)

2. Use the projection instead of outer relation in any join required to

build the temp table

TEMP2(SUPPNUM,CT) =

(SELECT TEMP1.PNUM ,COUNT(SHIPDATE)

FROM TEMP1, SUPPLY

WHERE SUPPLY.SHIPDATE < 1-1-80

AND TEMP1.PNUM =+ SUPPLY.PNUM

GROUP BY TEMP1.PNUM)

SUPPNUM CT

3 2

10 1

8 0

PNUM

3

10

8

Page 17: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Another bug : Relations other than equality

SELECT PNUM FROM PARTS WHERE QOH =

(SELECT MAX(QUAN) FROM SUPPLY

WHERE SUPPLY. PNUM < PARTS.PNUM AND SHIPDATE < l – l - 80)

TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM SUPPLY WHERE SHIPDATE < l-l-80

GROUP BY PNUM

SELECT PNUM

FROM PARTS, TEMP

WHERE QOH = TEMP.MAXQUAN AND TEMP.SUPPNUM<PARTS.PNUM

Max is calculated for each S.pnum but required is Max should be taken for a set of S.Pnum which are lesser than given P.Pnum

Problem

Page 18: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Solution :

1. First join, then aggregate (Kim' was: First group, then join).

TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM

PARTS,SUPPLY WHERE SHIPDATE < l-l-80 AND

SUPPLY.PNUM < PARTS.PNUM

GROUP BY PNUM

SELECT PNUM

FROM PARTS,TEMP

WHERE PARTS.QOH = TEMP.MAXQUAN AND

PARTS.PNUM = TEMP.SUPPNUM

Page 19: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Modified Algorithm : Nest JA2

1. Project the Join column of the outer relation, and restrictit with any simple predicates applying to the outer relation

TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS)

2. Create a temporary relation, Joining the inner relationwith the projection of the outer relation. If the aggregatefunction is COUNT, the Join must be an outer Join

TEMP2(PNUM)= (SELECT PNUM FROM SUPPLYWHERE SHIPDATE < l-1-80)

TEMP3 (PNUM,CT) =(SELECT TEMPl. PNUM, COUNT(TEMP2. SHIPDATE)FROM TEMPl,TEMP2WHERE TEMPl.PNUM=+TEMP2.PNUMGROUP BY TEMPl. PNUM)

Page 20: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

3. Join the outer relation with the temporary relation, according to the transformed version of the original query

SELECT PNUM

FROM PARTS,TEMP3

WHERE PARTS.QOH = TEMP3.CT AND

PARTS.PNUM = TEMP3.PNUM

Processing a General Nested Query : Recursive Approach

procedure nest_g (query-block)

for each predicate in the WHERE clause of query-block

if predicate is a nested predicate (i.e contains inner query block)

nest_g (inner_query_block)

/* Determine type of nesting and call appropriate transformation

procedure*/

/* if nesting is type-JA */

nest-JA2(inner_query_block)

Page 21: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Nest_g contd

nest-N-J(query_block,inner_query_block)

Else

/* if nesting is type-A */

nest_a(inner_query_block)

Else

nest-N-J (query_block, inner_query_block)

Return

Advantage :

• Simplicity

Page 22: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Analysis

Page 23: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Modified Kim’s Algorithm :

R.B OP1 TEMP1.COUNT : R.B OP1 O

ITEMPI < I R OJ S I ,Hence better than alternate algorithm

Page 24: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

References:

1.Optimisation of Nested SQL Queries Revisited - Richard A Ganski, Harry K T Wong

2.Improved Unnesting Algorithms for Join Aggregate SQL Queries – M.Muralikrishna

Page 25: Optimization of Nested Queries Sujatha Thanigaimani COSC 6421

Thank You