practical sql performance tuning, for developers and dbas kurt struyf competence partners

Practical SQL performance tuning, for developers and DBAs

Kurt StruyfCompetence Partners

Agenda

• One SQL, one access path

• Index, stage1, stage2

• Sort impact

• SQL examples of sub optimal coding and it’s improvements

• Access path fields in the plan_table

• Other CPU saving techniques

Static SQLOne SQL = One access path

SELECT …FROM …WHERE NAME BETWEEN :HV1-LOW AND :HV1-HIGHAND FIRSTNAME BETWEEN :HV2-LOW AND :HV2-HIGHAND BIRTHDATE BETWEEN :HV3-LOW AND :HV3-HIGHAND ZIPCODE BETWEEN :HV4-LOW AND :HV4-HIGH

Our table has 4 indexes :

IX1 on NAMEIX2 on FIRSTNAMEIX3 on BIRTHDATEIX4 on ZIP CODE

AT BIND TIME DB2 CHOOSES IX3

Step1Step2

Step3

AT RUN TIMEUser only fills out a value for ZIP CODE

Step4

AT RUN TIME DB2 USES IX3 which doesn’t filter anything ONE SQL = ONE access path

Step5

Agenda



• Sort impact




Index, Stage1, Stage2

DB2

RDS

DM

Index Index

Stage1

Stage2

Indexable

• Keep it positive and simple!• ‘=‘ : equal to• ‘>’ : larger then• ‘<‘ : smaller then• ‘>=‘ : larger then or equal to• ‘<=‘ : smaller then or equal to• ‘LIKE’• ‘IN’• ‘BETWEEN’

MUST ALSO BE BOOLEAN

DB2

RDS

DM

Index Index

Stage1

Stage2

Matching columns is an indication of how well an index is used,- more matching columns better index use- always start with first index column- on ‘=‘ and on one ‘IN’ you can continue

Example: Index on (Name, Clientno, Salarycode) Predicate Matching--------------------- -------------1. Name = ‘Smith’ AND Clientno = 20 AND Salarycode = 56

2. Name = ‘Smith’ OR Clientno > 20 AND Salarycode = 56

3. Name IN (‘Smith’, ‘Doe’) AND Clientno > 20 AND Salarycode = 56

Matching Columns

Predicate Matching--------------------- -------------4. Name IN (‘Smith’, ‘Doe’) AND Clientno IN (20,30) AND Salarycode > 0

5. Name <> ‘Smith AND Clientno = 56

6. Clientno = 56 AND Salarycode = 0

Stage 1 • Keep it positive and simple but no index!

• ‘=‘ : equal to• ‘>’ : larger then• ‘<‘ : smaller then• ‘>=‘ : larger then or equal to• ‘<=‘ : smaller then or equal to• ‘LIKE’• ‘IN’• ‘BETWEEN’

DB2

RDS

DM

Index Index

Stage1

Stage2

• Keep it simple but not positive !• Keep it simple but not Boolean !

Stage 2

• All the rest !!• All functions such as

– SUBSTR

– CONCAT

– CHAR()

• Mismatching data types– Colchar_6 = ‘1234567’

• Host variable checking– AND :HV1 = 5

• Decryption • Current date between col1 and col2• Sorting

DB2

RDS

DM

Index Index

Stage1

Stage2

In COBOL checking “stage 3”

• NEVER (ab)use this!

• All DB2 columns that CAN be checked in SQL SHOULD be checked in SQL

• So BETTER a Stage2 predicate then NO predicate

DB2

RDS

DM

Index Index

Stage1

Stage2

IN COBOL

Index, Stage1, Stage2

DB2

RDS

DM

Index Index

Stage1

Stage2

Agenda



• Sort impact




Sort impact

• Sort YES :at open cursor all rows are retrieved (e.g. 3000)fetch 20 rows to build first screen, user found his info and abortsRESULT : 2980 rows needless retrieved

• Sort NO :at fetch time first row retrievedfetch 20 rows to build first screen, user found his info and abortsRESULT : ONLY 20 rows retrieved from DB2, no sort cost

Agenda



• Sort impact




Select *

• SELECT * almost never to be used

• SELECT ONLY COLUMNS that are needed !

• Reason : – Program maintenance– CPU cost per extra column– SORT file becomes bigger– Maybe not index only

Select *

• Even for : where… exists (select *…)Better where… exists (select 1…)

• Select …col5,… where col5= ‘AB’Better Select …’AB’… where col5= ‘AB’

• Select col1, col2…order by col2Better Select col1…order by col2if just for order by

Mismatching datatype

Mismatching Datatypes Are Stage2

COL_char(6) = ‘ABCDEF ’ COL_char(6) = ‘ABCDEF’

COL_dec = :hostvar-int COL_dec = decimal(:hostvar-integer)

COL_int < 10.5 COL_int < integer(10.5)

COL_int < 10

COL_dec = :hostvar-dec + 2 COL_dec = decimal(:hostvar-dec + 2)

move hostvar-dec +2 to hostvar2-dec

COL_dec = :hostvar2-dec

Other easy improvements

:hv between col1 and col2 col1 >= :hv and col2 <= :hvSubstr(name,1,2) = ‘MI’ name like ‘MI%’

But be careful for ‘%MI%’

Substr(name,5,2) = ‘MI’ denormalize V9 index on expression

COL_date <> ‘9999-12-31’ COL_date < ‘9999-12-31’COL_int <> 0 COL_int > 0COL <> :hv COL in ( , , , , , )COL not <= 5 COL > 5

More easy improvements

Col1=‘A’ or Col1= ‘B’ Col1 in (‘A’,’B’)

Col1>= :hv1 and COL1<=:hv2 Col1 between :hv1 AND :hv2

Col1 >= :hv1 ANDCol1 = :hv1 or (col1=:hv1 or Col1 >:hv1 and Col2 = :hv2 col1>:hv1 and col2 =:hv2)

Col1 = :hva (always 5) Col1 = 5

:hv = 5 IN COBOL !!!

Even More easy improvements

Col1 not between 10 and 50 …col 1 < 10…union all

…col1 > 50…

Existence checking select 1from tablewhere col1 =:hv

fetch first 1 row only

Col1 not in (‘A’, ‘B’, ‘C’) if possibleCol1 in (…the rest)will be cheaper even when list is bigger

Agenda



• Sort impact




Determine Access Path

• Optimization Service Center• Newest generation of Visual explain

• Plan_table• See next slide• Might require some exercise• Not everything in it

• DSN_statement_table• Contains the “Cost columns”

DB2 Plan_table

SELECT QBLOCKNO, PROGNAME, PLANNO, METHOD,

TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME, INDEXONLY, PREFETCH

FROM PLAN_TABLE WHERE QUERYNO = 30303

ORDER BY QBLOCKNO, PLANNO ;

QBLOCKNO PROGNAME PLANNO METHOD TNAME ACCESSTYPE MATCHCOLS ACCESSNAME INDEXONLY PREFETCH

1 DSNESM68 1 0 AATEHA1 I 2 AAX0EHA1 N

1 DSNESM68 2 1 AATEHB1 I 2 AAX0EHB1 N

1 DSNESM68 3 3 0 N

• Qblockno: indicates the number blocks necessary to resolve the query

– General rule, more blocks = less performing

• Progname: represents the program/package name

Access path: planno, method

• Planno: the number of steps AND the sequence in which a query is resolved

– General rule, more steps = less performing

• Method: expresses what kind of access is done– 0 : First access– 1 : Nested Loop Join– 3 : extra sort needed

• Tname: table name to be accessed

• Access type : how that data is accessed

…

DSN_Statement_Table

• Amongst others :

COST_CATEGORY: A: Indicates that DB2 had enough information to make a cost estimate without

using default values. B: Indicates that some condition exists for which DB2 was forced to use default values.

PROCMS: The estimated processor cost, in milliseconds, for the SQL statement

PROCSU: The estimated processor cost, in service units, for the SQL statement

Agenda



• Sort impact




Multi Row fetch

• Technique to save up to 60% of DB2 cpu

• Easy to use

• Fetches a rowset into an array

• Program can control size of rowset

Multi Row Fetch

• To be able to use this, the cursor should be DECLAREd for rowset positioning, for example:

EXEC SQL DECLARE cursor-name CURSOR WITH ROWSET POSITIONING FOR

SELECT column1 ,column2 FROM table-name; END-EXEC

instead ofEXEC SQL DECLARE cursor-name CURSOR FOR SELECT column1 ,column2 FROM table-name; END-EXEC

Then you can FETCH multiple rows at-a-time from the cursor

Multi Row Fetch

On the FETCH statement – the amount of rows requested can be specified

for example:EXEC SQL FETCH NEXT ROWSET FROM cursor-name FOR :rowset-size ROWS INTO …END-EXEC

instead ofEXEC SQL FETCH cursor-name INTO …END-EXEC

• The rowset size can be defined as a constant or a variable, for example:

01 rowset-size PIC S9(09) COMP-5.

Multi Row fetch

• Do not use single and multiple row fetch for the same cursor in one program

• Be aware of compiler limits– elementary item : max. 16Mb– complete working storage : max 128 Mb

• Last FETCH on a rowset can be ‘incomplete’

Multi Row Fetch

• Performance results may differ:– < 5 rows : poor performance (worse than before)

– 10 – 100 rows : best performance

– > 100 rows : no improvement anymore

• Following data is based upon treatment of 1 million rows (in seconds CPU).

Via row Via rowset Gain on DB2in CPU seconds

FETCH 16 6 10 (-60%)

FETCH + UPDATE via row 76 66 10 (-15%)

FETCH + UPDATE via rowset 76 60 16 (-35%)

Sequences

• Easy, fast and cheap way to generate unique numbers if :

• Holes are allowed• The order isn’t important

• Use : “next value for yy.xxxxxxxx” statement

BASIC SYNTAX : CREATE SEQUENCE yy.xxxxxxxx START WITH 1 INCREMENT BY 1

NO MINVALUE NO MAXVALUE NO CYCLE CACHE 200;

Sequences

Effect of concurrency on elapsed time

0

2

4

6

8

1 2 3

amount jobs

du

rati

on

own table

seq object

Effect on cpu usage

0

20

40

60

80

100

120

1 2 3

amount jobs

cp

u own table

seq object

Better response times

Less cpu need

Questions ?

[email protected]

practical sql performance tuning, for developers and dbas kurt struyf competence partners

Documents

cursor slide

access path index

access path step5 slide

index stage1 stage2

hv2 col1

b col1

exec sql fetch

col2 col1