practical sql performance tuning, for developers and dbas kurt struyf competence partners
TRANSCRIPT
Practical SQL performance tuning, for developers and DBAs
Kurt StruyfCompetence Partners
Agenda
• One SQL, one access path
• Index, stage1, stage2
• Sort impact
• SQL examples of sub optimal coding and it’s improvements
• Access path fields in the plan_table
• Other CPU saving techniques
Static SQLOne SQL = One access path
SELECT …FROM …WHERE NAME BETWEEN :HV1-LOW AND :HV1-HIGHAND FIRSTNAME BETWEEN :HV2-LOW AND :HV2-HIGHAND BIRTHDATE BETWEEN :HV3-LOW AND :HV3-HIGHAND ZIPCODE BETWEEN :HV4-LOW AND :HV4-HIGH
Our table has 4 indexes :
IX1 on NAMEIX2 on FIRSTNAMEIX3 on BIRTHDATEIX4 on ZIP CODE
AT BIND TIME DB2 CHOOSES IX3
Step1Step2
Step3
AT RUN TIMEUser only fills out a value for ZIP CODE
Step4
AT RUN TIME DB2 USES IX3 which doesn’t filter anything ONE SQL = ONE access path
Step5
Agenda
• One SQL, one access path
• Index, stage1, stage2
• Sort impact
• SQL examples of sub optimal coding and it’s improvements
• Access path fields in the plan_table
• Other CPU saving techniques
Index, Stage1, Stage2
DB2
RDS
DM
Index Index
Stage1
Stage2
Indexable
• Keep it positive and simple!• ‘=‘ : equal to• ‘>’ : larger then• ‘<‘ : smaller then• ‘>=‘ : larger then or equal to• ‘<=‘ : smaller then or equal to• ‘LIKE’• ‘IN’• ‘BETWEEN’
MUST ALSO BE BOOLEAN
DB2
RDS
DM
Index Index
Stage1
Stage2
Matching columns is an indication of how well an index is used,- more matching columns better index use- always start with first index column- on ‘=‘ and on one ‘IN’ you can continue
Example: Index on (Name, Clientno, Salarycode) Predicate Matching--------------------- -------------1. Name = ‘Smith’ AND Clientno = 20 AND Salarycode = 56
2. Name = ‘Smith’ OR Clientno > 20 AND Salarycode = 56
3. Name IN (‘Smith’, ‘Doe’) AND Clientno > 20 AND Salarycode = 56
Matching Columns
Predicate Matching--------------------- -------------4. Name IN (‘Smith’, ‘Doe’) AND Clientno IN (20,30) AND Salarycode > 0
5. Name <> ‘Smith AND Clientno = 56
6. Clientno = 56 AND Salarycode = 0
Stage 1 • Keep it positive and simple but no index!
• ‘=‘ : equal to• ‘>’ : larger then• ‘<‘ : smaller then• ‘>=‘ : larger then or equal to• ‘<=‘ : smaller then or equal to• ‘LIKE’• ‘IN’• ‘BETWEEN’
DB2
RDS
DM
Index Index
Stage1
Stage2
• Keep it simple but not positive !• Keep it simple but not Boolean !
Stage 2
• All the rest !!• All functions such as
– SUBSTR
– CONCAT
– CHAR()
• Mismatching data types– Colchar_6 = ‘1234567’
• Host variable checking– AND :HV1 = 5
• Decryption • Current date between col1 and col2• Sorting
DB2
RDS
DM
Index Index
Stage1
Stage2
In COBOL checking “stage 3”
• NEVER (ab)use this!
• All DB2 columns that CAN be checked in SQL SHOULD be checked in SQL
• So BETTER a Stage2 predicate then NO predicate
DB2
RDS
DM
Index Index
Stage1
Stage2
IN COBOL
Index, Stage1, Stage2
DB2
RDS
DM
Index Index
Stage1
Stage2
Agenda
• One SQL, one access path
• Index, stage1, stage2
• Sort impact
• SQL examples of sub optimal coding and it’s improvements
• Access path fields in the plan_table
• Other CPU saving techniques
Sort impact
• Sort YES :at open cursor all rows are retrieved (e.g. 3000)fetch 20 rows to build first screen, user found his info and abortsRESULT : 2980 rows needless retrieved
• Sort NO :at fetch time first row retrievedfetch 20 rows to build first screen, user found his info and abortsRESULT : ONLY 20 rows retrieved from DB2, no sort cost
Agenda
• One SQL, one access path
• Index, stage1, stage2
• Sort impact
• SQL examples of sub optimal coding and it’s improvements
• Access path fields in the plan_table
• Other CPU saving techniques
Select *
• SELECT * almost never to be used
• SELECT ONLY COLUMNS that are needed !
• Reason : – Program maintenance– CPU cost per extra column– SORT file becomes bigger– Maybe not index only
Select *
• Even for : where… exists (select *…)Better where… exists (select 1…)
• Select …col5,… where col5= ‘AB’Better Select …’AB’… where col5= ‘AB’
• Select col1, col2…order by col2Better Select col1…order by col2if just for order by
Mismatching datatype
Mismatching Datatypes Are Stage2
COL_char(6) = ‘ABCDEF ’ COL_char(6) = ‘ABCDEF’
COL_dec = :hostvar-int COL_dec = decimal(:hostvar-integer)
COL_int < 10.5 COL_int < integer(10.5)
COL_int < 10
COL_dec = :hostvar-dec + 2 COL_dec = decimal(:hostvar-dec + 2)
move hostvar-dec +2 to hostvar2-dec
COL_dec = :hostvar2-dec
Other easy improvements
:hv between col1 and col2 col1 >= :hv and col2 <= :hvSubstr(name,1,2) = ‘MI’ name like ‘MI%’
But be careful for ‘%MI%’
Substr(name,5,2) = ‘MI’ denormalize V9 index on expression
COL_date <> ‘9999-12-31’ COL_date < ‘9999-12-31’COL_int <> 0 COL_int > 0COL <> :hv COL in ( , , , , , )COL not <= 5 COL > 5
More easy improvements
Col1=‘A’ or Col1= ‘B’ Col1 in (‘A’,’B’)
Col1>= :hv1 and COL1<=:hv2 Col1 between :hv1 AND :hv2
Col1 >= :hv1 ANDCol1 = :hv1 or (col1=:hv1 or Col1 >:hv1 and Col2 = :hv2 col1>:hv1 and col2 =:hv2)
Col1 = :hva (always 5) Col1 = 5
:hv = 5 IN COBOL !!!
Even More easy improvements
Col1 not between 10 and 50 …col 1 < 10…union all
…col1 > 50…
Existence checking select 1from tablewhere col1 =:hv
fetch first 1 row only
Col1 not in (‘A’, ‘B’, ‘C’) if possibleCol1 in (…the rest)will be cheaper even when list is bigger
Agenda
• One SQL, one access path
• Index, stage1, stage2
• Sort impact
• SQL examples of sub optimal coding and it’s improvements
• Access path fields in the plan_table
• Other CPU saving techniques
Determine Access Path
• Optimization Service Center• Newest generation of Visual explain
• Plan_table• See next slide• Might require some exercise• Not everything in it
• DSN_statement_table• Contains the “Cost columns”
DB2 Plan_table
SELECT QBLOCKNO, PROGNAME, PLANNO, METHOD,
TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME, INDEXONLY, PREFETCH
FROM PLAN_TABLE WHERE QUERYNO = 30303
ORDER BY QBLOCKNO, PLANNO ;
QBLOCKNO PROGNAME PLANNO METHOD TNAME ACCESSTYPE MATCHCOLS ACCESSNAME INDEXONLY PREFETCH
1 DSNESM68 1 0 AATEHA1 I 2 AAX0EHA1 N
1 DSNESM68 2 1 AATEHB1 I 2 AAX0EHB1 N
1 DSNESM68 3 3 0 N
• Qblockno: indicates the number blocks necessary to resolve the query
– General rule, more blocks = less performing
• Progname: represents the program/package name
Access path: planno, method
• Planno: the number of steps AND the sequence in which a query is resolved
– General rule, more steps = less performing
• Method: expresses what kind of access is done– 0 : First access– 1 : Nested Loop Join– 3 : extra sort needed
• Tname: table name to be accessed
• Access type : how that data is accessed
…
DSN_Statement_Table
• Amongst others :
COST_CATEGORY: A: Indicates that DB2 had enough information to make a cost estimate without
using default values. B: Indicates that some condition exists for which DB2 was forced to use default values.
PROCMS: The estimated processor cost, in milliseconds, for the SQL statement
PROCSU: The estimated processor cost, in service units, for the SQL statement
Agenda
• One SQL, one access path
• Index, stage1, stage2
• Sort impact
• SQL examples of sub optimal coding and it’s improvements
• Access path fields in the plan_table
• Other CPU saving techniques
Multi Row fetch
• Technique to save up to 60% of DB2 cpu
• Easy to use
• Fetches a rowset into an array
• Program can control size of rowset
Multi Row Fetch
• To be able to use this, the cursor should be DECLAREd for rowset positioning, for example:
EXEC SQL DECLARE cursor-name CURSOR WITH ROWSET POSITIONING FOR
SELECT column1 ,column2 FROM table-name; END-EXEC
instead ofEXEC SQL DECLARE cursor-name CURSOR FOR SELECT column1 ,column2 FROM table-name; END-EXEC
Then you can FETCH multiple rows at-a-time from the cursor
Multi Row Fetch
On the FETCH statement – the amount of rows requested can be specified
for example:EXEC SQL FETCH NEXT ROWSET FROM cursor-name FOR :rowset-size ROWS INTO …END-EXEC
instead ofEXEC SQL FETCH cursor-name INTO …END-EXEC
• The rowset size can be defined as a constant or a variable, for example:
01 rowset-size PIC S9(09) COMP-5.
Multi Row fetch
• Do not use single and multiple row fetch for the same cursor in one program
• Be aware of compiler limits– elementary item : max. 16Mb– complete working storage : max 128 Mb
• Last FETCH on a rowset can be ‘incomplete’
Multi Row Fetch
• Performance results may differ:– < 5 rows : poor performance (worse than before)
– 10 – 100 rows : best performance
– > 100 rows : no improvement anymore
• Following data is based upon treatment of 1 million rows (in seconds CPU).
Via row Via rowset Gain on DB2in CPU seconds
FETCH 16 6 10 (-60%)
FETCH + UPDATE via row 76 66 10 (-15%)
FETCH + UPDATE via rowset 76 60 16 (-35%)
Sequences
• Easy, fast and cheap way to generate unique numbers if :
• Holes are allowed• The order isn’t important
• Use : “next value for yy.xxxxxxxx” statement
BASIC SYNTAX : CREATE SEQUENCE yy.xxxxxxxx START WITH 1 INCREMENT BY 1
NO MINVALUE NO MAXVALUE NO CYCLE CACHE 200;
Sequences
Effect of concurrency on elapsed time
0
2
4
6
8
1 2 3
amount jobs
du
rati
on
own table
seq object
Effect on cpu usage
0
20
40
60
80
100
120
1 2 3
amount jobs
cp
u own table
seq object
Better response times
Less cpu need
Questions ?