cbo basics: cardinality
TRANSCRIPT
CBO Basics: Cardinality
Sidney Chen@zha-dba
Agenda
• Cardinality
• The Cardinality under Various Prediction
• Case study: incorrect high value
CardinalityThe estimated number of rows a query is expected to return.
Number of rows in table
xPredicate selectivity
create table audience asselect mod(rownum-1,12) + 1 month_no, mod(rownum-1,10) + 1 city_nofrom all_objectswhere rownum <= 1200;
Imaging there are 1200 people attend the dba weekly meeting, They are randomly born in 12 month and comes from 10 citites, given the data the even distributed.
sid@CS10G> select month_no, count(*) from audience group by month_no order by month_no;
MONTH_NO COUNT(*)---------- ---------- 1 100 2 100 3 100 4 100 5 100 6 100 7 100 8 100 9 100 10 100 11 100 12 100
12 rows selected.
sid@CS10G> select city_no, count(*) from audience group by city_no order by city_no;
CITY_NO COUNT(*)---------- ---------- 1 120 2 120 3 120 4 120 5 120 6 120 7 120 8 120 9 120 10 120
10 rows selected.
sid@CS10G> @ind sid.audience;
OWNER TABLE_NAME BLOCKS NUM_ROWS AVG_ROW_LEN----- ---------- ---------- ---------- -----------SID AUDIENCE 5 1200 6
sid@CS10G> @desc sid.audience;
Column Name NUM_DISTINCT DENSITY NUM_BUCKETS Low High------------ ------------ ---------- ----------- ---- ----MONTH_NO 12 .083333333 1 1 12CITY_NO 10 .1 1 1 10
Critical info
• NDK: number of distinct keys• Density = 1/NDK, (0.1 = 1/10) (0.083333333 =
1/12)• NUM_BUCKETS=1, there is no histogram
gather• NUM_BUCKETS > 1, histogram gathered
select month_no from audience where month_no=12;
Cardinality1200 * (1/12) = 100
sid@CS10G> select month_no from audience where month_no=12;
100 rows selected.
Execution Plan----------------------------------------------------------Plan hash value: 2423062965
-------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|-------------------------------------------------------------------| 0 | SELECT STATEMENT | | 100 | 300 | 3 (0)||* 1 | TABLE ACCESS FULL| AUDIENCE | 100 | 300 | 3 (0)|-------------------------------------------------------------------
Predicate Information (identified by operation id):---------------------------------------------------
1 - filter("MONTH_NO"=12)
select month_no from audience where city_no=1;
Cardinality1200 * (1/10) = 120
sid@CS10G> select month_no from audience where city_no=1;
120 rows selected.
Execution Plan----------------------------------------------------------Plan hash value: 2423062965
-------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|-------------------------------------------------------------------| 0 | SELECT STATEMENT | | 120 | 720 | 3 (0)||* 1 | TABLE ACCESS FULL| AUDIENCE | 120 | 720 | 3 (0)|-------------------------------------------------------------------
Predicate Information (identified by operation id):---------------------------------------------------
1 - filter("CITY_NO"=1)
select month_no from audience where month_no > 9;
Cardinality1200 * ( (12-9)/(12-1) ) = 327
sid@CS10G> select month_no from audience where month_no > 9;
300 rows selected.
Execution Plan----------------------------------------------------------Plan hash value: 2423062965
-------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|-------------------------------------------------------------------| 0 | SELECT STATEMENT | | 327 | 981 | 3 (0)||* 1 | TABLE ACCESS FULL| AUDIENCE | 327 | 981 | 3 (0)|-------------------------------------------------------------------
Predicate Information (identified by operation id):---------------------------------------------------
1 - filter("MONTH_NO">9)
Equality: If Out of range
explain plan set statement_id = '12' for select month_no from audience where month_no = 12;
explain plan set statement_id = '13' for select month_no from audience where month_no = 13;
explain plan set statement_id = '14' for select month_no from audience where month_no = 14;
explain plan set statement_id = '15' for select month_no from audience where month_no = 15;
explain plan set statement_id = '16' for select month_no from audience where month_no = 16;
explain plan set statement_id = '17' for select month_no from audience where month_no = 17;
explain plan set statement_id = '18' for select month_no from audience where month_no = 18;
explain plan set statement_id = '19' for select month_no from audience where month_no = 19;
explain plan set statement_id = '20' for select month_no from audience where month_no = 20;
sid@CS10G> select statement_id, cardinality from plan_table where id=1 order by statement_id;
STATEMENT_ID CARDINALITY------------ -----------12 10013 9114 8215 7316 6417 5518 4519 3620 27
9 rows selected.
Range: If Out of range explain plan set statement_id = '12' for select month_no
from audience where month_no > 12;explain plan set statement_id = '13' for select month_no
from audience where month_no > 13;explain plan set statement_id = '14' for select month_no
from audience where month_no > 14;explain plan set statement_id = '15' for select month_no
from audience where month_no > 15;explain plan set statement_id = '16' for select month_no
from audience where month_no > 16;explain plan set statement_id = '17' for select month_no
from audience where month_no > 17;explain plan set statement_id = '18' for select month_no
from audience where month_no > 18;explain plan set statement_id = '19' for select month_no
from audience where month_no > 19;explain plan set statement_id = '20' for select month_no
from audience where month_no > 20;
sid@CS10G> select statement_id, cardinality from plan_table where id=1 order by statement_id;
STATEMENT_ID CARDINALITY------------ -----------12 10013 9114 8215 7316 6417 5518 4519 3620 27
9 rows selected.
The far from low/high range, the less you are to find data
Bad news
• If you have sequence, or time-based column in predicate(such as last modified date:last_mod_dt), and haven’t been keeping the statistics up to date
• The Cardinality will drop as time passes, if you using equality and range on that column
sid@CS10G> select month_no from audience where month_no > 16;
no rows selected
Execution Plan----------------------------------------------------------Plan hash value: 2423062965
-------------------------------------+-----------------------------------+| Id | Operation | Name | Rows | Bytes | Cost | Time |-------------------------------------+-----------------------------------+| 0 | SELECT STATEMENT | | | | 3 | || 1 | TABLE ACCESS FULL | AUDIENCE| 64 | 192 | 3 | 00:00:01 |-------------------------------------+-----------------------------------+Predicate Information:----------------------1 - filter("MONTH_NO"=16)
Using 10053 event to confirmsid@CS10G> @53onalter session set events '10053 trace name context forever, level 1';
Session altered.
sid@CS10G> explain plan for select month_no from audience where month_no > 16;
Explained.
sid@CS10G> @53offsid@CS10G> @tracefile
TRACEFILE----------------------------------------------------------------------------------------------------/home/u02/app/oracle/product/11.1.0/db_1/admin/cs10g/udump/cs10g_ora_24947.trc
SINGLE TABLE ACCESS PATH ----------------------------------------- BEGIN Single Table Cardinality Estimation ----------------------------------------- Column (#1): MONTH_NO(NUMBER) AvgLen: 3.00 NDV: 12 Nulls: 0 Density: 0.083333 Min: 1 Max: 12 Using prorated density: 0.05303 of col #1 as selectivity of out-of-
range value pred Table: AUDIENCE Alias: AUDIENCE Card: Original: 1200 Rounded: 64 Computed: 63.64 Non Adjusted:
63.64 ----------------------------------------- END Single Table Cardinality Estimation -----------------------------------------
Between: If Out of range explain plan set statement_id = '12' for select month_no from
audience where month_no between 13 and 15;explain plan set statement_id = '14' for select month_no from
audience where month_no between 14 and 16; explain plan set statement_id = '15' for select month_no from
audience where month_no between 15 and 17; explain plan set statement_id = '16' for select month_no from
audience where month_no between 13 and 20; explain plan set statement_id = '17' for select month_no from
audience where month_no between 14 and 21; explain plan set statement_id = '18' for select month_no from
audience where month_no between 15 and 22; explain plan set statement_id = '19' for select month_no from
audience where month_no between 16 and 23;
sid@CS10G> select statement_id, cardinality from plan_table where id=1 order by statement_id;
STATEMENT_ID CARDINALITY------------ -----------12 10013 10014 10015 10016 10017 10018 10019 10020 100
9 rows selected.
Case: incorrect low/high value
SELECT count('1') RECCOUNTFROM Test_ILM_INTERACTION t0JOIN Test_ILM_INTERACTION_TYPE t1 ON t0.INTERACTION_TYP =
t1.INTERACTION_TYPJOIN Test_ILM_INTERACTION_REF t3 ON t0.interaction_uuid =
t3.interaction_uuidWHERE t1.IS_VIEWABLE = 1
AND ((t0.DOMAIN_NME = 'DOMAIN_A') or (T0.DOMAIN_NME = 'DOMAIN_B' AND
T0.APPLICATION_NME = 'APPLICATION_C')) AND (t3.REF_CDE = 'BK_NUMBER' AND t3.REF_KEY_VALUE = '2389301444') AND t0.INTERACTION_DT BETWEEN
TO_DATE('01-06-2011 16:00:00', 'DD-MM-YYYY HH24:MI:SS') AND TO_DATE('16-06-2011 15:59:59', 'DD-MM-YYYY HH24:MI:SS')
---------------------------------------------------------------------+----------------| Id | Operation | Name | Rows | Cost |---------------------------------------------------------------------+----------------| 0 | SELECT STATEMENT | | | 13 || 1 | SORT AGGREGATE | | 1 | || 2 | NESTED LOOPS | | 1 | 13 || 3 | NESTED LOOPS | | 1 | 12 || 4 | INDEX RANGE SCAN | Test_ILM_INTERACTION_IDX3 | 4 | 4 || 5 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_REF_PK | 1 | 2 || 6 | TABLE ACCESS BY INDEX ROWID | Test_ILM_INTERACTION_TYPE | 1 | 1 || 7 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_TYPE_PK| 1 | 0 |---------------------------------------------------------------------+----------------
Predicate Information (identified by operation id):--------------------------------------------------- 4 - access("T0"."INTERACTION_DT">=TO_DATE('2011-06-01 05:00:00', 'yyyy-mm-dd hh24:mi:ss') AND "T0"."INTERACTION_DT"<=TO_DATE('2011-06-16 04:59:59', 'yyyy-mm-dd hh24:mi:ss')) filter("T0"."DOMAIN_NME"='DOMAIN_A' OR "T0"."APPLICATION_NME"='APPLICATION_C' AND "T0"."DOMAIN_NME"='DOMAIN_B') 5 - access("T0"."INTERACTION_UUID"="T3"."INTERACTION_UUID" AND "T3"."REF_CDE"='BL_NUMBER' AND "T3"."REF_KEY_VALUE"=2389301444') 6 - filter("T1"."IS_VIEWABLE"=1) 7 - access("T0"."INTERACTION_TYP"="T1"."INTERACTION_TYP")
sys@CS2PRD> @desc Testowner.Test_ilm_interaction
Column Name NUM_DISTINCT DENSITY Low High------------------ ------------ ---------- -------------------- --------------------INTERACTION_DT 7583898 1.3186E-07 2010-05-07 23:45:47 2011-05-31 15:31:35
sys@CS2PRD> @ind Testowner.Test_ilm_interaction
TABLE_NAME INDEX_NAME POS# COLUMN_NAME ----------------------- --------------------------- ---- ----------------Test_ILM_INTERACTION Test_ILM_INTERACTION_IDX3 1 INTERACTION_DT 2 COMPANY_ID 3 APPLICATION_NME 4 DOMAIN_NME 5 INTERACTION_TYP 6 INTERACTION_UUID
10053 trace
Column (#4): INTERACTION_DT(DATE) AvgLen: 8.00 NDV: 7583898 Nulls: 0 Density: 1.3186e-07 Min: 2455325 Max: 2455714
SINGLE TABLE ACCESS PATH Using prorated density: 1.3151e-07 of col #4 as selectivity of out-of-range
value pred Table: Test_ILM_INTERACTION Alias: T0 Card: Original: 53138344 Rounded: 4 Computed: 3.75 Non Adjusted: 3.75
Access Path: index (IndexOnly) Index: Test_ILM_INTERACTION_IDX3 resc_io: 4.00 resc_cpu: 29886 ix_sel: 1.3151e-07 ix_sel_with_filters: 7.0628e-08 Cost: 4.00 Resp: 4.00 Degree: 1 ****** trying bitmap/domain indexes ****** Best:: AccessPath: IndexRange Index: Test_ILM_INTERACTION_IDX3 Cost: 4.00 Degree: 1 Resp: 4.00 Card: 3.75 Bytes: 0
select to_date(2455714,'J') max_date from dual;
MAX_DATE-------------------2011-06-01 00:00:00
Solution:dbms_stats.set_colum_statsm_high := ADD_MONTHS(m_high, 1);m_val_array := DBMS_STATS.DATEARRAY(m_low, m_high);
dbms_stats.prepare_column_values(srec => m_statrec,datevals => m_val_array
);
dbms_stats.set_column_stats(ownname => NULL,tabname => '&&TABLE_NAME',colname => '&&COLUMN_NAME', distcnt => m_distcnt,density => m_density,nullcnt => m_nullcnt,srec => m_statrec,avgclen => m_avgclen,no_invalidate => false
);
Pls check the hack_stats_3.sql for the full script
sys@CS2PRD> @desc Testowner.Test_ilm_interaction
Column Name NUM_DISTINCT DENSITY Low High------------------ ------------ ---------- -------------------- --------------------INTERACTION_DT 7583898 1.3186E-07 2010-05-07 23:45:47 2011-06-31 15:31:35
---------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost (%CPU)|---------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 14 (0)|| 1 | SORT AGGREGATE | | 1 | || 2 | NESTED LOOPS | | 1 | 14 (0)|| 3 | NESTED LOOPS | | 1 | 13 (0)|| 4 | TABLE ACCESS BY INDEX ROWID| Test_ILM_INTERACTION_REF | 1 | 10 (0)||* 5 | INDEX RANGE SCAN | Test_ILM_INTERACTION_REF_IDX1 | 9 | 4 (0)||* 6 | TABLE ACCESS BY INDEX ROWID| Test_ILM_INTERACTION | 1 | 3 (0)||* 7 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_PK | 1 | 2 (0)||* 8 | TABLE ACCESS BY INDEX ROWID | Test_ILM_INTERACTION_TYPE | 1 | 1 (0)||* 9 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_TYPE_PK | 1 | 0 (0)|---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):---------------------------------------------------
5 - access("T3"."REF_CDE"='BK_NUMBER' AND "T3"."REF_KEY_VALUE"='2389301444') 6 - filter("T0"."INTERACTION_DT">=TO_DATE('2011-06-01 16:00:00', 'yyyy-mm-dd hh24:mi:ss') AND ("T0"."DOMAIN_NME"='DOMAIN_A' OR "T0"."APPLICATION_NME"='APPLICATION_C' AND "T0"."DOMAIN_NME"='DOMAIN_B') AND "T0"."INTERACTION_DT"<=TO_DATE('2011-06-16
15:59:59', 'yyyy-mm-dd hh24:mi:ss')) 7 - access("T0"."INTERACTION_UUID"="T3"."INTERACTION_UUID") 8 - filter("T1"."IS_VIEWABLE"=1) 9 - access("T0"."INTERACTION_TYP"="T1"."INTERACTION_TYP")
10053 trace
*************************************** SINGLE TABLE ACCESS PATH Column (#9): DOMAIN_NME(VARCHAR2) AvgLen: 12.00 NDV: 10 Nulls: 53692 Density: 9.3545e-09 Histogram: Freq #Bkts: 10 UncompBkts: 5973 EndPtVals: 10 Column (#2): APPLICATION_NME(VARCHAR2) AvgLen: 17.00 NDV: 30 Nulls: 0 Density: 9.3451e-09 Histogram: Freq #Bkts: 30 UncompBkts: 5979 EndPtVals: 30 Column (#4): INTERACTION_DT(DATE) AvgLen: 8.00 NDV: 7583898 Nulls: 0 Density: 1.3186e-07 Min: 2455325 Max: 2455744 Table: Test_ILM_INTERACTION Alias: T0 Card: Original: 53138344 Rounded: 1024006 Computed: 1024006.31 Non Adjusted:
1024006.31 Access Path: index (IndexOnly) Index: Test_ILM_INTERACTION_IDX3 resc_io: 28694.00 resc_cpu: 574312799 ix_sel: 0.035829 ix_sel_with_filters: 0.019271 Cost: 28722.69 Resp: 28722.69 Degree: 1 ****** trying bitmap/domain indexes ****** Best:: AccessPath: IndexRange Index: Test_ILM_INTERACTION_IDX3 Cost: 28722.69 Degree: 1 Resp: 28722.69 Card: 1024006.31 Bytes: 0***************************************
select to_date(2455744,'J') max_date from dual;
MAX_DATE-------------------2011-07-01 00:00:00
Performance after tuning
The functional scriptsFile_Name Function
10053_bad_plan_for_Test.trc the 10053 tracefile output for the case study before tuning
10053_good_plan_for_Test.trc the 10053 tracefile output for the case study after tuning
53off.sql turn off the 10053 event
53on.sql turn on the 10053 event
demo_script.sql the demo scripts, pls play with it before the sharing
desc.sql desc the table statistics info, including easily reading low/high value
hack_stats_3.sqlincrease the high_value one month, update the manuaaly statistics by dbms_stats.set_column_stats
ind.sql display the table/index statstics
raw_to_value.sql create the function used by the desc.sql
tracefile.sql display the tracefile for the current session
Thanks
Q&A
---------------------------------------------------------------------+----------------| Id | Operation | Name | Rows | Cost |---------------------------------------------------------------------+----------------| 0 | SELECT STATEMENT | | | 13 || 1 | SORT AGGREGATE | | 1 | || 2 | NESTED LOOPS | | 1 | 13 || 3 | NESTED LOOPS | | 1 | 12 || 4 | INDEX RANGE SCAN | Test_ILM_INTERACTION_IDX3 | 4 | 4 || 5 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_REF_PK | 1 | 2 || 6 | TABLE ACCESS BY INDEX ROWID | Test_ILM_INTERACTION_TYPE | 1 | 1 || 7 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_TYPE_PK| 1 | 0 |---------------------------------------------------------------------+----------------
---------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost (%CPU)|---------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 14 (0)|| 1 | SORT AGGREGATE | | 1 | || 2 | NESTED LOOPS | | 1 | 14 (0)|| 3 | NESTED LOOPS | | 1 | 13 (0)|| 4 | TABLE ACCESS BY INDEX ROWID| Test_ILM_INTERACTION_REF | 1 | 10 (0)||* 5 | INDEX RANGE SCAN | Test_ILM_INTERACTION_REF_IDX1 | 9 | 4 (0)||* 6 | TABLE ACCESS BY INDEX ROWID| Test_ILM_INTERACTION | 1 | 3 (0)||* 7 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_PK | 1 | 2 (0)||* 8 | TABLE ACCESS BY INDEX ROWID | Test_ILM_INTERACTION_TYPE | 1 | 1 (0)||* 9 | INDEX UNIQUE SCAN | Test_ILM_INTERACTION_TYPE_PK | 1 | 0 (0)|---------------------------------------------------------------------------------------------
B-tree Access Cost
cost =blevel +ceiling(leaf_blocks * effective index selectivity) +ceiling(clustering_factor * effective table selectivity) = blevel +ceiling(leaf_blocks * ix_sel)+ceiling(clustering_factor *ix_sel_with_filter)
Index Stats:: Index: Test_ILM_INTERACTION_IDX3 Col#: 4 10 2 9 7 1 LVLS: 3 #LB: 800771 #DK: 51629818 LB/K: 1.00 DB/K: 1.00 CLUF: 27495746.00
Access Path: index (IndexOnly) Index: Test_ILM_INTERACTION_IDX3 resc_io: 4.00 resc_cpu: 29886 ix_sel: 1.3151e-07 ix_sel_with_filters: 7.0734e-08 Cost: 4.00 Resp: 4.00 Degree: 1
sys@CS10G> select 3 + ceil(800771 * 0.00000013151) cost from dual;
COST---------- 4
Index only, no need to access table
Index Stats:: Index: Test_ILM_INTERACTION_REF_IDX1 Col#: 2 3 LVLS: 3 #LB: 872428 #DK: 10567436 LB/K: 1.00 DB/K: 5.00 CLUF: 60057635.00
Access Path: index (AllEqRange) Index: Test_ILM_INTERACTION_REF_IDX1 resc_io: 10.00 resc_cpu: 74724 ix_sel: 9.4630e-08 ix_sel_with_filters: 9.4630e-08 Cost: 10.00 Resp: 10.00 Degree: 1
sys@CS10G> select 3 + ceil(872428 * 0.00000009463) + ceil(60057535 * 0.00000009463) cost from dual;
COST---------- 10