how mysql choose the execution plan
TRANSCRIPT
MySQL IndexHow mysql choose the execution plan
Li Xinhe @2016 July
You’ve Made a Great Choice!
Understanding indexing is crucial both for Devs and DBAs
Poor index choices are responsible for large portion of production problems.
Indexing is not a rocket science
Maybe not for Optimizer
source code lines almost 2M
code shipped per year “5 line”
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
Quiz From Garena Test for Software Developers
Which of the following queries can fully utilize the composite index "INDEX(a,
b)" on the columns "a" and "b" in the "user" table? ______
A. SELECT * FROM user WHERE a=0 AND b=0;B. SELECT * FROM user WHERE a=0 OR b=0;C. SELECT * FROM user WHERE a>0 AND b=0;
D. SELECT * FROM user WHERE a=0 AND b>0;
Quiz: Your Answer
A. a=0 AND b=0;B. a=0 OR b=0;C. a>0 AND b=0;
D. a=0 AND b>0;
Quiz :My Answer
A. a=0 AND b=0;B. a=0 OR b=0;C. a>0 AND b=0;
D. a=0 AND b>0;
Official Answer AD
My Answer A, AD, ACD, ABCD
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
MySQL & Index
What are indexes for:
Speed up access in the db
Help to enforce constraints (UNIQUE, FOREIGN KEY)
Types of Indexes
BTree Majority of indexes we deal in MySQL
RTree
HASH
FULLTEXT
B++Tree Example
Indexes in MyISAM vs Innodb
MyISAM:
Point to physical offset in the data file
All indexes are equivalent
Innodb
Clustered Indexes (primary key) store data in the leaf page, not pointer
Secondary Indexes
Indexes
Multiple Column Indexes Or Composite Index
KEY `index1` (`a`,`b`)
Still one B+Tree Index
Index query Vs Post filter
Storage Engine (Innodb) use the Index for query, then MySQL will filter if needed
Overhead of The Indexing
Update the indexes when writing
Need more space on disk and in memory
Impact on Cost of Indexing for Innodb
Long PK
Make all Secondary keys longer and slower
Random PK
Insertion causes a lot of page splits reduce the lifetime of SSD
Low selectivity index
Index on gender
Random Read Vs Sequential Read
Prefetching
InnoDB read-ahead innodb_read_ahead_threshold
Oracle multiblock-read
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
Explain the “EXPLAIN”
ID
select_type
SIMPLE , PRIMARY, SUBQUERY, DERIVED, UNION, UNION RESULT
Type best ---> worst
const, system > eq_ref , ref > range > index >> ALL
Possible_keys & key &Rows
Key_len: Composite Index
Extra
Using index : Covering Index
Using where: Post-filter
Using temporary: sort or group by
Using filesort: can’t sort using index.
Using index condition: ICP
EXPLAIN
More data in MySQL 5.7
Try “format=json” MySQL 5.6
TRACE
EXPLAIN shows the selected plan
TRACE show WHY the plan was selected:
Alternative plans
Estimated costs
Decisions mode
JSON format
How to use Mysql 5.6
SET optimizer_trace= "enabled=on"
Do query
select trace into dumpfile "/var/lib/mysql-files/trace1.log" from information_schema.optimizer_trace;
SET optimizer_trace= "enabled=off"
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
MySQL Optimizer
Cost-based Query Optimization: General idea
Assign cost to operations
Compute cost of partial or alternative plans
Search for plan with lowest cost
Cost-based optimizations:
Access method
Join order
Subquery strategy
Total Cost = IO cost + CPU cost
Input to Cost Model
IO-cost:
Estimates from storage engine based on number of pages to read
Both index and data pages
Schema:
Length of records and keys
Uniqueness for indexes
Nullability
Input to Cost Model
Statistics:
Number of rows in table
Key distribution/Cardinality:
Average number of records per key value
Only for indexed columns
Maintained by storage engine
Number of records in an index range
More on Cost Model
Not just minimizing number of scanned rows
Lots of other heuristics and hacks
Primary Key is special for Innodb
Covering Index benefits
Full table scan is faster
Also can use index for sorting
Data in memory, on disk, on ssd
Note it can change dynamically based on constants and data
Memory Disk SSD
Table scan 6.8s 36s 15s
Index scan 5.2s 2.5hour 30min
Cost Model Example
SELECT * FROM t2 WHERE a BETWEEN x AND y;
Table scan:
IO cost : #pages in table
CPU cost : #rows * ROW_EVALUATE_COST
Range scan:
IO cost : #pages to read from index + #rows_in_range
CPU cost: #rows_in_range * ROW_EVALUATE_COST
Cost Model Example EXPLAIN
EXPLAIN SELECT * FROM t2 WHERE a BETWEEN 50 AND 60;
EXPLAIN SELECT * FROM t2 WHERE a BETWEEN 50 AND 70;
Cost Model Example TRACE
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
ICP Index_Condition_Pushdown
Main Ideal:
Using Index data to filter WHERE clause
Push where clause “Conditions” for Storage engine to filter
SELECT A WHERE B = 2 AND C LIKE “%lee%”NO ICP
Index(B) -- traditional, using index for range only
Index(B,C,A) -- covering. All involved columns included
Using ICP
Index(B,C) range access by B, filter clause on c, only read full row if match
ICP Index_Condition_Pushdown No ICP Using ICP
WHERE B = 2 AND C LIKE “%lee%” Index (B, C)
ICP Index_Condition_Pushdown
Mysql 5.6+ (5.7 support partitioned tables)
Used for the range, ref, eg_ref and ref_or_null
By default is onSELECT @@optimizer_switch;
set @@optimizer_switch = "index_condition_pushdown=off"
ICP demoTable & Data
create table icp(id int, age int, name varchar(30), memo varchar(600));alter table icp add index aind(age, name, memo);while (100K){ --eval insert into icp values($i, 1, 'a$i', repeat('a$i', 100))}
SQL: select * from icp where age = 1 and memo like '%9999%';
show session status like '%handler%';
Handler_read_next 100000 -- > 10+
Explain to check using ICP “Using index condition”
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
Quiz: Explain
Scenario 1 Most case in live db config and db distribution AD
Scenario 2 Enable Index_Condition_Pushdown ACD
Scenario 3 Special data distribution A
Scenario 4 Special table structure (Covering Index) ABCD
Scenario 5 Special Storage Engine Index using hashtab A
How to modify the question to make answer unique?
A. a=0 AND b=0;B. a=0 OR b=0;C. a>0 AND b=0;
D. a=0 AND
b>0;