indexing the mysql index: key to performance tuning
Upload: osscube-llc-a-global-open-source-enterprise-for-open-source-solutions
Post on 11-May-2015
18.473 views
TRANSCRIPT
Indexing the MySQL Index: Guide to Performance Enhancement
Presented by – Sonali Minocha OSSCube
Who Am I?
Chief Technology Officer (MySQL) with OSSCube
MySQL Consulting, Implementation & Training
MySQL Certified DBA & Cluster DBA
What is Index?
A database index is a data structure that improves
the speed of data retrieval operations on a database
table.
A mechanism to locate and access data within a database.
An index may quote one or more columns and be a means
of enforcing uniqueness on their values.
More about Index
•Speedy data retrieval.•SPEED of SELECTs •Rapid random look ups.•Efficient for Reporting, OLAP, read intensive applications ••However it is expensive for •– Slows down writes•– heavy write applications (OLTP) be careful •– More disk space used
Properties
Index can be created on :•One or more columns.
Index only contains key-fields according
to which table is arranged.
Index may be unique or non-
unique.
Index may quote one or more columns and be a
means of enforcing uniqueness of their
values.
EMPLOYEE TABLE
EMPLOYEE ID FIRSTNAME LASTNAME AGE SALARY GENDER
001 Ashish Kataria 25 10000 M
002 Rony Felix 28 20000 M
003 Namita Misra 24 10000 F
004 Ankur Aeran 30 25000 M
005 Priyanka Jain 30 20000 F
006 Pradeep Pandey 31 30000 M
007 Pankaj Gupta 25 12000 M
008 Ankit Garg 30 15000 M
Cont.
In this table if we have to search for employee whose name is Rony then code will look like :For each row in table if row[2] = 'Rony' then results.append[row] Else movenext
So we checking each now for condition.
HOW DATABASE INDEXES WORK ?
• Lets assume we have a table of data like this:
Type Of Indexes
Column Index
Concatenated Index
Covering Index
Partial Index
Clustered/Non-clustered Index
Column Index
Index on a single column
Only those query will be optimized which satisfy your
criteria.
Eg:
SELECT employeeid,
firstname FROM Employee
WHERE employeeid = 001
By adding an index to employeeid, the query is optimized
to only look at records that satisfy
your criteria.
Concatenated Index
Index on multiple columns.
Use appropriate index. :
SELECT employeeid, lastname FROM Employee
WHERE employeeid = 002 AND lastname = ‘Felix’;
Covering Index
Covers all columns in a
query.
The benefit of a covering index is that the lookup of the various B-Tree index pages necessarily
satisfies the query, and no additional data page lookups are
necessary.
SELECT employeeid FROM Employee
WHERE employeeid = 001
Partial IndexSubset of a column for the index.
Use on CHAR, VARCHAR,TEXT etc.
Creating a partial index may greatly reduce the size of the index, and minimize the additional data lookups required.
Create table t ( name char(255) , INDEX ( name(15) ) );
Eg:-SELECT employeeid, firstname, lastname FROM Employee WHERE lastname like ‘A%’
We should add an index to lastname to improve performance.
Clustered vs. Non clustered
Describes whether the data records are stored on disk in a sorted order
MyISAM - non clustered. InnoDB - Clustered.
Secondary indexes built upon the clustering key
Primary Index is added to all secondary index.
Because the data resides within the leaf nodes of index, more space in memory needed to search through same amount of records
How Hash Function Works
How it can be faster?
If we create HASH TABLE. The key of hash table would be based on empnameand the values would be
pointer to the database row.This is Hash Index:
• Hash index are good for equality searches.• Hash index are not good for index searches.
So what should be the solution for Range Searches?
B-Tree
B-Tree/ Binary tree: Stores data in ordered way.
Allows logarithmic selections,
insertions and deletion.
It allows faster range searches.
Nodes in B-Tree contains a index field and a pointer to a data row.• So like in
above example if we create an index on age the node of B-tree will look like
Each node takes up one
disk block.
Single disk operation.
30 0X775800
Age Location of the data
EMPLOYEE ID FIRSTNAME LASTNAME AGE SALARY GENDER
001 Ashish Kataria 25 10000 M
002 Rony Felix 28 20000 M
003 Namita Misra 24 10000 F
004 Ankur Aeran 30 25000 M
005 Priyanka Jain 30 20000 F
006 Pradeep Pandey 31 30000 M
007 Pankaj Gupta 25 12000 M
008 Ankit Garg 30 15000 M
003 006
002001 005004 007008
B-Tree Diagram
R-Tree
MySQL supports any other type of index called Spatial Index. Spatial Index are created the way other index are created. Only extended keyword is used 'SPATIAL'.
Fulltext IndexesAbility to search for text.
• Searches are not case sensitive.• Short words are ignored, the default minimum length is 4 character.• ft_min_word_len• ft_max_word_len
Only available in MyISAM.
• ft_stopword_file= ' '
Can be created for a TEXT, CHAR or VARCHAR.
Important points of fulltext Search:
Words called stopwords are ignored:
If a word is present in more than 50% of the rows it will have a weight of zero. This has advantage on large data sets.
Hash, B-Tree, R-Tree uses different strategy to speed data retrieval time.
The best algorithm is pickedup depending on data expected and supportedalgorithm.
Query is using Index or Not?Query Execution Plan (EXPLAIN)
With EXPLAIN the query is sent all the way to the
optimizer, but not to the storage engine
mysql> explain select * from citylist\G id: 1select_type: SIMPLE table: citylist type: ALLpossible_keys: NULL key: NULLkey_len: NULL ref: NULL rows: 4079 Extra: 1 row in set (0.01 sec)
Selectivity• Selectivity of a column is the ratio between number of distinct values and number of total values. •Primary Key has selectivity 1.
eg: Employee table has 10,000 users with fields employeeid ,email ,firstname ,lastname ,salary ,gender
Our application searches for following fields: employeeid
first ,lastname ,gender email So employeeid, email, firstname and lastname can be candiates for indexes.
Since employee id is unique its selectivity will be equal
to the primary key selectivity.
In case of gender it will have two values M ,F selectivity = 2/10,000
= .00002
If we drop this index , it will be more beneficial. Index on
firstname and lastname selectivity is a function of name you are
searching.
Selectivity above than 15% is a good
index.
# /*
# SQL script to grab the worst performing indexes
# in the whole server
# */
# SELECT
# t.TABLE_SCHEMA AS `db`
# , t.TABLE_NAME AS `table`
# , s.INDEX_NAME AS `inde name`
# , s.COLUMN_NAME AS `field name`
# , s.SEQ_IN_INDEX `seq in index`
# , s2.max_columns AS `# cols`
# , s.CARDINALITY AS `card`
# , t.TABLE_ROWS AS `est rows`
# , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS `sel %`
# FROM INFORMATION_SCHEMA.STATISTICS s
# INNER JOIN INFORMATION_SCHEMA.TABLES t
# ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
# AND s.TABLE_NAME = t.TABLE_NAME
SQL script to grab the worst performing indexes in the whole server
# INNER JOIN (
# SELECT
# TABLE_SCHEMA
# , TABLE_NAME
# , INDEX_NAME
# , MAX(SEQ_IN_INDEX) AS max_columns
# FROM INFORMATION_SCHEMA.STATISTICS
# WHERE TABLE_SCHEMA != 'mysql'
# GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
# ) AS s2
# ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA
# AND s.TABLE_NAME = s2.TABLE_NAME
# AND s.INDEX_NAME = s2.INDEX_NAME
# WHERE t.TABLE_SCHEMA != 'mysql' /* Filter out the mysql system DB */
# AND t.TABLE_ROWS> 10 /* Only tables with some rows */
# AND s.CARDINALITY IS NOT NULL /* Need at least one non-NULL value in the field */
# AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00 /* Selectivity < 1.0 b/c unique indexes are perfect anyway */
# ORDER BY `sel %`, s.TABLE_SCHEMA, s.TABLE_NAME /* Switch to `sel %` DESC for best non-unique indexes */
Where to add index
WHERE clauses ( on which column data is filtered)
• BAD IDEA to index gender or columns like status
• Good distribution and selectivity in field values
• Field order is important.
Index join columns
Try to create as many Covering Index as possible
GROUP BY clauses
Avoid Redundant Indexes
Example:Key(a)key(a,b) Key(a(10));
Key(a)andKey(a(10) is redundant because they are prefix of Key(A,B)Redundantx may be useful A – integer columnB – varchar(255) Key(A) will be faster than using Key(A,B).
Index on short columns are more faster however if index on longer column is created that can be beneficial as covered index.
Key Caches (MyISAM)
• For tables are used more often Key Cache can be used to optimize read of those tables
hot_cache.key_buffer_size = 128K • Assign tables to caches
CACHE INDEX table1, TO hot_cache; CACHE INDEX table2 TO cold_cache;
• Preload your indexes for maximum efficiency • LOAD INDEX INTO CACHE table1;• Use IGNORE LEAVES
Case where Index will not be used
Functions on indexed fields.
WHERE TO_DAYS(dateofjoining) – TO_DAYS(Now()) <= 7 (doesn’t use index) WHERE dateofjoing >= DATE_SUB(NOW(), INTER
VAL 7 DAY) (uses index)
Select * from employee where name like ‘%s’;If we use left() function used on index column.
Choosing Indexes
Index columns that you use for searching, sorting or grouping, not columns
you only display as output.
Consider column
selectivity.
Index Short Values.
Index prefixes of string values.
Take advantage of leftmost
prefixes.
Don't over Index.
Match Index types to the type of
comparisions you perform.
Use the slow-query log to identify
queries that may be performing badly.
Keep data types as small as possible for
what you need Don't use BIGINT unl
ess required The smaller your data types, the more re
cords will fit into the index blocks. The more re
cords fit in each block, the fewer reads are nee
ded to find your records.
Common indexing mistakes
Not using an Index.
Using CREATE INDEX.
Misusing a composite
Index.
Using an expression on a
column.
Appending the primary key to an
index on an InnoDB table.
Q n A
Thank you for your time and attention
For more information, please feel free to drop in a line to [email protected] or visit http://www.osscube.com
www.osscube.com