indexing the mysql index: key to performance tuning

Indexing the MySQL Index: Guide to Performance Enhancement

Presented by – Sonali Minocha OSSCube

Who Am I?

Chief Technology Officer (MySQL) with OSSCube

MySQL Consulting, Implementation & Training

MySQL Certified DBA & Cluster DBA

What is Index?

A database index is a data structure that improves

the speed of data retrieval operations on a database

table.

A mechanism to locate and access data within a database.

An index may quote one or more columns and be a means

of enforcing uniqueness on their values.

More about Index

•Speedy data retrieval.•SPEED of SELECTs •Rapid random look ups.•Efficient for Reporting, OLAP, read intensive applications ••However it is expensive for •– Slows down writes•– heavy write applications (OLTP) be careful •– More disk space used

Properties

Index can be created on :•One or more columns.

Index only contains key-fields according

to which table is arranged.

Index may be unique or non-

unique.

Index may quote one or more columns and be a

means of enforcing uniqueness of their

values.

EMPLOYEE TABLE

EMPLOYEE ID FIRSTNAME LASTNAME AGE SALARY GENDER

001 Ashish Kataria 25 10000 M

002 Rony Felix 28 20000 M

003 Namita Misra 24 10000 F

004 Ankur Aeran 30 25000 M

005 Priyanka Jain 30 20000 F

006 Pradeep Pandey 31 30000 M

007 Pankaj Gupta 25 12000 M

008 Ankit Garg 30 15000 M

Cont.

In this table if we have to search for employee whose name is Rony then code will look like :For each row in table if row[2] = 'Rony' then results.append[row] Else movenext

So we checking each now for condition.

HOW DATABASE INDEXES WORK ?

• Lets assume we have a table of data like this:

Type Of Indexes

Column Index

Concatenated Index

Covering Index

Partial Index

Clustered/Non-clustered Index

Column Index

Index on a single column

Only those query will be optimized which satisfy your

criteria.

Eg:

SELECT employeeid,

firstname FROM Employee

WHERE employeeid = 001

By adding an index to employeeid, the query is optimized

to only look at records that satisfy

your criteria.

Concatenated Index

Index on multiple columns.

Use appropriate index. :

SELECT employeeid, lastname FROM Employee

WHERE employeeid = 002 AND lastname = ‘Felix’;

Covering Index

Covers all columns in a

query.

The benefit of a covering index is that the lookup of the various B-Tree index pages necessarily

satisfies the query, and no additional data page lookups are

necessary.

SELECT employeeid FROM Employee

WHERE employeeid = 001

Partial IndexSubset of a column for the index.

Use on CHAR, VARCHAR,TEXT etc.

Creating a partial index may greatly reduce the size of the index, and minimize the additional data lookups required.

Create table t ( name char(255) , INDEX ( name(15) ) );

Eg:-SELECT employeeid, firstname, lastname FROM Employee WHERE lastname like ‘A%’

We should add an index to lastname to improve performance.

Clustered vs. Non clustered

Describes whether the data records are stored on disk in a sorted order

MyISAM - non clustered. InnoDB - Clustered.

Secondary indexes built upon the clustering key

Primary Index is added to all secondary index.

Because the data resides within the leaf nodes of index, more space in memory needed to search through same amount of records

How Hash Function Works

How it can be faster?

If we create HASH TABLE. The key of hash table would be based on empnameand the values would be

pointer to the database row.This is Hash Index:

• Hash index are good for equality searches.• Hash index are not good for index searches.

So what should be the solution for Range Searches?

B-Tree

B-Tree/ Binary tree: Stores data in ordered way.

Allows logarithmic selections,

insertions and deletion.

It allows faster range searches.

Nodes in B-Tree contains a index field and a pointer to a data row.• So like in

above example if we create an index on age the node of B-tree will look like

Each node takes up one

disk block.

Single disk operation.

30 0X775800

Age Location of the data

EMPLOYEE ID FIRSTNAME LASTNAME AGE SALARY GENDER

001 Ashish Kataria 25 10000 M

002 Rony Felix 28 20000 M

003 Namita Misra 24 10000 F

004 Ankur Aeran 30 25000 M

005 Priyanka Jain 30 20000 F

006 Pradeep Pandey 31 30000 M

007 Pankaj Gupta 25 12000 M

008 Ankit Garg 30 15000 M

003 006

002001 005004 007008

B-Tree Diagram

R-Tree

MySQL supports any other type of index called Spatial Index. Spatial Index are created the way other index are created. Only extended keyword is used 'SPATIAL'.

Fulltext IndexesAbility to search for text.

• Searches are not case sensitive.• Short words are ignored, the default minimum length is 4 character.• ft_min_word_len• ft_max_word_len

Only available in MyISAM.

• ft_stopword_file= ' '

Can be created for a TEXT, CHAR or VARCHAR.

Important points of fulltext Search:

Words called stopwords are ignored:

If a word is present in more than 50% of the rows it will have a weight of zero. This has advantage on large data sets.

Hash, B-Tree, R-Tree uses different strategy to speed data retrieval time.

The best algorithm is pickedup depending on data expected and supportedalgorithm.

Query is using Index or Not?Query Execution Plan (EXPLAIN)

With EXPLAIN the query is sent all the way to the

optimizer, but not to the storage engine

mysql> explain select * from citylist\G id: 1select_type: SIMPLE table: citylist type: ALLpossible_keys: NULL key: NULLkey_len: NULL ref: NULL rows: 4079 Extra: 1 row in set (0.01 sec)

Selectivity• Selectivity of a column is the ratio between number of distinct values and number of total values. •Primary Key has selectivity 1.

eg: Employee table has 10,000 users with fields employeeid ,email ,firstname ,lastname ,salary ,gender

Our application searches for following fields: employeeid

first ,lastname ,gender email So employeeid, email, firstname and lastname can be candiates for indexes.

Since employee id is unique its selectivity will be equal

to the primary key selectivity.

In case of gender it will have two values M ,F selectivity = 2/10,000

= .00002

If we drop this index , it will be more beneficial. Index on

firstname and lastname selectivity is a function of name you are

searching.

Selectivity above than 15% is a good

index.

# /*

# SQL script to grab the worst performing indexes

# in the whole server

# */

# SELECT

# t.TABLE_SCHEMA AS `db`

# , t.TABLE_NAME AS `table`

# , s.INDEX_NAME AS `inde name`

# , s.COLUMN_NAME AS `field name`

# , s.SEQ_IN_INDEX `seq in index`

# , s2.max_columns AS `# cols`

# , s.CARDINALITY AS `card`

# , t.TABLE_ROWS AS `est rows`

# , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS `sel %`

# FROM INFORMATION_SCHEMA.STATISTICS s

# INNER JOIN INFORMATION_SCHEMA.TABLES t

# ON s.TABLE_SCHEMA = t.TABLE_SCHEMA

# AND s.TABLE_NAME = t.TABLE_NAME

SQL script to grab the worst performing indexes in the whole server

# INNER JOIN (

# SELECT

# TABLE_SCHEMA

# , TABLE_NAME

# , INDEX_NAME

# , MAX(SEQ_IN_INDEX) AS max_columns

# FROM INFORMATION_SCHEMA.STATISTICS

# WHERE TABLE_SCHEMA != 'mysql'

# GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME

# ) AS s2

# ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA

# AND s.TABLE_NAME = s2.TABLE_NAME

# AND s.INDEX_NAME = s2.INDEX_NAME

# WHERE t.TABLE_SCHEMA != 'mysql' /* Filter out the mysql system DB */

# AND t.TABLE_ROWS> 10 /* Only tables with some rows */

# AND s.CARDINALITY IS NOT NULL /* Need at least one non-NULL value in the field */

# AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00 /* Selectivity < 1.0 b/c unique indexes are perfect anyway */

# ORDER BY `sel %`, s.TABLE_SCHEMA, s.TABLE_NAME /* Switch to `sel %` DESC for best non-unique indexes */

Where to add index

WHERE clauses ( on which column data is filtered)

• BAD IDEA to index gender or columns like status

• Good distribution and selectivity in field values

• Field order is important.

Index join columns

Try to create as many Covering Index as possible

GROUP BY clauses

Avoid Redundant Indexes

Example:Key(a)key(a,b) Key(a(10));

Key(a)andKey(a(10) is redundant because they are prefix of Key(A,B)Redundantx may be useful A – integer columnB – varchar(255) Key(A) will be faster than using Key(A,B).

Index on short columns are more faster however if index on longer column is created that can be beneficial as covered index.

Key Caches (MyISAM)

• For tables are used more often Key Cache can be used to optimize read of those tables

hot_cache.key_buffer_size = 128K • Assign tables to caches

CACHE INDEX table1, TO hot_cache; CACHE INDEX table2 TO cold_cache;

• Preload your indexes for maximum efficiency • LOAD INDEX INTO CACHE table1;• Use IGNORE LEAVES

Case where Index will not be used

Functions on indexed fields.

WHERE TO_DAYS(dateofjoining) – TO_DAYS(Now()) <= 7 (doesn’t use index) WHERE dateofjoing >= DATE_SUB(NOW(), INTER

VAL 7 DAY) (uses index)

Select * from employee where name like ‘%s’;If we use left() function used on index column.

Choosing Indexes

Index columns that you use for searching, sorting or grouping, not columns

you only display as output.

Consider column

selectivity.

Index Short Values.

Index prefixes of string values.

Take advantage of leftmost

prefixes.

Don't over Index.

Match Index types to the type of

comparisions you perform.

Use the slow-query log to identify

queries that may be performing badly.

Keep data types as small as possible for

what you need Don't use BIGINT unl

ess required The smaller your data types, the more re

cords will fit into the index blocks. The more re

cords fit in each block, the fewer reads are nee

ded to find your records.

Common indexing mistakes

Not using an Index.

Using CREATE INDEX.

Misusing a composite

Index.

Using an expression on a

column.

Appending the primary key to an

index on an InnoDB table.

Thank you for your time and attention

For more information, please feel free to drop in a line to [email protected] or visit http://www.osscube.com

www.osscube.com

mailto:[email protected]

http://www.osscube.com/