mysql performance tips & best practices
DESCRIPTION
The technology has almost written off MySQL as a database for new fancy NoSQL databases like MongoDB and Cassandra or even Hadoop for aggregation. But MySQL has a lot to offer in terms of 'ACID'ity, performance and simplicity. For many use-cases MySQL works well. In this week's ShareThis workshop we discuss different tips & techniques to improve performance and extend the lifetime of your MySQL deployment.TRANSCRIPT
MySQL Best Practices For Scale
ShareThis
MySQL
● Extremely Performant● Easy to vertically scale● Well-known, battle-tested● Lots of information on the web● Horizontally Scalable with MySQL Cluster
o ?????
Most Popular Table Types
MyISAM● Full Text Search● B-tree Indexes● Very fast for reads● Very fast for primary read/append use cases (table
locks for updates, deletes, but not for inserts*)
Table Types
Innodb● InnoDB tables consume a greater amount of space on-
disk than their MyISAM equivalents.● ACID compliant● Row-level locking● Harder to reclaim disk space● Support’s Foreign Keys● Transactions/Rollbacks
B-Tree Index
B-Tree Index
How MySQL Uses Index
select pubId, date, value from pub_counters where date between “2014-06-01” and “2014-07-01”
● Traverses B-tree (most likely) Index for value o Or first match in range of values. Keeps scanning until it get’s
the last value in range.
● Returns a “row id” on disk. File offset or unique key● Read data from disk (or memory if it fits) to return the fields
requested
Composite Index
Assume Queryselect pubId from pub_counters where pubId = 1 and date = “2014-06-01”
Most Probable IndexALTER TABLE pub_counters ADD INDEX `pubIdIdx` (pubId)ALTER TABLE pub_counters ADD INDEX `dateIdx` (date)
Most Performant IndexALTER TABLE pub_counters ADD INDEX `pub_date_idx` (pubId, date)
Covering index
Assume Queryselect pubId, date, value from pub_counters where pubId = 1 and date = “2014-06-01”
Most Probable IndexALTER TABLE pub_counters ADD INDEX `pub_date` (pubId, date)
Most Performant IndexALTER TABLE pub_counters ADD INDEX `pub_date_val` (pubId, date, value)
Clustered Index
Accessing a row through the clustered index is fast because the row data is on the same page (on disk) where the index search leads.
• If in the table is set PRIMARY KEY – this is it.• Otherwise, if the table has UNIQUE indexes - this is the first
of them.• Otherwise, InnoDB itself creates a hidden field with the
surrogate ID in size of 6 bytes.
This is why it’s very important to use a natural primary key whenever possible.
Over Indexing● Significantly hurts write performance● Needless disk & memory usage. May kick other “hot” data out of
memory● Don’t Index low cardinality values
o MySQL will choose to NOT use the index if it has to scan ~30% of the index
● Use another table for sparse indexselect pubId, date, value from pub_counters where deleted = True
Instead useselect id, pub_counter_id from pub_counters_deleted
Over Indexing● Don’t IndexALTER TABLE pub_counters ADD INDEX `pub_date_idx`
(pubId, date)o And
ALTER TABLE pub_counters ADD INDEX `pub_date_idx` (pubId)o Index searches go from left to right.
● For a MyISAM table, indexing it heavily may cause the index file to reach its maximum size more quickly than the data file.
● Significantly hurts write performance
Over Indexing
● Significantly hurts write performance● Don’t Index low cardinality values
o MySQL will choose to NOT use the index if it has to scan ~30% of the index
● Use another table for sparse indexselect pubId, date, value from pub_counters where deleted = True
Instead useselect id, pub_counter_id from pub_counters_deleted
Data Warehouse: Star Index
De-normalize schemas
Performance Best Practices● If MyISAM, 20-30% improvement
o Larger file size, but faster read performanceALTER TABLE mytable ROW_FORMAT=Fixed;
● Analyze your queriesexplain select pubId, date, value from pub_counters where pubId = 1 and date = “2014-06-01”
● Use ints instead of varchar/char● Slow performance loglong_query_time = 1
● Choose the right column sizeSELECT * FROM tblname PROCEDURE ANALYSE();
Optimization with Innodb
● Turn off linux caching by disabling O_DIRECT
● innodb_buffer_pool_size = (.70 * total_mem_size)● bulk-insert-buffer-size=256M● innodb_buffer_pool_instances = tune for
concurrency● innodb_thread_concurrency = 2 x CPUs● innodb_flush_method = O_DIRECT (avoids double
buffering)
Optimization with MyISAM● Use “insert delayed” whenever possible● Only available if there are no “gaps”.
o No entries have been deleted.o OPTIMIZE pub_date_values
● Concurrent inserts[mysqld]concurrent_insert = ALWAYS OR
[mysqld] concurrent_insert = 2
Optimization with Innodb
RAID0● No Redundancy● Fast● Inexpensive
Optimization with Innodb
RAID10● Redundancy● All Speed● More expensive● 2x Disks
EC2 Performance
http://blog.scalyr.com/2012/10/a-systematic-look-at-ec2-io/
EC2 Performance
http://victortrac.com/ec2-ephemeral-disks-vs-ebs-volumes-in-raid.html
EC2 Performance
http://victortrac.com/ec2-ephemeral-disks-vs-ebs-volumes-in-raid.html