scaling mysql strategies for developers

165

Upload: jonathan-levin

Post on 14-Jun-2015

3.616 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Scaling MySQL Strategies for Developers
Page 2: Scaling MySQL Strategies for Developers

Who Am I?

• Jonathan

• MySQL Consultant

• Working with MySQL since 2007

• Specialize in SQL, Indexing and Reporting (Big Data)

Page 3: Scaling MySQL Strategies for Developers

Who is this for?

* Smilies indicate ability to control area

Page 4: Scaling MySQL Strategies for Developers
Page 5: Scaling MySQL Strategies for Developers

This Much

100% 0% 20%

What Will I Cover?

Domain Knowledge

Page 6: Scaling MySQL Strategies for Developers

Occ

urr

ence

s

Problems

Solutions this tutorial will cover

Page 7: Scaling MySQL Strategies for Developers

serenesimplycomplicated.blogspot.co.uk/2012/07/developing-direction.html

Page 8: Scaling MySQL Strategies for Developers

http://frabz.com/3bv6

Page 9: Scaling MySQL Strategies for Developers

What Will I Cover?

• The top 20% of the strategies to resolve 80% of your performance problems

• The strategies that are within reach of developers

• Strategies that are more common and more established

• From my experience.

• To reduce risk

Page 10: Scaling MySQL Strategies for Developers

Table of Contents

• Indexes

• Finding Bottlenecks

• Partitioning

• Intensive Table Optimization

• Read Cache

• Scaling Reads

• Reporting

• Write Buffers

• Scaling Writes

• Sharding

Part One Part Three

Part Two

Page 11: Scaling MySQL Strategies for Developers

Indexes

Page 12: Scaling MySQL Strategies for Developers

What are Indexes?

Page 13: Scaling MySQL Strategies for Developers

Indexes

• Advantages

• Speed – Use the right path

• Now used in NoSQL stores

• “A properly indexed database will give you very few problems” – me

• My blog – “Indexing and Caching”

Page 14: Scaling MySQL Strategies for Developers

B-Tree Indexes

(http://20bits.com/article/interview-questions-database-indexes)

Page 15: Scaling MySQL Strategies for Developers

(http://www.youtube.com/watch?v=coRJrcIYbF4)

Page 16: Scaling MySQL Strategies for Developers
Page 17: Scaling MySQL Strategies for Developers
Page 18: Scaling MySQL Strategies for Developers

Choosing the best Index

Prevent Table Scans

Prevent Extra Processing

Prevent Reading from Disk

*1

*2

*3

Page 19: Scaling MySQL Strategies for Developers

Indexes

• 1 Star – EXPLAIN

• Type

• const - where id=1

• ref - where location='london'

• eq_ref - where t1.id = t2.id

• Extra

• using where

• Limitation – type

• range - where id in (1,2,3,4,5)

Page 20: Scaling MySQL Strategies for Developers

Indexes

• 2 Star – EXPLAIN

• Extra

• using where

• Using index

• And Not

• Using filesort

• Using temporary

• Limitation

• Using temporary - query contains different GROUP BY and ORDER BY columns

Page 21: Scaling MySQL Strategies for Developers

Indexes

• 3 Star – EXPLAIN

• Type

• index

• Extra

• Using index

• Using index for group-by

Page 22: Scaling MySQL Strategies for Developers

Index Examples

Page 23: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE = 121;

(PRIMARY KEY )

Regular Usage

Page 24: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE = 121

AND BETWEEN 1 AND 100

KEY ( , )

Range Scan

Page 25: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE = 121

AND IN (1,100,30,7)

KEY ( , )

Range Scan

Page 26: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE = 121

KEY ( , , )

Covering Index

Page 27: Scaling MySQL Strategies for Developers

SELECT FROM ₸ WHERE = 121

AND IN (1,100,30,7)

KEY ( , )

Not Optimal

Page 28: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE = 121

AND IN (1,100,30,7);

KEY ( , )

Broken Range

Page 29: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE IN (SELECT FROM ♠)

KEY ( )

Sub Queries

Page 30: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE = 121

GROUP BY ORDER BY

KEY ( )

Indexes for Sorting

Page 31: Scaling MySQL Strategies for Developers

SELECT , FROM ₸ WHERE = 121

GROUP BY ORDER BY

KEY ( , )

Indexes for Sorting

or KEY ( , )

Page 32: Scaling MySQL Strategies for Developers

SELECT . , . FROM INNER JOIN ON . = .

WHERE . = 232

KEY ( ) or KEY( )

Indexes for Joins

Page 33: Scaling MySQL Strategies for Developers

WHERE ₸.Ω = 232; INNER JOIN

Page 34: Scaling MySQL Strategies for Developers

INNER JOIN ₸ FULL SCAN

FILTER Ω = 232

Page 35: Scaling MySQL Strategies for Developers

SELECT , (SELECT .. FROM WHERE ) FROM ₸ WHERE IN

(SELECT FROM ♠ WHERE..);

“I need help optimizing the my.cnf”

Page 36: Scaling MySQL Strategies for Developers

KEY ( , , )

PRIMARY KEY ( ) KEY ( , )

Clustered PK and Secondary Indexes

(Can be used in GROUP / ORDER BY SELECT variables to make Covering index)

http://www.dbasquare.com/2012/05/17/can-mysql-use-primary-key-values-from-a-secondary-index/

Page 37: Scaling MySQL Strategies for Developers

SELECT FROM ₸ WHERE =1 OR =2;

KEY ( ) KEY ( )

Index Merge

Page 38: Scaling MySQL Strategies for Developers

SELECTFROM ₸ WHERE =1 UNION (SELECTFROM ₸ WHERE =2)

KEY ( ) KEY ( )

Index Merge

* 5.6

Page 39: Scaling MySQL Strategies for Developers
Page 40: Scaling MySQL Strategies for Developers

Finding Bottlenecks

Page 41: Scaling MySQL Strategies for Developers

*

*

*

Page 42: Scaling MySQL Strategies for Developers
Page 43: Scaling MySQL Strategies for Developers

Gathering Data

What you will need:

1. MySQL Slow log: MySQL >= 5.1 or Percona/MariaDB microslow patch

2. Set long_query_time = 0 - for 6 to 24 hours for decent sized slow log. (Make sure host has enough space)

Page 44: Scaling MySQL Strategies for Developers

Log Processing

Worst Response Queries 1. Echo ‘’ > slow.log

2. mysql> set global long_query_time=0; set long_query_time = 0; flush logs;

3. Wait X hours and return original value.

4. pt-query-digest slow.log > slow.txt

Bulky Queries --filter ‘($event->Rows_examined > 1000)’

Write Queries --filter '($event->Rows_affected > 0)‘

Processing should be done on another host

Page 45: Scaling MySQL Strategies for Developers

Log Processing

• MySQL 5.6

• Statement Digest

• No need for log processing to get Digest.

Page 46: Scaling MySQL Strategies for Developers

Rank Response time Calls R/Call Item

1 480.9273 16.3% 600 0.8015

SELECT dp_node dp_usernode dp_buddylist dp_users dp_node_access

2 322.4220 4.3% 129258 0.0025 ADMIN INIT DB

3 314.8719 4.2% 30220 0.0104 UPDATE dp_users

4 287.7109 3.8% 51606 0.0056 SET

5 269.3434 3.6% 600 0.4489

SELECT dp_node dp_usernode dp_buddylist dp_users dp_node_access

6 238.8571 3.1% 2902141 0.0001 SELECT dp_url_alias

Worst Response Queries pt-query-digest slow.log > slow.txt

Page 47: Scaling MySQL Strategies for Developers

* Gap Locking * pt-stalk

Page 48: Scaling MySQL Strategies for Developers

mysql tables in use 4, locked 2

5289 lock struct(s), heap size 620984, 273785 row lock(s), undo log entries 363312

MySQL thread id 467, OS thread handle 0x7fceab7df700, query id 88914423

Trx read view will not see trx with id >= 10ECC92B0, sees < 10ECC916F

TABLE LOCK table `currency` trx id 10ECC90CC lock mode IS

RECORD LOCKS space id 0 page no 261 n bits 80 index `fk_currency_status1` of table `currency` trx id 10ECC90CC lock mode S

TABLE LOCK table `daily_summary` trx id 10ECC90CC lock mode IS

RECORD LOCKS space id 0 page no 34829580 n bits 200 index `PRIMARY` of table `daily_summary` trx id 10ECC90CC lock mode S

TABLE LOCK table `exchange_rate` trx id 10ECC90CC lock mode IS

TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS

SELECT r.exchange_rate INTO destination_exchange_rate FROM exchange_rate AS r WHERE r.currency_id = NAME_CONST('destination_currency_id',6) AND r.date = NAME_CONST('day',_latin1'2012-06-30' COLLATE 'latin1_swedish_ci')

Page 49: Scaling MySQL Strategies for Developers

*** (1) TRANSACTION:

TRANSACTION 13DCDF4D9, ACTIVE 0 sec starting index read

mysql tables in use 1, locked 1

LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s)

MySQL thread id 2438176, OS thread handle 0x7f9a37408700, query id 118341815748

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 0 page no 48627 n bits 280 index `PRIMARY` of table `sys_doctrine_lock_tracking` trx id 13DCDF4D9 lock_mode X locks rec but not gap waiting

Record lock, heap no 207 PHYSICAL RECORD: n_fields 6; compact format;

0: len 8; hex 43616d706169676e;

1: len 4; hex 31313436; asc 1146;;

2: len 6; hex 00013dc12d7a; asc = -z;;

UPDATE sys_doctrine_lock_tracking SET timestamp_obtained = '1341839053' WHERE object_key = '1146' AND user_ident = '158' AND c_type = '137'

Page 50: Scaling MySQL Strategies for Developers

Bottlenecks

• Locking Queries • Try to make them complete as fast as possible

• JOINs vs sub query

• Function wrapped around index

• UDF with SQL inside

• Long Transactions

• Looping with short queries

Page 51: Scaling MySQL Strategies for Developers

Misbehaving Optimizer

• Optimizer Hints:

• USE INDEX

• FORCE INDEX

• IGNORE INDEX

• STRIGHT_JOIN

• Joins

• LEFT sometimes faster then INNER

* Too many indexes confuse the optimizer

Page 52: Scaling MySQL Strategies for Developers
Page 53: Scaling MySQL Strategies for Developers

Bottlenecks

• Virtualization

• Increase (obscenely) innodb_log_file_size

• Needs restart + deleting old log files

• EXT3

• General I/O improvements

• innodb-flush-log-at-trx-commit

• Sync binlog

• Xa support

Group Commit

Page 54: Scaling MySQL Strategies for Developers

Bottlenecks

• General I/O improvements

• Percona server

• Better flushing to disk

• Less mutexes

• Upgrade MySQL

• Same reasons as above

• Innodb-io-capacity http://www.wmarow.com/strcalc/

Page 55: Scaling MySQL Strategies for Developers

Database Upgrades

• My Secret Sauce for smooth MySQL Migrations

1. Upgrade the dev/staging DBs with desired version

2. Wait 1-2 months till silky-smooth

• All features have been tested

• All the query issues have been fixed

3. Upgrade servers

• Down to Up – Slaves first

Page 56: Scaling MySQL Strategies for Developers

Bottlenecks

• Mutexes

• Query Cache

• “Freeing items” in processlist

• Network

• Skip-name-resolve

• net_write_timeout / read_timeout

• thread_cache_size

Batch Processes

Lots of connections

Page 57: Scaling MySQL Strategies for Developers

Homework

• Haven’t talked fully about:

SQL and EXPLAIN

• Webinars

• Indexes - percona.tv/percona-webinars/tools-and-

techniques-for-index-design

• Explain - percona.tv/percona-webinars/explain-demystified

• Websites • http://www.myxplain.net

Page 58: Scaling MySQL Strategies for Developers

Part Two

Page 59: Scaling MySQL Strategies for Developers

Partitioning

Page 60: Scaling MySQL Strategies for Developers

Big Table

Columns

Rows

Page 61: Scaling MySQL Strategies for Developers

Columns

Rows

Partition

Partition

Partition

Partition

Partition

Partition

Partition

Partition

Partition

Algorithm

Page 62: Scaling MySQL Strategies for Developers

Use Cases

Reduce Data

Parallelize Data

B-Tree Levels

Short Scans

Inserting Algorithm Unique Key

Foreign Keys

Select Algorithm

Primary Key Overhead

Benefits

ID

Time

Hash

80-90% DB Usage

Manual

Archive

Partitioning

Shards

Issues

Automatic

Time

Page 63: Scaling MySQL Strategies for Developers

Partitioning

• Use Cases

• Reducing Data – Only get the partition/table that you need

• Parallelizing Data - Get an equal amount of data from each partition in parallel

• Benefits

• Shorter table scans

• Less levels for index scans

Use Cases

Reduce Data

Parallelize Data

B-Tree Levels

Short Scans

Benefits

Page 64: Scaling MySQL Strategies for Developers

Issues

• Algorithm

• INSERTs

• SELECTs

• Keys

• Foreign Key

• Unique Key

• Primary Key Overhead

• Increases Table size

• Can change indexes

Inserting Algorithm

Unique Key

Foreign Keys

Select Algorithm

Primary Key Overhead

Issues

Page 65: Scaling MySQL Strategies for Developers

Partition Types

• Range

• List

• Hash

• Key

• Columns

• Sub-partitioning

ID

Time

Hash

Automatic

Page 66: Scaling MySQL Strategies for Developers

Rank Response time Calls Item

1 480.9273 26.3% 129258

SELECT address FROM WHERE BETWEEN ‘2012-11-10’ and ‘2012-11-

17’

2 322.4220 14.3% 600 SELECT total FROM WHERE BETWEEN ‘2012-11-01’ and ‘2012-11-30’

3 34.8719 4.2% 30220 UPDATE SET active=1 WHERE = 17635376

4 28.7109 3.8% 51606 SELECT dispatch_time FROM WHERE = 7387612

Partitioning by Usage vs Partitioning by Maintenance

Page 67: Scaling MySQL Strategies for Developers

CREATE TABLE orders

id int unsigned not null auto_increment,

`date` date not null,

PRIMARY KEY (date, id),

KEY id (id),

..) ENGINE=InnoDB DEFAULT CHARSET=utf8

PARTITION BY ( (date))

(PARTITION VALUES LESS THAN ( (‘2012-01-01’),

PARTITION VALUES LESS THAN ( (‘2012-06-01’)),

PARTITION VALUES LESS THAN )

Can also do PRIMARY KEY (id, date) KEY date (date)

Page 68: Scaling MySQL Strategies for Developers

Manual Partitioning

• Archive - Main table & Archive Table

• Time – Create table per year, per month..

• Shards – Table per country

Manual

Archive

Shards

Time

* Foreign keys

Page 69: Scaling MySQL Strategies for Developers
Page 70: Scaling MySQL Strategies for Developers

Intensive Table Optimization

Page 71: Scaling MySQL Strategies for Developers

Intensive Table Optimization

Once upon a time, I was researching ways to make a database working set fit as much as possible to memory…

Page 72: Scaling MySQL Strategies for Developers

mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/

Page 73: Scaling MySQL Strategies for Developers

Intensive Table Optimization

1. People are usually very liberal with data type sizes

2. There were (usually) so many indexes that: 1. They multiplied the table size

2. Were not efficient compares to how the table is used

3. Confused the optimizer

3. Discovered partitions were not used or misused 1. Discovered sub partitions

2. Primary Key alignment

4. Discovered InnoDB compression

Page 74: Scaling MySQL Strategies for Developers

Intensive Table Optimization

There are tools to help with this, but…

Page 75: Scaling MySQL Strategies for Developers

quickmeme.com /meme/3rmy8y/

Page 76: Scaling MySQL Strategies for Developers

Intensive Table Optimization

Bottleneck Tables

Slowest Queries

User Statistics

Table Sizes

Optimizations

Data Types

Query Logs

Indexes Partitions

Foreign Keys

Optional

Page 77: Scaling MySQL Strategies for Developers

Gathering Data

What you will need:

1. MySQL Slow log: MySQL >= 5.1 or Percona/MariaDB microslow patch

2. Set long_query_time = 0 - for 6 to 24 hours for decent sized slow log. (Make sure host has enough space)

Slowest Queries

Bottleneck Tables

Page 78: Scaling MySQL Strategies for Developers

Gathering Data

Helpful (Optional):

1. Percona/Mariadb user_statistics patch

2. Get list of most read/written tables

3. Get list of used and un-used indexes

4. List of largest tables

User Statistics

Table Sizes

Bottleneck Tables

Page 79: Scaling MySQL Strategies for Developers

Rank Response time Calls Item

1 8589.9513 27.5% 231051 UPDATE dp_users

2 4752.6688 15.2% 257235 SELECT dp_cache_menu

3 1606.4946 5.1% 183542 SELECT community_chats

4 1418.9034 4.5% 259939 SELECT dp_cache

5 564.3305 1.8% 7970165 SELECT dp_url_alias

6 495.0092 1.6% 44940 SELECT dp_event dp_node

Worst Response Queries pt-query-digest slow.log > slow.txt

Page 80: Scaling MySQL Strategies for Developers

Table Statistics

SELECT table_name,

FROM information_schema.table_statistics

ORDER BY DESC LIMIT 5;

ROWS_CHANGED ROWS_CHANGED_X_INDEXES

table_name rows_read

dp_users 2302477894

dp_node 1231318439

dp_comments 1071462211

dp_userpoints 1033073070

dp_search_index 260154684

Page 81: Scaling MySQL Strategies for Developers

Table Statistics

Rank Response time Calls Item

1 7975.4487 6.5% 124384 advertisement

2 5554.1435 4.5% 1834 info

3 4915.4816 4.0% 208 placement

4 4902.7644 4.0% 158 advert_summary

Worst Response Tables --group-by tables

Page 82: Scaling MySQL Strategies for Developers

Table Sizes

Table_Name Rows Data Idx Total_size Idxfrac

total_daily_summary 610M 77G 88G 165G 1.15

advert_summary 478M 57G 45G 102G 0.78

log_messages 92M 47G 10G 57G 0.21

SELECT CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS TABLE_NAME, CONCAT(ROUND(TABLE_ROWS / 1000000, 2), 'M') ROWS, CONCAT(ROUND(DATA_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') DATA, CONCAT(ROUND(INDEX_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') IDX, CONCAT(ROUND(( DATA_LENGTH + INDEX_LENGTH ) / ( 1024 * 1024 * 1024 ), 2), 'G') TOTAL_SIZE, ROUND(INDEX_LENGTH / DATA_LENGTH, 2) IDXFRAC FROM INFORMATION_SCHEMA.TABLES ORDER BY DATA_LENGTH + INDEX_LENGTH DESC LIMIT 10;

http://www.mysqlperformanceblog.com

/2008/03/17/researching-your-mysql-table-sizes/

Page 83: Scaling MySQL Strategies for Developers

Which table needs your attention?

Table Size

Page 84: Scaling MySQL Strategies for Developers

Intensive Table Optimization

• Table Targeting

• The most “worthy” table to focus your attention on

• Biggest bang for your buck

• If you know which table is the most troublesome

• Ignore most of the investigations

• Apart from slow log

• Investigations help understand DB usage

Page 85: Scaling MySQL Strategies for Developers

Optimizations

Data Types

Query Logs

Indexes Partitions

Foreign Keys Sub Partitioning

Compression

Don’t Need

Page 86: Scaling MySQL Strategies for Developers

Intensive Table Optimization

• Datatypes

• SELECT * FROM table \G

• Example: Tinyint instead of Bigint:

(7 bytes row + 7bytes index) * 350million rows = 4.9Gb

• Enum instead of Varchar

• Remove NULLs when not needed

Page 87: Scaling MySQL Strategies for Developers

Intensive Table Optimization

• Compression

• Best for tables with a lot of varchar/text

• Compress table by x2, x4, x8..

• Need to experiment with innodb_strict = on;

• On my tests (5.5) – Very very slow

• Alter tables

• INSERTS/UPDATES/DELETES

Optimizations

Compression

Page 88: Scaling MySQL Strategies for Developers

Get Data

Make Assumptions

Test Assumptions

Deploy

New Results

Slow Log

Query Digest

Filtered by Target Table

Index-Usage

EXPLAIN

1.

2.

3.

4.

Page 89: Scaling MySQL Strategies for Developers

Target Table Processing

Filter Log:

pt-query-digest slow.log

--filter '$event->arg =~ m/dp_users /'

--no-report --print >dp_users.log

Worst Queries from new log:

pt-query-digest dp_users.log --limit 100% >tbl_dp_users.txt

Page 90: Scaling MySQL Strategies for Developers

Rank Response

time Calls Item

1 209.2863

10.7% 88850

UPDATE dp_users SET access = 133******3 WHERE = 23****01\G

3 162.2711

8.3% 1309010 SELECT access FROM dp_users WHERE = 21***4\G

4 139.9009

7.1% 197

SELECT uid, name FROM dp_users WHERE = 1 ORDER BY DESC\G

5 133.8691

6.8% 327

SELECT * FROM dp_users u WHERE = 's******s'\G

6 109.6903

5.6% 29152

SELECT name, created, picture FROM dp_users WHERE picture !='' AND = '1' AND BETWEEN '133*****0' AND '133*****60'\G

7 92.9095

4.7% 360642

SELECT dp_node dp_users using ( ) dp_node_revisions

8 74.2426

3.8% 106

SELECT * FROM dp_users u WHERE = hoa****rio' AND = '3837********5f9b' AND = 1\G

Page 91: Scaling MySQL Strategies for Developers

Rank Response time Calls Item

1 480.9273 26.3% 129258 SELECT address FROM orders WHERE date BETWEEN ‘2012-11-10’ and ‘2012-11-17’

2 322.4220 14.3% 600 SELECT total FROM orders WHERE date BETWEEN ‘2012-11-01’ and ‘2012-11-30’

3 34.8719 4.2% 30220 UPDATE order SET active=1 WHERE id = 17635376

4 28.7109 3.8% 51606 SELECT dispatch_time FROM order WHERE id = 7387612

Partitioning by Usage

Page 92: Scaling MySQL Strategies for Developers

Testing Assumptions

SELECT uid, name FROM WHERE = 1 ORDER BY DESC\G

1.30secs

SELECT uid, name FROM WHERE = 1 ORDER BY DESC\G

0.56secs

Query Digest

EXPLAIN

Page 93: Scaling MySQL Strategies for Developers

Test Environment

• Hardware environment similar to live

• Data size similar to live environment: • Replicating slave

• Cannot change datatypes on MIXED/ROW replication

• Create table2 and run queries against it

• Xtrabackup – full replica

• Script with Mysqldump + WHERE • mysqldump --databases main --tables table1 table2 –where “date >

now() – interval 30 day” > dump.sql

• Mysqldump –all-database –ignore-table main.table1 main.table2 >> dump.sql

New Results

Page 94: Scaling MySQL Strategies for Developers

Final Tweaking

(Remember the table log file – dp_users.log ?)

pt-index-usage

• pt-index-usage dp_users.log --host 127.0.0.1 --tables dp_users >idx_dp_users.txt

• Go over recommendations

Test Assumptions

Index-Usage

Page 95: Scaling MySQL Strategies for Developers

Deploy Strategies

1. Rolling Servers

2. pt-online-schema-change

3. Two-part move

a. Create new table – table2

b. Insert table rows that will not change – INSERT INTO

table2 SELECT * FROM table1 WHERE date <= curdate() – interval 30 day;

c. Short downtime d. Rename table1 to table3; rename table2 to table1;

e. INSERT IGNORE INTO table1 SELECT * FROM table3 WHERE date >= curdate() – interval 30 day;

4. Alter table – long downtime (pre 5.6, maybe)

Deploy

Page 96: Scaling MySQL Strategies for Developers

Continuous Self-Learning

Page 97: Scaling MySQL Strategies for Developers

Get Data

Make Assumptions

Test Assumptions

Deploy

New Results

Slow Log

Query Digest

Filtered by Target Table

Index-Usage

EXPLAIN

Page 98: Scaling MySQL Strategies for Developers

Part Three

Page 99: Scaling MySQL Strategies for Developers

Page Cache

File Reverse Proxy

Browser Cache

Summary Tables

Query Cache

Column Cache

Volatile

Streaming

Shield

Denormalize

Subtotal

Conditional Attributes

2nd Level

Data Warehouse

3rd Level

Page 100: Scaling MySQL Strategies for Developers

Read Cache

• Outside the database

• Page Cache

• Query Cache

• Inside the database

• Column Cache

• Summary Table

* Complexity

Page 101: Scaling MySQL Strategies for Developers

Page Cache

• Browser Cache

• Etag, Expires, Last-modified

• Reverse Proxy

• Squid, Varnish, Nginx, Apache, Proprietary.

• File/Full page cache

• mod_file_cache, Zend_Cache_Backend

• W3 Total Cache, sfSuperCache

* Stale

Page 102: Scaling MySQL Strategies for Developers

memegenerator.net/instance/23247230

Page 103: Scaling MySQL Strategies for Developers

Query Cache

• Volatile • Memcached, Redis, Hibernate Cache, Arrays.. • On-Request, Time-to-Live, Stale and Cache

Stampede

• Streaming • Interval / Async, Stale, Common Queries

• Shield – Mongo Shield • Script/Tool Replication, Dependency • Aggregation • Complexity / Layers

Page 104: Scaling MySQL Strategies for Developers

147cm

Mongo Shield

img.photobucket.com/albums/v158/keris_hanuman/Afbeelding1455.jpg

Page 105: Scaling MySQL Strategies for Developers
Page 106: Scaling MySQL Strategies for Developers

Memcached Memcached

MySQL

Cart

Sessions Sessions

Sticky Sticky

Page 107: Scaling MySQL Strategies for Developers

MySQL

Memcached

Cart

Sessions

Page 108: Scaling MySQL Strategies for Developers

Manipulating Time

* Error Handling

Page 109: Scaling MySQL Strategies for Developers

Column Cache

• Denormalize

• Additional Column(s) to prevent JOINs

• Maintenance, Space on disk

• Example: CustomerID, OrderID, OrderItemID

• Sub Total

• Prevent additional slow GROUP BY queries

• Maintenance, Generation, Space on disk

• Example: totalPurchases, moneyOwed

* Space vs Speed

Page 110: Scaling MySQL Strategies for Developers

Column Cache

• Conditional • Store conditional (True/False) logic • Prevents recalculating result – another query • Can prevent rewriting code • Example: isDone, hasReview, aboveAvg

• Attributes • ENUM datatype • SET datatype - ARRAY of options • Prevents JOINs • May save space

Page 111: Scaling MySQL Strategies for Developers

Summary Tables

An additional table which consists

of an aggregation of another table or several JOIN’d tables.

Summary Tables

Page 112: Scaling MySQL Strategies for Developers

SELECT ... FROM main_table t1 INNER JOIN table2 t2 on t1.orderid = t2.id INNER JOIN table3 t3 on t1.customerid = t3.id INNER JOIN table4 t4 on t1.addressid = t4.id INNER JOIN table5 t5 on t2.supplierid = t5.id INNER JOIN table6 t6 on t2.warehouse = t6.id INNER JOIN table2 t7 on t6.addressid = t7.id INNER JOIN table8 t8 on t1.productid = t8.id INNER JOIN table9 t9 on t1.buyerid = t9.id INNER JOIN table10 t10 on t1.officeid = t10.id WHERE t1.date between '2012-11-01' and '2012-11-30' GROUP BY t1.date

Page 113: Scaling MySQL Strategies for Developers

Summary Tables

Processed 1.2million rows

Returned 30 rows

Time 17.52 minutes

Page 114: Scaling MySQL Strategies for Developers

Summary Tables

CREATE TABLE summary_table (primary key (date,addressid,productid)) as SELECT ... FROM main_table t1 INNER JOIN table2 t2 on t1.orderid = t2.id INNER JOIN table3 t3 on t1.customerid = t3.id INNER JOIN table4 t4 on t1.addressid = t4.id INNER JOIN table6 t6 on t2.warehouse = t6.id GROUP BY t1.date, t1.addressid, t1.productid

Page 115: Scaling MySQL Strategies for Developers

Summary Tables

SELECT ... FROM summary_table t1 INNER JOIN table5 t5 on t2.supplierid = t5.id INNER JOIN table2 t7 on t6.addressid = t7.id INNER JOIN table8 t8 on t1.productid = t8.id INNER JOIN table9 t9 on t1.buyerid = t9.id INNER JOIN table10 t10 on t1.officeid = t10.id WHERE t1.date between '2012-11-01' and '2012-11-30‘ GROUP BY t1.date

Page 116: Scaling MySQL Strategies for Developers

Summary Tables

Processed 35000 rows

Returned 30 rows

Time 0.75 seconds

Page 117: Scaling MySQL Strategies for Developers
Page 118: Scaling MySQL Strategies for Developers

Summary Tables as an

Analytics Sub-System

Page 119: Scaling MySQL Strategies for Developers

Database Design Comparison

Operational System Analytic System

Purpose Execution of a business process

Measurement of a business process

Primary Interaction Insert, Update, Query, Delete

Query

Design Optimization Update concurrency High-performance query

Design Principle Entity-relationship (ER) 3rd Normal form (3NF)

Dimensional design (Star schema or cube)

amazon.co.uk/Schema-Complete-Reference-Christopher-Adamson/

Page 120: Scaling MySQL Strategies for Developers

Data Warehouses

Fact Tables Dimension Tables

• Measurement

• Narrow

• Long

• Most of the data

• Context

• Wide

• Short

• Filters and descriptive data

Page 121: Scaling MySQL Strategies for Developers

Operational Design

Page 122: Scaling MySQL Strategies for Developers

Operational Design

Customers

Orders

Order Items

Products

Addresses

Page 123: Scaling MySQL Strategies for Developers

Star Schema

Page 124: Scaling MySQL Strategies for Developers

Star Schema

OrderItems Fact

Date dim

Address dim

Products dim

Customers dim

Page 125: Scaling MySQL Strategies for Developers

Maintenance

• Hourly/Daily/Weekly/Monthly Aggregations

• Intervals

• Off Peak

• On-Insert

Page 126: Scaling MySQL Strategies for Developers

Scaling Reads

Indexes

Read Slaves

Partitioning

Sub Partitioning

Sharding

InnoDB Buffer Pool

A lot more settings

Intensive Table

Optimization

Read/Write Splitting

Another Master

Read Cache

Galera

Better Hardware

IO Memory

Page 127: Scaling MySQL Strategies for Developers

Scaling Reads

• InnoDB Buffer Pool

• Cache Warming

• Read buffer

• Sort Buffer

• Join Buffer

• Temp Table size / on disk

InnoDB Buffer Pool

A lot more settings

Page 128: Scaling MySQL Strategies for Developers

Scaling Reads

• Better Hardware

• Disk I/O

• Memory

Better Hardware

IO Memory

Page 129: Scaling MySQL Strategies for Developers

Scaling Reads

• Read Slaves

• Read/Write Splitting

• Master/Master

• Galera

Page 130: Scaling MySQL Strategies for Developers

Server Architecture

MySQL

Page 131: Scaling MySQL Strategies for Developers

Server Architecture

Page 132: Scaling MySQL Strategies for Developers
Page 133: Scaling MySQL Strategies for Developers
Page 134: Scaling MySQL Strategies for Developers
Page 135: Scaling MySQL Strategies for Developers

db1

db2

db3

Page 136: Scaling MySQL Strategies for Developers

Reporting

Indexes

Reporting Slaves

Partitioning

Sub Partitioning

Summary Tables

Sharding

InnoDB Buffer Pool

A lot more settings

Intensive Table

Optimization

Different Indexes

Hadoop

OLAP Cubes

Cross Shard Joins

Columnar Store

£££

Better Hardware

IO Memory

Page 137: Scaling MySQL Strategies for Developers

Reporting

• Reporting Slaves

• Different Indexes

• No Foreign Keys

• Partitioning

• If ROW-replication:

• Must have same data types Reporting

Slaves

Different Indexes

Page 138: Scaling MySQL Strategies for Developers

Reporting

• Sharding

• Cross-shard JOINs

• Go Fish

• Aggregations Hadoop

Page 139: Scaling MySQL Strategies for Developers

Reporting

• Summary tables - aggregations

• Scripts

• Hadoop

• Ready –Made reports

• OLAP Cubes

• Columnar Store

• £££

* RDBMS very fast at GROUP BY

Page 140: Scaling MySQL Strategies for Developers

Innodb Log File Size

A lot more settings

RAID Write-back

Queue

Local Server Storage

Summarized Writes

MySQL settings

Battery

CRUSH

Hadoop

HandCode

ETL

Write Buffers

Page 141: Scaling MySQL Strategies for Developers

Write Buffers

• RAID card

• BBU

• Write-back vs write-through

• Battery Learning/Drain

Page 142: Scaling MySQL Strategies for Developers

Write Buffers

• Innodb log file size

• Buffer pool * dirty read (%) * io capacity

Innodb Log File Size

A lot more settings

* Virtual Environment

Page 143: Scaling MySQL Strategies for Developers

Write Buffers

• innodb-flush-log-at-trx-commit

• Sync binlog

• Support xa

MySQL settings

Page 144: Scaling MySQL Strategies for Developers

Write Buffers

• ActiveMQ, RabbitMQ, ZeroMQ, Gearman

• Not ACID, may need redundancy

• Summarized Writes

• Memcached counters + interval writes

Page 145: Scaling MySQL Strategies for Developers

Write Buffers

• Local Storage (web/app servers)

• Memcached

• SQLite (disk/in-memory)

• Log file

• MySQL

• Independent, isolated

• Need to fetch data

• Prevent missed data and duplicates

<- Most popular

Local Server Storage

ETL

Page 146: Scaling MySQL Strategies for Developers

Write Buffers

• Fetching Data

• Hand code

• ETL tool – Pentaho/Talend

• Flume

• Aggregation/Processing

• Hadoop

• Google CRUSH

<- Very Popular

Page 147: Scaling MySQL Strategies for Developers

Google CRUSH Tools

gunzip oldlog.log.gz |

convdate -f 3 -i "%d/%b/%Y:%H:%M:%S %z" -o "%Y-%m-%d" |

reorder -k 3,2 |

aggregate -p -k 3,2 -c 1 |

csvformat |

gzip - > newlog.csv.gz

Page 148: Scaling MySQL Strategies for Developers

Indexes

Partitioning

Sub Partitioning

Intensive Table

Optimization

Write Buffers

Innodb Log File Size

MySQL Settings

Hardware

OS Settings

Sharding

Bypass SQL Layer

Scaling Writes

Remove Bottlenecks

Page 149: Scaling MySQL Strategies for Developers

Scaling Writes

• Less Indexes

• Less writes

• Partitioning

• Less table maintenance

• Less B-tree levels

• Less need to organize blocks

• Algorithm overhead

• Mutex/Locks

Page 150: Scaling MySQL Strategies for Developers

Scaling Writes

• Bypass SQL Layer

• Innodb/Memcached

• HandlerSocket

SQL Parser

Optimizer

Storage Engine

* Raik

Page 151: Scaling MySQL Strategies for Developers

Scaling Writes

• IO Scheduler

• File System (+ nobarrier, noatime, nodiratime)

• EXT 3

• EXT 4

• XFS

• ZFS

• Block Sizes

Page 152: Scaling MySQL Strategies for Developers

Scaling Writes

• Faster I/O

• Faster Disks

• SAS

• SSD

• RAID + cache

• PCIe SSD

• FusionIO

• Virident

Hardware

Page 153: Scaling MySQL Strategies for Developers

Scaling Writes

• Master/Master

• Does not scale writes

• Writes still need to replicate

• Sharding

• Does scale writes

Page 154: Scaling MySQL Strategies for Developers

Shared Nothing

By Schema

Global Data

Partitioned Data

Child Data

ID

Area

Lookup

DB App

Function

Proxy

Functional Partitioning

Hash Cross Shard

Go Fish

Hadoop

Sharding

Reporting Key

Page 155: Scaling MySQL Strategies for Developers

Sharding

• Partitioned Data

• Splitting the Data

• Vertically

• Main Tables to partition

• Child Tables

• Global Tables

• By Schema

• Shared Nothing

Page 156: Scaling MySQL Strategies for Developers

Sharding

• Partitioning by Key

• ID – CustomerID, ProductID, App

• Area – Country, City, Continent

• Hash – Random for equal spread

Page 157: Scaling MySQL Strategies for Developers

Sharding

• Which shard has the data?

• Store it in a DB

• Flexible but slower

• Some Function in your App

• Faster, less flexible

• Proxy config file

• Faster, less flexible

• Needs some app coding

Page 158: Scaling MySQL Strategies for Developers

Sharding

• Maintenance

• Backups

• Slaves

• Uptime

• Loosely coupled system

Page 159: Scaling MySQL Strategies for Developers

Sharding

• Functional Partitioning

• Different Apps

• Share some tables

Functional Partitioning

Page 160: Scaling MySQL Strategies for Developers

Sharding

• Reporting

• Go Fish – One server / Shared nothing

• Cross Shard – Many servers

• Hadoop – Aggregate to one reporting server

Page 161: Scaling MySQL Strategies for Developers

The End

Page 162: Scaling MySQL Strategies for Developers
Page 163: Scaling MySQL Strategies for Developers
Page 164: Scaling MySQL Strategies for Developers

The End

• Questions & Answers

• Email: [email protected]

• Don’t forget to rate this tutorial

Page 165: Scaling MySQL Strategies for Developers

If we have time

• MySQL 5.6

• NoSQL

• ORM

• Beyond Hadoop

• Bring-Your-Own-Problems