what causes downtime in mysql, and how can you prevent it? · top ten incident types “cause”...
TRANSCRIPT
![Page 1: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/1.jpg)
What causes downtime in MySQL, and how can you prevent it?
Espen BraekkenWebinar, 25th of Jan 2012
![Page 2: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/2.jpg)
www.percona.com
Agenda
● What is High Availability?● What Causes Downtime in MySQL?● How to Prevent Downtime● Resources
![Page 3: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/3.jpg)
www.percona.com
Part I: High Availability
![Page 4: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/4.jpg)
www.percona.com
High Availability
● Absence of Downtime● MTBF● MTTR
MTTR MTBF
![Page 5: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/5.jpg)
www.percona.com
Reducing MTTR
● Find out quickly (monitoring & alerting)● Recover quickly (redundancy & failover)
Many people focus on technology; limited, reactive
![Page 6: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/6.jpg)
www.percona.com
Increasing MTBF
● Understand failures (research, post-mortems)● Work to prevent or reduce failures
Boring, hard to justify—but proactive!
![Page 7: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/7.jpg)
www.percona.com
Goals of this Presentation
● Why does downtime happen?● Prerequisite to preventing it
● Which failures are most common?● Understand and prioritize risks
● What could have prevented the incidents?● Which preventions are effective?
![Page 8: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/8.jpg)
www.percona.com
Proactive
● “-adjective. Serving to prepare for, intervene in, or control an expected occurrence or situation, especially a negative or difficult one; anticipatory: proactive measures against crime.” — dictionary.com
![Page 9: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/9.jpg)
www.percona.com
Part II: Understanding Downtime Incidents
![Page 10: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/10.jpg)
www.percona.com
Research Background
● Our credentials● We provide emergency services for MySQL users
● Source dataset● About 200 emergency issues; 154 selected
● Identify and categorize● Location, causes, preventions of failure● Rank these three by frequency
![Page 11: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/11.jpg)
www.percona.com
What Issues Are Reported?
![Page 12: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/12.jpg)
www.percona.com
Where Incidents Occur
![Page 13: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/13.jpg)
www.percona.com
Top Ten Incident Types
“Cause” Category Count Percent
SQL Performance 20 12.9%
Data difference Replication 14 9.1%
DROP TABLE Data loss/corruption 9 5.8%
Disk full Operating environment 9 5.8%
Network Operating environment 9 5.8%
Operating system Operating environment 8 5.2%
Schema/indexing Performance 8 5.2%
InnoDB Performance 8 5.2%
Configuration Performance 7 4.5%
Configuration Replication 6 3.9%
For much more detail, see the Resources at the end of this slide deck.
![Page 14: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/14.jpg)
www.percona.com
Root Cause Analysis
● Incidents have causes, but not “root” causes● There is always a chain of failures● A single intervention is a prevention
![Page 15: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/15.jpg)
www.percona.com
#1 Cause of Downtime
● Lack of Change Control● Often upgrade-related, but not always● Configuration changes● Schema/query changes; deployments
● Upgrades● Careless Upgrades
– Query behavior changes, plan changes, bugs● Failure to upgrade
– Bugs, bugs, bugs
![Page 16: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/16.jpg)
www.percona.com
Part III: Prevention and Proactivity
![Page 17: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/17.jpg)
www.percona.com
What Prevents Downtime?
![Page 18: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/18.jpg)
www.percona.com
Proactivity
● Proactivity requires routine activity● It's important to document● Choose appropriate schedules for activities
![Page 19: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/19.jpg)
www.percona.com
Documentation
● Document the how (transcript) & what (result)
![Page 20: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/20.jpg)
www.percona.com
Scheduling
● Choose appropriate schedules for activities● One-time tasks● Weekly● Monthly● On-demand, irregular
● Following slides list some main points● For much greater detail, see Resources at end.
![Page 21: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/21.jpg)
www.percona.com
One-Time Tasks
● Inspect the server and application● External systems● Storage● Privileges● Basic configuration settings
![Page 22: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/22.jpg)
www.percona.com
One-Time Tasks Cont'd
● Monitor and alerting● Be frugal● Avoid false positives● Monitor for problems, not heuristics
● Metrics and trending● Capture everything● Keep as long as practical
![Page 23: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/23.jpg)
www.percona.com
Weekly Tasks
● Predict replication lag● Predict performance problems
● Use cheap & fast “black-box” analysis
● Analyze workload performance● Find schema, indexing, data distribution, and query
problems
![Page 24: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/24.jpg)
www.percona.com
Weekly Tasks Cont'd
● Review new queries● Review schema changes● Compare my.cnf to SHOW VARIABLES● Validate backups
![Page 25: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/25.jpg)
www.percona.com
Weekly Tasks Cont'd
● Check for corruption● Review all logs, prove they work● Verify that alerts get through● Check if it's time to restart
![Page 26: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/26.jpg)
www.percona.com
Monthly Tasks
● Review backup & recovery procedures & policy● Test restore and recovery● Review privileges and security● Review archiving/purging plan● Clean up accumulated cruft
![Page 27: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/27.jpg)
www.percona.com
Irregular & On-Demand Tasks
● Check schema changes● Review change logs for upgrades● Restart systems● Test upgrades● Apply one-time tasks to new servers
![Page 28: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/28.jpg)
www.percona.com
Steven Covey's Four Quadrants
UrgentImportant
Not UrgentImportant
UrgentNot Important
Not UrgentNot Important
1 2
3 4
![Page 29: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/29.jpg)
www.percona.com
Steven Covey's Four Quadrants
UrgentImportant
Not UrgentImportant
UrgentNot Important
Not UrgentNot Important
1 2
3 4
![Page 30: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/30.jpg)
www.percona.com
Urgent; Not Important
![Page 31: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/31.jpg)
www.percona.com
Urgent; Not Important
![Page 32: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/32.jpg)
www.percona.com
The Second Quadrant
● Important; Not Urgent● Test restore & recovery● Look for early warnings● Manage and validate changes
Practice. Develop insight and understanding.
![Page 33: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/33.jpg)
www.percona.com
Conclusion
It's not sexy to be proactive, but it works.
![Page 34: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/34.jpg)
www.percona.com
Resources - I
● All research and results are available online:● percona.com/about-us/mysql-white-papers/● Causes of Downtime in Production MySQL Servers● Preventing MySQL Emergencies
– Detailed activity lists and scheduling suggestions– Detailed advice on what to monitor
● Good reading:● How Complex Systems Fail (Richard Cook)● What The Dog Saw (Malcolm Gladwell)
![Page 35: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/35.jpg)
www.percona.com
Resources - II
● Percona toolkit docs, downloads, PDF manual:● http://www.percona.com/software/percona-toolkit/
● Forum:● http://forum.percona.com/
● Mailing list:● https://groups.google.com/group/percona-
discussion/
● Training courses worldwide:● http://www.percona.com/training
![Page 36: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/36.jpg)
www.percona.com
Resources - III
● High Performance MySQL● 3rd Edition available in April http://t.co/OWG817iz
● Online MySQL Configuration Wizard● http://tools.percona.com/
● MySQL User's Conference April 10-12● http://www.percona.com/live/● Breakout sessions have recently been announced
![Page 37: What Causes Downtime in MySQL, and How Can You Prevent It? · Top Ten Incident Types “Cause” Category Count Percent SQL Performance 20 12.9% Data difference Replication 14 9.1%](https://reader034.vdocuments.mx/reader034/viewer/2022042316/5f054adf7e708231d4123dd5/html5/thumbnails/37.jpg)
www.percona.com
QA