monitoring mysql - blog.koehntopp.deblog.koehntopp.de/uploads/monitoring_mysql_slides_en.pdf ·...
TRANSCRIPT
Monitoring MySQL
Kristian Köhntopp
Mittwoch, 28. Oktober 2009
I am...
• Kristian Köhntopp
• Database architecture at a small travel agency in Amsterdam
• In previous lives: MySQL, web.de, NetUSE, MMC Kiel, PHP, PHPLIB, various FAQs and Howto
Mittwoch, 28. Oktober 2009
You are• Job…
• DBA, Developer, General IT, IT Management
• Using version...
• 3.23, 4.0, 4.1, 5.0, 5.1
• Using MySQL for...
• Webapps, Enterprise, Embedded, ...
• Number of servers...
• 1, <3, <10, <25, <100, 100 or more
Mittwoch, 28. Oktober 2009
Why Monitoring?
• Audience
• Consumers of monitoring data
• Metrics
• What kind of data necessary?
• Toolbox
• Which kind of tool to use?
Mittwoch, 28. Oktober 2009
Why Monitoring?
• Who requires monitoring?
• Operations (Incident)
• Infrastructure Development (Capacity)
• Feature Development (Debug)
• Compliance (SLA, Legal)
Mittwoch, 28. Oktober 2009
Why Monitoring?
• Each kind of monitoring has different
• Purpose
• Metric
• Raw data, Notification latency
• Deliverable
• HA requirements
Mittwoch, 28. Oktober 2009
Incident detection
• Purpose: “Are we still online?”
• Metric: “High level availability test w/ binary outcome”
• Deliverable: Ticket to Helpdesk ➜ Operating ➜ Incident Management
• Latency: Seconds
• HA Requirement: Minutes, high
Mittwoch, 28. Oktober 2009
Capacity planning• Purpose: “When can I guarantee server
overload?” (Negative SLA)
• Metric: detailed records of variables in all subsystems
• Deliverable: weekly/monthly report to IT Management, general Management
• Latency: days
• HA Requirement: days/lowMittwoch, 28. Oktober 2009
Debugging• Purpose: “Which query crashes the
server? Why is this statement slow?”
• Metric: detailed records of variables while processing a single query
• Deliverable: individual report to single developer
• Latency: Seconds, Minutes
• HA Requirement: noneMittwoch, 28. Oktober 2009
Compliance• Purpose: “Are we fulfilling our
contracts?”
• Metric: “high level availability tests w/ binary outcome”, query times
• Deliverable: weekly/monthly report to IT management/customer
• Latency: Days
• HA Requirement: days, lowMittwoch, 28. Oktober 2009
For Audit• Purpose:
• Detect tampering, alteration and access, create accountability records for changes and access
• Metric: high level event records with application semantics
• additionally: inescapable, unforgeable
• Deliverable: daily/weekly report
• HA Requirement: Out of pathMittwoch, 28. Oktober 2009
Metrics
• Data sources at OS level
• Data sources in MySQL
• Derived data sources
Mittwoch, 28. Oktober 2009
OS Level
• Internal Availability Check:
• presence of PID file
• presence of process
• test -f linux.pid is not good enough
• “kill -0 $(cat linux.pid)” is better than “ps axuwww | grep mysql[d]”
Mittwoch, 28. Oktober 2009
OS Level
• External Availability Check :
• ping check
• trivial query
• Set timeouts for the trivial query according to SLA
Mittwoch, 28. Oktober 2009
OS Level
• Memory Checks:
• process size in memory
• VSIZE vs. RSS
• buffer cache size (“free -m”)
• swap check!
• vm.swappiness = 0
Mittwoch, 28. Oktober 2009
OS Level
• “iostat -x 1 3” output
• In general, databases are limited by seek/sec, not MB/sec
• SSD, FusionIO
• Network I/O quality
• Smokeping? (Cluster!)
Mittwoch, 28. Oktober 2009
OS Level
Mittwoch, 28. Oktober 2009
MySQL
• General Counters and Config:
• SHOW /*!50000 GLOBAL */ STATUS;
• SHOW /*!50000 GLOBAL */ VARIABLES;
• What is running?
• SHOW FULL PROCESSLIST;
Mittwoch, 28. Oktober 2009
MySQL
• File Handles:
• SHOW TABLE STATUS;
• SHOW OPEN TABLES;
Mittwoch, 28. Oktober 2009
MySQL
• Replication:
• SHOW SLAVE STATUS;
• SHOW MASTER LOGS;
• SHOW MASTER STATUS;
Mittwoch, 28. Oktober 2009
MySQL
• InnoDB:
• SHOW ENGINE INNODB STATUS;
• SHOW GLOBAL STATUS LIKE 'inno%';
Mittwoch, 28. Oktober 2009
Status: General•qps: questions/uptime
•COM_% Counters:
Mittwoch, 28. Oktober 2009
Status: General
• COM_% Counters
• Read/Writes:( select + qcache_hits ) / ( insert+update+delete+replace )
• Transactions:#commit, rollback/commit, writes/commit
Mittwoch, 28. Oktober 2009
Status: Caches
• Table Cache
• (opened_tables/sec )
• (table_cache_size – open_tables)
• Thread Cache
• (threads_created/sec)
• (thread_cache_size – threads_cached)
Mittwoch, 28. Oktober 2009
Status: Caches
Mittwoch, 28. Oktober 2009
Status: Caches
• Query-Cache Hit Ratio:
• qcache_hits*100 / ( qcache_hits + com_select )
• Hits vs. Inserts vs. Not Cached
Mittwoch, 28. Oktober 2009
Status: Caches
• Lowmem Prunes:
• qcache_lowmem_prunes / uptime
• qcache_lowmem_prunes per second
Mittwoch, 28. Oktober 2009
Status: Caches
• Increase query cache size:
• less prunes, higher hit ratio
• Sometimes it is better to delay writes or to split tables instead
Mittwoch, 28. Oktober 2009
Status: Caches
Mittwoch, 28. Oktober 2009
Status: Connections
• Connections
• max_connections – max_used_connections
• max_connections - threads_connected
Mittwoch, 28. Oktober 2009
Status: Connections
Mittwoch, 28. Oktober 2009
Status: MyISAM
• Key Cache Hit Ratio:
• key_read_requests / key_read
• 300-1000 target
• 99.7% or better hit ratio
Mittwoch, 28. Oktober 2009
Status: MyISAM
• MyISAM Lock Contention
• table_locks_waited * 100 / table_locks_immediate
• <1% good, 1% warning, >3% you are currently dying
• distinctly nonlinear behavior
Mittwoch, 28. Oktober 2009
Status: MyISAM
Mittwoch, 28. Oktober 2009
Status: InnoDB
• Page Cache Usage:
• Innodb_buffer_pool_pages_free *100 / Innodb_buffer_pool_pages_total
• Cache Miss Ratio:
• (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)*100
• target: <3%, <1%
Mittwoch, 28. Oktober 2009
Status: InnoDB
• Cache Monitoring: Innodb_buffer_pool_wait_freemust not count up!
• Log-Monitoring: Innodb_log_waitsmust not count up!
Mittwoch, 28. Oktober 2009
Status: InnoDB
• InnoDB has many more stats
• See Innotop output, read up on theory
• Worth a talk of its own
Mittwoch, 28. Oktober 2009
Status: temp tables
• Temp tables per second:
• created_tmp_tables
• Temp tables to Disk:
• created_disk_tmp_tables * 100 / created_tmp_tables
Mittwoch, 28. Oktober 2009
Status: temp tables
• Additional hints:
• What kind of filesystem is tmpdir pointing to?
• Are we selecting BLOB/TEXT types?
• tmp_table size and max_heap_table_size must match
Mittwoch, 28. Oktober 2009
Status: temp tables
Mittwoch, 28. Oktober 2009
Status: Replication• Functionality:
• Slave_IO_running: YES, Slave_SQL_running: YES
• Lag:
• Seconds_behind_master
• Rate:
• Read_Master_Log_Pos/sec,
Mittwoch, 28. Oktober 2009
Status: Replication
Mittwoch, 28. Oktober 2009
Status: Slow Queries
• Slow Queries in general:
• Slow_queries/sec
• Counting evil queries:
• select_full_join / sec
• select_full_join / com_select
Mittwoch, 28. Oktober 2009
Toolbox: Incidents
• Incident detection:
• Nagios family
• Load Balancer Live Check
• Post Mortem:
• A small shell script running per minute
Mittwoch, 28. Oktober 2009
Toolbox: Incidents
• Nagios Plugins Quality:
• Bad checks
• Scripts
• Not compliant w/ standards
• Incident monitors vs. Compliance monitors
Mittwoch, 28. Oktober 2009
Toolbox: Post Mortem• Record per minute, keep one week
• logged to /var/log/mysql_pl
• uptime, ps auxwww, df -Th, free -m
• show full processlist; show slave status;
• show engine innodb status
• if HAVE_INNODB == YES
Mittwoch, 28. Oktober 2009
Toolbox: Incidents
Mittwoch, 28. Oktober 2009
Toolbox: Capacity
• MySQL Enterprise Monitor
• Cacti or Munin
• Actual overload tests
• increase LB weights
• monitor latency of standardized probe to detect breakage
Mittwoch, 28. Oktober 2009
MEM
Mittwoch, 28. Oktober 2009
Tools: Cacti
• Shiny, but creating templates is a pain!
• Ready-made templates from
• http://code.google.com /p/mysql-cacti-templates/
Mittwoch, 28. Oktober 2009
Tools: Capacity
• Free MySQL SNMP tools are rare
• Exporting Variables, Status and Slave Status to SNMP:
• Perl Coprocess (PoC at best)
• http://mysqldump.azundris.com/archives/63-guid.html
Mittwoch, 28. Oktober 2009
Toolbox: Console
• inntop (mytop is dead)
• maatkit (indispenseable)
• tuning-primer.sh
• http://www.day32.com/MySQL/
• Self-written tools
• Establish a culture of tool creation
Mittwoch, 28. Oktober 2009
Toolbox: Debug
• MEM w/ proxy
• Proxy sometimes problematic
• Alternative: Rig DB access class
• mk-query-digest
• w/ SPAN port at switch
Mittwoch, 28. Oktober 2009
Toolbox: Debug
Mittwoch, 28. Oktober 2009
Toolbox: Audit
• Trailing controls
• Agile development
• Dump comparison
• mysqldump --no-data & git & diff
• mk-show-grants & git & diff
• etc.
Mittwoch, 28. Oktober 2009
Toolbox: Audit
• We have very many servers
• Critical data on isolated servers
• Limited access
• Wonders of an unlimited license
Mittwoch, 28. Oktober 2009
Mittwoch, 28. Oktober 2009