monitoring mysql with prometheus and grafana

of 107 /107
Monitoring MySQL with Prometheus & Grafana Julien Pivotto (@roidelapluie) Open Source Monitoring Conference November 22nd, 2017

Author: julien-pivotto

Post on 28-Jan-2018




19 download

Embed Size (px)


  1. 1. Monitoring MySQL with Prometheus & Grafana Julien Pivotto (@roidelapluie) Open Source Monitoring Conference November 22nd, 2017
  2. 2. SELECT USER(); Julien "roidelapluie" Pivotto @roidelapluie Sysadmin at inuits Automation, monitoring, HA MySQL/MariaDB user/admin/contributor Grafana and Prometheus user/contributor
  3. 3. inuits
  4. 4. Monitoring / Observability
  5. 5. Monitoring or Observability? Collecting lots of data Not only for alerting, also for understanding You don't know what you will need
  6. 6. Who decides what to monitor? Pre-Define checks Check only what os needed OR Let the system you monitor decide what to expose Alert based on that data
  7. 7. Push or pull ? Push: need to know and configure where your server is What if 2 3 10 servers? What in dev? Pull: Let any server take the metrics Use service discovery to know what to fetch
  8. 8. High availability Monitoring is very important Metamonitoring What is the price to get a highly available monitoring system?
  9. 9. Prometheus
  10. 10. Prometheus Prometheus is a Cloud-Native Data-Centric Open- Source Performant Simple metrics collection, analysis and alerting tool. Nothing more.
  11. 11. Cloud Native Very easy to configure, deploy, maintain Designed in multiple services Container ready Orchestration ready (dynamic config)
  12. 12. Open Source Apache 2.0 CNCF Go Support for multiple OS Many "exporters": lt-port-allocations
  13. 13. Performance Prometheus is designed to fetch data in an interval measured in SECONDS Designed to handle lots of metrics New storage engine in 2.0 to scale even better and support usecases like kubernetes
  14. 14. Data Centric A Metric in Prometheus has metadata: myql_global_status_handlers_total{handler="tmp_write"} 1122 And lots of function to filter, change, remove... those metadata while fetching them.
  15. 15. Metrics type Counters (always go up) Gauge (go up and down) Histograms (aggregate by buckets) Summary (percentiles) (most of the time useless)
  16. 16. A word about Prometheus vs Graphite Prometheus does not see a metric as an "event". Metrics are current value until they are replaced. You can not see when a metric has been included in Prometheus. For Events, Prometheus refers to Elasticsearch.
  17. 17. How? Creative Commons Attribution 2.0[email protected]/6824648824
  18. 18. Warning : this is complicated.
  19. 19. Prometheus uses HTTP and PLAIN TEXT. http(s) is supported basic auth / token also (exposing https is not Prometheus job)
  20. 20. $ curl # HELP prometheus_notifications_queue_length # The number of alert notifications in the queue. # TYPE prometheus_notifications_queue_length gauge prometheus_notifications_queue_length 0 # HELP prometheus_notifications_sent_total # Total number of alerts successfully sent. # TYPE prometheus_notifications_sent_total counter prometheus_notifications_sent_total{alertmanager=" :9093"} 4.796464e+06
  21. 21. $ curl prometheus_notifications_queue_length 0 prometheus_notifications_sent_total{alertmanager=" :9093"} 4.796464e+06
  22. 22. Anything that can embed a http server can serve prometheus metrics.
  23. 23. Native Prometheus integrations Kubernetes Ceph etcd telegraph Grafana mgmt ...
  24. 24. What when you can't? Linux (not counting TUX web server) MySQL Third party software Third party services
  25. 25. Exporters Creative Commons Attribution 2.0
  26. 26. Exporters Exporters expose metrics with an HTTP API They connect to the real target Bindings available for many languages Exporters do not save data ; they are not "proxies" and don't "cache" anything
  27. 27. Official exporters Consul, Memcached, MySQL, Node/System, HAProxy, AWS Cloudwatch, Collectd, Graphite, Statsd, JMX, influxdb, SNMP, Blackbox
  28. 28. Node exporter Linux Metrics Including: CPU, Memory, Disk space, networking, load, time...
  29. 29. Textfile As part of the node_exporter, you can write metrics in file Scripts output, etc ...
  30. 30. Blackbox exporter HTTP, DNS, TCP sockets, ICMP... target= probe_ssl_earliest_cert_expiry 1.52697083e+09
  31. 31. MySQL Creative Commons Attribution 2.0
  32. 32. Frequency Some queries are expensive You can decide to fetch some data every 1m, others every 10s...
  33. 33. Wait .. Exporters don't cache?? How do I fetch some data every 10s and others every 1m?
  34. 34. scrape_configs: job_name:'mysqlglobalstatus' scrape_interval:10s static_configs: targets: '' params: collect[]: global_status job_name:'mysqlperformance' scrape_interval:1m static_configs: targets: '' params: collect[]: perf_schema.tableiowaits perf_schema.indexiowaits perf_schema.tablelocks
  35. 35. MySQL Replication MySQL Master MySQL Master MySQL Master -> MySQL Slave MySQL Master -> MySQL Slave -> MySQL Slave MySQL Masters -> MySQL Slaves -> MySQL Slaves -> MySQL Slaves MySQL Master -> MySQL Slaves
  36. 36. pt-heartbeat pt-heartbeart is a daemon that updates an entry with current timestamp on a mysql server every second. On the replica, you can check the timestamp and do NOWtimestampto get the real lag. +++ |ts|server_id| +++ |20170817T16:55:01.001030|1| +++
  37. 37. pt-heartbeat GPL Perl Part of percona toolkit
  38. 38. wait, mysql has that natively mysql>SHOWSLAVESTATUSG ... Seconds_Behind_Master:0 ... aka mysqld_exporter metric: mysql_slave_lag_seconds
  39. 39. Bugs Fixes for Seconds_Behind_Master in: 5.7.18, 5.6.36, 5.6.23, 5.6.16.
  40. 40. How it works Checks the heartbeat table (SQL query). It's not calling the ptheartbeatcli. So it is independant from it.
  41. 41. CLI flags collect.heartbeat collect.heartbeat.database collect.heartbeat.table
  42. 42. Metrics mysql_heartbeat_stored_timestamp_seconds{server_id="1"} mysql_heartbeat_now_timestamp_seconds{server_id="1"}
  43. 52. PromQL deriv(mysql_global_status_connections[5m])
  44. 53. PromQL mysql_up==0
  45. 54. PromQL sum(avg_over_time( mysql_info_schema_threads[10m]))by (instance)sum( avg_over_time(mysql_info_schema_thread s[10m]offset1d))by(instance)
  46. 55. PromQL {__name__=~".+innodb.+cache.*"} predict_linear(mysql_heartbeat_lag_seconds[5m],60*2) sum(rate(mysql_global_status_commands_total{command=~" (commit|rollback)"}[5m]))
  47. 56. Recording and alerting rules
  48. 57. Rules Prometheus can record extra metrics Make expensive calculations Alert if value is present
  49. 58. groups: name:mysql rules: record:mysql_heartbeat_lag_second expr:| mysql_heartbeat_now_timestamp_seconds mysql_heartbeat_stored_timestamp_seconds alert:MysqlReplicationNotRunning expr:| mysql_slave_status_slave_io_running==0OR mysql_slave_status_slave_sql_running==0 for:2m annotations: summary:"Slavereplicationisnotrunning" alert:MySQLReplicationLag expr:| (mysql_slave_lag_seconds>30) ANDon(instance) (predict_linear( mysql_slave_lag_seconds[5m],60*2)>0) for:5m labels: priority:immediate
  50. 59. Alertmanager When prometheus has an alert, it sends it. Every minute by default, as long as the alert is ongoing Alertmanager, a separated daemon, do the rest of the work
  51. 60. What is alertmanager doing? Grouping Inhibition Silence Notify
  52. 61. grouping alerts 5 nodes are down. Do you want 5 email? group_by:['alertname','cluster','service']
  53. 62. inhibition Datacenter is on fire. Do you want to know that switches, hosts, services are down? source_match: severity:'critical' target_match: severity:'warning'
  54. 63. Silence I upgrade kernel and reboot the service. Do I want mails during that time?
  55. 64. Notifications Alertmanager can notify you with: Mails Webhooks 3rd party services SMS (sachet: 3rd party integration using webhook)
  56. 65. Alerting routes You can send alerts to defined people based on routes Everything to logs mailbox Critical alerts Network to net team SMS Svc to app team SMS Warning alerts Network to net team mail Svc to app team mail
  57. 66. High Availability: Prometheus Have different prometheis servers with the same config They do not talk to each other All of them fetch the same data They monitor each other
  58. 67. High availability: Alertmanager Have multiple alertmanagers with the same config Prometheis send alerts to all alertmanagers Alert manager talk to each other not to send the same notification
  59. 68. One tool does one job... Prometheus will collect data Alertmanager will send notifications Exporters will expose data Grafana will graph data
  60. 69. Grafana Open Source (Apache 2.0) Web app Specialized in visualization Pluggable Multiple datasources: prometheus, graphite, influxdb... Has an API!
  61. 70. History of Grafana Grafana is a fork of Kibana 3 ; used to be JS- Driven. Now fully featured, requires a database, multi- projects/users support, etc...
  62. 71. Grafana and Prometheus Prometheus shipped its own consoles Now it recommends Grafana and deprecated its own consoles
  63. 72. Grafana Dashboards
  64. 73. Grafana Dashboards
  65. 74. Time Picker
  66. 75. Configure Prometheus in Grafana
  67. 76. Configure Prometheus in Grafana
  68. 77. Prometheus Dashboard
  69. 78. Multiple Prometheus instances You can add multiple prometheus instances on grafana You can add dropdown on the top to select which one you want to use Use case: prometheus HA, local prometheus (with access mode=direct)
  70. 79. Creating Grafana Dashboards Takes time Requires deep knowledge of the tools Improved over time Easy to share (json + online library)
  71. 80. Percona Grafana Dashboard Percona Open Sourced Grafana Dashboards Covering MySQL, Mongo and Linux monitoring Part of a bigger picture, PMM, but usable standalone Open Source (AGPL!) dashboards
  72. 81. Installing Percona Graphes Method 1 Enable File dashboards in Grafana Clone grafana-dashboards to the configured location (or make a package) Method 2 Use the Grafana API to upload the JSON's.
  73. 82. MySQL Setup You'll need mysqld_exporter, with a user MySQL 5.1+ Performance Schema for full set of metrics mysqld_exporter -collect.binlog_size=true -collect.info_schema.processlist=true`
  74. 83. node_exporter setup node_exporter -collectors.enabled= "diskstats,filefd,filesystem,loadavg,meminfo,n etdev,stat,time,uname,vmstat"
  75. 84. Prometheus (static file) scrape_configs: job_name:prometheus static_configs: targets:['localhost:9090'] labels: instance:prometheus job_name:linux static_configs: targets:[''] labels: instance:db1 job_name:mysql static_configs: targets:[''] labels: instance:db1
  76. 85. Dashboards
  77. 86. Dashboards
  78. 87. Dashboards
  79. 88. We don't need all of them? Because Grafana is just viz, you can import only the one you want (e.g. exclude Mongo) You can import later any extra dashboard you need
  80. 89. MySQL Overview
  81. 90. MySQL Overview
  82. 91. MySQL Overview
  83. 92. InnoDB
  84. 93. InnoDB
  85. 94. InnoDB
  86. 95. Replication
  87. 96. Other grafana features Multi tenant Deployments Folders for dashboards (comming soon)
  88. 97. Jsonnet library to create dashboard Brand new, still WIP
  89. 98. Conclusions Prometheus and Grafana are first-class monitoring tools Totally different approach than other tools Embeddable into your apps Percona Dashboards gets your graphes ready in no-time with minimal efforts
  90. 99. Julien Pivotto roidelapluie [email protected] Inuits [email protected] Contact