monitoring mysql replication lag with prometheus & pt-heartbeat
TRANSCRIPT
Monitoring MySQL Replication Delaywith mysqld_exporter & pt-heartbeat
Julien Pivotto (@roidelapluie)
PromConf Munich
Augustus 18, 2017
SELECT USER();Julien "roidelapluie" Pivotto
@roidelapluie
Sysadmin at inuits
Automation, monitoring, HA
MySQL/MariaDB user/admin/contributor
Grafana and Prometheus user/contributor
inuits
MySQL ReplicationMySQL Master <-> MySQL Master
MySQL Master -> MySQL Slave
MySQL Master -> MySQL Slave -> MySQLSlave
MySQL Masters -> MySQL Slaves -> MySQLSlaves -> MySQL Slaves
MySQL Master -> MySQL Slaves
mysqld_exporter
mysqld_exporter
mysqld_exporter is greatLots of data
Lots of alerts examples
Percona's Graphana dashboard brings dozensof useful dashboards
Migrating to Prometheus does not mean that weshould forget the past ... Or lower our monitoringexpectations.
pt-heartbeatpt-heartbeart is a daemon that updates an entrywith current timestamp on a mysql server everysecond.
On the replica, you can check the timestamp anddo NOW timestamp to get the real lag.
+++| ts | server_id |+++| 20170817T16:55:01.001030 | 1 |+++
pt-heartbeatGPL
Perl
Part of percona toolkit
pt-heartbeatOur previous monitoring tool (munin) had supportfor pt-heartbeat. Prometheus mysqld_exporterdidn't.
wait, mysql has that nativelymysql> SHOW SLAVE STATUS\G...Seconds_Behind_Master: 0...
aka mysqld_exporter metric:
mysql_slave_lag_seconds
BugsFixes for Seconds_Behind_Master in: 5.7.18,5.6.36, 5.6.23, 5.6.16.
pt-heartbeat is usefulOkay, so we had that thing, now we move toprometheus, we don't want to losethat thing.
:idea_emoji: let's implement this!
Pull Request 183https://github.com/prometheus/mysqld_exporter/pull/183
Opened Feb 20
Merged Feb 21
How it worksChecks the heartbeat table (SQL query). It's notcalling the ptheartbeat cli. So it is independantfrom it.
CLI flagscollect.heartbeat
collect.heartbeat.database
collect.heartbeat.table
Metricsmysql_heartbeat_stored_timestamp_seconds{server_id="1"}mysql_heartbeat_now_timestamp_seconds{server_id="1"}
Recording Lagmysql_heartbeat_lag_seconds = mysql_heartbeat_now_timestamp_seconds mysql_heartbeat_stored_timestamp_seconds
https://github.com/prometheus/mysqld_exporter/blob/master/example.rules
AlertALERT MySQLReplicationLag IF (mysql_heartbeat_lag_seconds > 30) AND on (instance) (predict_linear(mysql_heartbeat_lag_seconds[5m], 60*2) > 0) FOR 1m LABELS { severity = "critical" } ANNOTATIONS { summary = "MySQL slave replication is lagging", description = "The mysql slave replication has fallen behind and is not recovering", }
https://github.com/prometheus/mysqld_exporter/blob/master/example.rules
Contributing to PerconaGrafana Dashboards
less great
PR opened Feb 23
Still open
Takeawayscontributing to prometheus is easy
pt-heartbeat is the way to monitor mysqlreplication lag
and now it's available in prometheus
any volunteers to rewrite pt-heartbeat in go? :)