monitoring mysql replication lag with prometheus & pt-heartbeat

24
Monitoring MySQL Replication Delay with mysqld_exporter & pt-heartbeat Julien Pivotto (@roidelapluie) PromConf Munich Augustus 18, 2017

Upload: julien-pivotto

Post on 28-Jan-2018

805 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Monitoring MySQL Replication Delaywith mysqld_exporter & pt-heartbeat

Julien Pivotto (@roidelapluie)

PromConf Munich

Augustus 18, 2017

Page 2: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

SELECT USER();Julien "roidelapluie" Pivotto

@roidelapluie

Sysadmin at inuits

Automation, monitoring, HA

MySQL/MariaDB user/admin/contributor

Grafana and Prometheus user/contributor

Page 3: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

inuits

Page 4: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

MySQL ReplicationMySQL Master <-> MySQL Master

MySQL Master -> MySQL Slave

MySQL Master -> MySQL Slave -> MySQLSlave

MySQL Masters -> MySQL Slaves -> MySQLSlaves -> MySQL Slaves

MySQL Master -> MySQL Slaves

Page 5: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

mysqld_exporter

Page 6: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

mysqld_exporter

Page 7: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

mysqld_exporter is greatLots of data

Lots of alerts examples

Percona's Graphana dashboard brings dozensof useful dashboards

Page 8: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Migrating to Prometheus does not mean that weshould forget the past ... Or lower our monitoringexpectations.

Page 9: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

pt-heartbeatpt-heartbeart is a daemon that updates an entrywith current timestamp on a mysql server everysecond.

On the replica, you can check the timestamp anddo  NOW ­ timestamp  to get the real lag.

+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| ts                         | server_id |+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| 2017­08­17T16:55:01.001030 |         1 |+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+

Page 10: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

pt-heartbeatGPL

Perl

Part of percona toolkit

Page 11: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

pt-heartbeatOur previous monitoring tool (munin) had supportfor pt-heartbeat. Prometheus mysqld_exporterdidn't.

Page 12: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

wait, mysql has that nativelymysql> SHOW SLAVE STATUS\G...Seconds_Behind_Master: 0...

aka mysqld_exporter metric:

 mysql_slave_lag_seconds 

Page 13: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat
Page 14: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

BugsFixes for Seconds_Behind_Master in: 5.7.18,5.6.36, 5.6.23, 5.6.16.

Page 15: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

pt-heartbeat is usefulOkay, so we had that thing, now we move toprometheus, we don't want to losethat thing.

:idea_emoji: let's implement this!

Page 16: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Pull Request 183https://github.com/prometheus/mysqld_exporter/pull/183

Opened Feb 20

Merged Feb 21

Page 17: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

How it worksChecks the heartbeat table (SQL query). It's notcalling the  pt­heartbeat  cli. So it is independantfrom it.

Page 18: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

CLI flagscollect.heartbeat

collect.heartbeat.database

collect.heartbeat.table

Page 19: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Metricsmysql_heartbeat_stored_timestamp_seconds{server_id="1"}mysql_heartbeat_now_timestamp_seconds{server_id="1"}

Page 20: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Recording Lagmysql_heartbeat_lag_seconds =    mysql_heartbeat_now_timestamp_seconds ­    mysql_heartbeat_stored_timestamp_seconds

https://github.com/prometheus/mysqld_exporter/blob/master/example.rules

Page 21: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

AlertALERT MySQLReplicationLag  IF      (mysql_heartbeat_lag_seconds > 30)    AND on (instance)      (predict_linear(mysql_heartbeat_lag_seconds[5m],       60*2) > 0)  FOR 1m  LABELS {    severity = "critical"  }  ANNOTATIONS {    summary = "MySQL slave replication is lagging",    description = "The mysql slave replication has      fallen behind and is not recovering",  }

https://github.com/prometheus/mysqld_exporter/blob/master/example.rules

Page 22: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Contributing to PerconaGrafana Dashboards

less great

PR opened Feb 23

Still open

Page 23: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Takeawayscontributing to prometheus is easy

pt-heartbeat is the way to monitor mysqlreplication lag

and now it's available in prometheus

any volunteers to rewrite pt-heartbeat in go? :)

Page 24: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat

Julien Pivottoroidelapluie

[email protected]

Inuitshttps://[email protected]

Contact