monitoring mysql with prometheus and grafana

Post on 28-Jan-2018

594 Views

Category:

Technology

19 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Monitoring MySQL withPrometheus & Grafana

Julien Pivotto (@roidelapluie)

Open Source Monitoring Conference

November 22nd, 2017

SELECT USER();Julien "roidelapluie" Pivotto

@roidelapluie

Sysadmin at inuits

Automation, monitoring, HA

MySQL/MariaDB user/admin/contributor

Grafana and Prometheus user/contributor

inuits

Monitoring /Observability

Monitoring or Observability?Collecting lots of data

Not only for alerting, also for understanding

You don't know what you will need

Who decides what to monitor?Pre-Define checks

Check only what os needed

OR

Let the system you monitor decide what toexpose

Alert based on that data

Push or pull ?Push: need to know and configure where yourserver is

What if 2 3 10 servers?

What in dev?

Pull: Let any server take the metricsUse service discovery to know what tofetch

High availabilityMonitoring is very important

Metamonitoring

What is the price to get a highly availablemonitoring system?

Prometheushttps://prometheus.io/

PrometheusPrometheus is a Cloud-Native Data-Centric Open-Source Performant Simple metrics collection,analysis and alerting tool.

Nothing more.

Cloud NativeVery easy to configure, deploy, maintain

Designed in multiple services

Container ready

Orchestration ready (dynamic config)

Open SourceApache 2.0

CNCF

Go

Support for multiple OS

Many "exporters":https://github.com/prometheus/prometheus/wiki/Defau

lt-port-allocations

PerformancePrometheus is designed to fetch data in aninterval measured in SECONDS

Designed to handle lots of metrics

New storage engine in 2.0 to scale even betterand support usecases like kubernetes

Data CentricA Metric in Prometheus has metadata:

myql_global_status_handlers_total{handler="tmp_write"} 1122

And lots of function to filter, change, remove...those metadata while fetching them.

Metrics typeCounters (always go up)

Gauge (go up and down)

Histograms (aggregate by buckets)

Summary (percentiles) (most of the timeuseless)

A word aboutPrometheus vs Graphite

Prometheus does not see a metric as an "event".Metrics are current value until they are replaced.You can not see when a metric has been includedin Prometheus.For Events, Prometheus refers to Elasticsearch.

How?Creative Commons Attribution 2.0 https://www.flickr.com/photos/75001512@N00/6824648824

Warning : this is complicated.

Prometheus uses HTTP andPLAIN TEXT.

http(s) is supported

basic auth / token also

(exposing https is not Prometheus job)

$ curl http://127.0.0.1:9090/metrics# HELP prometheus_notifications_queue_length# The number of alert notifications in the queue.# TYPE prometheus_notifications_queue_length gaugeprometheus_notifications_queue_length 0# HELP prometheus_notifications_sent_total# Total number of alerts successfully sent.# TYPE prometheus_notifications_sent_total counterprometheus_notifications_sent_total{alertmanager="127.0.0.1:9093"} 4.796464e+06

$ curl http://127.0.0.1:9090/metricsprometheus_notifications_queue_length 0prometheus_notifications_sent_total{alertmanager="127.0.0.1:9093"} 4.796464e+06

Anything that can embed a httpserver can serve prometheus

metrics.

Native Prometheus integrationsKubernetes

Ceph

etcd

telegraph

Grafana

mgmt

...

What when you can't?Linux (not counting TUX web server)

MySQL

Third party software

Third party services

ExportersCreative Commons Attribution 2.0 https://www.flickr.com/photos/fuzzy/563198182

ExportersExporters expose metrics with an HTTP API

They connect to the real target

Bindings available for many languages

Exporters do not save data ; they are not"proxies" and don't "cache" anything

Official exportersConsul, Memcached, MySQL, Node/System,HAProxy, AWS Cloudwatch, Collectd, Graphite,Statsd, JMX, influxdb, SNMP, Blackbox

Node exporterLinux Metrics

Including: CPU, Memory, Disk space,networking, load, time...

TextfileAs part of the node_exporter, you can writemetrics in file

Scripts output, etc ...

Blackbox exporterHTTP, DNS, TCP sockets, ICMP...

http://127.0.0.1:9115/probe?target=https://inuits.eu&module=http_2xxprobe_ssl_earliest_cert_expiry 1.52697083e+09

MySQLCreative Commons Attribution 2.0 https://www.flickr.com/photos/lurkerm/262541595

FrequencySome queries are expensive

You can decide to fetch some data every 1m,others every 10s...

Wait .. Exporters don't cache??How do I fetch some data every

10s and others every 1m?

scrape_configs:­ job_name: 'mysql global status'  scrape_interval: 10s  static_configs:  ­ targets:    ­ '172.31.14.3:9104'  params:    collect[]:    ­ global_status

­ job_name: 'mysql performance'  scrape_interval: 1m  static_configs:  ­ targets:    ­ '172.31.14.3:9104'  params:    collect[]:    ­ perf_schema.tableiowaits    ­ perf_schema.indexiowaits    ­ perf_schema.tablelocks

MySQL ReplicationMySQL Master <-> MySQL Master

MySQL Master -> MySQL Slave

MySQL Master -> MySQL Slave -> MySQLSlave

MySQL Masters -> MySQL Slaves -> MySQLSlaves -> MySQL Slaves

MySQL Master -> MySQL Slaves

pt-heartbeatpt-heartbeart is a daemon that updates an entrywith current timestamp on a mysql server everysecond.

On the replica, you can check the timestamp anddo  NOW ­ timestamp  to get the real lag.

+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| ts                         | server_id |+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| 2017­08­17T16:55:01.001030 |         1 |+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+

pt-heartbeatGPL

Perl

Part of percona toolkit

wait, mysql has that nativelymysql> SHOW SLAVE STATUS\G...Seconds_Behind_Master: 0...

aka mysqld_exporter metric:

 mysql_slave_lag_seconds 

BugsFixes for Seconds_Behind_Master in: 5.7.18,5.6.36, 5.6.23, 5.6.16.

How it worksChecks the heartbeat table (SQL query). It's notcalling the  pt­heartbeat  cli. So it is independantfrom it.

CLI flagscollect.heartbeat

collect.heartbeat.database

collect.heartbeat.table

Metricsmysql_heartbeat_stored_timestamp_seconds{server_id="1"}mysql_heartbeat_now_timestamp_seconds{server_id="1"}

Back to PrometheusCreative Commons Attribution 2.0 https://www.flickr.com/photos/andrepierre/14843110667

Exploring Metrics

Exploring Metrics

Exploring Metrics

Exploring Metrics

PromQLmysql_global_status_commands_total

PromQLmysql_global_status_commands_total{command="select"}

PromQLmysql_global_status_commands_total

{command=~"select|set_options"}

PromQLmysql_global_status_commands_total{command=~"select|set

_options"}

PromQLderiv(mysql_global_status_connections[5m])

PromQLmysql_up == 0

PromQLsum( avg_over_time(

mysql_info_schema_threads[10m])) by

(instance) ­ sum(

avg_over_time(mysql_info_schema_thread

s[10m] offset 1d)) by (instance)

PromQL{__name__=~".+innodb.+cache.*"}

predict_linear(mysql_heartbeat_lag_seconds[5m], 60*2)

sum(rate(mysql_global_status_commands_total{command=~"

(commit|rollback)"}[5m]))

Recording and alerting rules

RulesPrometheus can record extra metrics

Make expensive calculations

Alert if value is present

groups:­ name: mysql  rules:  ­ record: mysql_heartbeat_lag_second    expr: |      mysql_heartbeat_now_timestamp_seconds ­      mysql_heartbeat_stored_timestamp_seconds  ­ alert: MysqlReplicationNotRunning    expr: |      mysql_slave_status_slave_io_running == 0 OR      mysql_slave_status_slave_sql_running == 0    for: 2m    annotations:          summary: "Slave replication is not running"  ­ alert: MySQLReplicationLag    expr: |     (mysql_slave_lag_seconds > 30)     AND on (instance)     (predict_linear(       mysql_slave_lag_seconds[5m], 60*2) > 0)    for: 5m    labels:      priority: immediate

AlertmanagerWhen prometheus has an alert, it sends it.

Every minute by default, as long as the alert isongoing

Alertmanager, a separated daemon, do therest of the work

What is alertmanager doing?Grouping

Inhibition

Silence

Notify

grouping alerts5 nodes are down. Do you want 5 email?

group_by: ['alertname', 'cluster', 'service']

inhibitionDatacenter is on fire. Do you want to know thatswitches, hosts, services are down?

­ source_match:    severity: 'critical'  target_match:    severity: 'warning'

SilenceI upgrade kernel and reboot the service. Do I wantmails during that time?

NotificationsAlertmanager can notify you with:

Mails

Webhooks

3rd party services

SMS (sachet: 3rd party integration usingwebhook)

Alerting routesYou can send alerts to defined people basedon routes

Everything to logs mailboxCritical alerts

Network to net team SMS

Svc to app team SMS

Warning alertsNetwork to net team mail

Svc to app team mail

High Availability: PrometheusHave different prometheis servers with thesame config

They do not talk to each other

All of them fetch the same data

They monitor each other

High availability: AlertmanagerHave multiple alertmanagers with the sameconfig

Prometheis send alerts to all alertmanagers

Alert manager talk to each other not to sendthe same notification

One tool does one job...Prometheus will collect data

Alertmanager will send notifications

Exporters will expose data

Grafana will graph data

GrafanaOpen Source (Apache 2.0)

Web app

Specialized in visualization

Pluggable

Multiple datasources: prometheus, graphite,influxdb...

Has an API!

History of GrafanaGrafana is a fork of Kibana 3 ; used to be JS-Driven.

Now fully featured, requires a database, multi-projects/users support, etc...

Grafana and PrometheusPrometheus shipped its own consoles

Now it recommends Grafana and deprecatedits own consoles

Grafana Dashboards

Grafana Dashboards

Time Picker

Configure Prometheus inGrafana

Configure Prometheus inGrafana

Prometheus Dashboard

Multiple Prometheus instancesYou can add multiple prometheus instanceson grafana

You can add dropdown on the top to selectwhich one you want to use

Use case: prometheus HA, local prometheus(with access mode=direct)

Creating Grafana DashboardsTakes time

Requires deep knowledge of the tools

Improved over time

Easy to share (json + online library)

Percona Grafana DashboardPercona Open Sourced Grafana Dashboards

Covering MySQL, Mongo and Linux monitoring

Part of a bigger picture, PMM, but usablestandalone

Open Source (AGPL!)

https://github.com/percona/grafana-dashboards

Installing Percona Graphes

Method 1

Enable File dashboards in Grafana

Clone grafana-dashboards to the configuredlocation (or make a package)

Method 2

Use the Grafana API to upload the JSON's.

MySQL SetupYou'll need mysqld_exporter, with a user

MySQL 5.1+

Performance Schema for full set of metrics

mysqld_exporter-collect.binlog_size=true-collect.info_schema.processlist=true`

node_exporter setupnode_exporter-collectors.enabled="diskstats,filefd,filesystem,loadavg,meminfo,netdev,stat,time,uname,vmstat"

Prometheus (static file)scrape_configs:  ­ job_name: prometheus    static_configs:      ­ targets: ['localhost:9090']        labels:          instance: prometheus

  ­ job_name: linux    static_configs:      ­ targets: ['10.0.98.43:9100']        labels:          instance: db1

  ­ job_name: mysql    static_configs:      ­ targets: ['10.0.98.43:9104']        labels:          instance: db1

Dashboards

Dashboards

Dashboards

We don't need all of them?Because Grafana is just viz, you can importonly the one you want (e.g. exclude Mongo)

You can import later any extra dashboard youneed

MySQL Overview

MySQL Overview

MySQL Overview

InnoDB

InnoDB

InnoDB

Replication

Other grafana featuresMulti tenant

Deployments

Folders for dashboards (comming soon)

Jsonnet library to create dashboard

Brand new, still WIP

https://github.com/grafana/grafonnet-lib

ConclusionsPrometheus and Grafana are first-classmonitoring tools

Totally different approach than other tools

Embeddable into your apps

Percona Dashboards gets your graphes readyin no-time with minimal efforts

Julien Pivottoroidelapluie

roidelapluie@inuits.eu

Inuitshttps://inuits.euinfo@inuits.eu

Contact

top related