elk possibilities: log management and beyond

ELK Possibilities

- Log Management And Beyond

Ashish Billore

ELK Possible UseCases

Log Management:– Capture logs from all the services on all the nodes– Present in a centralized dashboard for easy query / visualization

Serviceability, Troubleshooting and Debug Tool:– Use ELK to query and report logs filtered:

On time range, Specific Node, Any keyword

Regulatory and Audit Requirements:– Retain logs safe with Elasticsearch database snapshots

Monitoring:– Using CollectD agent, Monitor OS Nodes: CPU, Mem, Disk, I/O, Openipmi*– Present metrics in a centralized dashboard for easy query / visualization

Alerting based on events: – Using ELK elastalert plugin, alert on resource metrics threshold. Of interest are:

CPU, Mem, Disk, I/O, ipmi based thresholds

Components and Config Options

ELK:– Components Needed and Purpose:

E, L, K Curator: Elasticsearch index management plugin: Purging older indexes Redis: In front of Logstash to optimize log ingestion in heavy traffic or logstash outage Elasticsearch to be configured with replication factor of 1, this will replicate elasticsearch documents on both nodes.

Logstash need to point to Elasticsearch cluster for indexed output

Potential Issues for Investigation: – Elasticsearch in cluster mode has master and data-nodes. Master election requires

min 3 nodes to avoid split-brain problem – Logstash is CPU intensive: need to allocate multiple core / threads for larger

datasets or too many indexes– Elasticsearch is RAM intensive, deployments with large data need proportional RAM

ELK + CollectD: Practical Considerations

Practical investigation tasks on ELK + CollectD to determine: How much overhead?

CPU, memory, disk, network usage by each server component (E,L,K)? Data increments (how much the disk usage grows overtime)? How much resource consumed by agents on nodes: CollectD, rsyslog?

Non-Functional requirements: Scalability HA recovery etc.

Reason for Optimism: ELK Combination heavily used For Distributed, scale-out applications:

OpenStack, Clouds, in communityProduction deployments at: Facebook, Uber, NetFlix, eBay, stackoverflow, Verizon, OpenStack gerrit, NASA ..More Info: https://www.elastic.co/use-cases

https://www.elastic.co/use-cases

https://www.elastic.co/use-cases

Add-on Functionalities

1. Metric collection / Reporting with CollectD: Deploy single node ELK with CollectD plugin and capture following metric data:

o CPU, Memory, Disk, I/O and IPMIo

2. Elastalert Plugin for Alerts On above setup configure Elastalert plugin Setup alerts for certain metrics: CPU, Disk space, network I/O or errors in logs Configure notification (e.g. through email/dashboard alerts)

3. Curator plugin for elasticsearch index clean-up: Configure above setup with Curator plugin Create various policies for index cleanup:

o Based on time range: index older than 1 weeko Based on condition: If disk is 80% full

4. Snapshot and restore for data backup / restore In above setup take snapshot of elasticsearch database Restore an empty elasticsearch database with above snapshot

Experiment Links for reference

ELK Data Management: https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html#archive-indices https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html#retiring-data https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html ELK Cluster, Scaling and Failover: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html https://www.elastic.co/guide/en/elasticsearch/guide/current/_add_failover.html https://www.elastic.co/guide/en/elasticsearch/guide/current/backing-up-your-cluster.html ELK Curator for roll-over, cleanup: https://www.elastic.co/guide/en/elasticsearch/client/curator/3.5/getting-started.html ELK Alerts (Elastalert): https://github.com/Yelp/elastalert ELK resource monitoring with CollectD: https://mtalavera.wordpress.com/2015/02/16/monitoring-with-collectd-and-kibana/ https://collectd.org/wiki/index.php/Table_of_Plugins

https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html#archive-indices



https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html#retiring-data

https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html#retiring-data

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html

https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/_add_failover.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/_add_failover.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/backing-up-your-cluster.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/backing-up-your-cluster.html

https://www.elastic.co/guide/en/elasticsearch/client/curator/3.5/getting-started.html

https://www.elastic.co/guide/en/elasticsearch/client/curator/3.5/getting-started.html

https://github.com/Yelp/elastalert

https://mtalavera.wordpress.com/2015/02/16/monitoring-with-collectd-and-kibana/

https://mtalavera.wordpress.com/2015/02/16/monitoring-with-collectd-and-kibana/

https://collectd.org/wiki/index.php/Table_of_Plugins

https://collectd.org/wiki/index.php/Table_of_Plugins

Quick Recap

Log Integration Framework (Apache 2.0 License): Log Collection Centralization Parsing Storage and Search Visualization: searchable Time-series Dashboards

Scale Log Management as Systems/Cloud Scale

Horizontally Scale each component as neededDevOps Friendly with chef, ansible and puppet scripts

ELK : Generic Log Management and Beyond

ELK being a generic framework: Leverage for any system that can generate logs or data over-time: Openstack, Syslog / rsyslog, DB logs, any other application logs, metrics Easy to on-board any application / service in future

• Dashboards: Time-series data visualization Multiple views for multiple information for multiple users (can be

authorization-secured) Embeddable in other widgets / pages / Views

• Elasticsearch: “google-search” for entire system logs Search available over REST JSON based indexed DB

• ELK Combination heavily used: For Distributed, scale-out applications, openstack, Clouds, in community

ELK + CollectD for Utilization Data Visualization, trending

• CollectD- Daemon to collect system performance statistics periodically- Provides mechanisms to store values in a variety of ways- OpenSource, actively developed and 90+ plugins out-of-box to use

Sample Results:

ELK + CollectD for Utilization Data Visualization, trending

elk possibilities: log management and beyond

Technology