Download - Scaling an ELK stack at bol.com
Scaling an ELK stackElasticsearch NL meetup
2014.09.22, Utrecht
2
Who am I?
Renzo Tomà
• IT operations• Linux engineer• Python developer• Likes huge streams of raw data• Designed metrics & logsearch platform• Married, proud father of two
And you?
3
ELK
4
ELK at bol.comLogsearch platform.
For developers & operations.
Search & analyze log events using Kibana.
Events from many sources (e.g. syslog, accesslog, log4j, …)
Part of our infrastructure.
Why? Faster root cause analyses quicker time-to-repair.
5
Real world examplesCase: release of new webshop version.Nagios alert: jboss processing time.Metrics: increase in active threads (and proctime).=> Inconclusive!
Find all HTTP requests to www.bol.com which were slower than 5 seconds:
@type:apache_access AND @fields.site:”www_bol_com” AND \@fields.responsetimes:[5.000.000 TO *]
=> Hits for 1 URL. Enough for DEV to start its RCA.
6
Real world examplesCase: strange performance spikes on webshop.Looks bad, but cause unknown.
Find all errors in webshop log4j logging:
@fields.application:wsp AND @fields.level:ERROR
Compare errors before vs during spike. Spot the difference.
=> Spikes caused by timeouts on a backend service.
Metrics correlation: timeouts not cause, but symptom of full GC issue.
7
Initial design (mid 2013’ish)
Kibana2
Remote_syslog
pkg
Log4j syslog
appender
Logstash
Elastic searchElastic
search
Log events
Acts as syslog server.Converts linesinto events,into json docs.
Accesslog
SyslogCentral syslog server
Servers, routers, firewalls …
Apache webservers
Java webapplications (JVM)
Using syslog protocolover UDP as transport.Even for accesslog + log4j.
tail
8
Initial attempt #failSingle logstash instance not fast enough.Unable to keep up with events created.
High CPU load, due to intensive grokking (regex).Network buffer overflow. UDP traffic dropped.
Result: missing events.
9
Initial attempt #failLog4j events can be multiline (e.g. stacktraces).
Events are send per line:100 lines = 100 syslog msgs
Merging by Logstash.
Remember the UDP drops?
Result:- unparseable events (if 1st line was missing)- Swiss cheese. Stacktrace lines were missing.
10
Initial attempt #failSyslog RFC3164:
“The total length of the packet MUST be 1024 bytes or less.”
Rich Apache LogFormat + lots of cookies = 4kb easily.
Anything after byte 1024 got trimmed.
Result: unparseable events (mismatch grok pattern)
11
The only way is up.
Improvement proposals:
- Use queuing to make Logstash horizontal scalable.
- Drop syslog as transport (for non-syslog).
- Reduce amount of grokking. Pre-formatting at source scales better. Less complexity.
12
Latest design (mid 2014’ish)
Kibana2 + 3
Local Logsheep
Log4j jsonevent
layout
Elastic searchElastic
search
Log events
Accesslogjsonevent
format
SyslogCentral syslog server
Servers, routers, firewalls …
Apache webservers
Java webapplications (JVM)
Elastic searchRedis
(queue)
Log4j redis
appender
LogstashLocal
Logsheep
Events in jsonevent format.No grokking required.
Many instancesLots of other
sources
13
Current status #win- Logstash: up to 10 instances per env (because of logstash 1.1 version)
- ES cluster (v1.0.1): 6 data + 2 client nodes
- Each datanode has 7 datadisks (striping)
- Indexing at 2k – 4k docs added per second
- Avg. index time: 0.5ms
- Peak: 300M docs = 185GB, per day
- Searches: just a few per hour
- Shardcount: 3 per idx, 1 replica, 3000 total
- Retention: up to 60 days
14
Our lessons learnedBefore anything else!
Start collecting metrics so you get a baseline.No blind tuning. Validate every change fact-based.
Our weapons of choice:• Graphite• Diamond (I am contributor of the ES collector)
• Jcollectd
Alternative: try Marvel.
15
Logstash tip #1Insert Redis as queue between source and logstash instances:- Scale Logstash scale horizontally- High availability (no events get lost)
Redis
Logstash
Logstash
Logstash
Redis
16
Logstash tip #2Tune your workers. Find your chokepoint and increase its workers to improve throughput.
OutputInput Filter
OutputInputFilter
Filter
$ top –H –p $(pgrep logstash)
17
Logstash tip #3Grok is very powerful, but CPU intensive. Hard to write, maintain and debug.
Fix: vertical scaling. Increase filterworkers or add more Logstash instances.
Better: feed Logstash with jsonevent input.
Solutions:• Log4j: use log4j-jsonevent-layout • Apache: define json output with LogFormat
18
Logstash tip #4 (last one)
Use the HTTP protocol Elasticsearch output.
Avoid a version lock in!
HTTP may be slower, but newer ES means:- Lots of new features- Lots of bug fixes- Lots of performance improvements
Most important: you decide what versions to use.
Logstash v1.4.2 (June ‘14) requires ES v1.1.1 (April ‘14).Latest ES version is v1.3.2 (Aug ‘14).
19
Elasticsearch tip #1Do not download a ‘great’ configuration.
Elasticsearch is very complex. Lots of moving parts. Lots of different use-cases. Lots of configuration options. The defaults can not be optimal.
Start with defaults:• Load it (stresstest or pre-launch traffic).• Check your metrics.• Find your chokepoint.• Change setting.• Verify and repeat.
20
Elasticsearch tip #2Increase the ‘index.refresh_interval’ setting.
Refresh: make newly added docs available for search. Default value: one second. High impact on heavy indexing systems (like ours).
Change it at runtime & check the metrics:$ curl -s -XPUT 0:9200/_all/_settings?index.refresh_interval=5s
21
Elasticsearch tip #3Use Curator to keep total shardcount constant.
Uncontrolled shard growth may trigger a sudden hockey stick effect.
Our setup:- 6 datanodes- 6 shards per index- 3 primary, 3 replica
“One shard per datanode” (YMMV)
22
Elasticsearch tip #4Become experienced in rolling cluster restarts:- to roll out new Elasticsearch releases- to apply a config setting (e.g. heap, gc, ..)
- because it will solve an incident.
Control concurrency + bandwidth:cluster.routing.allocation.node_concurrent_recoveriescluster.routing.allocation.cluster_concurrent_rebalanceindices.recovery.max_bytes_per_sec
Get confident enough to trustdoing a rolling restart on aSaturday evening!(To get this graph )
23
Elasticsearch tip #5 (last one)
Cluster restarts improve recovery time.
Recovery: compares replica vs primary shard. If different, recreate the replica. Costly (iowait) and very time consuming.
But … difference is normal. Primary and replica have their own segment merge management: same docs, but different bytes.
After recovery: replica is exact copy of primary.
Note: only works for stale shards (no more updates).You have a lot of those when using daily Logstash indices.
Thank you for listening.Got questions?
You can contact me via:[email protected], or
25
Relocation in action
27
Tools we usehttp://redis.io/Key/value memory store, no-frills queuing, extremely fast.Used to scale logstash horizontally.
https://github.com/emicklei/log4j-redis-appenderSend log4j event to Redis queue, non-blocking, batch, failover
https://github.com/emicklei/log4j-jsonevent-layoutFormat log4j events in logstash event layout.Why have logstash do lots of grokking, if you can feed it with logstash friendly json.
http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/Format Apache access logging in logstash event layout. Again: avoid grokking.
https://github.com/bolcom/ (SOON)Logsheep: custom multi-threaded logtailer / udp listener, sends events to redis.
https://github.com/BrightcoveOS/Diamond/ Great metrics collector framework with Elasticsearch collector. I am contributor.
https://github.com/elasticsearch/curatorTool for automatic Elasticsearch index management (delete, close, optimize, bloom).