scaling an elk stack at bol.com

Scaling an ELK stackElasticsearch NL meetup

2014.09.22, Utrecht

2

Who am I?

Renzo Tomà

• IT operations• Linux engineer• Python developer• Likes huge streams of raw data• Designed metrics & logsearch platform• Married, proud father of two

And you?

3

ELK

4

ELK at bol.comLogsearch platform.

For developers & operations.

Search & analyze log events using Kibana.

Events from many sources (e.g. syslog, accesslog, log4j, …)

Part of our infrastructure.

Why? Faster root cause analyses quicker time-to-repair.

5

Real world examplesCase: release of new webshop version.Nagios alert: jboss processing time.Metrics: increase in active threads (and proctime).=> Inconclusive!

Find all HTTP requests to www.bol.com which were slower than 5 seconds:

@type:apache_access AND @fields.site:”www_bol_com” AND \@fields.responsetimes:[5.000.000 TO *]

=> Hits for 1 URL. Enough for DEV to start its RCA.

http://www.bol.com/

6

Real world examplesCase: strange performance spikes on webshop.Looks bad, but cause unknown.

Find all errors in webshop log4j logging:

@fields.application:wsp AND @fields.level:ERROR

Compare errors before vs during spike. Spot the difference.

=> Spikes caused by timeouts on a backend service.

Metrics correlation: timeouts not cause, but symptom of full GC issue.

7

Initial design (mid 2013’ish)

Kibana2

Remote_syslog

pkg

Log4j syslog

appender

Logstash

Elastic searchElastic

search

Log events

Acts as syslog server.Converts linesinto events,into json docs.

Accesslog

SyslogCentral syslog server

Servers, routers, firewalls …

Apache webservers

Java webapplications (JVM)

Using syslog protocolover UDP as transport.Even for accesslog + log4j.

tail

8

Initial attempt #failSingle logstash instance not fast enough.Unable to keep up with events created.

High CPU load, due to intensive grokking (regex).Network buffer overflow. UDP traffic dropped.

Result: missing events.

9

Initial attempt #failLog4j events can be multiline (e.g. stacktraces).

Events are send per line:100 lines = 100 syslog msgs

Merging by Logstash.

Remember the UDP drops?

Result:- unparseable events (if 1st line was missing)- Swiss cheese. Stacktrace lines were missing.

10

Initial attempt #failSyslog RFC3164:

“The total length of the packet MUST be 1024 bytes or less.”

Rich Apache LogFormat + lots of cookies = 4kb easily.

Anything after byte 1024 got trimmed.

Result: unparseable events (mismatch grok pattern)

11

The only way is up.

Improvement proposals:

- Use queuing to make Logstash horizontal scalable.

- Drop syslog as transport (for non-syslog).

- Reduce amount of grokking. Pre-formatting at source scales better. Less complexity.

12

Latest design (mid 2014’ish)

Kibana2 + 3

Local Logsheep

Log4j jsonevent

layout

Elastic searchElastic

search

Log events

Accesslogjsonevent

format

SyslogCentral syslog server

Servers, routers, firewalls …

Apache webservers

Java webapplications (JVM)

Elastic searchRedis

(queue)

Log4j redis

appender

LogstashLocal

Logsheep

Events in jsonevent format.No grokking required.

Many instancesLots of other

sources

13

Current status #win- Logstash: up to 10 instances per env (because of logstash 1.1 version)

- ES cluster (v1.0.1): 6 data + 2 client nodes

- Each datanode has 7 datadisks (striping)

- Indexing at 2k – 4k docs added per second

- Avg. index time: 0.5ms

- Peak: 300M docs = 185GB, per day

- Searches: just a few per hour

- Shardcount: 3 per idx, 1 replica, 3000 total

- Retention: up to 60 days

14

Our lessons learnedBefore anything else!

Start collecting metrics so you get a baseline.No blind tuning. Validate every change fact-based.

Our weapons of choice:• Graphite• Diamond (I am contributor of the ES collector)

• Jcollectd

Alternative: try Marvel.

15

Logstash tip #1Insert Redis as queue between source and logstash instances:- Scale Logstash scale horizontally- High availability (no events get lost)

Redis

Logstash

Logstash

Logstash

Redis

16

Logstash tip #2Tune your workers. Find your chokepoint and increase its workers to improve throughput.

OutputInput Filter

OutputInputFilter

Filter

$ top –H –p $(pgrep logstash)

17

Logstash tip #3Grok is very powerful, but CPU intensive. Hard to write, maintain and debug.

Fix: vertical scaling. Increase filterworkers or add more Logstash instances.

Better: feed Logstash with jsonevent input.

Solutions:• Log4j: use log4j-jsonevent-layout • Apache: define json output with LogFormat

18

Logstash tip #4 (last one)

Use the HTTP protocol Elasticsearch output.

Avoid a version lock in!

HTTP may be slower, but newer ES means:- Lots of new features- Lots of bug fixes- Lots of performance improvements

Most important: you decide what versions to use.

Logstash v1.4.2 (June ‘14) requires ES v1.1.1 (April ‘14).Latest ES version is v1.3.2 (Aug ‘14).

19

Elasticsearch tip #1Do not download a ‘great’ configuration.

Elasticsearch is very complex. Lots of moving parts. Lots of different use-cases. Lots of configuration options. The defaults can not be optimal.

Start with defaults:• Load it (stresstest or pre-launch traffic).• Check your metrics.• Find your chokepoint.• Change setting.• Verify and repeat.

20

Elasticsearch tip #2Increase the ‘index.refresh_interval’ setting.

Refresh: make newly added docs available for search. Default value: one second. High impact on heavy indexing systems (like ours).

Change it at runtime & check the metrics:$ curl -s -XPUT 0:9200/_all/_settings?index.refresh_interval=5s

21

Elasticsearch tip #3Use Curator to keep total shardcount constant.

Uncontrolled shard growth may trigger a sudden hockey stick effect.

Our setup:- 6 datanodes- 6 shards per index- 3 primary, 3 replica

“One shard per datanode” (YMMV)

22

Elasticsearch tip #4Become experienced in rolling cluster restarts:- to roll out new Elasticsearch releases- to apply a config setting (e.g. heap, gc, ..)

- because it will solve an incident.

Control concurrency + bandwidth:cluster.routing.allocation.node_concurrent_recoveriescluster.routing.allocation.cluster_concurrent_rebalanceindices.recovery.max_bytes_per_sec

Get confident enough to trustdoing a rolling restart on aSaturday evening!(To get this graph )

23

Elasticsearch tip #5 (last one)

Cluster restarts improve recovery time.

Recovery: compares replica vs primary shard. If different, recreate the replica. Costly (iowait) and very time consuming.

But … difference is normal. Primary and replica have their own segment merge management: same docs, but different bytes.

After recovery: replica is exact copy of primary.

Note: only works for stale shards (no more updates).You have a lot of those when using daily Logstash indices.

Thank you for listening.Got questions?

You can contact me via:[email protected], or

mailto:[email protected]

Relocation in action

27

Tools we usehttp://redis.io/Key/value memory store, no-frills queuing, extremely fast.Used to scale logstash horizontally.

https://github.com/emicklei/log4j-redis-appenderSend log4j event to Redis queue, non-blocking, batch, failover

https://github.com/emicklei/log4j-jsonevent-layoutFormat log4j events in logstash event layout.Why have logstash do lots of grokking, if you can feed it with logstash friendly json.

http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/Format Apache access logging in logstash event layout. Again: avoid grokking.

https://github.com/bolcom/ (SOON)Logsheep: custom multi-threaded logtailer / udp listener, sends events to redis.

https://github.com/BrightcoveOS/Diamond/ Great metrics collector framework with Elasticsearch collector. I am contributor.

https://github.com/elasticsearch/curatorTool for automatic Elasticsearch index management (delete, close, optimize, bloom).

http://redis.io/

https://github.com/emicklei/log4j-redis-appender

https://github.com/emicklei/log4j-jsonevent-layout

https://github.com/emicklei/log4j-jsonevent-layout

http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/

http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/

https://github.com/bolcom/

https://github.com/BrightcoveOS/Diamond/

https://github.com/elasticsearch/curator

https://github.com/elasticsearch/curator

scaling an elk stack at bol.com

Technology