centralized logs the elasticsearch way - linuxdays › media › centralized... · curator periodic...

Centralized logs the ElasticSearch way

by eznam.cz

Jan Šimák @ infra SCIF

everal years ago

past is ues

pre ent

and still continues

the journey

was long

at the beginning

filebeat logstash elasticsearch

kibanaelastic api

next tep

filebeat logstash elasticsearch

kibanaelastic apikafka

now we are here

cross-cluster search

kibana+

es api

dc X dc XX dc XXX dc XXXX

loadbalancer “labrador”

clu ter breakdown

role: master

role: data+ingestrole: data+ingestnode.attr.type: inactive-logsnode.attr.type: indexing-nodes

clu ter breakdownrole: master

5 instances

OpenStack vm

15GB RAM

14 vCPU

clu ter breakdownrole: data+ingestnode.attr.type: indexing-nodes

16 instances

bare metal

128GB RAM

40 HT CPU

4 2TB SSD w/o RAID

vendor szn montovna

clu ter breakdownrole: data+ingestnode.attr.type: inactive-nodes

48 instances

bare metal

64 - 128GB RAM

24 - 40 HT CPU

SAS or SSD, various sizes, w/ or w/o RAID

various vendors

clu ter breakdown

common jvm configuration (except heap size)

common elasticsearch.yml (except roles and attrs)

mix of versions:

5.6.x (first cluster)

6.8.x (second cluster, CCS)

no docker deployment (but we love docker)

Ansible deployment

cross-clu ter search

2 instances per DC

OpenStack vm

32GB RAM

24 vCPU

dedicated ES cluster

keeps kibana’s indices only

ecurity, what ?

all clusters are secured by

PKI for:

mutual node communication

rest api communication

all in one CA for a client or a server

how do we manage E ?curator

periodic cron jobs

cerebro

third party web UI for a visual overview

ad-hoc r/w operational tasks

directly through the rest api (port 9200)

ad-hoc r/w operational tasks

_cat API army swiss knife

index managementindex rollover based on #documents

#primary shards = #indexing-nodes - 2

keep shard size small enough ~40GB

alerting on shard size

alerting on empty index

some indices still have date suffix -> obsolete

note: show live demo if you dare

troubles aka F…..to many indices/shards -> done

md freeze under heavy load -> done

small disks on indexing nodes -> done

HDD on indexing nodes -> done

100% SSD disk utilization causing 429 -> done

mixing ES versions in one cluster -> done

wrong kafka configuration for internal topic -> done

kafka consumer group rebalances -> open

[email protected]

@jan_simak

mailto:[email protected]

centralized logs the elasticsearch way - linuxdays › media › centralized... · curator periodic...

Documents