centralized logs the elasticsearch way - linuxdays › media › centralized... · curator periodic...
TRANSCRIPT
Centralized logs the ElasticSearch way
by eznam.cz
Jan Šimák @ infra SCIF
everal years ago
past is ues
pre ent
and still continues
the journey
was long
at the beginning
filebeat logstash elasticsearch
kibanaelastic api
next tep
filebeat logstash elasticsearch
kibanaelastic apikafka
now we are here
cross-cluster search
kibana+
es api
dc X dc XX dc XXX dc XXXX
loadbalancer “labrador”
clu ter breakdown
role: master
role: data+ingestrole: data+ingestnode.attr.type: inactive-logsnode.attr.type: indexing-nodes
clu ter breakdownrole: master
5 instances
OpenStack vm
15GB RAM
14 vCPU
clu ter breakdownrole: data+ingestnode.attr.type: indexing-nodes
16 instances
bare metal
128GB RAM
40 HT CPU
4 2TB SSD w/o RAID
vendor szn montovna
clu ter breakdownrole: data+ingestnode.attr.type: inactive-nodes
48 instances
bare metal
64 - 128GB RAM
24 - 40 HT CPU
SAS or SSD, various sizes, w/ or w/o RAID
various vendors
clu ter breakdown
common jvm configuration (except heap size)
common elasticsearch.yml (except roles and attrs)
mix of versions:
5.6.x (first cluster)
6.8.x (second cluster, CCS)
no docker deployment (but we love docker)
Ansible deployment
cross-clu ter search
2 instances per DC
OpenStack vm
32GB RAM
24 vCPU
dedicated ES cluster
keeps kibana’s indices only
ecurity, what ?
all clusters are secured by
PKI for:
mutual node communication
rest api communication
all in one CA for a client or a server
how do we manage E ?curator
periodic cron jobs
cerebro
third party web UI for a visual overview
ad-hoc r/w operational tasks
directly through the rest api (port 9200)
ad-hoc r/w operational tasks
_cat API army swiss knife
index managementindex rollover based on #documents
#primary shards = #indexing-nodes - 2
keep shard size small enough ~40GB
alerting on shard size
alerting on empty index
some indices still have date suffix -> obsolete
note: show live demo if you dare
note: show live demo if you dare
troubles aka F…..to many indices/shards -> done
md freeze under heavy load -> done
small disks on indexing nodes -> done
HDD on indexing nodes -> done
100% SSD disk utilization causing 429 -> done
mixing ES versions in one cluster -> done
wrong kafka configuration for internal topic -> done
kafka consumer group rebalances -> open