centralized logs the elasticsearch way - linuxdays › media › centralized... · curator periodic...

23
Centralized logs the ElasticSearch way by eznam.cz Jan Šimák @ infra SCIF

Upload: others

Post on 28-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

Centralized logs the ElasticSearch way

by eznam.cz

Jan Šimák @ infra SCIF

Page 2: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

everal years ago

Page 3: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

past is ues

Page 4: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

pre ent

Page 5: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

and still continues

the journey

was long

Page 6: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

at the beginning

filebeat logstash elasticsearch

kibanaelastic api

Page 7: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

next tep

filebeat logstash elasticsearch

kibanaelastic apikafka

Page 8: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

now we are here

cross-cluster search

kibana+

es api

dc X dc XX dc XXX dc XXXX

loadbalancer “labrador”

Page 9: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

clu ter breakdown

role: master

role: data+ingestrole: data+ingestnode.attr.type: inactive-logsnode.attr.type: indexing-nodes

Page 10: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

clu ter breakdownrole: master

5 instances

OpenStack vm

15GB RAM

14 vCPU

Page 11: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

clu ter breakdownrole: data+ingestnode.attr.type: indexing-nodes

16 instances

bare metal

128GB RAM

40 HT CPU

4 2TB SSD w/o RAID

vendor szn montovna

Page 12: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

clu ter breakdownrole: data+ingestnode.attr.type: inactive-nodes

48 instances

bare metal

64 - 128GB RAM

24 - 40 HT CPU

SAS or SSD, various sizes, w/ or w/o RAID

various vendors

Page 13: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

clu ter breakdown

common jvm configuration (except heap size)

common elasticsearch.yml (except roles and attrs)

mix of versions:

5.6.x (first cluster)

6.8.x (second cluster, CCS)

no docker deployment (but we love docker)

Ansible deployment

Page 14: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

cross-clu ter search

2 instances per DC

OpenStack vm

32GB RAM

24 vCPU

dedicated ES cluster

keeps kibana’s indices only

Page 15: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

ecurity, what ?

all clusters are secured by

PKI for:

mutual node communication

rest api communication

all in one CA for a client or a server

Page 16: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

how do we manage E ?curator

periodic cron jobs

cerebro

third party web UI for a visual overview

ad-hoc r/w operational tasks

directly through the rest api (port 9200)

ad-hoc r/w operational tasks

_cat API army swiss knife

Page 17: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

index managementindex rollover based on #documents

#primary shards = #indexing-nodes - 2

keep shard size small enough ~40GB

alerting on shard size

alerting on empty index

some indices still have date suffix -> obsolete

Page 18: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

note: show live demo if you dare

Page 19: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

note: show live demo if you dare

Page 20: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational

troubles aka F…..to many indices/shards -> done

md freeze under heavy load -> done

small disks on indexing nodes -> done

HDD on indexing nodes -> done

100% SSD disk utilization causing 429 -> done

mixing ES versions in one cluster -> done

wrong kafka configuration for internal topic -> done

kafka consumer group rebalances -> open

Page 21: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational
Page 22: Centralized logs the ElasticSearch way - LinuxDays › media › centralized... · curator periodic cron jobs cerebro third party web UI for a visual overview ad-hoc r/w operational